From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=DNln=IT=suse.de=rguenther@sourceware.org>
Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2a07:de40:b251:101:10:150:64:2])
	by sourceware.org (Postfix) with ESMTPS id 8E6963858C74
	for <gcc-patches@gcc.gnu.org>; Tue,  9 Jan 2024 14:19:43 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8E6963858C74
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8E6963858C74
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:2
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704809986; cv=none;
	b=AxneFk2Q+gQIuaEqiVjWjVv7T7nlz+s8UcvrqPVtj/vBrV9oDFkcmwUux/L6dUEpTdJqjOGIyw428RgVti3sdfoPH3fBTrNJtdseK33hVn8n2X5B2Oy1va8FpGRhEWVZCAs2j+CYmqGB0jTIgUKJWiNiZy7jf+x50lAExMuXtSo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1704809986; c=relaxed/simple;
	bh=Ysdj7bqkJsYd6UalBYefXQzC+/9FdvTjEM2fmUCsTZM=;
	h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date:
	 From:To:Subject:Message-ID:MIME-Version; b=wOpu5QCcxxZEw2BqebQ5Yh57dMXnPwiI6jg7X9gnNrpP+crziFEdUVMJ10IBc69hBsZqch0uSBlVRDc0uethWf9N65CibgAujKQxG1+9JwBsNaJFGBVsZ5ZuP4DcM7mtLKCmqgo7H4dpFzAcbtgdIuG9uJr1IHNb01vYI6+IB9Y=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from [10.168.4.150] (unknown [10.168.4.150])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(No client certificate requested)
	by smtp-out2.suse.de (Postfix) with ESMTPS id EAB731F807;
	Tue,  9 Jan 2024 14:19:41 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1704809982; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=Kjj3/DNLtFxCvl/h/ewRP8DpzHndqx/MIFGAtksWe7w=;
	b=OKCNDw82ItZLLaEqjFetq8ywYM66xoMh98D42FVJvYe+Ju6Gojlc62hqiFvW9mw77EM1Dj
	HF9mN6o2sOhzW5VqJI2/PvIKYFRv5T0G0/3uc5W4MSptIMTsWCKvd3WOBfHJD5rGfq5m2l
	6RXotamm9skrq21PgKiGj8Ql7OnJoDo=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1704809982;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=Kjj3/DNLtFxCvl/h/ewRP8DpzHndqx/MIFGAtksWe7w=;
	b=ZWzgx9XP3fTrKIOQzP182DBNICT4PiOGxKbFwr5FRxCE7FUlSx58oQ+gdMdbAAr3R4O3fj
	YPzl7Ms7ijwCU8Cw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1704809982; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=Kjj3/DNLtFxCvl/h/ewRP8DpzHndqx/MIFGAtksWe7w=;
	b=OKCNDw82ItZLLaEqjFetq8ywYM66xoMh98D42FVJvYe+Ju6Gojlc62hqiFvW9mw77EM1Dj
	HF9mN6o2sOhzW5VqJI2/PvIKYFRv5T0G0/3uc5W4MSptIMTsWCKvd3WOBfHJD5rGfq5m2l
	6RXotamm9skrq21PgKiGj8Ql7OnJoDo=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1704809982;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=Kjj3/DNLtFxCvl/h/ewRP8DpzHndqx/MIFGAtksWe7w=;
	b=ZWzgx9XP3fTrKIOQzP182DBNICT4PiOGxKbFwr5FRxCE7FUlSx58oQ+gdMdbAAr3R4O3fj
	YPzl7Ms7ijwCU8Cw==
Date: Tue, 9 Jan 2024 15:14:42 +0100 (CET)
From: Richard Biener <rguenther@suse.de>
To: Tamar Christina <Tamar.Christina@arm.com>
cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>, 
    "jlaw@ventanamicro.com" <jlaw@ventanamicro.com>
Subject: RE: [PATCH]middle-end: Fix dominators updates when peeling with
 multiple exits [PR113144]
In-Reply-To:  <VI1PR08MB5325C3F3E294A75B4CA322D6FF6A2@VI1PR08MB5325.eurprd08.prod.outlook.com>
Message-ID: <q7n83730-r017-qo07-8339-n96os6s8196p@fhfr.qr>
References: <patch-18113-tamar@arm.com> <q60q1345-6po3-5p08-425p-7687rs03p95r@fhfr.qr>  <VI1PR08MB5325B761117C84559A5E7776FF6A2@VI1PR08MB5325.eurprd08.prod.outlook.com> <8pn44s40-pqs9-s99r-6qs4-1p9432692p7q@fhfr.qr> 
 <VI1PR08MB5325E8D2FF908B2062A09816FF6A2@VI1PR08MB5325.eurprd08.prod.outlook.com> <q8p4pnq5-qnq1-8770-p4qo-51prp680n7r1@fhfr.qr> <4p54n752-rs8o-o37o-7q5r-npp5onp9sqrr@fhfr.qr> 
 <VI1PR08MB5325C3F3E294A75B4CA322D6FF6A2@VI1PR08MB5325.eurprd08.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Level: 
Authentication-Results: smtp-out2.suse.de;
	none
X-Spam-Level: 
X-Spam-Score: 0.20
X-Spamd-Result: default: False [0.20 / 50.00];
	 ARC_NA(0.00)[];
	 TO_DN_EQ_ADDR_SOME(0.00)[];
	 FROM_HAS_DN(0.00)[];
	 RCPT_COUNT_THREE(0.00)[4];
	 TO_DN_SOME(0.00)[];
	 TO_MATCH_ENVRCPT_ALL(0.00)[];
	 MIME_GOOD(-0.10)[text/plain];
	 DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519];
	 NEURAL_HAM_SHORT(-0.20)[-0.998];
	 NEURAL_SPAM_LONG(3.50)[1.000];
	 DBL_BLOCKED_OPENRESOLVER(0.00)[ventanamicro.com:email,arm.com:email,tree-vect-loop-manip.cc:url,suse.de:email];
	 FUZZY_BLOCKED(0.00)[rspamd.com];
	 RCVD_COUNT_ZERO(0.00)[0];
	 FROM_EQ_ENVFROM(0.00)[];
	 MIME_TRACE(0.00)[0:+];
	 BAYES_HAM(-3.00)[100.00%]
X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Tue, 9 Jan 2024, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Tuesday, January 9, 2024 1:51 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: RE: [PATCH]middle-end: Fix dominators updates when peeling with
> > multiple exits [PR113144]
> > 
> > On Tue, 9 Jan 2024, Richard Biener wrote:
> > 
> > > On Tue, 9 Jan 2024, Tamar Christina wrote:
> > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Richard Biener <rguenther@suse.de>
> > > > > Sent: Tuesday, January 9, 2024 12:26 PM
> > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > > > > Subject: RE: [PATCH]middle-end: Fix dominators updates when peeling with
> > > > > multiple exits [PR113144]
> > > > >
> > > > > On Tue, 9 Jan 2024, Tamar Christina wrote:
> > > > >
> > > > > > > This makes it quadratic in the number of vectorized early exit loops
> > > > > > > in a function.  The vectorizer CFG manipulation operates in a local
> > > > > > > enough bubble that programmatic updating of dominators should be
> > > > > > > possible (after all we manage to produce correct SSA form!), the
> > > > > > > proposed change gets us too far off to a point where re-computating
> > > > > > > dominance info is likely cheaper (but no, we shouldn't do this either).
> > > > > > >
> > > > > > > Can you instead give manual updating a try again?  I think
> > > > > > > versioning should produce up-to-date dominator info, it's only
> > > > > > > when you redirect branches during peeling that you'd need
> > > > > > > adjustments - but IIRC we're never introducing new merges?
> > > > > > >
> > > > > > > IIRC we can't wipe dominators during transform since we query them
> > > > > > > during code generation.  We possibly could code generate all
> > > > > > > CFG manipulations of all vectorized loops, recompute all dominators
> > > > > > > and then do code generation of all vectorized loops.
> > > > > > >
> > > > > > > But then we're doing a loop transform and the exits will ultimatively
> > > > > > > end up in the same place, so the CFG and dominator update is bound to
> > > > > > > where the original exits went to.
> > > > > >
> > > > > > Yeah that's a fair point, the issue is specifically with at_exit.  So how about:
> > > > > >
> > > > > > When we peel at_exit we are moving the new loop at the exit of the
> > previous
> > > > > > loop.  This means that the blocks outside the loop dat the previous loop
> > used to
> > > > > > dominate are no longer being dominated by it.
> > > > >
> > > > > Hmm, indeed.  Note this does make the dominator update O(function-size)
> > > > > and when vectorizing multiple loops in a function this becomes
> > > > > quadratic.  That's quite unfortunate so I wonder if we can delay the
> > > > > update to the parts we do not need up-to-date dominators during
> > > > > vectorization (of course it gets fragile with having only partly
> > > > > correct dominators).
> > > >
> > > > Fair, I created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113290 and will
> > > > tackle it when I add SLP support in GCC 15.
> > > >
> > > > I think the problem is, and the reason we do early dominator correction and
> > > > validation is because the same function is used by loop distribution.
> > > >
> > > > But you're right that during vectorization we perform dominators update twice
> > > > now.
> > >
> > > We're performing it at least once per multi-exit loop that is vectorized,
> > > covering all downstream blocks.
> > 
> > That is, consider sth like
> > 
> > int a[77];
> > 
> > int bar ();
> > void foo ()
> > {
> >   int val;
> > #define LOOP \
> >   val = bar (); \
> >   for (int i = 0; i < 77; ++i) \
> >     { \
> >       if (a[i] == val) \
> >         break; \
> >       a[i]++; \
> >     }
> > #define LOOP10 LOOP LOOP LOOP LOOP LOOP LOOP LOOP LOOP LOOP LOOP
> > #define LOOP100 LOOP10 LOOP10 LOOP10 LOOP10 LOOP10 LOOP10 LOOP10
> > LOOP10
> > LOOP10 LOOP10
> > #define LOOP1000 LOOP100 LOOP100 LOOP100 LOOP100 LOOP100 LOOP100
> > LOOP100
> > LOOP100 LOOP100 LOOP100
> >   LOOP1000
> > }
> > 
> > where on x86_64 with -O3 -msse4.1 -fno-gcse -fno-gcse-after-reload we're
> > currently "fine" (calling iterate_fix_dominators 2000 times).  But
> > with geting all dominated blocks you should get every block to exit
> > "fixed" and maybe get dominance compute to show up in the profile.
> > 
> 
> Yeah, that makes sense.  If we can move loop distribution to a different method,
> then we can just perform dominators update only once for all loops at the end
> of vectorization right?

Maybe.  We'll have to see.  It would be good to have a way to mark
dominator regions as invalid (you could simply disconnect the nodes
from the tree).  What we know is that within loops dominator queries
should be OK.

I still think the invalidated region is bound, for example to blocks
dominating the sons of the scalar loop header, that is, I doubt
you really need to use get_all_dominated_blocks, get_dominated_by
should eventually suffice (or get_dominated_by_region with the
region being the scalar loop body and maybe exit blocks)?

> Thanks,
> Tamar
> 
> > Richard.
> > 
> > 
> > > > So Maybe we should have a parameter to indicate whether dominators should
> > > > be updated?
> > >
> > > I think we should possibly try making loop distribution use another
> > > mechanism for its copying ... there's duplicate_loop_body_to_header_edge
> > > that's also used by loop_version, the core parts doing the new
> > > loop creation could be split out and the detail how the final CFG
> > > is set up be retained in two workers.
> > >
> > > Richard.
> > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > >
> > > > > > The new dominators however are hard to predict since if the loop has
> > multiple
> > > > > > exits and all the exits are an "early" one then we always execute the scalar
> > > > > > loop.  In this case the scalar loop can completely dominate the new loop.
> > > > > >
> > > > > > If we later have skip_vector then there's an additional skip edge added that
> > > > > > might change the dominators.
> > > > > >
> > > > > > The previous patch would force an update of all blocks reachable from the
> > new
> > > > > > exits.  This one updates *only* blocks that we know the scalar exits
> > dominated.
> > > > > >
> > > > > > For the examples this reduces the blocks to update from 18 to 3.
> > > > > >
> > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > > > > > and no issues normally and with --enable-checking=release --enable-lto
> > > > > > --with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra.
> > > > > >
> > > > > > Ok for master?
> > > > >
> > > > > See below.
> > > > >
> > > > > > Thanks,
> > > > > > Tamar
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > 	PR tree-optimization/113144
> > > > > > 	PR tree-optimization/113145
> > > > > > 	* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
> > > > > > 	Update all BB that the original exits dominated.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > 	PR tree-optimization/113144
> > > > > > 	PR tree-optimization/113145
> > > > > > 	* gcc.dg/vect/vect-early-break_94-pr113144.c: New test.
> > > > > >
> > > > > > --- inline copy of patch ---
> > > > > >
> > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_94-pr113144.c
> > > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_94-pr113144.c
> > > > > > new file mode 100644
> > > > > > index
> > > > >
> > 0000000000000000000000000000000000000000..903fe7be6621e81db6f294
> > > > > 41e4309fa213d027c5
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_94-pr113144.c
> > > > > > @@ -0,0 +1,41 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-add-options vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > > +/* { dg-require-effective-target vect_int } */
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > > +
> > > > > > +long tar_atol256_max, tar_atol256_size, tar_atosl_min;
> > > > > > +char tar_atol256_s;
> > > > > > +void __errno_location();
> > > > > > +
> > > > > > +
> > > > > > +inline static long tar_atol256(long min) {
> > > > > > +  char c;
> > > > > > +  int sign;
> > > > > > +  c = tar_atol256_s;
> > > > > > +  sign = c;
> > > > > > +  while (tar_atol256_size) {
> > > > > > +    if (c != sign)
> > > > > > +      return sign ? min : tar_atol256_max;
> > > > > > +    c = tar_atol256_size--;
> > > > > > +  }
> > > > > > +  if ((c & 128) != (sign & 128))
> > > > > > +    return sign ? min : tar_atol256_max;
> > > > > > +  return 0;
> > > > > > +}
> > > > > > +
> > > > > > +inline static long tar_atol(long min) {
> > > > > > +  return tar_atol256(min);
> > > > > > +}
> > > > > > +
> > > > > > +long tar_atosl() {
> > > > > > +  long n = tar_atol(-1);
> > > > > > +  if (tar_atosl_min) {
> > > > > > +    __errno_location();
> > > > > > +    return 0;
> > > > > > +  }
> > > > > > +  if (n > 0)
> > > > > > +    return 0;
> > > > > > +  return n;
> > > > > > +}
> > > > > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > > > > index
> > > > >
> > 76d4979c0b3b374dcaacf6825a95a8714114a63b..9bacaa182a3919cae1cb99dfc
> > > > > 5ae4923e1f93376 100644
> > > > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > > > @@ -1719,8 +1719,6 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> > loop
> > > > > *loop, edge loop_exit,
> > > > > >  	  /* Now link the alternative exits.  */
> > > > > >  	  if (multiple_exits_p)
> > > > > >  	    {
> > > > > > -	      set_immediate_dominator (CDI_DOMINATORS, new_preheader,
> > > > > > -				       main_loop_exit_block);
> > > > > >  	      for (auto gsi_from = gsi_start_phis (loop->header),
> > > > > >  		   gsi_to = gsi_start_phis (new_preheader);
> > > > > >  		   !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> > > > > > @@ -1776,7 +1774,14 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> > loop
> > > > > *loop, edge loop_exit,
> > > > > >  	{
> > > > > >  	  update_loop = new_loop;
> > > > > >  	  for (edge e : get_loop_exit_edges (loop))
> > > > > > -	    doms.safe_push (e->dest);
> > > > > > +	    {
> > > > > > +	      /* Basic blocks that the old loop dominated are now dominated
> > by
> > > > > > +		 the new loop and so we have to update those.  */
> > > > > > +	      for (auto bb : get_all_dominated_blocks (CDI_DOMINATORS, e-
> > >src))
> > > > > > +		if (!flow_bb_inside_loop_p (loop, bb))
> > > > > > +		  doms.safe_push (bb);
> > > > > > +	      doms.safe_push (e->dest);
> > > > > > +	    }
> > > > >
> > > > > I think you'll get duplicate blocks that way.  Maybe simplify this
> > > > > all by instead doing
> > > > >
> > > > >           auto doms = get_all_dominated_blocks (CDI_DOMINATORS, loop-
> > >header);
> > > > >           for (unsigned i = 0; i < doms.length (); ++i)
> > > > >             if (flow_bb_inside_loop_p (loop, doms[i]))
> > > > >               doms.unordered_remove (i);
> > > > >
> > > > > ?
> > > > >
> > > > > OK with that change, but really we should see to avoid this
> > > > > quadraticness :/  It's probably not too bad right now given we have
> > > > > quite some restrictions on vectorizing loops with multiple exits,
> > > > > but I suggest you try an artificial testcase with the "same"
> > > > > loop repeated N times to see whether dominance compute creeps up
> > > > > in the profile.
> > > > >
> > > > > Richard.
> > > >
> > >
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)