public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	nd <nd@arm.com>, "jlaw@ventanamicro.com" <jlaw@ventanamicro.com>
Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
Date: Thu, 16 Nov 2023 12:01:22 +0000	[thread overview]
Message-ID: <VI1PR08MB53257E3B7238B6D3E4265807FFB0A@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2311161116580.8772@jbgna.fhfr.qr>

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Thursday, November 16, 2023 11:28 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> breaks and arbitrary exits
> 
> On Thu, 16 Nov 2023, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Thursday, November 16, 2023 10:40 AM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> jlaw@ventanamicro.com
> > > Subject: RE: [PATCH 7/21]middle-end: update IV update code to
> > > support early breaks and arbitrary exits
> > >
> > > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > >
> > > > > -----Original Message-----
> > > > > From: Richard Biener <rguenther@suse.de>
> > > > > Sent: Wednesday, November 15, 2023 1:23 PM
> > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> > > jlaw@ventanamicro.com
> > > > > Subject: RE: [PATCH 7/21]middle-end: update IV update code to
> > > > > support early breaks and arbitrary exits
> > > > >
> > > > > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > > Sent: Wednesday, November 15, 2023 1:01 PM
> > > > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> > > > > jlaw@ventanamicro.com
> > > > > > > Subject: RE: [PATCH 7/21]middle-end: update IV update code
> > > > > > > to support early breaks and arbitrary exits
> > > > > > >
> > > > > > > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > > > > > >
> > > > > > > > Patch updated to latest trunk:
> > > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > This changes the PHI node updates to support early breaks.
> > > > > > > > It has to support both the case where the loop's exit
> > > > > > > > matches the normal loop exit and one where the early exit is
> "inverted", i.e.
> > > > > > > > it's an early
> > > > > > > exit edge.
> > > > > > > >
> > > > > > > > In the latter case we must always restart the loop for VF iterations.
> > > > > > > > For an early exit the reason is obvious, but there are
> > > > > > > > cases where the "normal" exit is located before the early
> > > > > > > > one.  This exit then does a check on ivtmp resulting in us
> > > > > > > > leaving the loop since it thinks we're
> > > > > done.
> > > > > > > >
> > > > > > > > In these case we may still have side-effects to perform so
> > > > > > > > we also go to the scalar loop.
> > > > > > > >
> > > > > > > > For the "normal" exit niters has already been adjusted for
> > > > > > > > peeling, for the early exits we must find out how many
> > > > > > > > iterations we actually did.  So we have to recalculate the
> > > > > > > > new position
> > > for each exit.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Tamar
> > > > > > > >
> > > > > > > > gcc/ChangeLog:
> > > > > > > >
> > > > > > > > 	* tree-vect-loop-manip.cc (vect_set_loop_condition_normal):
> > > > > > > > Hide
> > > > > > > unused.
> > > > > > > > 	(vect_update_ivs_after_vectorizer): Support early break.
> > > > > > > > 	(vect_do_peeling): Use it.
> > > > > > > >
> > > > > > > > --- inline copy of patch ---
> > > > > > > >
> > > > > > > > diff --git a/gcc/tree-vect-loop-manip.cc
> > > > > > > > b/gcc/tree-vect-loop-manip.cc index
> > > > > > > >
> > > > > > >
> > > > >
> > >
> d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3
> > > > > > > d2654cf1
> > > > > > > > c842baac58f5 100644
> > > > > > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > > > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > > > > > @@ -1200,7 +1200,7 @@
> > > > > > > > vect_set_loop_condition_partial_vectors_avx512
> > > > > > > (class loop *loop,
> > > > > > > >     loop handles exactly VF scalars per iteration.  */
> > > > > > > >
> > > > > > > >  static gcond *
> > > > > > > > -vect_set_loop_condition_normal (loop_vec_info loop_vinfo,
> > > > > > > > edge exit_edge,
> > > > > > > > +vect_set_loop_condition_normal (loop_vec_info /*
> > > > > > > > +loop_vinfo */, edge exit_edge,
> > > > > > > >  				class loop *loop, tree niters, tree step,
> > > > > > > >  				tree final_iv, bool niters_maybe_zero,
> > > > > > > >  				gimple_stmt_iterator loop_cond_gsi)
> > > @@ -
> > > > > > > 1412,7 +1412,7 @@
> > > > > > > > vect_set_loop_condition (class loop *loop, edge loop_e,
> > > > > > > > loop_vec_info
> > > > > > > loop_vinfo
> > > > > > > >     When this happens we need to flip the understanding of
> > > > > > > > main and
> > > > > other
> > > > > > > >     exits by peeling and IV updates.  */
> > > > > > > >
> > > > > > > > -bool inline
> > > > > > > > +bool
> > > > > > > >  vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)  {
> > > > > > > >    return single_pred (loop->latch) == loop_exit->src; @@
> > > > > > > > -2142,6
> > > > > > > > +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info
> > > > > > > > +loop_vinfo)
> > > > > > > >       Input:
> > > > > > > >       - LOOP - a loop that is going to be vectorized. The
> > > > > > > > last few
> > > iterations
> > > > > > > >                of LOOP were peeled.
> > > > > > > > +     - VF   - The chosen vectorization factor for LOOP.
> > > > > > > >       - NITERS - the number of iterations that LOOP executes (before
> it is
> > > > > > > >                  vectorized). i.e, the number of times the
> > > > > > > > ivs should be
> > > bumped.
> > > > > > > >       - UPDATE_E - a successor edge of LOOP->exit that is
> > > > > > > > on the
> > > > > > > > (only) path
> > > > > > >
> > > > > > > the comment on this is now a bit misleading, can you try to
> > > > > > > update it and/or move the comment bits to the docs on
> EARLY_EXIT?
> > > > > > >
> > > > > > > > @@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p
> > > > > > > > (loop_vec_info
> > > > > > > loop_vinfo)
> > > > > > > >                    The phi args associated with the edge UPDATE_E in the
> bb
> > > > > > > >                    UPDATE_E->dest are updated accordingly.
> > > > > > > >
> > > > > > > > +     - restart_loop - Indicates whether the scalar loop
> > > > > > > > + needs to restart the
> > > > > > >
> > > > > > > params are ALL_CAPS
> > > > > > >
> > > > > > > > +		      iteration count where the vector loop began.
> > > > > > > > +
> > > > > > > >       Assumption 1: Like the rest of the vectorizer, this function
> assumes
> > > > > > > >       a single loop exit that has a single predecessor.
> > > > > > > >
> > > > > > > > @@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p
> > > > > > > > (loop_vec_info
> > > > > > > loop_vinfo)
> > > > > > > >   */
> > > > > > > >
> > > > > > > >  static void
> > > > > > > > -vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > > > > > > > -				  tree niters, edge update_e)
> > > > > > > > +vect_update_ivs_after_vectorizer (loop_vec_info
> > > > > > > > +loop_vinfo,
> > > > > > > > +poly_uint64 vf,
> > > > > > >
> > > > > > > LOOP_VINFO_VECT_FACTOR?
> > > > > > >
> > > > > > > > +				  tree niters, edge update_e, bool
> > > > > > > restart_loop)
> > > > > > >
> > > > > > > I think 'bool early_exit' is better here?  I wonder if we have an "early"
> > > > > > > exit after the main exit we are probably sure there are no
> > > > > > > side-effects to re- execute and could avoid this restarting?
> > > > > >
> > > > > > Side effects yes, but the actual check may not have been performed
> yet.
> > > > > > If you remember
> > > > > >
> > > https://gist.github.com/Mistuke/66f14fe5c1be32b91ce149bd9b8bb35f
> > > > > > There in the clz loop through the "main" exit you still have
> > > > > > to see if that iteration did not contain the entry.  This is
> > > > > > because the loop counter is incremented before you iterate.
> > > > > >
> > > > > > >
> > > > > > > >  {
> > > > > > > >    gphi_iterator gsi, gsi1;
> > > > > > > >    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > > > >    basic_block update_bb = update_e->dest;
> > > > > > > > -
> > > > > > > > -  basic_block exit_bb = LOOP_VINFO_IV_EXIT
> > > > > > > > (loop_vinfo)->dest;
> > > > > > > > -
> > > > > > > > -  /* Make sure there exists a single-predecessor exit bb:
> > > > > > > > */
> > > > > > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > > > > > -  gcc_assert (single_succ_edge (exit_bb) == update_e);
> > > > > > > > +  bool inversed_iv
> > > > > > > > +	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT
> > > (loop_vinfo),
> > > > > > > > +					 LOOP_VINFO_LOOP
> > > (loop_vinfo));
> > > > > > > > +  bool needs_interm_block = LOOP_VINFO_EARLY_BREAKS
> > > (loop_vinfo)
> > > > > > > > +			    && flow_bb_inside_loop_p (loop,
> > > update_e->src);
> > > > > > > > +  edge loop_e = LOOP_VINFO_IV_EXIT (loop_vinfo);  gcond
> > > > > > > > + *cond = get_loop_exit_condition (loop_e);  basic_block
> > > > > > > > + exit_bb = loop_e->dest;  basic_block iv_block = NULL;
> > > > > > > > + gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
> > > > > > > >
> > > > > > > >    for (gsi = gsi_start_phis (loop->header), gsi1 =
> > > > > > > > gsi_start_phis
> > > > > (update_bb);
> > > > > > > >         !gsi_end_p (gsi) && !gsi_end_p (gsi1); @@ -2190,7
> > > > > > > > +2198,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info
> > > loop_vinfo,
> > > > > > > >        tree step_expr, off;
> > > > > > > >        tree type;
> > > > > > > >        tree var, ni, ni_name;
> > > > > > > > -      gimple_stmt_iterator last_gsi;
> > > > > > > >
> > > > > > > >        gphi *phi = gsi.phi ();
> > > > > > > >        gphi *phi1 = gsi1.phi (); @@ -2222,11 +2229,52 @@
> > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > (loop_vec_info loop_vinfo,
> > > > > > > >        enum vect_induction_op_type induction_type
> > > > > > > >  	= STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (phi_info);
> > > > > > > >
> > > > > > > > -      if (induction_type == vect_step_op_add)
> > > > > > > > +      tree iv_var = PHI_ARG_DEF_FROM_EDGE (phi,
> > > > > > > > + loop_latch_edge
> > > > > (loop));
> > > > > > > > +      /* create_iv always places it on the LHS.  Alternatively we can
> set a
> > > > > > > > +	 property during create_iv to identify it.  */
> > > > > > > > +      bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > > > > > > +      if (restart_loop && ivtemp)
> > > > > > > >  	{
> > > > > > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > > > > > +	  ni = build_int_cst (type, vf);
> > > > > > > > +	  if (inversed_iv)
> > > > > > > > +	    ni = fold_build2 (MINUS_EXPR, type, ni,
> > > > > > > > +			      fold_convert (type, step_expr));
> > > > > > > > +	}
> > > > > > > > +      else if (induction_type == vect_step_op_add)
> > > > > > > > +	{
> > > > > > > > +
> > > > > > > >  	  tree stype = TREE_TYPE (step_expr);
> > > > > > > > -	  off = fold_build2 (MULT_EXPR, stype,
> > > > > > > > -			     fold_convert (stype, niters), step_expr);
> > > > > > > > +
> > > > > > > > +	  /* Early exits always use last iter value not niters. */
> > > > > > > > +	  if (restart_loop)
> > > > > > > > +	    {
> > > > > > > > +	      /* Live statements in the non-main exit shouldn't
> > > > > > > > +be
> > > adjusted.  We
> > > > > > > > +		 normally didn't have this problem with a single exit
> > > > > > > > +as
> > > live
> > > > > > > > +		 values would be in the exit block.  However when
> > > dealing with
> > > > > > > > +		 multiple exits all exits are redirected to the merge
> > > block
> > > > > > > > +		 and we restart the iteration.  */
> > > > > > >
> > > > > > > Hmm, I fail to see how this works - we're either using the
> > > > > > > value to continue the induction or not, independent of
> STMT_VINFO_LIVE_P.
> > > > > >
> > > > > > That becomes clear in the patch to update live reductions.
> > > > > > Essentially any live Reductions inside an alternative exit
> > > > > > will reduce to the first element rather than the last and use
> > > > > > that as the seed for the
> > > > > scalar loop.
> > > > >
> > > > > Hum.  Reductions are vectorized as N separate reductions.  I
> > > > > don't think you can simply change the reduction between the lanes to
> "skip"
> > > > > part of the vector iteration.  But you can use the value of the
> > > > > vector from before the vector iteration - the loop header PHI
> > > > > result, and fully reduce that to get at the proper value.
> > > >
> > > > That's what It's supposed to be doing though.  The reason live
> > > > operations are skipped here is that if we don't we'll re-adjust
> > > > the IV even though the value will already be correct after vectorization.
> > > >
> > > > Remember that this code only gets so far for IV PHI nodes.
> > > >
> > > > The loop phi header result itself can be live, i.e. see testcases
> > > > vect-early-break_70.c to vect-early-break_75.c
> > > >
> > > > you have i_15 = PHI <i_14 (6), 1(2)>
> > > >
> > > > we use i_15 in the early exit. This should not be adjusted because
> > > > when it's vectorized the value at 0[lane 0] is already correct.
> > > > This is why for any PHI inside the early exits it uses the value
> > > > 0[0] instead of
> > > N[lane_max].
> > > >
> > > > Perhaps I'm missing something here?
> > >
> > > OK, so I refreshed my mind of what vect_update_ivs_after_vectorizer
> does.
> > >
> > > I still do not understand the (complexity of the) patch.  Basically
> > > the function computes the new value of the IV "from scratch" based
> > > on the number of scalar iterations of the vector loop, the 'niter'
> > > argument.  I would have expected that for the early exits we either
> > > pass in a different 'niter' or alternatively a 'niter_adjustment'.
> >
> > But for an early exit there's no static value for adjusted niter,
> > since you don't know which iteration you exited from. Unlike the
> > normal exit when you know if you get there you've done all possible
> iterations.
> >
> > So you must compute the scalar iteration count on the exit itself.
> 
> ?  You do not need the actual scalar iteration you exited (you don't compute
> that either), you need the scalar iteration the vector iteration started with
> when it exited prematurely and that's readily available?

For a normal exit yes, not for an early exit no? niters_vector_mult_vf is only
valid for the main exit.

There's the unadjusted scalar count, which is what it's using to adjust it to
the final count.  Unless I'm missing something?

> 
> > >
> > > It seems your change handles different kinds of inductions differently.
> > > Specifically
> > >
> > >       bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > >       if (restart_loop && ivtemp)
> > >         {
> > >           type = TREE_TYPE (gimple_phi_result (phi));
> > >           ni = build_int_cst (type, vf);
> > >           if (inversed_iv)
> > >             ni = fold_build2 (MINUS_EXPR, type, ni,
> > >                               fold_convert (type, step_expr));
> > >         }
> > >
> > > it looks like for the exit test IV we use either 'VF' or 'VF - step'
> > > as the new value.  That seems to be very odd special casing for
> > > unknown reasons.  And while you adjust vec_step_op_add, you don't
> > > adjust vect_peel_nonlinear_iv_init (maybe not supported - better assert
> here).
> >
> > The VF case is for a normal "non-inverted" loop, where if you take an
> > early exit you know that you have to do at most VF iterations.  The VF
> > - step is to account for the inverted loop control flow where you exit
> > after adjusting the IV already by + step.
> 
> But doesn't that assume the IV counts from niter to zero?  I don't see this
> special case is actually necessary, no?
> 

I needed it because otherwise the scalar loop iterates one iteration too little
So I got a miscompile with the inverter loop stuff.  I'll look at it again perhaps
It can be solved differently.

> >
> > Peeling doesn't matter here, since you know you were able to do a
> > vector iteration so it's safe to do VF iterations.  So having peeled
> > doesn't affect the remaining iters count.
> >
> > >
> > > Also the vec_step_op_add case will keep the original scalar IV live
> > > even when it is a vectorized induction.  The code recomputing the
> > > value from scratch avoids this.
> > >
> > >       /* For non-main exit create an intermediat edge to get any updated iv
> > >          calculations.  */
> > >       if (needs_interm_block
> > >           && !iv_block
> > >           && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p
> > > (new_stmts)))
> > >         {
> > >           iv_block = split_edge (update_e);
> > >           update_e = single_succ_edge (update_e->dest);
> > >           last_gsi = gsi_last_bb (iv_block);
> > >         }
> > >
> > > this is also odd, can we adjust the API instead?  I suppose this is
> > > because your computation uses the original loop IV, if you based the
> > > computation off the initial value only this might not be necessary?
> >
> > No, on the main exit the code updates the value in the loop header and
> > puts the Calculation in the merge block.  This works because it only
> > needs to consume PHI nodes in the merge block and things like niters are
> adjusted in the guard block.
> >
> > For an early exit, we don't have a guard block, only the merge block.
> > We have to update the PHI nodes in that block,  but can't do so since
> > you can't produce a value and consume it in a PHI node in the same BB.
> > So we need to create the block to put the values in for use in the
> > merge block.  Because there's no "guard" block for early exits.
> 
> ?  then compute niters in that block as well.

We can't since it'll not be reachable through the right edge.  What we can
do if you want is slightly change peeling, we currently peel as:

  \        \             /
  E1     E2        Normal exit
    \       |          |
       \    |          Guard
          \ |          |
         Merge block
                  |
             Pre Header

If we instead peel as:


  \        \             /
  E1     E2        Normal exit
    \       |          |
       Exit join   Guard
          \ |          |
         Merge block
                  |
             Pre Header

We can use the exit join block.  This would also mean vect_update_ivs_after_vectorizer
Doesn't need to iterate over all exits and only really needs to adjust the phi nodes
Coming out of the exit join and guard block.

Does this work for you?

Thanks,
Tamar
> 
> > The API can be adjusted by always creating the empty block either during
> peeling.
> > That would prevent us from having to do anything special here.  Would
> > that work better?  Or I can do it in the loop that iterates over the
> > exits to before the call to vect_update_ivs_after_vectorizer, which I think
> might be more consistent.
> >
> > >
> > > That said, I wonder why we cannot simply pass in an adjusted niter
> > > which would be niters_vector_mult_vf - vf and be done with that?
> > >
> >
> > We can ofcourse not have this and recompute it from niters itself,
> > however this does affect the epilog code layout. Particularly knowing
> > the static number if iterations left causes it to usually unroll the
> > loop and share some of the computations.  i.e. the scalar code is often more
> efficient.
> >
> > The computation would be niters_vector_mult_vf - iters_done * vf,
> > since the value put Here is the remaining iteration count.  It's static for early
> exits.
> 
> Well, it might be "static" in that it doesn't really matter what you use for the
> epilog main IV initial value as long as you are sure you're not going to take that
> exit as you are sure we're going to take one of the early exits.  So yeah, the
> special code is probably OK, but it needs a better comment and as said the
> structure of vect_update_ivs_after_vectorizer is a bit hard to follow now.
> 
> As said an important part for optimization is to not keep the scalar IVs live in
> the vector loop.
> 
> > But can do whatever you prefer here.  Let me know what you prefer for the
> above.
> >
> > Thanks,
> > Tamar
> >
> > > Thanks,
> > > Richard.
> > >
> > >
> > > > Regards,
> > > > Tamar
> > > > >
> > > > > > It has to do this since you have to perform the side effects
> > > > > > for the non-matching elements still.
> > > > > >
> > > > > > Regards,
> > > > > > Tamar
> > > > > >
> > > > > > >
> > > > > > > > +	      if (STMT_VINFO_LIVE_P (phi_info))
> > > > > > > > +		continue;
> > > > > > > > +
> > > > > > > > +	      /* For early break the final loop IV is:
> > > > > > > > +		 init + (final - init) * vf which takes into account peeling
> > > > > > > > +		 values and non-single steps.  The main exit can use
> > > niters
> > > > > > > > +		 since if you exit from the main exit you've done all
> > > vector
> > > > > > > > +		 iterations.  For an early exit we don't know when we
> > > exit
> > > > > > > > +so
> > > > > > > we
> > > > > > > > +		 must re-calculate this on the exit.  */
> > > > > > > > +	      tree start_expr = gimple_phi_result (phi);
> > > > > > > > +	      off = fold_build2 (MINUS_EXPR, stype,
> > > > > > > > +				 fold_convert (stype, start_expr),
> > > > > > > > +				 fold_convert (stype, init_expr));
> > > > > > > > +	      /* Now adjust for VF to get the final iteration value.  */
> > > > > > > > +	      off = fold_build2 (MULT_EXPR, stype, off,
> > > > > > > > +				 build_int_cst (stype, vf));
> > > > > > > > +	    }
> > > > > > > > +	  else
> > > > > > > > +	    off = fold_build2 (MULT_EXPR, stype,
> > > > > > > > +			       fold_convert (stype, niters), step_expr);
> > > > > > > > +
> > > > > > > >  	  if (POINTER_TYPE_P (type))
> > > > > > > >  	    ni = fold_build_pointer_plus (init_expr, off);
> > > > > > > >  	  else
> > > > > > > > @@ -2238,6 +2286,8 @@ vect_update_ivs_after_vectorizer
> > > > > > > > (loop_vec_info
> > > > > > > loop_vinfo,
> > > > > > > >        /* Don't bother call vect_peel_nonlinear_iv_init.  */
> > > > > > > >        else if (induction_type == vect_step_op_neg)
> > > > > > > >  	ni = init_expr;
> > > > > > > > +      else if (restart_loop)
> > > > > > > > +	continue;
> > > > > > >
> > > > > > > This looks all a bit complicated - why wouldn't we simply
> > > > > > > always use the PHI result when 'restart_loop'?  Isn't that
> > > > > > > the correct old start value in
> > > > > all cases?
> > > > > > >
> > > > > > > >        else
> > > > > > > >  	ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
> > > > > > > >  					  niters, step_expr, @@ -
> 2245,9 +2295,20 @@
> > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > (loop_vec_info
> > > > > > > > loop_vinfo,
> > > > > > > >
> > > > > > > >        var = create_tmp_var (type, "tmp");
> > > > > > > >
> > > > > > > > -      last_gsi = gsi_last_bb (exit_bb);
> > > > > > > >        gimple_seq new_stmts = NULL;
> > > > > > > >        ni_name = force_gimple_operand (ni, &new_stmts,
> > > > > > > > false, var);
> > > > > > > > +
> > > > > > > > +      /* For non-main exit create an intermediat edge to
> > > > > > > > + get any
> > > updated iv
> > > > > > > > +	 calculations.  */
> > > > > > > > +      if (needs_interm_block
> > > > > > > > +	  && !iv_block
> > > > > > > > +	  && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p
> > > > > > > (new_stmts)))
> > > > > > > > +	{
> > > > > > > > +	  iv_block = split_edge (update_e);
> > > > > > > > +	  update_e = single_succ_edge (update_e->dest);
> > > > > > > > +	  last_gsi = gsi_last_bb (iv_block);
> > > > > > > > +	}
> > > > > > > > +
> > > > > > > >        /* Exit_bb shouldn't be empty.  */
> > > > > > > >        if (!gsi_end_p (last_gsi))
> > > > > > > >  	{
> > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling (loop_vec_info
> > > > > > > > loop_vinfo, tree
> > > > > > > niters, tree nitersm1,
> > > > > > > >  	 niters_vector_mult_vf steps.  */
> > > > > > > >        gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
> > > > > > > >        update_e = skip_vector ? e : loop_preheader_edge (epilog);
> > > > > > > > -      vect_update_ivs_after_vectorizer (loop_vinfo,
> > > niters_vector_mult_vf,
> > > > > > > > -					update_e);
> > > > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > > +	update_e = single_succ_edge (e->dest);
> > > > > > > > +      bool inversed_iv
> > > > > > > > +	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT
> > > (loop_vinfo),
> > > > > > > > +					 LOOP_VINFO_LOOP
> > > (loop_vinfo));
> > > > > > >
> > > > > > > You are computing this here and in
> vect_update_ivs_after_vectorizer?
> > > > > > >
> > > > > > > > +
> > > > > > > > +      /* Update the main exit first.  */
> > > > > > > > +      vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> > > > > niters_vector_mult_vf,
> > > > > > > > +					update_e, inversed_iv);
> > > > > > > > +
> > > > > > > > +      /* And then update the early exits.  */
> > > > > > > > +      for (auto exit : get_loop_exit_edges (loop))
> > > > > > > > +	{
> > > > > > > > +	  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> > > > > > > > +	    continue;
> > > > > > > > +
> > > > > > > > +	  vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> > > > > > > > +					    niters_vector_mult_vf,
> > > > > > > > +					    exit, true);
> > > > > > >
> > > > > > > ... why does the same not work here?  Wouldn't the proper
> > > > > > > condition be !dominated_by_p (CDI_DOMINATORS, exit->src,
> > > > > > > LOOP_VINFO_IV_EXIT
> > > > > > > (loop_vinfo)->src) or similar?  That is, whether the exit is
> > > > > > > at or after the main IV exit?  (consider having two)
> > > > > > >
> > > > > > > > +	}
> > > > > > > >
> > > > > > > >        if (skip_epilog)
> > > > > > > >  	{
> > > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > > Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > Nuernberg)
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

  reply	other threads:[~2023-11-16 12:01 UTC|newest]

Thread overview: 200+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
2023-06-28 13:41 ` [PATCH 1/19]middle-end ifcvt: Support bitfield lowering of multiple-exit loops Tamar Christina
2023-07-04 11:29   ` Richard Biener
2023-06-28 13:41 ` [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector Tamar Christina
2023-06-29 22:17   ` Jason Merrill
2023-06-30 16:18     ` Tamar Christina
2023-06-30 16:44       ` Jason Merrill
2023-06-28 13:42 ` [PATCH 3/19]middle-end clean up vect testsuite using pragma novector Tamar Christina
2023-06-28 13:54   ` Tamar Christina
2023-07-04 11:31   ` Richard Biener
2023-06-28 13:43 ` [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits Tamar Christina
2023-07-04 11:52   ` Richard Biener
2023-07-04 14:57     ` Jan Hubicka
2023-07-06 14:34       ` Jan Hubicka
2023-07-07  5:59         ` Richard Biener
2023-07-07 12:20           ` Jan Hubicka
2023-07-07 12:27             ` Tamar Christina
2023-07-07 14:10               ` Jan Hubicka
2023-07-10  7:07             ` Richard Biener
2023-07-10  8:33               ` Jan Hubicka
2023-07-10  9:24                 ` Richard Biener
2023-07-10  9:23               ` Jan Hubicka
2023-07-10  9:29                 ` Richard Biener
2023-07-11  9:28                   ` Jan Hubicka
2023-07-11 10:31                     ` Richard Biener
2023-07-11 12:40                       ` Jan Hubicka
2023-07-11 13:04                         ` Richard Biener
2023-06-28 13:43 ` [PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds Tamar Christina
2023-07-04 12:05   ` Richard Biener
2023-07-10 15:32     ` Tamar Christina
2023-07-11 11:03       ` Richard Biener
2023-06-28 13:44 ` [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant Tamar Christina
2023-07-04 12:10   ` Richard Biener
2023-07-06 10:37     ` Tamar Christina
2023-07-06 10:51       ` Richard Biener
2023-06-28 13:44 ` [PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Tamar Christina
2023-07-13 11:32   ` Richard Biener
2023-07-13 11:54     ` Tamar Christina
2023-07-13 12:10       ` Richard Biener
2023-06-28 13:45 ` [PATCH 8/19]middle-end: updated niters analysis to handle multiple exits Tamar Christina
2023-07-13 11:49   ` Richard Biener
2023-07-13 12:03     ` Tamar Christina
2023-07-14  9:09     ` Richard Biener
2023-06-28 13:45 ` [PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison to make the main body re-usable Tamar Christina
2023-06-28 13:55   ` [PATCH 9/19] " Tamar Christina
2023-07-13 16:23     ` Richard Biener
2023-06-28 13:46 ` [PATCH 10/19]middle-end: implement vectorizable_early_break Tamar Christina
2023-06-28 13:46 ` [PATCH 11/19]middle-end: implement code motion for early break Tamar Christina
2023-06-28 13:47 ` [PATCH 12/19]middle-end: implement loop peeling and IV updates " Tamar Christina
2023-07-13 17:31   ` Richard Biener
2023-07-13 19:05     ` Tamar Christina
2023-07-14 13:34       ` Richard Biener
2023-07-17 10:56         ` Tamar Christina
2023-07-17 12:48           ` Richard Biener
2023-08-18 11:35         ` Tamar Christina
2023-08-18 12:53           ` Richard Biener
2023-08-18 13:12             ` Tamar Christina
2023-08-18 13:15               ` Richard Biener
2023-10-23 20:21         ` Tamar Christina
2023-06-28 13:47 ` [PATCH 13/19]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
2023-06-28 13:47 ` [PATCH 14/19]middle-end testsuite: Add new tests for early break vectorization Tamar Christina
2023-06-28 13:48 ` [PATCH 15/19]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
2023-06-28 13:48 ` [PATCH 16/19]AArch64 Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
2023-06-28 13:48 ` [PATCH 17/19]AArch64 Add optimization for vector cbranch combining SVE and " Tamar Christina
2023-06-28 13:49 ` [PATCH 18/19]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
2023-06-28 13:50 ` [PATCH 19/19]Arm: Add MVE " Tamar Christina
     [not found] ` <MW5PR11MB5908414D8B2AB0580A888ECAA924A@MW5PR11MB5908.namprd11.prod.outlook.com>
2023-06-28 14:49   ` FW: [PATCH v5 0/19] Support early break/return auto-vectorization 钟居哲
2023-06-28 16:00     ` Tamar Christina
2023-11-06  7:36 ` [PATCH v6 0/21]middle-end: " Tamar Christina
2023-11-06  7:37 ` [PATCH 1/21]middle-end testsuite: Add more pragma novector to new tests Tamar Christina
2023-11-07  9:46   ` Richard Biener
2023-11-06  7:37 ` [PATCH 2/21]middle-end testsuite: Add tests for early break vectorization Tamar Christina
2023-11-07  9:52   ` Richard Biener
2023-11-16 10:53     ` Richard Biener
2023-11-06  7:37 ` [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks Tamar Christina
2023-11-07 10:53   ` Richard Biener
2023-11-07 11:34     ` Tamar Christina
2023-11-07 14:23       ` Richard Biener
2023-12-19 10:11         ` Tamar Christina
2023-12-19 14:05           ` Richard Biener
2023-12-20 10:51             ` Tamar Christina
2023-12-20 12:24               ` Richard Biener
2023-11-06  7:38 ` [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form " Tamar Christina
2023-11-15  0:00   ` Tamar Christina
2023-11-15 12:40     ` Richard Biener
2023-11-20 21:51       ` Tamar Christina
2023-11-24 10:16         ` Tamar Christina
2023-11-24 12:38           ` Richard Biener
2023-11-06  7:38 ` [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch Tamar Christina
2023-11-07 15:04   ` Richard Biener
2023-11-07 23:10     ` Tamar Christina
2023-11-13 20:11     ` Tamar Christina
2023-11-14  7:56       ` Richard Biener
2023-11-14  8:07         ` Tamar Christina
2023-11-14 23:59           ` Tamar Christina
2023-11-15 12:14             ` Richard Biener
2023-11-06  7:38 ` [PATCH 6/21]middle-end: support multiple exits in loop versioning Tamar Christina
2023-11-07 14:54   ` Richard Biener
2023-11-06  7:39 ` [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Tamar Christina
2023-11-15  0:03   ` Tamar Christina
2023-11-15 13:01     ` Richard Biener
2023-11-15 13:09       ` Tamar Christina
2023-11-15 13:22         ` Richard Biener
2023-11-15 14:14           ` Tamar Christina
2023-11-16 10:40             ` Richard Biener
2023-11-16 11:08               ` Tamar Christina
2023-11-16 11:27                 ` Richard Biener
2023-11-16 12:01                   ` Tamar Christina [this message]
2023-11-16 12:30                     ` Richard Biener
2023-11-16 13:22                       ` Tamar Christina
2023-11-16 13:35                         ` Richard Biener
2023-11-16 14:14                           ` Tamar Christina
2023-11-16 14:17                             ` Richard Biener
2023-11-16 15:19                               ` Tamar Christina
2023-11-16 18:41                                 ` Tamar Christina
2023-11-17 10:40                                   ` Tamar Christina
2023-11-17 12:13                                     ` Richard Biener
2023-11-20 21:54                                       ` Tamar Christina
2023-11-24 10:18                                         ` Tamar Christina
2023-11-24 12:41                                           ` Richard Biener
2023-11-06  7:39 ` [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits Tamar Christina
2023-11-15  0:05   ` Tamar Christina
2023-11-15 13:41     ` Richard Biener
2023-11-15 14:26       ` Tamar Christina
2023-11-16 11:16         ` Richard Biener
2023-11-20 21:57           ` Tamar Christina
2023-11-24 10:20             ` Tamar Christina
2023-11-24 13:23               ` Richard Biener
2023-11-27 22:47                 ` Tamar Christina
2023-11-29 13:28                   ` Richard Biener
2023-11-29 21:22                     ` Tamar Christina
2023-11-30 13:23                       ` Richard Biener
2023-12-06  4:21                         ` Tamar Christina
2023-12-06  9:33                           ` Richard Biener
2023-11-06  7:39 ` [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code Tamar Christina
2023-11-27 22:49   ` Tamar Christina
2023-11-29 13:50     ` Richard Biener
2023-12-06  4:37       ` Tamar Christina
2023-12-06  9:37         ` Richard Biener
2023-12-08  8:58           ` Tamar Christina
2023-12-08 10:28             ` Richard Biener
2023-12-08 13:45               ` Tamar Christina
2023-12-08 13:59                 ` Richard Biener
2023-12-08 15:01                   ` Tamar Christina
2023-12-11  7:09                   ` Tamar Christina
2023-12-11  9:36                     ` Richard Biener
2023-12-11 23:12                       ` Tamar Christina
2023-12-12 10:10                         ` Richard Biener
2023-12-12 10:27                           ` Tamar Christina
2023-12-12 10:59                           ` Richard Sandiford
2023-12-12 11:30                             ` Richard Biener
2023-12-13 14:13                               ` Tamar Christina
2023-12-14 13:12                                 ` Richard Biener
2023-12-14 18:44                                   ` Tamar Christina
2023-11-06  7:39 ` [PATCH 10/21]middle-end: implement relevancy analysis support for control flow Tamar Christina
2023-11-27 22:49   ` Tamar Christina
2023-11-29 14:47     ` Richard Biener
2023-12-06  4:10       ` Tamar Christina
2023-12-06  9:44         ` Richard Biener
2023-11-06  7:40 ` [PATCH 11/21]middle-end: wire through peeling changes and dominator updates after guard edge split Tamar Christina
2023-11-06  7:40 ` [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks Tamar Christina
2023-11-27 22:48   ` Tamar Christina
2023-12-06  8:31   ` Richard Biener
2023-12-06  9:10     ` Tamar Christina
2023-12-06  9:27       ` Richard Biener
2023-11-06  7:40 ` [PATCH 13/21]middle-end: Update loop form analysis to support early break Tamar Christina
2023-11-27 22:48   ` Tamar Christina
2023-12-06  4:00     ` Tamar Christina
2023-12-06  8:18   ` Richard Biener
2023-12-06  8:52     ` Tamar Christina
2023-12-06  9:15       ` Richard Biener
2023-12-06  9:29         ` Tamar Christina
2023-11-06  7:41 ` [PATCH 14/21]middle-end: Change loop analysis from looking at at number of BB to actual cfg Tamar Christina
2023-11-06 14:44   ` Richard Biener
2023-11-06  7:41 ` [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging Tamar Christina
2023-12-09 10:38   ` Richard Sandiford
2023-12-11  7:38     ` Richard Biener
2023-12-11  8:49       ` Tamar Christina
2023-12-11  9:00         ` Richard Biener
2023-11-06  7:41 ` [PATCH 16/21]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
2023-11-06  7:41 ` [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
2023-11-28 16:37   ` Richard Sandiford
2023-11-28 17:55     ` Richard Sandiford
2023-12-06 16:25       ` Tamar Christina
2023-12-07  0:56         ` Richard Sandiford
2023-12-14 18:40           ` Tamar Christina
2023-12-14 19:34             ` Richard Sandiford
2023-11-06  7:42 ` [PATCH 18/21]AArch64: Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
2023-11-06  7:42 ` [PATCH 19/21]AArch64: Add optimization for vector cbranch combining SVE and " Tamar Christina
2023-11-06  7:42 ` [PATCH 20/21]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
2023-11-27 12:48   ` Kyrylo Tkachov
2023-11-06  7:43 ` [PATCH 21/21]Arm: Add MVE " Tamar Christina
2023-11-27 12:47   ` Kyrylo Tkachov
2023-11-06 14:25 ` [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization Richard Biener
2023-11-06 15:17   ` Tamar Christina
2023-11-07  9:42     ` Richard Biener
2023-11-07 10:47       ` Tamar Christina
2023-11-07 13:58         ` Richard Biener
2023-11-27 18:30           ` Richard Sandiford
2023-11-28  8:11             ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR08MB53257E3B7238B6D3E4265807FFB0A@VI1PR08MB5325.eurprd08.prod.outlook.com \
    --to=tamar.christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jlaw@ventanamicro.com \
    --cc=nd@arm.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).