public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <rguenther@suse.de>
To: Tamar Christina <Tamar.Christina@arm.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	nd <nd@arm.com>,  "jlaw@ventanamicro.com" <jlaw@ventanamicro.com>
Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
Date: Fri, 17 Nov 2023 12:13:13 +0000 (UTC)	[thread overview]
Message-ID: <nycvar.YFH.7.77.849.2311171206070.8772@jbgna.fhfr.qr> (raw)
In-Reply-To: <VI1PR08MB5325C8599C15649F7ACB0C11FFB7A@VI1PR08MB5325.eurprd08.prod.outlook.com>

On Fri, 17 Nov 2023, Tamar Christina wrote:

> > > > > > Yes, but that only works for the inductions marked so.  We'd
> > > > > > need to mark the others as well, but only for the early exits.
> > > > > >
> > > > > > > although I don't understand why we use the scalar count,  I
> > > > > > > suppose the reasoning is that we don't really want to keep it
> > > > > > > around, and referencing
> > > > > > it forces it to be kept?
> > > > > >
> > > > > > Referencing it will cause the scalar compute to be retained, but
> > > > > > since we do not adjust the scalar compute during vectorization
> > > > > > (but expect it to be dead) the scalar compute will compute the
> > > > > > wrong thing (as shown by the reduction example - I suspect
> > > > > > inductions will suffer
> > > > from the same problem).
> > > > > >
> > > > > > > At the moment it just does `init + (final - init) * vf` which is correct no?
> > > > > >
> > > > > > The issue is that 'final' is not computed correctly in the
> > > > > > vectorized loop.  This formula might work for affine evolutions of
> > course.
> > > > > >
> > > > > > Extracting the correct value from the vectorized induction would
> > > > > > be the preferred solution.
> > > > >
> > > > > Ok, so I should be able to just mark IVs as live during
> > > > > process_use if there are multiple exits right? Since it's just
> > > > > gonna be unused on the main exit since we use niters?
> > > > >
> > > > > Because since it's the PHI inside the loop that needs to be marked
> > > > > live I can't just do it for a specific exits no?
> > > > >
> > > > > If I create a copy of the PHI node during peeling for use in early
> > > > > exits and mark it live it won't work no?
> > > >
> > > > I guess I wouldn't actually mark it STMT_VINFO_LIVE_P but somehow
> > > > arrange vectorizable_live_operation to be called, possibly adding a
> > > > edge argument to that as well.
> > > >
> > > > Maybe the thing to do for the moment is to reject vectorization with
> > > > early breaks if there's any (non-STMT_VINFO_LIVE_P?) induction or
> > > > reduction besides the main counting IV one you can already special-case?
> > >
> > > Ok so I did a quick hack with:
> > >
> > >       if (!virtual_operand_p (PHI_RESULT (phi))
> > > 	  && !STMT_VINFO_LIVE_P (phi_info))
> > > 	{
> > > 	  use_operand_p use_p;
> > > 	  imm_use_iterator imm_iter;
> > > 	  bool non_exit_use = false;
> > > 	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, PHI_RESULT (phi))
> > > 	    if (!flow_bb_inside_loop_p (loop, gimple_bb (USE_STMT (use_p))))
> > > 	      for (auto exit : get_loop_exit_edges (loop))
> > > 		{
> > > 		  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> > > 		    continue;
> > >
> > > 		  if (gimple_bb (USE_STMT (use_p)) != exit->dest)
> > > 		    {
> > > 		      non_exit_use = true;
> > > 		      goto fail;
> > > 		    }
> > > 		}
> > > fail:
> > > 	  if (non_exit_use)
> > > 	    return false;
> > > 	}
> > >
> > > And it does seem to still allow all the cases I want.  I've placed
> > > this in vect_can_advance_ivs_p.
> > >
> > > Does this cover what you meant?
> > >
> > 
> > Ok, I've rewritten this in a nicer form, but doesn't this mean we now block any
> > loop there the index is not live?
> > i.e. we block such simple loops like
> > 
> > #ifndef N
> > #define N 800
> > #endif
> > unsigned vect_a[N];
> > 
> > unsigned test4(unsigned x)
> > {
> >  unsigned ret = 0;
> >  for (int i = 0; i < N; i++)
> >  {
> >    if (vect_a[i]*2 != x)
> >      break;
> >    vect_a[i] = x;
> >  }
> >  return ret;
> > }
> > 
> > because it does a simple `break`.  If I force it to be live it works, but then I need
> > to differentiate between the counter and the IV.
> > 
> > # i_15 = PHI <i_12(6), 0(2)>
> > # ivtmp_7 = PHI <ivtmp_14(6), 803(2)>
> > 
> > I seems like if we don't want to keep i_15 around (at the moment it will be kept
> > because of its usage in the exit block it won't be DCEd) then we need to mark it
> > live early during analysis.
> > 
> > Most likely if we do this I don't need to care about the "inverted" workflow
> > here at all. What do you think?
> > 
> > Yes that doesn't work for SLP, but I don't think I can get SLP working in the
> > remaining time anyway..
> > 
> > I'll fix reduction and multiple exit live values in the mean time.
> > 
> 
> Ok, so I currently have the following solution.  Let me know if you agree with it
> and I'll polish it up today and tomorrow and respin things.
> 
> 1. During vect_update_ivs_after_vectorizer we no longer touch any PHIs aside from
>      Just updating IVtemps with the expected remaining iteration count.

OK

> 2. During vect_transform_loop after vectorizing any induction or reduction I call vectorizable_live_operation
>      For any phi node that still has any usages in the early exit merge block.

OK, I suppose you need to amend the vectorizable_live_operation API to
tell it it works for the early exits or the main exit (and not complain
when !STMT_VINFO_LIVE_P for the early exit case).

> 3. vectorizable_live_operation is taught to have to materialize the same PHI in multiple exits

For the main exit you'd get here via STMT_VINFO_LIVE_P handling and
vect_update_ivs_after_vectorizer would handle the rest.  For the
early exits I think you only have to materialize once (in the merge 
block)?

> 4. vectorizable_reduction or maybe vect_create_epilog_for_reduction need to be modified to for early exits materialize
>     The previous iteration value.

I think you need to only touch vect_create_epilog_for_reduction, the
early exit merge block needs another reduction epilog.  Well, in theory
just another vector to reduce but not sure if the control flow supports
having the same actual epilog for both the main and the early exits.

Richard.

> This seems to work and produces now for the simple loop above:
> 
> .L2:
>         str     q27, [x1, x3]
>         str     q29, [x2, x1]
>         add     x1, x1, 16
>         cmp     x1, 3200
>         beq     .L11
> .L4:
>         ldr     q31, [x2, x1]
>         mov     v28.16b, v30.16b
>         add     v30.4s, v30.4s, v26.4s
>         shl     v31.4s, v31.4s, 1
>         add     v27.4s, v28.4s, v29.4s
>         cmeq    v31.4s, v31.4s, v29.4s
>         not     v31.16b, v31.16b
>         umaxp   v31.4s, v31.4s, v31.4s
>         fmov    x4, d31
>         cbz     x4, .L2
>         fmov    w1, s28
>         mov     w6, 4                                                                                                                                                                                                                                                        .L3:
> 
> so now the scalar index is no longer kept and it reduces the value from the vector IV in the exit:
> 
> fmov    w1, s28
> 
> Does this work as you expected?
> 
> Thanks,
> Tamar
> 
> > Thanks,
> > Tamar
> > > Thanks,
> > > Tamar
> > >
> > > >
> > > > Richard.
> > > >
> > > > > Tamar
> > > > > >
> > > > > > > Also you missed the question below about how to avoid the
> > > > > > > creation of the block, You ok with changing that?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Tamar
> > > > > > >
> > > > > > > > Or for now disable early-break for inductions that are not
> > > > > > > > the main exit control IV (in vect_can_advance_ivs_p)?
> > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > It seems your change handles different kinds of
> > > > > > > > > > > > inductions
> > > > > > differently.
> > > > > > > > > > > > Specifically
> > > > > > > > > > > >
> > > > > > > > > > > >       bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > > > > > > > > > >       if (restart_loop && ivtemp)
> > > > > > > > > > > >         {
> > > > > > > > > > > >           type = TREE_TYPE (gimple_phi_result (phi));
> > > > > > > > > > > >           ni = build_int_cst (type, vf);
> > > > > > > > > > > >           if (inversed_iv)
> > > > > > > > > > > >             ni = fold_build2 (MINUS_EXPR, type, ni,
> > > > > > > > > > > >                               fold_convert (type, step_expr));
> > > > > > > > > > > >         }
> > > > > > > > > > > >
> > > > > > > > > > > > it looks like for the exit test IV we use either 'VF' or 'VF - step'
> > > > > > > > > > > > as the new value.  That seems to be very odd special
> > > > > > > > > > > > casing for unknown reasons.  And while you adjust
> > > > > > > > > > > > vec_step_op_add, you don't adjust
> > > > > > > > > > > > vect_peel_nonlinear_iv_init (maybe not supported -
> > > > > > > > > > > > better assert
> > > > > > > > > > here).
> > > > > > > > > > >
> > > > > > > > > > > The VF case is for a normal "non-inverted" loop, where
> > > > > > > > > > > if you take an early exit you know that you have to do
> > > > > > > > > > > at most VF
> > > > iterations.
> > > > > > > > > > > The VF
> > > > > > > > > > > - step is to account for the inverted loop control
> > > > > > > > > > > flow where you exit after adjusting the IV already by + step.
> > > > > > > > > >
> > > > > > > > > > But doesn't that assume the IV counts from niter to zero?
> > > > > > > > > > I don't see this special case is actually necessary, no?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I needed it because otherwise the scalar loop iterates one
> > > > > > > > > iteration too little So I got a miscompile with the
> > > > > > > > > inverter loop stuff.  I'll look at it again perhaps It can be solved
> > differently.
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Peeling doesn't matter here, since you know you were
> > > > > > > > > > > able to do a vector iteration so it's safe to do VF iterations.
> > > > > > > > > > > So having peeled doesn't affect the remaining iters count.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Also the vec_step_op_add case will keep the original
> > > > > > > > > > > > scalar IV live even when it is a vectorized induction.
> > > > > > > > > > > > The code recomputing the value from scratch avoids this.
> > > > > > > > > > > >
> > > > > > > > > > > >       /* For non-main exit create an intermediat
> > > > > > > > > > > > edge to get any updated
> > > > > > > > iv
> > > > > > > > > > > >          calculations.  */
> > > > > > > > > > > >       if (needs_interm_block
> > > > > > > > > > > >           && !iv_block
> > > > > > > > > > > >           && (!gimple_seq_empty_p (stmts) ||
> > > > > > > > > > > > !gimple_seq_empty_p
> > > > > > > > > > > > (new_stmts)))
> > > > > > > > > > > >         {
> > > > > > > > > > > >           iv_block = split_edge (update_e);
> > > > > > > > > > > >           update_e = single_succ_edge (update_e->dest);
> > > > > > > > > > > >           last_gsi = gsi_last_bb (iv_block);
> > > > > > > > > > > >         }
> > > > > > > > > > > >
> > > > > > > > > > > > this is also odd, can we adjust the API instead?  I
> > > > > > > > > > > > suppose this is because your computation uses the
> > > > > > > > > > > > original loop IV, if you based the computation off
> > > > > > > > > > > > the initial value only this might not be
> > > > > > > > necessary?
> > > > > > > > > > >
> > > > > > > > > > > No, on the main exit the code updates the value in the
> > > > > > > > > > > loop header and puts the Calculation in the merge block.
> > > > > > > > > > > This works because it only needs to consume PHI nodes
> > > > > > > > > > > in the merge block and things like niters are
> > > > > > > > > > adjusted in the guard block.
> > > > > > > > > > >
> > > > > > > > > > > For an early exit, we don't have a guard block, only
> > > > > > > > > > > the merge
> > > > block.
> > > > > > > > > > > We have to update the PHI nodes in that block,  but
> > > > > > > > > > > can't do so since you can't produce a value and
> > > > > > > > > > > consume it in a PHI node in the same
> > > > > > > > BB.
> > > > > > > > > > > So we need to create the block to put the values in
> > > > > > > > > > > for use in the merge block.  Because there's no "guard"
> > > > > > > > > > > block for early
> > > > exits.
> > > > > > > > > >
> > > > > > > > > > ?  then compute niters in that block as well.
> > > > > > > > >
> > > > > > > > > We can't since it'll not be reachable through the right edge.
> > > > > > > > > What we can do if you want is slightly change peeling, we
> > > > > > > > > currently peel
> > > > > > as:
> > > > > > > > >
> > > > > > > > >   \        \             /
> > > > > > > > >   E1     E2        Normal exit
> > > > > > > > >     \       |          |
> > > > > > > > >        \    |          Guard
> > > > > > > > >           \ |          |
> > > > > > > > >          Merge block
> > > > > > > > >                   |
> > > > > > > > >              Pre Header
> > > > > > > > >
> > > > > > > > > If we instead peel as:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >   \        \             /
> > > > > > > > >   E1     E2        Normal exit
> > > > > > > > >     \       |          |
> > > > > > > > >        Exit join   Guard
> > > > > > > > >           \ |          |
> > > > > > > > >          Merge block
> > > > > > > > >                   |
> > > > > > > > >              Pre Header
> > > > > > > > >
> > > > > > > > > We can use the exit join block.  This would also mean
> > > > > > > > > vect_update_ivs_after_vectorizer Doesn't need to iterate
> > > > > > > > > over all exits and only really needs to adjust the phi
> > > > > > > > > nodes Coming out of the exit join
> > > > > > > > and guard block.
> > > > > > > > >
> > > > > > > > > Does this work for you?
> > > > > >
> > > > > > Yeah, I think that would work.  But I'd like to sort out the
> > > > > > correctness details of the IV update itself before sorting out
> > > > > > this code
> > > > placement detail.
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Tamar
> > > > > > > > > >
> > > > > > > > > > > The API can be adjusted by always creating the empty
> > > > > > > > > > > block either during
> > > > > > > > > > peeling.
> > > > > > > > > > > That would prevent us from having to do anything special here.
> > > > > > > > > > > Would that work better?  Or I can do it in the loop
> > > > > > > > > > > that iterates over the exits to before the call to
> > > > > > > > > > > vect_update_ivs_after_vectorizer, which I think
> > > > > > > > > > might be more consistent.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > That said, I wonder why we cannot simply pass in an
> > > > > > > > > > > > adjusted niter which would be niters_vector_mult_vf
> > > > > > > > > > > > - vf and be done with
> > > > > > that?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > We can ofcourse not have this and recompute it from
> > > > > > > > > > > niters itself, however this does affect the epilog code layout.
> > > > > > > > > > > Particularly knowing the static number if iterations
> > > > > > > > > > > left causes it to usually unroll the loop and share
> > > > > > > > > > > some of the computations.  i.e. the scalar code is
> > > > > > > > > > > often more
> > > > > > > > > > efficient.
> > > > > > > > > > >
> > > > > > > > > > > The computation would be niters_vector_mult_vf -
> > > > > > > > > > > iters_done * vf, since the value put Here is the
> > > > > > > > > > > remaining iteration
> > > > count.
> > > > > > > > > > > It's static for early
> > > > > > > > > > exits.
> > > > > > > > > >
> > > > > > > > > > Well, it might be "static" in that it doesn't really
> > > > > > > > > > matter what you use for the epilog main IV initial value
> > > > > > > > > > as long as you are sure you're not going to take that
> > > > > > > > > > exit as you are sure we're going to take one of the
> > > > > > > > > > early exits.  So yeah, the special code is probably OK,
> > > > > > > > > > but it needs a better comment and as said the structure
> > > > > > > > > > of
> > > > > > > > vect_update_ivs_after_vectorizer is a bit hard to follow now.
> > > > > > > > > >
> > > > > > > > > > As said an important part for optimization is to not
> > > > > > > > > > keep the scalar IVs live in the vector loop.
> > > > > > > > > >
> > > > > > > > > > > But can do whatever you prefer here.  Let me know what
> > > > > > > > > > > you prefer for the
> > > > > > > > > > above.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Tamar
> > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Richard.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > Tamar
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It has to do this since you have to perform
> > > > > > > > > > > > > > > the side effects for the non-matching elements still.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > Tamar
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +	      if (STMT_VINFO_LIVE_P (phi_info))
> > > > > > > > > > > > > > > > > +		continue;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +	      /* For early break the final loop IV is:
> > > > > > > > > > > > > > > > > +		 init + (final - init) * vf which takes
> > > > > > > > > > > > > > > > > +into account
> > > > > > > > peeling
> > > > > > > > > > > > > > > > > +		 values and non-single steps.  The
> > > main
> > > > > > > > > > > > > > > > > +exit
> > > > > > > > can
> > > > > > > > > > > > > > > > > +use
> > > > > > > > > > > > niters
> > > > > > > > > > > > > > > > > +		 since if you exit from the main exit
> > > > > > > > > > > > > > > > > +you've
> > > > > > > > done
> > > > > > > > > > > > > > > > > +all
> > > > > > > > > > > > vector
> > > > > > > > > > > > > > > > > +		 iterations.  For an early exit we
> > > > > > > > > > > > > > > > > +don't know
> > > > > > > > when
> > > > > > > > > > > > > > > > > +we
> > > > > > > > > > > > exit
> > > > > > > > > > > > > > > > > +so
> > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > +		 must re-calculate this on the exit.  */
> > > > > > > > > > > > > > > > > +	      tree start_expr = gimple_phi_result (phi);
> > > > > > > > > > > > > > > > > +	      off = fold_build2 (MINUS_EXPR, stype,
> > > > > > > > > > > > > > > > > +				 fold_convert (stype,
> > > > > > > > start_expr),
> > > > > > > > > > > > > > > > > +				 fold_convert (stype,
> > > > > > > > init_expr));
> > > > > > > > > > > > > > > > > +	      /* Now adjust for VF to get the
> > > > > > > > > > > > > > > > > +final
> > > iteration value.
> > > > > > > > */
> > > > > > > > > > > > > > > > > +	      off = fold_build2 (MULT_EXPR, stype, off,
> > > > > > > > > > > > > > > > > +				 build_int_cst (stype,
> > > vf));
> > > > > > > > > > > > > > > > > +	    }
> > > > > > > > > > > > > > > > > +	  else
> > > > > > > > > > > > > > > > > +	    off = fold_build2 (MULT_EXPR, stype,
> > > > > > > > > > > > > > > > > +			       fold_convert (stype,
> > > niters),
> > > > > > > > step_expr);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > >  	  if (POINTER_TYPE_P (type))
> > > > > > > > > > > > > > > > >  	    ni = fold_build_pointer_plus (init_expr, off);
> > > > > > > > > > > > > > > > >  	  else
> > > > > > > > > > > > > > > > > @@ -2238,6 +2286,8 @@
> > > > > > > > > > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > > > > > (loop_vec_info
> > > > > > > > > > > > > > > > loop_vinfo,
> > > > > > > > > > > > > > > > >        /* Don't bother call vect_peel_nonlinear_iv_init.
> > */
> > > > > > > > > > > > > > > > >        else if (induction_type == vect_step_op_neg)
> > > > > > > > > > > > > > > > >  	ni = init_expr;
> > > > > > > > > > > > > > > > > +      else if (restart_loop)
> > > > > > > > > > > > > > > > > +	continue;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This looks all a bit complicated - why
> > > > > > > > > > > > > > > > wouldn't we simply always use the PHI result
> > > > > > > > > > > > > > > > when
> > > 'restart_loop'?
> > > > > > > > > > > > > > > > Isn't that the correct old start value in
> > > > > > > > > > > > > > all cases?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >        else
> > > > > > > > > > > > > > > > >  	ni = vect_peel_nonlinear_iv_init
> > > > > > > > > > > > > > > > > (&stmts,
> > > init_expr,
> > > > > > > > > > > > > > > > >  					  niters,
> > > step_expr,
> > > > > > @@ -
> > > > > > > > > > 2245,9 +2295,20 @@
> > > > > > > > > > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > > > > (loop_vec_info
> > > > > > > > > > > > > > > > > loop_vinfo,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >        var = create_tmp_var (type, "tmp");
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > -      last_gsi = gsi_last_bb (exit_bb);
> > > > > > > > > > > > > > > > >        gimple_seq new_stmts = NULL;
> > > > > > > > > > > > > > > > >        ni_name = force_gimple_operand (ni,
> > > > > > > > > > > > > > > > > &new_stmts, false, var);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +      /* For non-main exit create an
> > > > > > > > > > > > > > > > > + intermediat edge to get any
> > > > > > > > > > > > updated iv
> > > > > > > > > > > > > > > > > +	 calculations.  */
> > > > > > > > > > > > > > > > > +      if (needs_interm_block
> > > > > > > > > > > > > > > > > +	  && !iv_block
> > > > > > > > > > > > > > > > > +	  && (!gimple_seq_empty_p (stmts) ||
> > > > > > > > > > > > > > > > > +!gimple_seq_empty_p
> > > > > > > > > > > > > > > > (new_stmts)))
> > > > > > > > > > > > > > > > > +	{
> > > > > > > > > > > > > > > > > +	  iv_block = split_edge (update_e);
> > > > > > > > > > > > > > > > > +	  update_e = single_succ_edge (update_e-
> > > >dest);
> > > > > > > > > > > > > > > > > +	  last_gsi = gsi_last_bb (iv_block);
> > > > > > > > > > > > > > > > > +	}
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > >        /* Exit_bb shouldn't be empty.  */
> > > > > > > > > > > > > > > > >        if (!gsi_end_p (last_gsi))
> > > > > > > > > > > > > > > > >  	{
> > > > > > > > > > > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling
> > > > > > > > > > > > > > > > > (loop_vec_info loop_vinfo, tree
> > > > > > > > > > > > > > > > niters, tree nitersm1,
> > > > > > > > > > > > > > > > >  	 niters_vector_mult_vf steps.  */
> > > > > > > > > > > > > > > > >        gcc_checking_assert
> > > > > > > > > > > > > > > > > (vect_can_advance_ivs_p
> > > > > > > > (loop_vinfo));
> > > > > > > > > > > > > > > > >        update_e = skip_vector ? e :
> > > > > > > > > > > > > > > > > loop_preheader_edge
> > > > > > (epilog);
> > > > > > > > > > > > > > > > > -      vect_update_ivs_after_vectorizer (loop_vinfo,
> > > > > > > > > > > > niters_vector_mult_vf,
> > > > > > > > > > > > > > > > > -					update_e);
> > > > > > > > > > > > > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > > > > > > > > > > > +	update_e = single_succ_edge (e->dest);
> > > > > > > > > > > > > > > > > +      bool inversed_iv
> > > > > > > > > > > > > > > > > +	= !vect_is_loop_exit_latch_pred
> > > > > > > > (LOOP_VINFO_IV_EXIT
> > > > > > > > > > > > (loop_vinfo),
> > > > > > > > > > > > > > > > > +
> > > LOOP_VINFO_LOOP
> > > > > > > > > > > > (loop_vinfo));
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > You are computing this here and in
> > > > > > > > > > vect_update_ivs_after_vectorizer?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +      /* Update the main exit first.  */
> > > > > > > > > > > > > > > > > +      vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > > > > > + (loop_vinfo, vf,
> > > > > > > > > > > > > > niters_vector_mult_vf,
> > > > > > > > > > > > > > > > > +					update_e,
> > > > > > > > inversed_iv);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +      /* And then update the early exits.  */
> > > > > > > > > > > > > > > > > +      for (auto exit : get_loop_exit_edges (loop))
> > > > > > > > > > > > > > > > > +	{
> > > > > > > > > > > > > > > > > +	  if (exit == LOOP_VINFO_IV_EXIT
> > > (loop_vinfo))
> > > > > > > > > > > > > > > > > +	    continue;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +	  vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > > > > > +(loop_vinfo, vf,
> > > > > > > > > > > > > > > > > +
> > > > > > > > niters_vector_mult_vf,
> > > > > > > > > > > > > > > > > +					    exit, true);
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ... why does the same not work here?
> > > > > > > > > > > > > > > > Wouldn't the proper condition be
> > > > > > > > > > > > > > > > !dominated_by_p (CDI_DOMINATORS,
> > > > > > > > > > > > > > > > exit->src, LOOP_VINFO_IV_EXIT
> > > > > > > > > > > > > > > > (loop_vinfo)->src) or similar?  That is,
> > > > > > > > > > > > > > > > whether the exit is at or after the main IV exit?
> > > > > > > > > > > > > > > > (consider having
> > > > > > > > > > > > > > > > two)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +	}
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >        if (skip_epilog)
> > > > > > > > > > > > > > > > >  	{
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software
> > > > > > > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146,
> > > > > > > > > > > > > > 90461 Nuernberg, Germany;
> > > > > > > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich;
> > > > > > > > > > > > > > (HRB 36809, AG
> > > > > > > > > > > > > > Nuernberg)
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software
> > > > > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > > > > > > > > > Nuernberg, Germany;
> > > > > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich;
> > > > > > > > > > > > (HRB 36809, AG
> > > > > > > > > > > > Nuernberg)
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software
> > > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > > > > > > > Nuernberg, Germany;
> > > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB
> > > > > > > > > > 36809, AG
> > > > > > > > > > Nuernberg)
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809,
> > > > > > > > AG
> > > > > > > > Nuernberg)
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > > > Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

  reply	other threads:[~2023-11-17 12:13 UTC|newest]

Thread overview: 200+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
2023-06-28 13:41 ` [PATCH 1/19]middle-end ifcvt: Support bitfield lowering of multiple-exit loops Tamar Christina
2023-07-04 11:29   ` Richard Biener
2023-06-28 13:41 ` [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector Tamar Christina
2023-06-29 22:17   ` Jason Merrill
2023-06-30 16:18     ` Tamar Christina
2023-06-30 16:44       ` Jason Merrill
2023-06-28 13:42 ` [PATCH 3/19]middle-end clean up vect testsuite using pragma novector Tamar Christina
2023-06-28 13:54   ` Tamar Christina
2023-07-04 11:31   ` Richard Biener
2023-06-28 13:43 ` [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits Tamar Christina
2023-07-04 11:52   ` Richard Biener
2023-07-04 14:57     ` Jan Hubicka
2023-07-06 14:34       ` Jan Hubicka
2023-07-07  5:59         ` Richard Biener
2023-07-07 12:20           ` Jan Hubicka
2023-07-07 12:27             ` Tamar Christina
2023-07-07 14:10               ` Jan Hubicka
2023-07-10  7:07             ` Richard Biener
2023-07-10  8:33               ` Jan Hubicka
2023-07-10  9:24                 ` Richard Biener
2023-07-10  9:23               ` Jan Hubicka
2023-07-10  9:29                 ` Richard Biener
2023-07-11  9:28                   ` Jan Hubicka
2023-07-11 10:31                     ` Richard Biener
2023-07-11 12:40                       ` Jan Hubicka
2023-07-11 13:04                         ` Richard Biener
2023-06-28 13:43 ` [PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds Tamar Christina
2023-07-04 12:05   ` Richard Biener
2023-07-10 15:32     ` Tamar Christina
2023-07-11 11:03       ` Richard Biener
2023-06-28 13:44 ` [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant Tamar Christina
2023-07-04 12:10   ` Richard Biener
2023-07-06 10:37     ` Tamar Christina
2023-07-06 10:51       ` Richard Biener
2023-06-28 13:44 ` [PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Tamar Christina
2023-07-13 11:32   ` Richard Biener
2023-07-13 11:54     ` Tamar Christina
2023-07-13 12:10       ` Richard Biener
2023-06-28 13:45 ` [PATCH 8/19]middle-end: updated niters analysis to handle multiple exits Tamar Christina
2023-07-13 11:49   ` Richard Biener
2023-07-13 12:03     ` Tamar Christina
2023-07-14  9:09     ` Richard Biener
2023-06-28 13:45 ` [PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison to make the main body re-usable Tamar Christina
2023-06-28 13:55   ` [PATCH 9/19] " Tamar Christina
2023-07-13 16:23     ` Richard Biener
2023-06-28 13:46 ` [PATCH 10/19]middle-end: implement vectorizable_early_break Tamar Christina
2023-06-28 13:46 ` [PATCH 11/19]middle-end: implement code motion for early break Tamar Christina
2023-06-28 13:47 ` [PATCH 12/19]middle-end: implement loop peeling and IV updates " Tamar Christina
2023-07-13 17:31   ` Richard Biener
2023-07-13 19:05     ` Tamar Christina
2023-07-14 13:34       ` Richard Biener
2023-07-17 10:56         ` Tamar Christina
2023-07-17 12:48           ` Richard Biener
2023-08-18 11:35         ` Tamar Christina
2023-08-18 12:53           ` Richard Biener
2023-08-18 13:12             ` Tamar Christina
2023-08-18 13:15               ` Richard Biener
2023-10-23 20:21         ` Tamar Christina
2023-06-28 13:47 ` [PATCH 13/19]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
2023-06-28 13:47 ` [PATCH 14/19]middle-end testsuite: Add new tests for early break vectorization Tamar Christina
2023-06-28 13:48 ` [PATCH 15/19]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
2023-06-28 13:48 ` [PATCH 16/19]AArch64 Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
2023-06-28 13:48 ` [PATCH 17/19]AArch64 Add optimization for vector cbranch combining SVE and " Tamar Christina
2023-06-28 13:49 ` [PATCH 18/19]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
2023-06-28 13:50 ` [PATCH 19/19]Arm: Add MVE " Tamar Christina
     [not found] ` <MW5PR11MB5908414D8B2AB0580A888ECAA924A@MW5PR11MB5908.namprd11.prod.outlook.com>
2023-06-28 14:49   ` FW: [PATCH v5 0/19] Support early break/return auto-vectorization 钟居哲
2023-06-28 16:00     ` Tamar Christina
2023-11-06  7:36 ` [PATCH v6 0/21]middle-end: " Tamar Christina
2023-11-06  7:37 ` [PATCH 1/21]middle-end testsuite: Add more pragma novector to new tests Tamar Christina
2023-11-07  9:46   ` Richard Biener
2023-11-06  7:37 ` [PATCH 2/21]middle-end testsuite: Add tests for early break vectorization Tamar Christina
2023-11-07  9:52   ` Richard Biener
2023-11-16 10:53     ` Richard Biener
2023-11-06  7:37 ` [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks Tamar Christina
2023-11-07 10:53   ` Richard Biener
2023-11-07 11:34     ` Tamar Christina
2023-11-07 14:23       ` Richard Biener
2023-12-19 10:11         ` Tamar Christina
2023-12-19 14:05           ` Richard Biener
2023-12-20 10:51             ` Tamar Christina
2023-12-20 12:24               ` Richard Biener
2023-11-06  7:38 ` [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form " Tamar Christina
2023-11-15  0:00   ` Tamar Christina
2023-11-15 12:40     ` Richard Biener
2023-11-20 21:51       ` Tamar Christina
2023-11-24 10:16         ` Tamar Christina
2023-11-24 12:38           ` Richard Biener
2023-11-06  7:38 ` [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch Tamar Christina
2023-11-07 15:04   ` Richard Biener
2023-11-07 23:10     ` Tamar Christina
2023-11-13 20:11     ` Tamar Christina
2023-11-14  7:56       ` Richard Biener
2023-11-14  8:07         ` Tamar Christina
2023-11-14 23:59           ` Tamar Christina
2023-11-15 12:14             ` Richard Biener
2023-11-06  7:38 ` [PATCH 6/21]middle-end: support multiple exits in loop versioning Tamar Christina
2023-11-07 14:54   ` Richard Biener
2023-11-06  7:39 ` [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Tamar Christina
2023-11-15  0:03   ` Tamar Christina
2023-11-15 13:01     ` Richard Biener
2023-11-15 13:09       ` Tamar Christina
2023-11-15 13:22         ` Richard Biener
2023-11-15 14:14           ` Tamar Christina
2023-11-16 10:40             ` Richard Biener
2023-11-16 11:08               ` Tamar Christina
2023-11-16 11:27                 ` Richard Biener
2023-11-16 12:01                   ` Tamar Christina
2023-11-16 12:30                     ` Richard Biener
2023-11-16 13:22                       ` Tamar Christina
2023-11-16 13:35                         ` Richard Biener
2023-11-16 14:14                           ` Tamar Christina
2023-11-16 14:17                             ` Richard Biener
2023-11-16 15:19                               ` Tamar Christina
2023-11-16 18:41                                 ` Tamar Christina
2023-11-17 10:40                                   ` Tamar Christina
2023-11-17 12:13                                     ` Richard Biener [this message]
2023-11-20 21:54                                       ` Tamar Christina
2023-11-24 10:18                                         ` Tamar Christina
2023-11-24 12:41                                           ` Richard Biener
2023-11-06  7:39 ` [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits Tamar Christina
2023-11-15  0:05   ` Tamar Christina
2023-11-15 13:41     ` Richard Biener
2023-11-15 14:26       ` Tamar Christina
2023-11-16 11:16         ` Richard Biener
2023-11-20 21:57           ` Tamar Christina
2023-11-24 10:20             ` Tamar Christina
2023-11-24 13:23               ` Richard Biener
2023-11-27 22:47                 ` Tamar Christina
2023-11-29 13:28                   ` Richard Biener
2023-11-29 21:22                     ` Tamar Christina
2023-11-30 13:23                       ` Richard Biener
2023-12-06  4:21                         ` Tamar Christina
2023-12-06  9:33                           ` Richard Biener
2023-11-06  7:39 ` [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code Tamar Christina
2023-11-27 22:49   ` Tamar Christina
2023-11-29 13:50     ` Richard Biener
2023-12-06  4:37       ` Tamar Christina
2023-12-06  9:37         ` Richard Biener
2023-12-08  8:58           ` Tamar Christina
2023-12-08 10:28             ` Richard Biener
2023-12-08 13:45               ` Tamar Christina
2023-12-08 13:59                 ` Richard Biener
2023-12-08 15:01                   ` Tamar Christina
2023-12-11  7:09                   ` Tamar Christina
2023-12-11  9:36                     ` Richard Biener
2023-12-11 23:12                       ` Tamar Christina
2023-12-12 10:10                         ` Richard Biener
2023-12-12 10:27                           ` Tamar Christina
2023-12-12 10:59                           ` Richard Sandiford
2023-12-12 11:30                             ` Richard Biener
2023-12-13 14:13                               ` Tamar Christina
2023-12-14 13:12                                 ` Richard Biener
2023-12-14 18:44                                   ` Tamar Christina
2023-11-06  7:39 ` [PATCH 10/21]middle-end: implement relevancy analysis support for control flow Tamar Christina
2023-11-27 22:49   ` Tamar Christina
2023-11-29 14:47     ` Richard Biener
2023-12-06  4:10       ` Tamar Christina
2023-12-06  9:44         ` Richard Biener
2023-11-06  7:40 ` [PATCH 11/21]middle-end: wire through peeling changes and dominator updates after guard edge split Tamar Christina
2023-11-06  7:40 ` [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks Tamar Christina
2023-11-27 22:48   ` Tamar Christina
2023-12-06  8:31   ` Richard Biener
2023-12-06  9:10     ` Tamar Christina
2023-12-06  9:27       ` Richard Biener
2023-11-06  7:40 ` [PATCH 13/21]middle-end: Update loop form analysis to support early break Tamar Christina
2023-11-27 22:48   ` Tamar Christina
2023-12-06  4:00     ` Tamar Christina
2023-12-06  8:18   ` Richard Biener
2023-12-06  8:52     ` Tamar Christina
2023-12-06  9:15       ` Richard Biener
2023-12-06  9:29         ` Tamar Christina
2023-11-06  7:41 ` [PATCH 14/21]middle-end: Change loop analysis from looking at at number of BB to actual cfg Tamar Christina
2023-11-06 14:44   ` Richard Biener
2023-11-06  7:41 ` [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging Tamar Christina
2023-12-09 10:38   ` Richard Sandiford
2023-12-11  7:38     ` Richard Biener
2023-12-11  8:49       ` Tamar Christina
2023-12-11  9:00         ` Richard Biener
2023-11-06  7:41 ` [PATCH 16/21]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
2023-11-06  7:41 ` [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
2023-11-28 16:37   ` Richard Sandiford
2023-11-28 17:55     ` Richard Sandiford
2023-12-06 16:25       ` Tamar Christina
2023-12-07  0:56         ` Richard Sandiford
2023-12-14 18:40           ` Tamar Christina
2023-12-14 19:34             ` Richard Sandiford
2023-11-06  7:42 ` [PATCH 18/21]AArch64: Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
2023-11-06  7:42 ` [PATCH 19/21]AArch64: Add optimization for vector cbranch combining SVE and " Tamar Christina
2023-11-06  7:42 ` [PATCH 20/21]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
2023-11-27 12:48   ` Kyrylo Tkachov
2023-11-06  7:43 ` [PATCH 21/21]Arm: Add MVE " Tamar Christina
2023-11-27 12:47   ` Kyrylo Tkachov
2023-11-06 14:25 ` [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization Richard Biener
2023-11-06 15:17   ` Tamar Christina
2023-11-07  9:42     ` Richard Biener
2023-11-07 10:47       ` Tamar Christina
2023-11-07 13:58         ` Richard Biener
2023-11-27 18:30           ` Richard Sandiford
2023-11-28  8:11             ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.YFH.7.77.849.2311171206070.8772@jbgna.fhfr.qr \
    --to=rguenther@suse.de \
    --cc=Tamar.Christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jlaw@ventanamicro.com \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).