public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <rguenther@suse.de>
To: Tamar Christina <Tamar.Christina@arm.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	nd <nd@arm.com>,  "jlaw@ventanamicro.com" <jlaw@ventanamicro.com>
Subject: RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits
Date: Wed, 6 Dec 2023 10:33:06 +0100 (CET)	[thread overview]
Message-ID: <337r0po6-79qo-6r13-7o32-1n9796430872@fhfr.qr> (raw)
In-Reply-To: <VI1PR08MB53250A8BF46DFA507C3E9338FF84A@VI1PR08MB5325.eurprd08.prod.outlook.com>

On Wed, 6 Dec 2023, Tamar Christina wrote:

> > > > is the exit edge you are looking for without iterating over all loop exits.
> > > >
> > > > > +		gimple *tmp_vec_stmt = vec_stmt;
> > > > > +		tree tmp_vec_lhs = vec_lhs;
> > > > > +		tree tmp_bitstart = bitstart;
> > > > > +		/* For early exit where the exit is not in the BB that leads
> > > > > +		   to the latch then we're restarting the iteration in the
> > > > > +		   scalar loop.  So get the first live value.  */
> > > > > +		restart_loop = restart_loop || exit_e != main_e;
> > > > > +		if (restart_loop)
> > > > > +		  {
> > > > > +		    tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> > > > > +		    tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
> > > > > +		    tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));
> > > >
> > > > Hmm, that gets you the value after the first iteration, not the one before which
> > > > would be the last value of the preceeding vector iteration?
> > > > (but we don't keep those, we'd need a PHI)
> > >
> > > I don't fully follow.  The comment on top of this hunk under if (loop_vinfo) states
> > > that lhs should be pointing to a PHI.
> > >
> > > When I inspect the statement I see
> > >
> > > i_14 = PHI <i_11(6), 0(14)>
> > >
> > > so i_14 is the value at the start of the current iteration.  If we're coming from the
> > > header 0, otherwise i_11 which is the value of the previous iteration?
> > >
> > > The peeling code explicitly leaves i_14 in the merge block and not i_11 for this
> > exact reason.
> > > So I'm confused, my understanding is that we're already *at* the right PHI.
> > >
> > > Is it perhaps that you thought we put i_11 here for the early exits? In which case
> > > Yes I'd agree that that would be wrong, and there we would have had to look at
> > > The defs, but i_11 is the def.
> > >
> > > I already kept this in mind and leveraged peeling to make this part easier.
> > > i_11 is used in the main exit and i_14 in the early one.
> > 
> > I think the important detail is that this code is only executed for
> > vect_induction_defs which are indeed PHIs and so we're sure the
> > value live is before any modification so fine to feed as initial
> > value for the PHI in the epilog.
> > 
> > Maybe we can assert the def type here?
> 
> We can't assert because until cfg cleanup the dead value is still seen and still
> vectorized.  That said I've added a guard here.  We vectorize the non-induction
> value as normal now and if it's ever used it'll fail.
> 
> > 
> > > >
> > > > Why again do we need (non-induction) live values from the vector loop to the
> > > > epilogue loop again?
> > >
> > > They can appear as the result value of the main exit.
> > >
> > > e.g. in testcase (vect-early-break_17.c)
> > >
> > > #define N 1024
> > > unsigned vect_a[N];
> > > unsigned vect_b[N];
> > >
> > > unsigned test4(unsigned x)
> > > {
> > >  unsigned ret = 0;
> > >  for (int i = 0; i < N; i++)
> > >  {
> > >    vect_b[i] = x + i;
> > >    if (vect_a[i] > x)
> > >      return vect_a[i];
> > >    vect_a[i] = x;
> > >    ret = vect_a[i] + vect_b[i];
> > >  }
> > >  return ret;
> > > }
> > >
> > > The only situation they can appear in the as an early-break is when
> > > we have a case where main exit != latch connected exit.
> > >
> > > However in these cases they are unused, and only there because
> > > normally you would have exited (i.e. there was a return) but the
> > > vector loop needs to start over so we ignore it.
> > >
> > > These happen in testcase vect-early-break_74.c and
> > > vect-early-break_78.c
> > 
> > Hmm, so in that case their value is incorrect (but doesn't matter,
> > we ignore it)?
> > 
> 
> Correct, they're placed there due to exit redirection, but in these inverted
> testcases where we've peeled the vector iteration you can't ever skip the
> epilogue.  So they are guaranteed not to be used.
> 
> > > > > +		gimple_stmt_iterator exit_gsi;
> > > > > +		tree new_tree
> > > > > +		  = vectorizable_live_operation_1 (loop_vinfo, stmt_info,
> > > > > +						   exit_e, vectype, ncopies,
> > > > > +						   slp_node, bitsize,
> > > > > +						   tmp_bitstart, tmp_vec_lhs,
> > > > > +						   lhs_type, restart_loop,
> > > > > +						   &exit_gsi);
> > > > > +
> > > > > +		/* Use the empty block on the exit to materialize the new
> > > > stmts
> > > > > +		   so we can use update the PHI here.  */
> > > > > +		if (gimple_phi_num_args (use_stmt) == 1)
> > > > > +		  {
> > > > > +		    auto gsi = gsi_for_stmt (use_stmt);
> > > > > +		    remove_phi_node (&gsi, false);
> > > > > +		    tree lhs_phi = gimple_phi_result (use_stmt);
> > > > > +		    gimple *copy = gimple_build_assign (lhs_phi, new_tree);
> > > > > +		    gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
> > > > > +		  }
> > > > > +		else
> > > > > +		  SET_PHI_ARG_DEF (use_stmt, dest_e->dest_idx, new_tree);
> > > >
> > > > if the else case works, why not use it always?
> > >
> > > Because it doesn't work for main exit.  The early exit have a intermediate block
> > > that is used to generate the statements on, so for them we are fine updating the
> > > use in place.
> > >
> > > The main exits don't. and so the existing trick the vectorizer uses is to materialize
> > > the statements in the same block and then dissolves the phi node.   However you
> > > can't do that for the early exit because the phi node isn't singular.
> > 
> > But if the PHI has a single arg you can replace that?  By making a
> > copy stmt from it don't you break LC SSA?
> > 
> 
> Yeah, what the existing code is sneakily doing is this:
> 
> It has to vectorize
> 
> x = PHI <y>
> y gets vectorized a z but
> 
> x = PHI <z>
> z = ...
> 
> would be invalid,  so what it does, since it doesn't have a predecessor note to place stuff in,
> it'll do
> 
> z = ...
> x = z
> 
> and removed the PHI.  The PHI was only placed there for vectorization so it's not needed
> after this point.  It's also for this reason why the code passes around a gimpe_seq since
> it needs to make sure it gets the order right when inserting statements.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-loop.cc (vectorizable_live_operation,
> 	vectorizable_live_operation_1): Support early exits.
> 	(can_vectorize_live_stmts): Call vectorizable_live_operation for non-live
> 	inductions or reductions.
> 	(find_connected_edge, vect_get_vect_def): New.
> 	(vect_create_epilog_for_reduction): Support reductions in early break.
> 	* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
> 	(vect_stmt_relevant_p): Mark all inductions when early break as being
> 	live.
> 	* tree-vectorizer.h (perm_mask_for_reverse): Expose.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index f38cc47551488525b15c2be758cac8291dbefb3a..4e48217a31e59318c2ea8e5ab63b06ba19840cbd 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -3346,6 +3346,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  	    bb_before_epilog->count = single_pred_edge (bb_before_epilog)->count ();
>  	  bb_before_epilog = loop_preheader_edge (epilog)->src;
>  	}
> +
>        /* If loop is peeled for non-zero constant times, now niters refers to
>  	 orig_niters - prolog_peeling, it won't overflow even the orig_niters
>  	 overflows.  */
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index df5e1d28fac2ce35e71decdec0d8e31fb75557f5..2f922b42f6d567dfd5da9b276b1c9d37bc681876 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -5831,6 +5831,34 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
>    return new_temp;
>  }
>  
> +/* Retrieves the definining statement to be used for a reduction.
> +   For MAIN_EXIT_P we use the current VEC_STMTs and otherwise we look at
> +   the reduction definitions.  */
> +
> +tree
> +vect_get_vect_def (stmt_vec_info reduc_info, slp_tree slp_node,
> +		   slp_instance slp_node_instance, bool main_exit_p, unsigned i,
> +		   vec <gimple *> &vec_stmts)
> +{
> +  tree def;
> +
> +  if (slp_node)
> +    {
> +      if (!main_exit_p)
> +        slp_node = slp_node_instance->reduc_phis;
> +      def = vect_get_slp_vect_def (slp_node, i);
> +    }
> +  else
> +    {
> +      if (!main_exit_p)
> +	reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (reduc_info));
> +      vec_stmts = STMT_VINFO_VEC_STMTS (reduc_info);
> +      def = gimple_get_lhs (vec_stmts[0]);
> +    }
> +
> +  return def;
> +}
> +
>  /* Function vect_create_epilog_for_reduction
>  
>     Create code at the loop-epilog to finalize the result of a reduction
> @@ -5842,6 +5870,8 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
>     SLP_NODE_INSTANCE is the SLP node instance containing SLP_NODE
>     REDUC_INDEX says which rhs operand of the STMT_INFO is the reduction phi
>       (counting from 0)
> +   LOOP_EXIT is the edge to update in the merge block.  In the case of a single
> +     exit this edge is always the main loop exit.
>  
>     This function:
>     1. Completes the reduction def-use cycles.
> @@ -5882,7 +5912,8 @@ static void
>  vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>  				  stmt_vec_info stmt_info,
>  				  slp_tree slp_node,
> -				  slp_instance slp_node_instance)
> +				  slp_instance slp_node_instance,
> +				  edge loop_exit)
>  {
>    stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
>    gcc_assert (reduc_info->is_reduc_info);
> @@ -5891,6 +5922,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>       loop-closed PHI of the inner loop which we remember as
>       def for the reduction PHI generation.  */
>    bool double_reduc = false;
> +  bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit;
>    stmt_vec_info rdef_info = stmt_info;
>    if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
>      {
> @@ -6053,7 +6085,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>        /* Create an induction variable.  */
>        gimple_stmt_iterator incr_gsi;
>        bool insert_after;
> -      standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> +      vect_iv_increment_position (loop_exit, &incr_gsi, &insert_after);
>        create_iv (series_vect, PLUS_EXPR, vec_step, NULL_TREE, loop, &incr_gsi,
>  		 insert_after, &indx_before_incr, &indx_after_incr);
>  
> @@ -6132,23 +6164,23 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>           Store them in NEW_PHIS.  */
>    if (double_reduc)
>      loop = outer_loop;
> -  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> +  /* We need to reduce values in all exits.  */
> +  exit_bb = loop_exit->dest;
>    exit_gsi = gsi_after_labels (exit_bb);
>    reduc_inputs.create (slp_node ? vec_num : ncopies);
> +  vec <gimple *> vec_stmts;
>    for (unsigned i = 0; i < vec_num; i++)
>      {
>        gimple_seq stmts = NULL;
> -      if (slp_node)
> -	def = vect_get_slp_vect_def (slp_node, i);
> -      else
> -	def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]);
> +      def = vect_get_vect_def (rdef_info, slp_node, slp_node_instance,
> +			       main_exit_p, i, vec_stmts);
>        for (j = 0; j < ncopies; j++)
>  	{
>  	  tree new_def = copy_ssa_name (def);
>  	  phi = create_phi_node (new_def, exit_bb);
>  	  if (j)
> -	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
> -	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
> +	    def = gimple_get_lhs (vec_stmts[j]);
> +	  SET_PHI_ARG_DEF (phi, loop_exit->dest_idx, def);
>  	  new_def = gimple_convert (&stmts, vectype, new_def);
>  	  reduc_inputs.quick_push (new_def);
>  	}
> @@ -10481,17 +10513,18 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>    return true;
>  }
>  
> -
>  /* Function vectorizable_live_operation_1.
> +
>     helper function for vectorizable_live_operation.  */
> +
>  tree
>  vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
> -			       stmt_vec_info stmt_info, edge exit_e,
> +			       stmt_vec_info stmt_info, basic_block exit_bb,
>  			       tree vectype, int ncopies, slp_tree slp_node,
>  			       tree bitsize, tree bitstart, tree vec_lhs,
> -			       tree lhs_type, gimple_stmt_iterator *exit_gsi)
> +			       tree lhs_type, bool restart_loop,
> +			       gimple_stmt_iterator *exit_gsi)
>  {
> -  basic_block exit_bb = exit_e->dest;
>    gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
>  
>    tree vec_lhs_phi = copy_ssa_name (vec_lhs);
> @@ -10504,7 +10537,9 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
>    if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
>      {
>        /* Emit:
> +
>  	 SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
> +
>  	 where VEC_LHS is the vectorized live-out result and MASK is
>  	 the loop mask for the final iteration.  */
>        gcc_assert (ncopies == 1 && !slp_node);
> @@ -10513,15 +10548,18 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
>        tree len = vect_get_loop_len (loop_vinfo, &gsi,
>  				    &LOOP_VINFO_LENS (loop_vinfo),
>  				    1, vectype, 0, 0);
> +
>        /* BIAS - 1.  */
>        signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
>        tree bias_minus_one
>  	= int_const_binop (MINUS_EXPR,
>  			   build_int_cst (TREE_TYPE (len), biasval),
>  			   build_one_cst (TREE_TYPE (len)));
> +
>        /* LAST_INDEX = LEN + (BIAS - 1).  */
>        tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
>  				     len, bias_minus_one);
> +
>        /* This needs to implement extraction of the first index, but not sure
>  	 how the LEN stuff works.  At the moment we shouldn't get here since
>  	 there's no LEN support for early breaks.  But guard this so there's
> @@ -10532,13 +10570,16 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
>        tree scalar_res
>  	= gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
>  			vec_lhs_phi, last_index);
> +
>        /* Convert the extracted vector element to the scalar type.  */
>        new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
>      }
>    else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
>      {
>        /* Emit:
> +
>  	 SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
> +
>  	 where VEC_LHS is the vectorized live-out result and MASK is
>  	 the loop mask for the final iteration.  */
>        gcc_assert (!slp_node);
> @@ -10548,10 +10589,38 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
>        tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
>  				      &LOOP_VINFO_MASKS (loop_vinfo),
>  				      1, vectype, 0);
> +      tree scalar_res;
> +
> +      /* For an inverted control flow with early breaks we want EXTRACT_FIRST
> +	 instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
> +      if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +	{
> +	  /* First create the permuted mask.  */
> +	  tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
> +	  tree perm_dest = copy_ssa_name (mask);
> +	  gimple *perm_stmt
> +		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
> +				       mask, perm_mask);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
> +				       &gsi);
> +	  mask = perm_dest;
> +
> +	  /* Then permute the vector contents.  */
> +	  tree perm_elem = perm_mask_for_reverse (vectype);
> +	  perm_dest = copy_ssa_name (vec_lhs_phi);
> +	  perm_stmt
> +		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
> +				       vec_lhs_phi, perm_elem);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
> +				       &gsi);
> +	  vec_lhs_phi = perm_dest;
> +	}
>  
>        gimple_seq_add_seq (&stmts, tem);
> -       tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> -				       mask, vec_lhs_phi);
> +
> +      scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> +				 mask, vec_lhs_phi);
> +
>        /* Convert the extracted vector element to the scalar type.  */
>        new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
>      }
> @@ -10564,12 +10633,26 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
>        new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
>  				       &stmts, true, NULL_TREE);
>      }
> +
>    *exit_gsi = gsi_after_labels (exit_bb);
>    if (stmts)
>      gsi_insert_seq_before (exit_gsi, stmts, GSI_SAME_STMT);
> +
>    return new_tree;
>  }
>  
> +/* Find the edge that's the final one in the path from SRC to DEST and
> +   return it.  This edge must exist in at most one forwarder edge between.  */
> +
> +static edge
> +find_connected_edge (edge src, basic_block dest)
> +{
> +   if (src->dest == dest)
> +     return src;
> +
> +  return find_edge (src->dest, dest);
> +}
> +
>  /* Function vectorizable_live_operation.
>  
>     STMT_INFO computes a value that is used outside the loop.  Check if
> @@ -10590,11 +10673,13 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
>    poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
>    int ncopies;
>    gimple *use_stmt;
> +  use_operand_p use_p;
>    auto_vec<tree> vec_oprnds;
>    int vec_entry = 0;
>    poly_uint64 vec_index = 0;
>  
> -  gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
> +  gcc_assert (STMT_VINFO_LIVE_P (stmt_info)
> +	      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
>  
>    /* If a stmt of a reduction is live, vectorize it via
>       vect_create_epilog_for_reduction.  vectorizable_reduction assessed
> @@ -10619,8 +10704,25 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
>        if (STMT_VINFO_REDUC_TYPE (reduc_info) == FOLD_LEFT_REDUCTION
>  	  || STMT_VINFO_REDUC_TYPE (reduc_info) == EXTRACT_LAST_REDUCTION)
>  	return true;
> +
>        vect_create_epilog_for_reduction (loop_vinfo, stmt_info, slp_node,
> -					slp_node_instance);
> +					slp_node_instance,
> +					LOOP_VINFO_IV_EXIT (loop_vinfo));
> +
> +      /* If early break we only have to materialize the reduction on the merge
> +	 block, but we have to find an alternate exit first.  */
> +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +	{
> +	  for (auto exit : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo)))
> +	    if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo))
> +	      {
> +		vect_create_epilog_for_reduction (loop_vinfo, stmt_info,
> +						  slp_node, slp_node_instance,
> +						  exit);
> +		break;
> +	      }
> +	}
> +
>        return true;
>      }
>  
> @@ -10772,37 +10874,62 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
>  	   lhs' = new_tree;  */
>  
>        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> -      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> -      gcc_assert (single_pred_p (exit_bb));
> -
> -      tree vec_lhs_phi = copy_ssa_name (vec_lhs);
> -      gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> -      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
> -
> -      gimple_stmt_iterator exit_gsi;
> -      tree new_tree
> -	= vectorizable_live_operation_1 (loop_vinfo, stmt_info,
> -					 LOOP_VINFO_IV_EXIT (loop_vinfo),
> -					 vectype, ncopies, slp_node, bitsize,
> -					 bitstart, vec_lhs, lhs_type,
> -					 &exit_gsi);
> -
> -      /* Remove existing phis that copy from lhs and create copies
> -	 from new_tree.  */
> -      gimple_stmt_iterator gsi;
> -      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
> -	{
> -	  gimple *phi = gsi_stmt (gsi);
> -	  if ((gimple_phi_arg_def (phi, 0) == lhs))
> +      /* Check if we have a loop where the chosen exit is not the main exit,
> +	 in these cases for an early break we restart the iteration the vector code
> +	 did.  For the live values we want the value at the start of the iteration
> +	 rather than at the end.  */
> +      edge main_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +      bool restart_loop = LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
> +      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> +	if (!is_gimple_debug (use_stmt)
> +	    && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
> +	  FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
>  	    {
> -	      remove_phi_node (&gsi, false);
> -	      tree lhs_phi = gimple_phi_result (phi);
> -	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
> -	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
> -	    }
> -	  else
> -	    gsi_next (&gsi);
> -	}
> +	      edge e = gimple_phi_arg_edge (as_a <gphi *> (use_stmt),
> +					   phi_arg_index_from_use (use_p));
> +	      bool main_exit_edge = e == main_e
> +				    || find_connected_edge (main_e, e->src);
> +
> +	      /* Early exits have an merge block, we want the merge block itself
> +		 so use ->src.  For main exit the merge block is the
> +		 destination.  */
> +	      basic_block dest = main_exit_edge ? main_e->dest : e->src;
> +	      gimple *tmp_vec_stmt = vec_stmt;
> +	      tree tmp_vec_lhs = vec_lhs;
> +	      tree tmp_bitstart = bitstart;
> +
> +	      /* For early exit where the exit is not in the BB that leads
> +		 to the latch then we're restarting the iteration in the
> +		 scalar loop.  So get the first live value.  */
> +	      restart_loop = restart_loop || !main_exit_edge;
> +	      if (restart_loop
> +		  && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
> +		{
> +		  tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> +		  tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
> +		  tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));
> +		}
> +
> +	      gimple_stmt_iterator exit_gsi;
> +	      tree new_tree
> +		= vectorizable_live_operation_1 (loop_vinfo, stmt_info,
> +						 dest, vectype, ncopies,
> +						 slp_node, bitsize,
> +						 tmp_bitstart, tmp_vec_lhs,
> +						 lhs_type, restart_loop,
> +						 &exit_gsi);
> +
> +	      if (gimple_phi_num_args (use_stmt) == 1)
> +		{
> +		  auto gsi = gsi_for_stmt (use_stmt);
> +		  remove_phi_node (&gsi, false);
> +		  tree lhs_phi = gimple_phi_result (use_stmt);
> +		  gimple *copy = gimple_build_assign (lhs_phi, new_tree);
> +		  gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
> +		}
> +	      else
> +		SET_PHI_ARG_DEF (use_stmt, e->dest_idx, new_tree);
> +	  }
>  
>        /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
>        FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index b3a09c0a804a38e17ef32b6ce13b98b077459fc7..582c5e678fad802d6e76300fe3c939b9f2978f17 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -342,6 +342,7 @@ is_simple_and_all_uses_invariant (stmt_vec_info stmt_info,
>     - it has uses outside the loop.
>     - it has vdefs (it alters memory).
>     - control stmts in the loop (except for the exit condition).
> +   - it is an induction and we have multiple exits.
>  
>     CHECKME: what other side effects would the vectorizer allow?  */
>  
> @@ -399,6 +400,19 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
>  	}
>      }
>  
> +  /* Check if it's an induction and multiple exits.  In this case there will be
> +     a usage later on after peeling which is needed for the alternate exit.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_NOTE, vect_location,
> +			   "vec_stmt_relevant_p: induction forced for "
> +			   "early break.\n");
> +      *live_p = true;
> +
> +    }
> +
>    if (*live_p && *relevant == vect_unused_in_scope
>        && !is_simple_and_all_uses_invariant (stmt_info, loop_vinfo))
>      {
> @@ -1774,7 +1788,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info)
>  /* If the target supports a permute mask that reverses the elements in
>     a vector of type VECTYPE, return that mask, otherwise return null.  */
>  
> -static tree
> +tree
>  perm_mask_for_reverse (tree vectype)
>  {
>    poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> @@ -12720,20 +12734,27 @@ can_vectorize_live_stmts (vec_info *vinfo, stmt_vec_info stmt_info,
>  			  bool vec_stmt_p,
>  			  stmt_vector_for_cost *cost_vec)
>  {
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
>    if (slp_node)
>      {
>        stmt_vec_info slp_stmt_info;
>        unsigned int i;
>        FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, slp_stmt_info)
>  	{
> -	  if (STMT_VINFO_LIVE_P (slp_stmt_info)
> +	  if ((STMT_VINFO_LIVE_P (slp_stmt_info)
> +	       || (loop_vinfo
> +		   && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +		   && STMT_VINFO_DEF_TYPE (slp_stmt_info)
> +			== vect_induction_def))
>  	      && !vectorizable_live_operation (vinfo, slp_stmt_info, slp_node,
>  					       slp_node_instance, i,
>  					       vec_stmt_p, cost_vec))
>  	    return false;
>  	}
>      }
> -  else if (STMT_VINFO_LIVE_P (stmt_info)
> +  else if ((STMT_VINFO_LIVE_P (stmt_info)
> +	    || (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +		&& STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def))
>  	   && !vectorizable_live_operation (vinfo, stmt_info,
>  					    slp_node, slp_node_instance, -1,
>  					    vec_stmt_p, cost_vec))
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 15c7f75b1f3c61ab469f1b1970dae9c6ac1a9f55..974f617d54a14c903894dd20d60098ca259c96f2 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2248,6 +2248,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree,
>  				enum vect_def_type *,
>  				tree *, stmt_vec_info * = NULL);
>  extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
> +extern tree perm_mask_for_reverse (tree);
>  extern bool supportable_widening_operation (vec_info*, code_helper,
>  					    stmt_vec_info, tree, tree,
>  					    code_helper*, code_helper*,
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

  reply	other threads:[~2023-12-06  9:36 UTC|newest]

Thread overview: 200+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
2023-06-28 13:41 ` [PATCH 1/19]middle-end ifcvt: Support bitfield lowering of multiple-exit loops Tamar Christina
2023-07-04 11:29   ` Richard Biener
2023-06-28 13:41 ` [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector Tamar Christina
2023-06-29 22:17   ` Jason Merrill
2023-06-30 16:18     ` Tamar Christina
2023-06-30 16:44       ` Jason Merrill
2023-06-28 13:42 ` [PATCH 3/19]middle-end clean up vect testsuite using pragma novector Tamar Christina
2023-06-28 13:54   ` Tamar Christina
2023-07-04 11:31   ` Richard Biener
2023-06-28 13:43 ` [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits Tamar Christina
2023-07-04 11:52   ` Richard Biener
2023-07-04 14:57     ` Jan Hubicka
2023-07-06 14:34       ` Jan Hubicka
2023-07-07  5:59         ` Richard Biener
2023-07-07 12:20           ` Jan Hubicka
2023-07-07 12:27             ` Tamar Christina
2023-07-07 14:10               ` Jan Hubicka
2023-07-10  7:07             ` Richard Biener
2023-07-10  8:33               ` Jan Hubicka
2023-07-10  9:24                 ` Richard Biener
2023-07-10  9:23               ` Jan Hubicka
2023-07-10  9:29                 ` Richard Biener
2023-07-11  9:28                   ` Jan Hubicka
2023-07-11 10:31                     ` Richard Biener
2023-07-11 12:40                       ` Jan Hubicka
2023-07-11 13:04                         ` Richard Biener
2023-06-28 13:43 ` [PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds Tamar Christina
2023-07-04 12:05   ` Richard Biener
2023-07-10 15:32     ` Tamar Christina
2023-07-11 11:03       ` Richard Biener
2023-06-28 13:44 ` [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant Tamar Christina
2023-07-04 12:10   ` Richard Biener
2023-07-06 10:37     ` Tamar Christina
2023-07-06 10:51       ` Richard Biener
2023-06-28 13:44 ` [PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Tamar Christina
2023-07-13 11:32   ` Richard Biener
2023-07-13 11:54     ` Tamar Christina
2023-07-13 12:10       ` Richard Biener
2023-06-28 13:45 ` [PATCH 8/19]middle-end: updated niters analysis to handle multiple exits Tamar Christina
2023-07-13 11:49   ` Richard Biener
2023-07-13 12:03     ` Tamar Christina
2023-07-14  9:09     ` Richard Biener
2023-06-28 13:45 ` [PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison to make the main body re-usable Tamar Christina
2023-06-28 13:55   ` [PATCH 9/19] " Tamar Christina
2023-07-13 16:23     ` Richard Biener
2023-06-28 13:46 ` [PATCH 10/19]middle-end: implement vectorizable_early_break Tamar Christina
2023-06-28 13:46 ` [PATCH 11/19]middle-end: implement code motion for early break Tamar Christina
2023-06-28 13:47 ` [PATCH 12/19]middle-end: implement loop peeling and IV updates " Tamar Christina
2023-07-13 17:31   ` Richard Biener
2023-07-13 19:05     ` Tamar Christina
2023-07-14 13:34       ` Richard Biener
2023-07-17 10:56         ` Tamar Christina
2023-07-17 12:48           ` Richard Biener
2023-08-18 11:35         ` Tamar Christina
2023-08-18 12:53           ` Richard Biener
2023-08-18 13:12             ` Tamar Christina
2023-08-18 13:15               ` Richard Biener
2023-10-23 20:21         ` Tamar Christina
2023-06-28 13:47 ` [PATCH 13/19]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
2023-06-28 13:47 ` [PATCH 14/19]middle-end testsuite: Add new tests for early break vectorization Tamar Christina
2023-06-28 13:48 ` [PATCH 15/19]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
2023-06-28 13:48 ` [PATCH 16/19]AArch64 Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
2023-06-28 13:48 ` [PATCH 17/19]AArch64 Add optimization for vector cbranch combining SVE and " Tamar Christina
2023-06-28 13:49 ` [PATCH 18/19]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
2023-06-28 13:50 ` [PATCH 19/19]Arm: Add MVE " Tamar Christina
     [not found] ` <MW5PR11MB5908414D8B2AB0580A888ECAA924A@MW5PR11MB5908.namprd11.prod.outlook.com>
2023-06-28 14:49   ` FW: [PATCH v5 0/19] Support early break/return auto-vectorization 钟居哲
2023-06-28 16:00     ` Tamar Christina
2023-11-06  7:36 ` [PATCH v6 0/21]middle-end: " Tamar Christina
2023-11-06  7:37 ` [PATCH 1/21]middle-end testsuite: Add more pragma novector to new tests Tamar Christina
2023-11-07  9:46   ` Richard Biener
2023-11-06  7:37 ` [PATCH 2/21]middle-end testsuite: Add tests for early break vectorization Tamar Christina
2023-11-07  9:52   ` Richard Biener
2023-11-16 10:53     ` Richard Biener
2023-11-06  7:37 ` [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks Tamar Christina
2023-11-07 10:53   ` Richard Biener
2023-11-07 11:34     ` Tamar Christina
2023-11-07 14:23       ` Richard Biener
2023-12-19 10:11         ` Tamar Christina
2023-12-19 14:05           ` Richard Biener
2023-12-20 10:51             ` Tamar Christina
2023-12-20 12:24               ` Richard Biener
2023-11-06  7:38 ` [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form " Tamar Christina
2023-11-15  0:00   ` Tamar Christina
2023-11-15 12:40     ` Richard Biener
2023-11-20 21:51       ` Tamar Christina
2023-11-24 10:16         ` Tamar Christina
2023-11-24 12:38           ` Richard Biener
2023-11-06  7:38 ` [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch Tamar Christina
2023-11-07 15:04   ` Richard Biener
2023-11-07 23:10     ` Tamar Christina
2023-11-13 20:11     ` Tamar Christina
2023-11-14  7:56       ` Richard Biener
2023-11-14  8:07         ` Tamar Christina
2023-11-14 23:59           ` Tamar Christina
2023-11-15 12:14             ` Richard Biener
2023-11-06  7:38 ` [PATCH 6/21]middle-end: support multiple exits in loop versioning Tamar Christina
2023-11-07 14:54   ` Richard Biener
2023-11-06  7:39 ` [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Tamar Christina
2023-11-15  0:03   ` Tamar Christina
2023-11-15 13:01     ` Richard Biener
2023-11-15 13:09       ` Tamar Christina
2023-11-15 13:22         ` Richard Biener
2023-11-15 14:14           ` Tamar Christina
2023-11-16 10:40             ` Richard Biener
2023-11-16 11:08               ` Tamar Christina
2023-11-16 11:27                 ` Richard Biener
2023-11-16 12:01                   ` Tamar Christina
2023-11-16 12:30                     ` Richard Biener
2023-11-16 13:22                       ` Tamar Christina
2023-11-16 13:35                         ` Richard Biener
2023-11-16 14:14                           ` Tamar Christina
2023-11-16 14:17                             ` Richard Biener
2023-11-16 15:19                               ` Tamar Christina
2023-11-16 18:41                                 ` Tamar Christina
2023-11-17 10:40                                   ` Tamar Christina
2023-11-17 12:13                                     ` Richard Biener
2023-11-20 21:54                                       ` Tamar Christina
2023-11-24 10:18                                         ` Tamar Christina
2023-11-24 12:41                                           ` Richard Biener
2023-11-06  7:39 ` [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits Tamar Christina
2023-11-15  0:05   ` Tamar Christina
2023-11-15 13:41     ` Richard Biener
2023-11-15 14:26       ` Tamar Christina
2023-11-16 11:16         ` Richard Biener
2023-11-20 21:57           ` Tamar Christina
2023-11-24 10:20             ` Tamar Christina
2023-11-24 13:23               ` Richard Biener
2023-11-27 22:47                 ` Tamar Christina
2023-11-29 13:28                   ` Richard Biener
2023-11-29 21:22                     ` Tamar Christina
2023-11-30 13:23                       ` Richard Biener
2023-12-06  4:21                         ` Tamar Christina
2023-12-06  9:33                           ` Richard Biener [this message]
2023-11-06  7:39 ` [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code Tamar Christina
2023-11-27 22:49   ` Tamar Christina
2023-11-29 13:50     ` Richard Biener
2023-12-06  4:37       ` Tamar Christina
2023-12-06  9:37         ` Richard Biener
2023-12-08  8:58           ` Tamar Christina
2023-12-08 10:28             ` Richard Biener
2023-12-08 13:45               ` Tamar Christina
2023-12-08 13:59                 ` Richard Biener
2023-12-08 15:01                   ` Tamar Christina
2023-12-11  7:09                   ` Tamar Christina
2023-12-11  9:36                     ` Richard Biener
2023-12-11 23:12                       ` Tamar Christina
2023-12-12 10:10                         ` Richard Biener
2023-12-12 10:27                           ` Tamar Christina
2023-12-12 10:59                           ` Richard Sandiford
2023-12-12 11:30                             ` Richard Biener
2023-12-13 14:13                               ` Tamar Christina
2023-12-14 13:12                                 ` Richard Biener
2023-12-14 18:44                                   ` Tamar Christina
2023-11-06  7:39 ` [PATCH 10/21]middle-end: implement relevancy analysis support for control flow Tamar Christina
2023-11-27 22:49   ` Tamar Christina
2023-11-29 14:47     ` Richard Biener
2023-12-06  4:10       ` Tamar Christina
2023-12-06  9:44         ` Richard Biener
2023-11-06  7:40 ` [PATCH 11/21]middle-end: wire through peeling changes and dominator updates after guard edge split Tamar Christina
2023-11-06  7:40 ` [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks Tamar Christina
2023-11-27 22:48   ` Tamar Christina
2023-12-06  8:31   ` Richard Biener
2023-12-06  9:10     ` Tamar Christina
2023-12-06  9:27       ` Richard Biener
2023-11-06  7:40 ` [PATCH 13/21]middle-end: Update loop form analysis to support early break Tamar Christina
2023-11-27 22:48   ` Tamar Christina
2023-12-06  4:00     ` Tamar Christina
2023-12-06  8:18   ` Richard Biener
2023-12-06  8:52     ` Tamar Christina
2023-12-06  9:15       ` Richard Biener
2023-12-06  9:29         ` Tamar Christina
2023-11-06  7:41 ` [PATCH 14/21]middle-end: Change loop analysis from looking at at number of BB to actual cfg Tamar Christina
2023-11-06 14:44   ` Richard Biener
2023-11-06  7:41 ` [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging Tamar Christina
2023-12-09 10:38   ` Richard Sandiford
2023-12-11  7:38     ` Richard Biener
2023-12-11  8:49       ` Tamar Christina
2023-12-11  9:00         ` Richard Biener
2023-11-06  7:41 ` [PATCH 16/21]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
2023-11-06  7:41 ` [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
2023-11-28 16:37   ` Richard Sandiford
2023-11-28 17:55     ` Richard Sandiford
2023-12-06 16:25       ` Tamar Christina
2023-12-07  0:56         ` Richard Sandiford
2023-12-14 18:40           ` Tamar Christina
2023-12-14 19:34             ` Richard Sandiford
2023-11-06  7:42 ` [PATCH 18/21]AArch64: Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
2023-11-06  7:42 ` [PATCH 19/21]AArch64: Add optimization for vector cbranch combining SVE and " Tamar Christina
2023-11-06  7:42 ` [PATCH 20/21]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
2023-11-27 12:48   ` Kyrylo Tkachov
2023-11-06  7:43 ` [PATCH 21/21]Arm: Add MVE " Tamar Christina
2023-11-27 12:47   ` Kyrylo Tkachov
2023-11-06 14:25 ` [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization Richard Biener
2023-11-06 15:17   ` Tamar Christina
2023-11-07  9:42     ` Richard Biener
2023-11-07 10:47       ` Tamar Christina
2023-11-07 13:58         ` Richard Biener
2023-11-27 18:30           ` Richard Sandiford
2023-11-28  8:11             ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=337r0po6-79qo-6r13-7o32-1n9796430872@fhfr.qr \
    --to=rguenther@suse.de \
    --cc=Tamar.Christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jlaw@ventanamicro.com \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).