RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <rguenther@suse.de>
To: Tamar Christina <Tamar.Christina@arm.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	nd <nd@arm.com>,  "jlaw@ventanamicro.com" <jlaw@ventanamicro.com>
Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
Date: Wed, 6 Dec 2023 10:37:11 +0100 (CET)	[thread overview]
Message-ID: <5r3p7378-q309-ooqo-7o76-q9r567ns1890@fhfr.qr> (raw)
In-Reply-To: <VI1PR08MB53254186A2A0585B263FF72AFF84A@VI1PR08MB5325.eurprd08.prod.outlook.com>

On Wed, 6 Dec 2023, Tamar Christina wrote:

> > > > +
> > > > +  tree truth_type = truth_type_for (vectype_op);  machine_mode mode =
> > > > + TYPE_MODE (truth_type);  int ncopies;
> > > > +
> > 
> > more line break issues ... (also below, check yourself)
> > 
> > shouldn't STMT_VINFO_VECTYPE already match truth_type here?  If not
> > it looks to be set wrongly (or shouldn't be set at all)
> > 
> 
> Fixed, I now leverage the existing vect_recog_bool_pattern to update the types
> If needed and determine the initial type in vect_get_vector_types_for_stmt.
> 
> > > > +  if (slp_node)
> > > > +    ncopies = 1;
> > > > +  else
> > > > +    ncopies = vect_get_num_copies (loop_vinfo, truth_type);
> > > > +
> > > > +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);  bool
> > > > + masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > > > +
> > 
> > what about with_len?
> 
> Should be easy to add, but don't know how it works.
> 
> > 
> > > > +  /* Analyze only.  */
> > > > +  if (!vec_stmt)
> > > > +    {
> > > > +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> > > > +	{
> > > > +	  if (dump_enabled_p ())
> > > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +			       "can't vectorize early exit because the "
> > > > +			       "target doesn't support flag setting vector "
> > > > +			       "comparisons.\n");
> > > > +	  return false;
> > > > +	}
> > > > +
> > > > +      if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR))
> > 
> > Why NE_EXPR?  This looks wrong.  Or vectype_op is wrong if you're
> > emitting
> > 
> >  mask = op0 CMP op1;
> >  if (mask != 0)
> > 
> > I think you need to check for CMP, not NE_EXPR.
> 
> Well CMP is checked by vectorizable_comparison_1, but I realized this
> check is not checking what I wanted and the cbranch requirements
> already do.  So removed.
> 
> > 
> > > > +	{
> > > > +	  if (dump_enabled_p ())
> > > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +			       "can't vectorize early exit because the "
> > > > +			       "target does not support boolean vector "
> > > > +			       "comparisons for type %T.\n", truth_type);
> > > > +	  return false;
> > > > +	}
> > > > +
> > > > +      if (ncopies > 1
> > > > +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> > > > +	{
> > > > +	  if (dump_enabled_p ())
> > > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +			       "can't vectorize early exit because the "
> > > > +			       "target does not support boolean vector OR for "
> > > > +			       "type %T.\n", truth_type);
> > > > +	  return false;
> > > > +	}
> > > > +
> > > > +      if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
> > > > +				      vec_stmt, slp_node, cost_vec))
> > > > +	return false;
> > 
> > I suppose vectorizable_comparison_1 will check this again, so the above
> > is redundant?
> > 
> 
> The IOR? No, vectorizable_comparison_1 doesn't reduce so may not check it
> depending on the condition.
> 
> > > > +  /* Determine if we need to reduce the final value.  */
> > > > +  if (stmts.length () > 1)
> > > > +    {
> > > > +      /* We build the reductions in a way to maintain as much parallelism as
> > > > +	 possible.  */
> > > > +      auto_vec<tree> workset (stmts.length ());
> > > > +      workset.splice (stmts);
> > > > +      while (workset.length () > 1)
> > > > +	{
> > > > +	  new_temp = make_temp_ssa_name (truth_type, NULL,
> > > > "vexit_reduc");
> > > > +	  tree arg0 = workset.pop ();
> > > > +	  tree arg1 = workset.pop ();
> > > > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0,
> > > > arg1);
> > > > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > > > +				       &cond_gsi);
> > > > +	  if (slp_node)
> > > > +	    slp_node->push_vec_def (new_stmt);
> > > > +	  else
> > > > +	    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
> > > > +	  workset.quick_insert (0, new_temp);
> > 
> > Reduction epilogue handling has similar code to reduce a set of vectors
> > to a single one with an operation.  I think we want to share that code.
> > 
> 
> I've taken a look but that code isn't suitable here since they have different
> constraints.  I don't require an in-order reduction since for the comparison
> all we care about is whether in a lane any bit is set or not.  This means:
> 
> 1. we can reduce using a fast operation like IOR.
> 2. we can reduce in as much parallelism as possible.
> 
> The comparison is on the critical path for the loop now, unlike live reductions
> which are always at the end, so using the live reduction code resulted in a
> slow down since it creates a longer dependency chain.

OK.

> > > > +	}
> > > > +    }
> > > > +  else
> > > > +    new_temp = stmts[0];
> > > > +
> > > > +  gcc_assert (new_temp);
> > > > +
> > > > +  tree cond = new_temp;
> > > > +  if (masked_loop_p)
> > > > +    {
> > > > +      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies,
> > > > truth_type, 0);
> > > > +      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > > > +			       &cond_gsi);
> > 
> > I don't think this is correct when 'stmts' had more than one vector?
> > 
> 
> It is, because even when VLA, since we only support counted loops partial vectors
> are disabled. And it looks like --parm vect-partial-vector-usage=1 cannot force it on.

--param vect-partial-vector-usage=2 would, no?

> In principal I suppose I could mask the individual stmts, that should handle the future case when
> This is relaxed to supposed non-fix length buffers?

Well, it looks wrong - either put in an assert that we start with a
single stmt or assert !masked_loop_p instead?  Better ICE than
generate wrong code.

That said, I think you need to apply the masking on the original
stmts[], before reducing them, no?

Thanks,
Richard.

> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> 	(check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts,
> 	vect_recog_bool_pattern): Support gconds type analysis.
> 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> 	lhs.
> 	(vectorizable_early_exit): New.
> 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..c6cedf4fe7c1f1e1126ce166a059a4b2a2b49cbd 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
>    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
>      {
>        gcc_assert (!vectype
> +		  || is_a <gcond *> (pattern_stmt)
>  		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
>  		      == vect_use_mask_type_p (orig_stmt_info)));
>        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> @@ -5210,19 +5211,27 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
>     true if bool VAR can and should be optimized that way.  Assume it shouldn't
>     in case it's a result of a comparison which can be directly vectorized into
>     a vector comparison.  Fills in STMTS with all stmts visited during the
> -   walk.  */
> +   walk.  if COND then a gcond is being inspected instead of a normal COND,  */
>  
>  static bool
> -check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
> +check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
> +		    gcond *cond)
>  {
>    tree rhs1;
>    enum tree_code rhs_code;
> +  gassign *def_stmt = NULL;
>  
>    stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
> -  if (!def_stmt_info)
> +  if (!def_stmt_info && !cond)
>      return false;
> +  else if (!def_stmt_info)
> +    /* If we're a gcond we won't be codegen-ing the statements and are only
> +       after if the types match.  In that case we can accept loop invariant
> +       values.  */
> +    def_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (var));
> +  else
> +    def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
>  
> -  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
>    if (!def_stmt)
>      return false;
>  
> @@ -5234,27 +5243,28 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
>    switch (rhs_code)
>      {
>      case SSA_NAME:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
>  	return false;
>        break;
>  
>      CASE_CONVERT:
>        if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
>  	return false;
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
>  	return false;
>        break;
>  
>      case BIT_NOT_EXPR:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
>  	return false;
>        break;
>  
>      case BIT_AND_EXPR:
>      case BIT_IOR_EXPR:
>      case BIT_XOR_EXPR:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts)
> -	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, cond)
> +	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
> +				   cond))
>  	return false;
>        break;
>  
> @@ -5275,6 +5285,7 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
>  	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
>  							  TREE_TYPE (rhs1));
>  	  if (mask_type
> +	      && !cond
>  	      && expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
>  	    return false;
>  
> @@ -5324,11 +5335,13 @@ adjust_bool_pattern_cast (vec_info *vinfo,
>     VAR is an SSA_NAME that should be transformed from bool to a wider integer
>     type, OUT_TYPE is the desired final integer type of the whole pattern.
>     STMT_INFO is the info of the pattern root and is where pattern stmts should
> -   be associated with.  DEFS is a map of pattern defs.  */
> +   be associated with.  DEFS is a map of pattern defs.  If TYPE_ONLY then don't
> +   create new pattern statements and instead only fill LAST_STMT and DEFS.  */
>  
>  static void
>  adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
> -		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs)
> +		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs,
> +		     gimple *&last_stmt, bool type_only)
>  {
>    gimple *stmt = SSA_NAME_DEF_STMT (var);
>    enum tree_code rhs_code, def_rhs_code;
> @@ -5492,8 +5505,10 @@ adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
>      }
>  
>    gimple_set_location (pattern_stmt, loc);
> -  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> -			  get_vectype_for_scalar_type (vinfo, itype));
> +  if (!type_only)
> +    append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> +			    get_vectype_for_scalar_type (vinfo, itype));
> +  last_stmt = pattern_stmt;
>    defs.put (var, gimple_assign_lhs (pattern_stmt));
>  }
>  
> @@ -5509,11 +5524,14 @@ sort_after_uid (const void *p1, const void *p2)
>  
>  /* Create pattern stmts for all stmts participating in the bool pattern
>     specified by BOOL_STMT_SET and its root STMT_INFO with the desired type
> -   OUT_TYPE.  Return the def of the pattern root.  */
> +   OUT_TYPE.  Return the def of the pattern root.  If TYPE_ONLY the new
> +   statements are not emitted as pattern statements and the tree returned is
> +   only useful for type queries.  */
>  
>  static tree
>  adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
> -		   tree out_type, stmt_vec_info stmt_info)
> +		   tree out_type, stmt_vec_info stmt_info,
> +		   bool type_only = false)
>  {
>    /* Gather original stmts in the bool pattern in their order of appearance
>       in the IL.  */
> @@ -5523,16 +5541,16 @@ adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
>      bool_stmts.quick_push (*i);
>    bool_stmts.qsort (sort_after_uid);
>  
> +  gimple *last_stmt = NULL;
> +
>    /* Now process them in that order, producing pattern stmts.  */
>    hash_map <tree, tree> defs;
>    for (unsigned i = 0; i < bool_stmts.length (); ++i)
>      adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmts[i]),
> -			 out_type, stmt_info, defs);
> +			 out_type, stmt_info, defs, last_stmt, type_only);
>  
>    /* Pop the last pattern seq stmt and install it as pattern root for STMT.  */
> -  gimple *pattern_stmt
> -    = gimple_seq_last_stmt (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
> -  return gimple_assign_lhs (pattern_stmt);
> +  return gimple_assign_lhs (last_stmt);
>  }
>  
>  /* Return the proper type for converting bool VAR into
> @@ -5608,13 +5626,22 @@ vect_recog_bool_pattern (vec_info *vinfo,
>    enum tree_code rhs_code;
>    tree var, lhs, rhs, vectype;
>    gimple *pattern_stmt;
> -
> -  if (!is_gimple_assign (last_stmt))
> +  gcond* cond = NULL;
> +  if (!is_gimple_assign (last_stmt)
> +      && !(cond = dyn_cast <gcond *> (last_stmt)))
>      return NULL;
>  
> -  var = gimple_assign_rhs1 (last_stmt);
> -  lhs = gimple_assign_lhs (last_stmt);
> -  rhs_code = gimple_assign_rhs_code (last_stmt);
> +  if (is_gimple_assign (last_stmt))
> +    {
> +      var = gimple_assign_rhs1 (last_stmt);
> +      lhs = gimple_assign_lhs (last_stmt);
> +      rhs_code = gimple_assign_rhs_code (last_stmt);
> +    }
> +  else
> +    {
> +      lhs = var = gimple_cond_lhs (last_stmt);
> +      rhs_code = gimple_cond_code (last_stmt);
> +    }
>  
>    if (rhs_code == VIEW_CONVERT_EXPR)
>      var = TREE_OPERAND (var, 0);
> @@ -5632,7 +5659,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>  	return NULL;
>        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> +      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
>  	{
>  	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
>  				   TREE_TYPE (lhs), stmt_vinfo);
> @@ -5680,7 +5707,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>  
>        return pattern_stmt;
>      }
> -  else if (rhs_code == COND_EXPR
> +  else if ((rhs_code == COND_EXPR || cond)
>  	   && TREE_CODE (var) == SSA_NAME)
>      {
>        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
> @@ -5700,18 +5727,31 @@ vect_recog_bool_pattern (vec_info *vinfo,
>        if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
>  	return NULL;
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> +      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
>  	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
>        else if (integer_type_for_mask (var, vinfo))
>  	return NULL;
>  
> -      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> -      pattern_stmt 
> -	= gimple_build_assign (lhs, COND_EXPR,
> -			       build2 (NE_EXPR, boolean_type_node,
> -				       var, build_int_cst (TREE_TYPE (var), 0)),
> -			       gimple_assign_rhs2 (last_stmt),
> -			       gimple_assign_rhs3 (last_stmt));
> +      if (!cond)
> +	{
> +	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> +	  pattern_stmt
> +	    = gimple_build_assign (lhs, COND_EXPR,
> +				   build2 (NE_EXPR, boolean_type_node, var,
> +					   build_int_cst (TREE_TYPE (var), 0)),
> +				   gimple_assign_rhs2 (last_stmt),
> +				   gimple_assign_rhs3 (last_stmt));
> +	}
> +      else
> +	{
> +	  pattern_stmt
> +	    = gimple_build_cond (gimple_cond_code (cond), gimple_cond_lhs (cond),
> +				 gimple_cond_rhs (cond),
> +				 gimple_cond_true_label (cond),
> +				 gimple_cond_false_label (cond));
> +	  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (var));
> +	  vectype = truth_type_for (vectype);
> +	}
>        *type_out = vectype;
>        vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
>  
> @@ -5725,7 +5765,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>        if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
>  	return NULL;
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> +      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
>  	rhs = adjust_bool_stmts (vinfo, bool_stmts,
>  				 TREE_TYPE (vectype), stmt_vinfo);
>        else
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 582c5e678fad802d6e76300fe3c939b9f2978f17..d801b72a149ebe6aa4d1f2942324b042d07be530 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    vec<tree> vec_oprnds0 = vNULL;
>    vec<tree> vec_oprnds1 = vNULL;
>    tree mask_type;
> -  tree mask;
> +  tree mask = NULL_TREE;
>  
>    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
>      return false;
> @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    /* Transform.  */
>  
>    /* Handle def.  */
> -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> -  mask = vect_create_destination_var (lhs, mask_type);
> +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> +  if (lhs)
> +    mask = vect_create_destination_var (lhs, mask_type);
>  
>    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
>  		     rhs1, &vec_oprnds0, vectype,
> @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>        gimple *new_stmt;
>        vec_rhs2 = vec_oprnds1[i];
>  
> -      new_temp = make_ssa_name (mask);
> +      if (lhs)
> +	new_temp = make_ssa_name (mask);
> +      else
> +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
>        if (bitop1 == NOP_EXPR)
>  	{
>  	  new_stmt = gimple_build_assign (new_temp, code,
> @@ -12723,6 +12727,176 @@ vectorizable_comparison (vec_info *vinfo,
>    return true;
>  }
>  
> +/* Check to see if the current early break given in STMT_INFO is valid for
> +   vectorization.  */
> +
> +static bool
> +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> +{
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> +  if (!loop_vinfo
> +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> +    return false;
> +
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> +    return false;
> +
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> +    return false;
> +
> +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  gcc_assert (vectype);
> +
> +  tree vectype_op0 = NULL_TREE;
> +  slp_tree slp_op0;
> +  tree op0;
> +  enum vect_def_type dt0;
> +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
> +			   &vectype_op0))
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			   "use not simple.\n");
> +	return false;
> +    }
> +
> +  machine_mode mode = TYPE_MODE (vectype);
> +  int ncopies;
> +
> +  if (slp_node)
> +    ncopies = 1;
> +  else
> +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> +
> +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> +
> +  /* Analyze only.  */
> +  if (!vec_stmt)
> +    {
> +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target doesn't support flag setting vector "
> +			       "comparisons.\n");
> +	  return false;
> +	}
> +
> +      if (ncopies > 1
> +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target does not support boolean vector OR for "
> +			       "type %T.\n", vectype);
> +	  return false;
> +	}
> +
> +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				      vec_stmt, slp_node, cost_vec))
> +	return false;
> +
> +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> +	{
> +	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> +					      OPTIMIZE_FOR_SPEED))
> +	    return false;
> +	  else
> +	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> +	}
> +
> +
> +      return true;
> +    }
> +
> +  /* Tranform.  */
> +
> +  tree new_temp = NULL_TREE;
> +  gimple *new_stmt = NULL;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> +
> +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				  vec_stmt, slp_node, cost_vec))
> +    gcc_unreachable ();
> +
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  basic_block cond_bb = gimple_bb (stmt);
> +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> +
> +  auto_vec<tree> stmts;
> +
> +  if (slp_node)
> +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> +  else
> +    {
> +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> +      stmts.reserve_exact (vec_stmts.length ());
> +      for (auto stmt : vec_stmts)
> +	stmts.quick_push (gimple_assign_lhs (stmt));
> +    }
> +
> +  /* Determine if we need to reduce the final value.  */
> +  if (stmts.length () > 1)
> +    {
> +      /* We build the reductions in a way to maintain as much parallelism as
> +	 possible.  */
> +      auto_vec<tree> workset (stmts.length ());
> +      workset.splice (stmts);
> +      while (workset.length () > 1)
> +	{
> +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> +	  tree arg0 = workset.pop ();
> +	  tree arg1 = workset.pop ();
> +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> +				       &cond_gsi);
> +	  workset.quick_insert (0, new_temp);
> +	}
> +    }
> +  else
> +    new_temp = stmts[0];
> +
> +  gcc_assert (new_temp);
> +
> +  tree cond = new_temp;
> +  /* If we have multiple statements after reduction we should check all the
> +     lanes and treat it as a full vector.  */
> +  if (masked_loop_p)
> +    {
> +      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies,
> +				      vectype, 0);
> +      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> +			       &cond_gsi);
> +    }
> +
> +  /* Now build the new conditional.  Pattern gimple_conds get dropped during
> +     codegen so we must replace the original insn.  */
> +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> +  gcond *cond_stmt = as_a <gcond *>(stmt);
> +  gimple_cond_set_condition (cond_stmt, NE_EXPR, cond,
> +			     build_zero_cst (vectype));
> +  update_stmt (stmt);
> +
> +  if (slp_node)
> +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> +   else
> +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> +
> +
> +  if (!slp_node)
> +    *vec_stmt = stmt;
> +
> +  return true;
> +}
> +
>  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
>     can handle all live statements in the node.  Otherwise return true
>     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> @@ -12949,7 +13123,9 @@ vect_analyze_stmt (vec_info *vinfo,
>  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
>  				  stmt_info, NULL, node)
>  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> -				   stmt_info, NULL, node, cost_vec));
> +				   stmt_info, NULL, node, cost_vec)
> +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +				      cost_vec));
>    else
>      {
>        if (bb_vinfo)
> @@ -12972,7 +13148,10 @@ vect_analyze_stmt (vec_info *vinfo,
>  					 NULL, NULL, node, cost_vec)
>  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
>  					  cost_vec)
> -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +					  cost_vec));
> +
>      }
>  
>    if (node)
> @@ -13131,6 +13310,12 @@ vect_transform_stmt (vec_info *vinfo,
>        gcc_assert (done);
>        break;
>  
> +    case loop_exit_ctrl_vec_info_type:
> +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> +				      slp_node, NULL);
> +      gcc_assert (done);
> +      break;
> +
>      default:
>        if (!STMT_VINFO_LIVE_P (stmt_info))
>  	{
> @@ -14321,10 +14506,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>      }
>    else
>      {
> +      gcond *cond = NULL;
>        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
>  	scalar_type = TREE_TYPE (DR_REF (dr));
>        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
>  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> +      else if ((cond = dyn_cast <gcond *> (stmt)))
> +	{
> +	  /* We can't convert the scalar type to boolean yet, since booleans have a
> +	     single bit precision and we need the vector boolean to be a
> +	     representation of the integer mask.  So set the correct integer type and
> +	     convert to boolean vector once we have a vectype.  */
> +	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
> +	}
>        else
>  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
>  
> @@ -14339,12 +14533,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>  			     "get vectype for scalar type: %T\n", scalar_type);
>  	}
>        vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> +
>        if (!vectype)
>  	return opt_result::failure_at (stmt,
>  				       "not vectorized:"
>  				       " unsupported data-type %T\n",
>  				       scalar_type);
>  
> +      /* If we were a gcond, convert the resulting type to a vector boolean type now
> +	 that we have the correct integer mask type.  */
> +      if (cond)
> +	vectype = truth_type_for (vectype);
> +
>        if (dump_enabled_p ())
>  	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
>      }
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

next prev parent reply	other threads:[~2023-12-06  9:40 UTC|newest]

Thread overview: 202+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
2023-06-28 13:41 ` [PATCH 1/19]middle-end ifcvt: Support bitfield lowering of multiple-exit loops Tamar Christina
2023-07-04 11:29   ` Richard Biener
2023-06-28 13:41 ` [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector Tamar Christina
2023-06-29 22:17   ` Jason Merrill
2023-06-30 16:18     ` Tamar Christina
2023-06-30 16:44       ` Jason Merrill
2023-06-28 13:42 ` [PATCH 3/19]middle-end clean up vect testsuite using pragma novector Tamar Christina
2023-06-28 13:54   ` Tamar Christina
2023-07-04 11:31   ` Richard Biener
2023-06-28 13:43 ` [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits Tamar Christina
2023-07-04 11:52   ` Richard Biener
2023-07-04 14:57     ` Jan Hubicka
2023-07-06 14:34       ` Jan Hubicka
2023-07-07  5:59         ` Richard Biener
2023-07-07 12:20           ` Jan Hubicka
2023-07-07 12:27             ` Tamar Christina
2023-07-07 14:10               ` Jan Hubicka
2023-07-10  7:07             ` Richard Biener
2023-07-10  8:33               ` Jan Hubicka
2023-07-10  9:24                 ` Richard Biener
2023-07-10  9:23               ` Jan Hubicka
2023-07-10  9:29                 ` Richard Biener
2023-07-11  9:28                   ` Jan Hubicka
2023-07-11 10:31                     ` Richard Biener
2023-07-11 12:40                       ` Jan Hubicka
2023-07-11 13:04                         ` Richard Biener
2023-06-28 13:43 ` [PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds Tamar Christina
2023-07-04 12:05   ` Richard Biener
2023-07-10 15:32     ` Tamar Christina
2023-07-11 11:03       ` Richard Biener
2023-06-28 13:44 ` [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant Tamar Christina
2023-07-04 12:10   ` Richard Biener
2023-07-06 10:37     ` Tamar Christina
2023-07-06 10:51       ` Richard Biener
2023-06-28 13:44 ` [PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Tamar Christina
2023-07-13 11:32   ` Richard Biener
2023-07-13 11:54     ` Tamar Christina
2023-07-13 12:10       ` Richard Biener
2023-06-28 13:45 ` [PATCH 8/19]middle-end: updated niters analysis to handle multiple exits Tamar Christina
2023-07-13 11:49   ` Richard Biener
2023-07-13 12:03     ` Tamar Christina
2023-07-14  9:09     ` Richard Biener
2023-06-28 13:45 ` [PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison to make the main body re-usable Tamar Christina
2023-06-28 13:55   ` [PATCH 9/19] " Tamar Christina
2023-07-13 16:23     ` Richard Biener
2023-06-28 13:46 ` [PATCH 10/19]middle-end: implement vectorizable_early_break Tamar Christina
2023-06-28 13:46 ` [PATCH 11/19]middle-end: implement code motion for early break Tamar Christina
2023-06-28 13:47 ` [PATCH 12/19]middle-end: implement loop peeling and IV updates " Tamar Christina
2023-07-13 17:31   ` Richard Biener
2023-07-13 19:05     ` Tamar Christina
2023-07-14 13:34       ` Richard Biener
2023-07-17 10:56         ` Tamar Christina
2023-07-17 12:48           ` Richard Biener
2023-08-18 11:35         ` Tamar Christina
2023-08-18 12:53           ` Richard Biener
2023-08-18 13:12             ` Tamar Christina
2023-08-18 13:15               ` Richard Biener
2023-10-23 20:21         ` Tamar Christina
2023-06-28 13:47 ` [PATCH 13/19]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
2023-06-28 13:47 ` [PATCH 14/19]middle-end testsuite: Add new tests for early break vectorization Tamar Christina
2023-06-28 13:48 ` [PATCH 15/19]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
2023-06-28 13:48 ` [PATCH 16/19]AArch64 Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
2023-06-28 13:48 ` [PATCH 17/19]AArch64 Add optimization for vector cbranch combining SVE and " Tamar Christina
2023-06-28 13:49 ` [PATCH 18/19]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
2023-06-28 13:50 ` [PATCH 19/19]Arm: Add MVE " Tamar Christina
     [not found] ` <MW5PR11MB5908414D8B2AB0580A888ECAA924A@MW5PR11MB5908.namprd11.prod.outlook.com>
2023-06-28 14:49   ` FW: [PATCH v5 0/19] Support early break/return auto-vectorization 钟居哲
2023-06-28 16:00     ` Tamar Christina
2023-11-06  7:36 ` [PATCH v6 0/21]middle-end: " Tamar Christina
2023-11-06  7:37 ` [PATCH 1/21]middle-end testsuite: Add more pragma novector to new tests Tamar Christina
2023-11-07  9:46   ` Richard Biener
2023-11-06  7:37 ` [PATCH 2/21]middle-end testsuite: Add tests for early break vectorization Tamar Christina
2023-11-07  9:52   ` Richard Biener
2023-11-16 10:53     ` Richard Biener
2023-11-06  7:37 ` [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks Tamar Christina
2023-11-07 10:53   ` Richard Biener
2023-11-07 11:34     ` Tamar Christina
2023-11-07 14:23       ` Richard Biener
2023-12-19 10:11         ` Tamar Christina
2023-12-19 14:05           ` Richard Biener
2023-12-20 10:51             ` Tamar Christina
2023-12-20 12:24               ` Richard Biener
2023-11-06  7:38 ` [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form " Tamar Christina
2023-11-15  0:00   ` Tamar Christina
2023-11-15 12:40     ` Richard Biener
2023-11-20 21:51       ` Tamar Christina
2023-11-24 10:16         ` Tamar Christina
2023-11-24 12:38           ` Richard Biener
2023-11-06  7:38 ` [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch Tamar Christina
2023-11-07 15:04   ` Richard Biener
2023-11-07 23:10     ` Tamar Christina
2023-11-13 20:11     ` Tamar Christina
2023-11-14  7:56       ` Richard Biener
2023-11-14  8:07         ` Tamar Christina
2023-11-14 23:59           ` Tamar Christina
2023-11-15 12:14             ` Richard Biener
2023-11-06  7:38 ` [PATCH 6/21]middle-end: support multiple exits in loop versioning Tamar Christina
2023-11-07 14:54   ` Richard Biener
2023-11-06  7:39 ` [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Tamar Christina
2023-11-15  0:03   ` Tamar Christina
2023-11-15 13:01     ` Richard Biener
2023-11-15 13:09       ` Tamar Christina
2023-11-15 13:22         ` Richard Biener
2023-11-15 14:14           ` Tamar Christina
2023-11-16 10:40             ` Richard Biener
2023-11-16 11:08               ` Tamar Christina
2023-11-16 11:27                 ` Richard Biener
2023-11-16 12:01                   ` Tamar Christina
2023-11-16 12:30                     ` Richard Biener
2023-11-16 13:22                       ` Tamar Christina
2023-11-16 13:35                         ` Richard Biener
2023-11-16 14:14                           ` Tamar Christina
2023-11-16 14:17                             ` Richard Biener
2023-11-16 15:19                               ` Tamar Christina
2023-11-16 18:41                                 ` Tamar Christina
2023-11-17 10:40                                   ` Tamar Christina
2023-11-17 12:13                                     ` Richard Biener
2023-11-20 21:54                                       ` Tamar Christina
2023-11-24 10:18                                         ` Tamar Christina
2023-11-24 12:41                                           ` Richard Biener
2023-11-06  7:39 ` [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits Tamar Christina
2023-11-15  0:05   ` Tamar Christina
2023-11-15 13:41     ` Richard Biener
2023-11-15 14:26       ` Tamar Christina
2023-11-16 11:16         ` Richard Biener
2023-11-20 21:57           ` Tamar Christina
2023-11-24 10:20             ` Tamar Christina
2023-11-24 13:23               ` Richard Biener
2023-11-27 22:47                 ` Tamar Christina
2023-11-29 13:28                   ` Richard Biener
2023-11-29 21:22                     ` Tamar Christina
2023-11-30 13:23                       ` Richard Biener
2023-12-06  4:21                         ` Tamar Christina
2023-12-06  9:33                           ` Richard Biener
2023-11-06  7:39 ` [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code Tamar Christina
2023-11-27 22:49   ` Tamar Christina
2023-11-29 13:50     ` Richard Biener
2023-12-06  4:37       ` Tamar Christina
2023-12-06  9:37         ` Richard Biener [this message]
2023-12-08  8:58           ` Tamar Christina
2023-12-08 10:28             ` Richard Biener
2023-12-08 13:45               ` Tamar Christina
2023-12-08 13:59                 ` Richard Biener
2023-12-08 15:01                   ` Tamar Christina
2023-12-11  7:09                   ` Tamar Christina
2023-12-11  9:36                     ` Richard Biener
2023-12-11 23:12                       ` Tamar Christina
2023-12-12 10:10                         ` Richard Biener
2023-12-12 10:27                           ` Tamar Christina
2023-12-12 10:59                           ` Richard Sandiford
2023-12-12 11:30                             ` Richard Biener
2023-12-13 14:13                               ` Tamar Christina
2023-12-14 13:12                                 ` Richard Biener
2023-12-14 18:44                                   ` Tamar Christina
2023-11-06  7:39 ` [PATCH 10/21]middle-end: implement relevancy analysis support for control flow Tamar Christina
2023-11-27 22:49   ` Tamar Christina
2023-11-29 14:47     ` Richard Biener
2023-12-06  4:10       ` Tamar Christina
2023-12-06  9:44         ` Richard Biener
2023-11-06  7:40 ` [PATCH 11/21]middle-end: wire through peeling changes and dominator updates after guard edge split Tamar Christina
2023-11-06  7:40 ` [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks Tamar Christina
2023-11-27 22:48   ` Tamar Christina
2023-12-06  8:31   ` Richard Biener
2023-12-06  9:10     ` Tamar Christina
2023-12-06  9:27       ` Richard Biener
2023-11-06  7:40 ` [PATCH 13/21]middle-end: Update loop form analysis to support early break Tamar Christina
2023-11-27 22:48   ` Tamar Christina
2023-12-06  4:00     ` Tamar Christina
2023-12-06  8:18   ` Richard Biener
2023-12-06  8:52     ` Tamar Christina
2023-12-06  9:15       ` Richard Biener
2023-12-06  9:29         ` Tamar Christina
2023-11-06  7:41 ` [PATCH 14/21]middle-end: Change loop analysis from looking at at number of BB to actual cfg Tamar Christina
2023-11-06 14:44   ` Richard Biener
2023-11-06  7:41 ` [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging Tamar Christina
2023-12-09 10:38   ` Richard Sandiford
2023-12-11  7:38     ` Richard Biener
2023-12-11  8:49       ` Tamar Christina
2023-12-11  9:00         ` Richard Biener
2023-11-06  7:41 ` [PATCH 16/21]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
2023-11-06  7:41 ` [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
2023-11-28 16:37   ` Richard Sandiford
2023-11-28 17:55     ` Richard Sandiford
2023-12-06 16:25       ` Tamar Christina
2023-12-07  0:56         ` Richard Sandiford
2023-12-14 18:40           ` Tamar Christina
2023-12-14 19:34             ` Richard Sandiford
2023-11-06  7:42 ` [PATCH 18/21]AArch64: Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
2023-11-06  7:42 ` [PATCH 19/21]AArch64: Add optimization for vector cbranch combining SVE and " Tamar Christina
2023-11-06  7:42 ` [PATCH 20/21]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
2023-11-27 12:48   ` Kyrylo Tkachov
2023-11-06  7:43 ` [PATCH 21/21]Arm: Add MVE " Tamar Christina
2023-11-27 12:47   ` Kyrylo Tkachov
2023-11-06 14:25 ` [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization Richard Biener
2023-11-06 15:17   ` Tamar Christina
2023-11-07  9:42     ` Richard Biener
2023-11-07 10:47       ` Tamar Christina
2023-11-07 13:58         ` Richard Biener
2023-11-27 18:30           ` Richard Sandiford
2023-11-28  8:11             ` Richard Biener
2023-11-30  3:47 [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code juzhe.zhong
2023-11-30 10:39 ` Tamar Christina

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5r3p7378-q309-ooqo-7o76-q9r567ns1890@fhfr.qr \
    --to=rguenther@suse.de \
    --cc=Tamar.Christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jlaw@ventanamicro.com \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).