Re: [PATCH 1/5] Do single-lane SLP discovery for reductions

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: [PATCH 1/5] Do single-lane SLP discovery for reductions
@ 2024-05-27 12:45 Richard Biener
  2024-05-29  9:08 ` Richard Sandiford
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Biener @ 2024-05-27 12:45 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, tamar.christina

On Fri, 24 May 2024, Richard Biener wrote:

> This is the second merge proposed from the SLP vectorizer branch.
> I have again managed without adding and using --param vect-single-lane-slp
> but instead this provides always enabled functionality.
> 
> This makes us use SLP reductions (a group of reductions) for the
> case where the group size is one.  This basically means we try
> to use SLP for all reductions.
> 
> I've kept the series close to changes how they are on the branch
> but in the end I'll squash it, having separate commits for review
> eventually helps identifying common issues we will run into.  In
> particular we lack full SLP support for several reduction kinds
> and the branch has more enabling patches than in this series.
> For example 4/5 makes sure we use shifts and direct opcode
> reductions in the reduction epilog for SLP reductions but doesn't
> bother to try covering the general case but enables it only
> for the single-element group case to avoid regressions
> in gcc.dg/vect/reduc-{mul,or}_[12].c testcases.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, I've also
> successfully built SPEC CPU 2017.  This posting should trigger
> arm & riscv pre-checkin CI.
> 
> There's one ICE in gcc.target/i386/pr51235.c I discovered late
> that I will investigate and address after the weekend.

I've fixed this now.

On aarch64 and arm there's

FAIL: gcc.dg/vect/slp-reduc-3.c scan-tree-dump-times vect "VEC_PERM_EXPR" 
0

which is a testism, I _think_ due to a bogus vect_load_lanes check
in that line.  The code is as expected not using a SLP reduction of
two lanes due to the widen-sum pattern used.  It might be that we
somehow fail to use load-lanes when vectorizing the load with SLP
which means that for SLP reductions we fail to consider
load-lanes as override.  I think we should leave this FAIL, we need to
work to get load-lanes vectorization from SLP anyway.  To fix this
the load-permutation followup I have in the works will be necessary.

I also see

FAIL: gcc.target/aarch64/sve/dot_1.c scan-assembler-times \\twhilelo\\t 8
FAIL: gcc.target/aarch64/sve/reduc_4.c scan-assembler-not \\tfadd\\t
FAIL: gcc.target/aarch64/sve/sad_1.c scan-assembler-times 
\\tudot\\tz[0-9]+\\.s, z[0-9]+\\.b, z[0-9]+\\.b\\n 2

but scan-assemblers are not my favorite.  For example dot_1.c has
twice as many whilelo, but I'm not sure what goes wrong.

There are quite some regressions reported for RISC-V, I looked at the
ICEs and fixed them but I did not investigate any of the assembly
scanning FAILs.

I'll re-spin the series with the fixes tomorrow.
If anybody wants to point out something I should investigate please
speak up.

Thanks,
Richard.

> This change should be more straight-forward than the previous one,
> still comments are of course welcome.  After pushed I will followup
> with changes to enable single-lane SLP reductions for various
> COND_EXPR reductions as well as double-reduction support and
> in-order reduction support (also all restricted to single-lane
> for the moment).
>
> Thanks,
> Richard.
> 
> --
> 
> The following performs single-lane SLP discovery for reductions.
> This exposes a latent issue with reduction SLP in outer loop
> vectorization and makes gcc.dg/vect/vect-outer-4[fgkl].c FAIL
> execution.
> 
> 	* tree-vect-slp.cc (vect_build_slp_tree_2): Only multi-lane
> 	discoveries are reduction chains and need special backedge
> 	treatment.
> 	(vect_analyze_slp): Fall back to single-lane SLP discovery
> 	for reductions.  Make sure to try single-lane SLP reduction
> 	for all reductions as fallback.
> ---
>  gcc/tree-vect-slp.cc | 71 +++++++++++++++++++++++++++++++++-----------
>  1 file changed, 54 insertions(+), 17 deletions(-)
> 
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index c7ed520b629..73cc69d85ce 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -1907,7 +1907,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
>  	    /* Reduction chain backedge defs are filled manually.
>  	       ???  Need a better way to identify a SLP reduction chain PHI.
>  	       Or a better overall way to SLP match those.  */
> -	    if (all_same && def_type == vect_reduction_def)
> +	    if (stmts.length () > 1
> +		&& all_same && def_type == vect_reduction_def)
>  	      skip_args[loop_latch_edge (loop)->dest_idx] = true;
>  	  }
>  	else if (def_type != vect_internal_def)
> @@ -3905,9 +3906,10 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
>  	  }
>  
>        /* Find SLP sequences starting from groups of reductions.  */
> -      if (loop_vinfo->reductions.length () > 1)
> +      if (loop_vinfo->reductions.length () > 0)
>  	{
> -	  /* Collect reduction statements.  */
> +	  /* Collect reduction statements we can combine into
> +	     a SLP reduction.  */
>  	  vec<stmt_vec_info> scalar_stmts;
>  	  scalar_stmts.create (loop_vinfo->reductions.length ());
>  	  for (auto next_info : loop_vinfo->reductions)
> @@ -3920,25 +3922,60 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
>  		     reduction path.  In that case we'd have to reverse
>  		     engineer that conversion stmt following the chain using
>  		     reduc_idx and from the PHI using reduc_def.  */
> -		  && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def
> -		  /* Do not discover SLP reductions for lane-reducing ops, that
> -		     will fail later.  */
> -		  && (!(g = dyn_cast <gassign *> (STMT_VINFO_STMT (next_info)))
> +		  && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def)
> +		{
> +		  /* Do not discover SLP reductions combining lane-reducing
> +		     ops, that will fail later.  */
> +		  if (!(g = dyn_cast <gassign *> (STMT_VINFO_STMT (next_info)))
>  		      || (gimple_assign_rhs_code (g) != DOT_PROD_EXPR
>  			  && gimple_assign_rhs_code (g) != WIDEN_SUM_EXPR
> -			  && gimple_assign_rhs_code (g) != SAD_EXPR)))
> -		scalar_stmts.quick_push (next_info);
> +			  && gimple_assign_rhs_code (g) != SAD_EXPR))
> +		    scalar_stmts.quick_push (next_info);
> +		  else
> +		    {
> +		      /* Do SLP discovery for single-lane reductions.  */
> +		      vec<stmt_vec_info> stmts;
> +		      vec<stmt_vec_info> roots = vNULL;
> +		      vec<tree> remain = vNULL;
> +		      stmts.create (1);
> +		      stmts.quick_push (next_info);
> +		      vect_build_slp_instance (vinfo,
> +					       slp_inst_kind_reduc_group,
> +					       stmts, roots, remain,
> +					       max_tree_size, &limit,
> +					       bst_map, NULL);
> +		    }
> +		}
>  	    }
> -	  if (scalar_stmts.length () > 1)
> +	  /* Save for re-processing on failure.  */
> +	  vec<stmt_vec_info> saved_stmts = scalar_stmts.copy ();
> +	  vec<stmt_vec_info> roots = vNULL;
> +	  vec<tree> remain = vNULL;
> +	  if (scalar_stmts.length () <= 1
> +	      || !vect_build_slp_instance (loop_vinfo,
> +					   slp_inst_kind_reduc_group,
> +					   scalar_stmts, roots, remain,
> +					   max_tree_size, &limit, bst_map,
> +					   NULL))
>  	    {
> -	      vec<stmt_vec_info> roots = vNULL;
> -	      vec<tree> remain = vNULL;
> -	      vect_build_slp_instance (loop_vinfo, slp_inst_kind_reduc_group,
> -				       scalar_stmts, roots, remain,
> -				       max_tree_size, &limit, bst_map, NULL);
> +	      if (scalar_stmts.length () <= 1)
> +		scalar_stmts.release ();
> +	      /* Do SLP discovery for single-lane reductions.  */
> +	      for (auto stmt_info : saved_stmts)
> +		{
> +		  vec<stmt_vec_info> stmts;
> +		  vec<stmt_vec_info> roots = vNULL;
> +		  vec<tree> remain = vNULL;
> +		  stmts.create (1);
> +		  stmts.quick_push (vect_stmt_to_vectorize (stmt_info));
> +		  vect_build_slp_instance (vinfo,
> +					   slp_inst_kind_reduc_group,
> +					   stmts, roots, remain,
> +					   max_tree_size, &limit,
> +					   bst_map, NULL);
> +		}
> +	      saved_stmts.release ();
>  	    }
> -	  else
> -	    scalar_stmts.release ();
>  	}
>      }
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/5] Do single-lane SLP discovery for reductions
  2024-05-27 12:45 [PATCH 1/5] Do single-lane SLP discovery for reductions Richard Biener
@ 2024-05-29  9:08 ` Richard Sandiford
  0 siblings, 0 replies; 3+ messages in thread
From: Richard Sandiford @ 2024-05-29  9:08 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, tamar.christina

Richard Biener <rguenther@suse.de> writes:
> On Fri, 24 May 2024, Richard Biener wrote:
>
>> This is the second merge proposed from the SLP vectorizer branch.
>> I have again managed without adding and using --param vect-single-lane-slp
>> but instead this provides always enabled functionality.
>> 
>> This makes us use SLP reductions (a group of reductions) for the
>> case where the group size is one.  This basically means we try
>> to use SLP for all reductions.
>> 
>> I've kept the series close to changes how they are on the branch
>> but in the end I'll squash it, having separate commits for review
>> eventually helps identifying common issues we will run into.  In
>> particular we lack full SLP support for several reduction kinds
>> and the branch has more enabling patches than in this series.
>> For example 4/5 makes sure we use shifts and direct opcode
>> reductions in the reduction epilog for SLP reductions but doesn't
>> bother to try covering the general case but enables it only
>> for the single-element group case to avoid regressions
>> in gcc.dg/vect/reduc-{mul,or}_[12].c testcases.
>> 
>> Bootstrapped and tested on x86_64-unknown-linux-gnu, I've also
>> successfully built SPEC CPU 2017.  This posting should trigger
>> arm & riscv pre-checkin CI.
>> 
>> There's one ICE in gcc.target/i386/pr51235.c I discovered late
>> that I will investigate and address after the weekend.
>
> I've fixed this now.
>
> On aarch64 and arm there's
>
> FAIL: gcc.dg/vect/slp-reduc-3.c scan-tree-dump-times vect "VEC_PERM_EXPR" 
> 0
>
> which is a testism, I _think_ due to a bogus vect_load_lanes check
> in that line.  The code is as expected not using a SLP reduction of
> two lanes due to the widen-sum pattern used.  It might be that we
> somehow fail to use load-lanes when vectorizing the load with SLP
> which means that for SLP reductions we fail to consider
> load-lanes as override.  I think we should leave this FAIL, we need to
> work to get load-lanes vectorization from SLP anyway.  To fix this
> the load-permutation followup I have in the works will be necessary.

Sounds good to me FWIW.

> I also see
>
> FAIL: gcc.target/aarch64/sve/dot_1.c scan-assembler-times \\twhilelo\\t 8
> FAIL: gcc.target/aarch64/sve/reduc_4.c scan-assembler-not \\tfadd\\t
> FAIL: gcc.target/aarch64/sve/sad_1.c scan-assembler-times 
> \\tudot\\tz[0-9]+\\.s, z[0-9]+\\.b, z[0-9]+\\.b\\n 2
>
> but scan-assemblers are not my favorite.  For example dot_1.c has
> twice as many whilelo, but I'm not sure what goes wrong.
>
> There are quite some regressions reported for RISC-V, I looked at the
> ICEs and fixed them but I did not investigate any of the assembly
> scanning FAILs.
>
> I'll re-spin the series with the fixes tomorrow.
> If anybody wants to point out something I should investigate please
> speak up.

Thanks for checking the aarch64 results.  I'll look at the three SVE
failures once the patch is in.  Many of the tests are supposed to ensure
that we generate correct code for a given set of choices.  Sometimes
it's necessary to update the flags to retain the same of choices,
e.g. due to costing changes or general vectoriser improvements.

That is, the point of these tests isn't necessarily to make sure that we
get the "best" SVE code for the source -- especially since there isn't
really an abstract, objective "best" that applies to all targets.
The tests are instead reognising that we have mulitple techniques for
doing some things, and are trying to make sure that each of those
techniques works individually.

Realise that kind of test isn't popular with everyone.  The quid
pro quo is that we (AArch64 folks) get to look at the tests when
failures show up :)

Richard

>
> Thanks,
> Richard.
>
>> This change should be more straight-forward than the previous one,
>> still comments are of course welcome.  After pushed I will followup
>> with changes to enable single-lane SLP reductions for various
>> COND_EXPR reductions as well as double-reduction support and
>> in-order reduction support (also all restricted to single-lane
>> for the moment).
>>
>> Thanks,
>> Richard.
>> 
>> --
>> 
>> The following performs single-lane SLP discovery for reductions.
>> This exposes a latent issue with reduction SLP in outer loop
>> vectorization and makes gcc.dg/vect/vect-outer-4[fgkl].c FAIL
>> execution.
>> 
>> 	* tree-vect-slp.cc (vect_build_slp_tree_2): Only multi-lane
>> 	discoveries are reduction chains and need special backedge
>> 	treatment.
>> 	(vect_analyze_slp): Fall back to single-lane SLP discovery
>> 	for reductions.  Make sure to try single-lane SLP reduction
>> 	for all reductions as fallback.
>> ---
>>  gcc/tree-vect-slp.cc | 71 +++++++++++++++++++++++++++++++++-----------
>>  1 file changed, 54 insertions(+), 17 deletions(-)
>> 
>> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
>> index c7ed520b629..73cc69d85ce 100644
>> --- a/gcc/tree-vect-slp.cc
>> +++ b/gcc/tree-vect-slp.cc
>> @@ -1907,7 +1907,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
>>  	    /* Reduction chain backedge defs are filled manually.
>>  	       ???  Need a better way to identify a SLP reduction chain PHI.
>>  	       Or a better overall way to SLP match those.  */
>> -	    if (all_same && def_type == vect_reduction_def)
>> +	    if (stmts.length () > 1
>> +		&& all_same && def_type == vect_reduction_def)
>>  	      skip_args[loop_latch_edge (loop)->dest_idx] = true;
>>  	  }
>>  	else if (def_type != vect_internal_def)
>> @@ -3905,9 +3906,10 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
>>  	  }
>>  
>>        /* Find SLP sequences starting from groups of reductions.  */
>> -      if (loop_vinfo->reductions.length () > 1)
>> +      if (loop_vinfo->reductions.length () > 0)
>>  	{
>> -	  /* Collect reduction statements.  */
>> +	  /* Collect reduction statements we can combine into
>> +	     a SLP reduction.  */
>>  	  vec<stmt_vec_info> scalar_stmts;
>>  	  scalar_stmts.create (loop_vinfo->reductions.length ());
>>  	  for (auto next_info : loop_vinfo->reductions)
>> @@ -3920,25 +3922,60 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
>>  		     reduction path.  In that case we'd have to reverse
>>  		     engineer that conversion stmt following the chain using
>>  		     reduc_idx and from the PHI using reduc_def.  */
>> -		  && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def
>> -		  /* Do not discover SLP reductions for lane-reducing ops, that
>> -		     will fail later.  */
>> -		  && (!(g = dyn_cast <gassign *> (STMT_VINFO_STMT (next_info)))
>> +		  && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def)
>> +		{
>> +		  /* Do not discover SLP reductions combining lane-reducing
>> +		     ops, that will fail later.  */
>> +		  if (!(g = dyn_cast <gassign *> (STMT_VINFO_STMT (next_info)))
>>  		      || (gimple_assign_rhs_code (g) != DOT_PROD_EXPR
>>  			  && gimple_assign_rhs_code (g) != WIDEN_SUM_EXPR
>> -			  && gimple_assign_rhs_code (g) != SAD_EXPR)))
>> -		scalar_stmts.quick_push (next_info);
>> +			  && gimple_assign_rhs_code (g) != SAD_EXPR))
>> +		    scalar_stmts.quick_push (next_info);
>> +		  else
>> +		    {
>> +		      /* Do SLP discovery for single-lane reductions.  */
>> +		      vec<stmt_vec_info> stmts;
>> +		      vec<stmt_vec_info> roots = vNULL;
>> +		      vec<tree> remain = vNULL;
>> +		      stmts.create (1);
>> +		      stmts.quick_push (next_info);
>> +		      vect_build_slp_instance (vinfo,
>> +					       slp_inst_kind_reduc_group,
>> +					       stmts, roots, remain,
>> +					       max_tree_size, &limit,
>> +					       bst_map, NULL);
>> +		    }
>> +		}
>>  	    }
>> -	  if (scalar_stmts.length () > 1)
>> +	  /* Save for re-processing on failure.  */
>> +	  vec<stmt_vec_info> saved_stmts = scalar_stmts.copy ();
>> +	  vec<stmt_vec_info> roots = vNULL;
>> +	  vec<tree> remain = vNULL;
>> +	  if (scalar_stmts.length () <= 1
>> +	      || !vect_build_slp_instance (loop_vinfo,
>> +					   slp_inst_kind_reduc_group,
>> +					   scalar_stmts, roots, remain,
>> +					   max_tree_size, &limit, bst_map,
>> +					   NULL))
>>  	    {
>> -	      vec<stmt_vec_info> roots = vNULL;
>> -	      vec<tree> remain = vNULL;
>> -	      vect_build_slp_instance (loop_vinfo, slp_inst_kind_reduc_group,
>> -				       scalar_stmts, roots, remain,
>> -				       max_tree_size, &limit, bst_map, NULL);
>> +	      if (scalar_stmts.length () <= 1)
>> +		scalar_stmts.release ();
>> +	      /* Do SLP discovery for single-lane reductions.  */
>> +	      for (auto stmt_info : saved_stmts)
>> +		{
>> +		  vec<stmt_vec_info> stmts;
>> +		  vec<stmt_vec_info> roots = vNULL;
>> +		  vec<tree> remain = vNULL;
>> +		  stmts.create (1);
>> +		  stmts.quick_push (vect_stmt_to_vectorize (stmt_info));
>> +		  vect_build_slp_instance (vinfo,
>> +					   slp_inst_kind_reduc_group,
>> +					   stmts, roots, remain,
>> +					   max_tree_size, &limit,
>> +					   bst_map, NULL);
>> +		}
>> +	      saved_stmts.release ();
>>  	    }
>> -	  else
>> -	    scalar_stmts.release ();
>>  	}
>>      }
>>  
>> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/5] Do single-lane SLP discovery for reductions
@ 2024-05-24 13:42 Richard Biener
  0 siblings, 0 replies; 3+ messages in thread
From: Richard Biener @ 2024-05-24 13:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, tamar.christina

This is the second merge proposed from the SLP vectorizer branch.
I have again managed without adding and using --param vect-single-lane-slp
but instead this provides always enabled functionality.

This makes us use SLP reductions (a group of reductions) for the
case where the group size is one.  This basically means we try
to use SLP for all reductions.

I've kept the series close to changes how they are on the branch
but in the end I'll squash it, having separate commits for review
eventually helps identifying common issues we will run into.  In
particular we lack full SLP support for several reduction kinds
and the branch has more enabling patches than in this series.
For example 4/5 makes sure we use shifts and direct opcode
reductions in the reduction epilog for SLP reductions but doesn't
bother to try covering the general case but enables it only
for the single-element group case to avoid regressions
in gcc.dg/vect/reduc-{mul,or}_[12].c testcases.

Bootstrapped and tested on x86_64-unknown-linux-gnu, I've also
successfully built SPEC CPU 2017.  This posting should trigger
arm & riscv pre-checkin CI.

There's one ICE in gcc.target/i386/pr51235.c I discovered late
that I will investigate and address after the weekend.

This change should be more straight-forward than the previous one,
still comments are of course welcome.  After pushed I will followup
with changes to enable single-lane SLP reductions for various
COND_EXPR reductions as well as double-reduction support and
in-order reduction support (also all restricted to single-lane
for the moment).

Thanks,
Richard.

--

The following performs single-lane SLP discovery for reductions.
This exposes a latent issue with reduction SLP in outer loop
vectorization and makes gcc.dg/vect/vect-outer-4[fgkl].c FAIL
execution.

	* tree-vect-slp.cc (vect_build_slp_tree_2): Only multi-lane
	discoveries are reduction chains and need special backedge
	treatment.
	(vect_analyze_slp): Fall back to single-lane SLP discovery
	for reductions.  Make sure to try single-lane SLP reduction
	for all reductions as fallback.
---
 gcc/tree-vect-slp.cc | 71 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 54 insertions(+), 17 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index c7ed520b629..73cc69d85ce 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1907,7 +1907,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
 	    /* Reduction chain backedge defs are filled manually.
 	       ???  Need a better way to identify a SLP reduction chain PHI.
 	       Or a better overall way to SLP match those.  */
-	    if (all_same && def_type == vect_reduction_def)
+	    if (stmts.length () > 1
+		&& all_same && def_type == vect_reduction_def)
 	      skip_args[loop_latch_edge (loop)->dest_idx] = true;
 	  }
 	else if (def_type != vect_internal_def)
@@ -3905,9 +3906,10 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
 	  }
 
       /* Find SLP sequences starting from groups of reductions.  */
-      if (loop_vinfo->reductions.length () > 1)
+      if (loop_vinfo->reductions.length () > 0)
 	{
-	  /* Collect reduction statements.  */
+	  /* Collect reduction statements we can combine into
+	     a SLP reduction.  */
 	  vec<stmt_vec_info> scalar_stmts;
 	  scalar_stmts.create (loop_vinfo->reductions.length ());
 	  for (auto next_info : loop_vinfo->reductions)
@@ -3920,25 +3922,60 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
 		     reduction path.  In that case we'd have to reverse
 		     engineer that conversion stmt following the chain using
 		     reduc_idx and from the PHI using reduc_def.  */
-		  && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def
-		  /* Do not discover SLP reductions for lane-reducing ops, that
-		     will fail later.  */
-		  && (!(g = dyn_cast <gassign *> (STMT_VINFO_STMT (next_info)))
+		  && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def)
+		{
+		  /* Do not discover SLP reductions combining lane-reducing
+		     ops, that will fail later.  */
+		  if (!(g = dyn_cast <gassign *> (STMT_VINFO_STMT (next_info)))
 		      || (gimple_assign_rhs_code (g) != DOT_PROD_EXPR
 			  && gimple_assign_rhs_code (g) != WIDEN_SUM_EXPR
-			  && gimple_assign_rhs_code (g) != SAD_EXPR)))
-		scalar_stmts.quick_push (next_info);
+			  && gimple_assign_rhs_code (g) != SAD_EXPR))
+		    scalar_stmts.quick_push (next_info);
+		  else
+		    {
+		      /* Do SLP discovery for single-lane reductions.  */
+		      vec<stmt_vec_info> stmts;
+		      vec<stmt_vec_info> roots = vNULL;
+		      vec<tree> remain = vNULL;
+		      stmts.create (1);
+		      stmts.quick_push (next_info);
+		      vect_build_slp_instance (vinfo,
+					       slp_inst_kind_reduc_group,
+					       stmts, roots, remain,
+					       max_tree_size, &limit,
+					       bst_map, NULL);
+		    }
+		}
 	    }
-	  if (scalar_stmts.length () > 1)
+	  /* Save for re-processing on failure.  */
+	  vec<stmt_vec_info> saved_stmts = scalar_stmts.copy ();
+	  vec<stmt_vec_info> roots = vNULL;
+	  vec<tree> remain = vNULL;
+	  if (scalar_stmts.length () <= 1
+	      || !vect_build_slp_instance (loop_vinfo,
+					   slp_inst_kind_reduc_group,
+					   scalar_stmts, roots, remain,
+					   max_tree_size, &limit, bst_map,
+					   NULL))
 	    {
-	      vec<stmt_vec_info> roots = vNULL;
-	      vec<tree> remain = vNULL;
-	      vect_build_slp_instance (loop_vinfo, slp_inst_kind_reduc_group,
-				       scalar_stmts, roots, remain,
-				       max_tree_size, &limit, bst_map, NULL);
+	      if (scalar_stmts.length () <= 1)
+		scalar_stmts.release ();
+	      /* Do SLP discovery for single-lane reductions.  */
+	      for (auto stmt_info : saved_stmts)
+		{
+		  vec<stmt_vec_info> stmts;
+		  vec<stmt_vec_info> roots = vNULL;
+		  vec<tree> remain = vNULL;
+		  stmts.create (1);
+		  stmts.quick_push (vect_stmt_to_vectorize (stmt_info));
+		  vect_build_slp_instance (vinfo,
+					   slp_inst_kind_reduc_group,
+					   stmts, roots, remain,
+					   max_tree_size, &limit,
+					   bst_map, NULL);
+		}
+	      saved_stmts.release ();
 	    }
-	  else
-	    scalar_stmts.release ();
 	}
     }
 
-- 
2.35.3


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-05-29  9:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-27 12:45 [PATCH 1/5] Do single-lane SLP discovery for reductions Richard Biener
2024-05-29  9:08 ` Richard Sandiford
  -- strict thread matches above, loose matches on Subject: below --
2024-05-24 13:42 Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).