Re: [PATCH] tree-optimization/104010 - fix SLP scalar costing with patterns

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: [PATCH] tree-optimization/104010 - fix SLP scalar costing with patterns
@ 2022-04-13 13:46 Richard Biener
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Biener @ 2022-04-13 13:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, richard.earnshaw

On Wed, 13 Apr 2022, Richard Biener wrote:

> When doing BB vectorization the scalar cost compute is derailed
> by patterns, causing lanes to be considered live and thus not
> costed on the scalar side.  For the testcase in PR104010 this
> prevents vectorization which was done by GCC 11.  PR103941
> shows similar cases of missed optimizations that are fixed by
> this patch.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> I'm only considering this now because PR104010 is identified
> as regression on arm - Richards, what do you think?  I do think
> this will enable vectorization of more stuff now which might
> be good or bad - who knowns, but at least it needs to involve
> patterns.
> 
> Thanks,
> Richard.
> 
> 2022-04-13  Richard Biener  <rguenther@suse.de>
> 
> 	PR tree-optimization/104010
> 	PR tree-optimization/103941
> 	* tree-vect-slp.cc (vect_bb_slp_scalar_cost): When
> 	we run into stmts in patterns continue walking those
> 	for uses outside of the vectorized region instead of
> 	marking the lane live.
> 
> 	* gcc.target/i386/pr103941-1.c: New testcase.
> 	* gcc.target/i386/pr103941-2.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/i386/pr103941-1.c | 14 +++++++
>  gcc/testsuite/gcc.target/i386/pr103941-2.c | 12 ++++++
>  gcc/tree-vect-slp.cc                       | 47 ++++++++++++++++------
>  3 files changed, 61 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-2.c
> 
> diff --git a/gcc/testsuite/gcc.target/i386/pr103941-1.c b/gcc/testsuite/gcc.target/i386/pr103941-1.c
> new file mode 100644
> index 00000000000..524fdd0b4b1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103941-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2" } */
> +
> +unsigned char ur[16], ua[16], ub[16];
> +
> +void avgu_v2qi (void)
> +{
> +  int i;
> +
> +  for (i = 0; i < 2; i++)
> +    ur[i] = (ua[i] + ub[i] + 1) >> 1;
> +}
> +
> +/* { dg-final { scan-assembler "pavgb" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr103941-2.c b/gcc/testsuite/gcc.target/i386/pr103941-2.c
> new file mode 100644
> index 00000000000..972a32be997
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103941-2.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2" } */
> +
> +void foo (int *c, float *x, float *y)
> +{
> +  c[0] = x[0] < y[0];
> +  c[1] = x[1] < y[1];
> +  c[2] = x[2] < y[2];
> +  c[3] = x[3] < y[3];
> +}
> +
> +/* { dg-final { scan-assembler "cmpltps" } } */
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 4ac2b70303c..c7687065374 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -5185,22 +5185,45 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
>  	 the scalar cost.  */
>        if (!STMT_VINFO_LIVE_P (stmt_info))
>  	{
> -	  FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF)
> +	  auto_vec<gimple *, 8> worklist;
> +	  hash_set<gimple *> *worklist_visited = NULL;
> +	  worklist.quick_push (orig_stmt);
> +	  do
>  	    {
> -	      imm_use_iterator use_iter;
> -	      gimple *use_stmt;
> -	      FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
> -		if (!is_gimple_debug (use_stmt))
> -		  {
> -		    stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt);
> -		    if (!use_stmt_info
> -			|| !vectorized_scalar_stmts.contains (use_stmt_info))
> +	      gimple *work_stmt = worklist.pop ();
> +	      FOR_EACH_PHI_OR_STMT_DEF (def_p, work_stmt, op_iter, SSA_OP_DEF)
> +		{
> +		  imm_use_iterator use_iter;
> +		  gimple *use_stmt;
> +		  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter,
> +					 DEF_FROM_PTR (def_p))
> +		    if (!is_gimple_debug (use_stmt))
>  		      {
> -			(*life)[i] = true;
> -			break;
> +			stmt_vec_info use_stmt_info
> +			  = vinfo->lookup_stmt (use_stmt);
> +			if (!use_stmt_info
> +			    || !vectorized_scalar_stmts.contains (use_stmt_info))
> +			  {
> +			    if (STMT_VINFO_IN_PATTERN_P (use_stmt_info))

Building SPEC 2017 reveals a missing use_stmt_info && here, consider that
fixed.

Richard.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tree-optimization/104010 - fix SLP scalar costing with patterns
  2022-04-14 13:55     ` Richard Sandiford
@ 2022-04-19  6:59       ` Richard Biener
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Biener @ 2022-04-19  6:59 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, richard.earnshaw

On Thu, 14 Apr 2022, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Thu, 14 Apr 2022, Richard Sandiford wrote:
> >
> >> Richard Biener <rguenther@suse.de> writes:
> >> > When doing BB vectorization the scalar cost compute is derailed
> >> > by patterns, causing lanes to be considered live and thus not
> >> > costed on the scalar side.  For the testcase in PR104010 this
> >> > prevents vectorization which was done by GCC 11.  PR103941
> >> > shows similar cases of missed optimizations that are fixed by
> >> > this patch.
> >> >
> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >> >
> >> > I'm only considering this now because PR104010 is identified
> >> > as regression on arm - Richards, what do you think?  I do think
> >> > this will enable vectorization of more stuff now which might
> >> > be good or bad - who knowns, but at least it needs to involve
> >> > patterns.
> >> >
> >> > Thanks,
> >> > Richard.
> >> >
> >> > 2022-04-13  Richard Biener  <rguenther@suse.de>
> >> >
> >> > 	PR tree-optimization/104010
> >> > 	PR tree-optimization/103941
> >> > 	* tree-vect-slp.cc (vect_bb_slp_scalar_cost): When
> >> > 	we run into stmts in patterns continue walking those
> >> > 	for uses outside of the vectorized region instead of
> >> > 	marking the lane live.
> >> >
> >> > 	* gcc.target/i386/pr103941-1.c: New testcase.
> >> > 	* gcc.target/i386/pr103941-2.c: Likewise.
> >> > ---
> >> >  gcc/testsuite/gcc.target/i386/pr103941-1.c | 14 +++++++
> >> >  gcc/testsuite/gcc.target/i386/pr103941-2.c | 12 ++++++
> >> >  gcc/tree-vect-slp.cc                       | 47 ++++++++++++++++------
> >> >  3 files changed, 61 insertions(+), 12 deletions(-)
> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-1.c
> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-2.c
> >> >
> >> > diff --git a/gcc/testsuite/gcc.target/i386/pr103941-1.c b/gcc/testsuite/gcc.target/i386/pr103941-1.c
> >> > new file mode 100644
> >> > index 00000000000..524fdd0b4b1
> >> > --- /dev/null
> >> > +++ b/gcc/testsuite/gcc.target/i386/pr103941-1.c
> >> > @@ -0,0 +1,14 @@
> >> > +/* { dg-do compile } */
> >> > +/* { dg-options "-O2 -msse2" } */
> >> > +
> >> > +unsigned char ur[16], ua[16], ub[16];
> >> > +
> >> > +void avgu_v2qi (void)
> >> > +{
> >> > +  int i;
> >> > +
> >> > +  for (i = 0; i < 2; i++)
> >> > +    ur[i] = (ua[i] + ub[i] + 1) >> 1;
> >> > +}
> >> > +
> >> > +/* { dg-final { scan-assembler "pavgb" } } */
> >> > diff --git a/gcc/testsuite/gcc.target/i386/pr103941-2.c b/gcc/testsuite/gcc.target/i386/pr103941-2.c
> >> > new file mode 100644
> >> > index 00000000000..972a32be997
> >> > --- /dev/null
> >> > +++ b/gcc/testsuite/gcc.target/i386/pr103941-2.c
> >> > @@ -0,0 +1,12 @@
> >> > +/* { dg-do compile } */
> >> > +/* { dg-options "-O2 -msse2" } */
> >> > +
> >> > +void foo (int *c, float *x, float *y)
> >> > +{
> >> > +  c[0] = x[0] < y[0];
> >> > +  c[1] = x[1] < y[1];
> >> > +  c[2] = x[2] < y[2];
> >> > +  c[3] = x[3] < y[3];
> >> > +}
> >> > +
> >> > +/* { dg-final { scan-assembler "cmpltps" } } */
> >> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> >> > index 4ac2b70303c..c7687065374 100644
> >> > --- a/gcc/tree-vect-slp.cc
> >> > +++ b/gcc/tree-vect-slp.cc
> >> > @@ -5185,22 +5185,45 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
> >> >  	 the scalar cost.  */
> >> >        if (!STMT_VINFO_LIVE_P (stmt_info))
> >> >  	{
> >> > -	  FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF)
> >> > +	  auto_vec<gimple *, 8> worklist;
> >> > +	  hash_set<gimple *> *worklist_visited = NULL;
> >> > +	  worklist.quick_push (orig_stmt);
> >> > +	  do
> >> >  	    {
> >> > -	      imm_use_iterator use_iter;
> >> > -	      gimple *use_stmt;
> >> > -	      FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
> >> > -		if (!is_gimple_debug (use_stmt))
> >> > -		  {
> >> > -		    stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt);
> >> > -		    if (!use_stmt_info
> >> > -			|| !vectorized_scalar_stmts.contains (use_stmt_info))
> >> > +	      gimple *work_stmt = worklist.pop ();
> >> > +	      FOR_EACH_PHI_OR_STMT_DEF (def_p, work_stmt, op_iter, SSA_OP_DEF)
> >> > +		{
> >> > +		  imm_use_iterator use_iter;
> >> > +		  gimple *use_stmt;
> >> > +		  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter,
> >> > +					 DEF_FROM_PTR (def_p))
> >> > +		    if (!is_gimple_debug (use_stmt))
> >> >  		      {
> >> > -			(*life)[i] = true;
> >> > -			break;
> >> > +			stmt_vec_info use_stmt_info
> >> > +			  = vinfo->lookup_stmt (use_stmt);
> >> > +			if (!use_stmt_info
> >> > +			    || !vectorized_scalar_stmts.contains (use_stmt_info))
> >> > +			  {
> >> > +			    if (STMT_VINFO_IN_PATTERN_P (use_stmt_info))
> >> > +			      {
> >> 
> >> I guess I should walk through the testcase and figure it out for myself,
> >> but: I assume vectorized_scalar_stmts exists because not every statement
> >> we've considered vectorising has made the cut.  Isn't that also true
> >> of (original) scalar statements that would have been vectorised using
> >> patterns?
> >> 
> >> Does vectorized_scalar_stmts record original statements or statements to
> >> vectorise?  From its name I'd have assumed original statements, in which
> >> case I wouldn't have expected IN_PATTERN_P to need special handling.
> >
> > vectorized_scalar_stmts records statements to vectorize, and yes, that's
> > done because PURE_SLP isn't accurate when we promoted SLP nodes to
> > extern because they failed to vectorize.
> 
> Ah, OK.
> 
> > But it's true that in case a pattern is composed from multiple scalar
> > stmts we only cost the scalar root of the pattern.  As I said in the PR
> > the whole thing is up for another rewrite - we're basically trying to
> > compute the "scalar stmt cover" of the vectorized stmts and then
> > try to exclude those scalar stmts in the cover that have uses outside
> > of it.  Currently we're split this problem in an odd way that makes
> > doing it correctly difficult - I hope there's a better way but I
> > haven't yet found the time to do it.  Ideally we'd also merge it with
> > the copy that computes live lanes to be code generated as element
> > extracts.  And IIRC there's another variant somewhere, I think with
> > the SLP graph partitioning.
> >
> > If you have good ideas or want to give it a stab be welcome ;)
> 
> No, no good ideas.  I just wasn't sure why, if vectorized_scalar_stmts
> records statements to vectorize, we couldn't just use
> vect_stmt_to_vectorize (use_stmt_info).  But maybe that isn't
> reliable for SLP.

It could do that but I don't think that would change anything here
since we still have no good way to get at all scalar stmts the pattern
is composed of (and we don't record this set).

I've tried the "obvious", figuring scalar leafs of the SLP tree by
a SLP DFS walk and then simply pick the scalar SSA use-def chain until
I hit those but that's prone of derailing into address computation.
So it seems the "true" solution will involve mimicing
vect_get_and_check_slp_defs for this use-def chain walk at least.  Or
rather than trying to re-do this at costing time make sure to record
enough info when building the SLP tree (and when doing pattern recog).

> Anyway, you knoew this code much better than me :-), so I've certainly
> no objections to the patch for GCC 12.  We've done some aarch64 costing
> work recently but I don't remember a case in which we relied/worked
> around the specific behaviour being patched here.

OK, so for GCC 12 I think it's this patch or let it unfixed.  I'm going
to ponder for a few hours, maybe throw a dice.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tree-optimization/104010 - fix SLP scalar costing with patterns
  2022-04-14 12:51   ` Richard Biener
@ 2022-04-14 13:55     ` Richard Sandiford
  2022-04-19  6:59       ` Richard Biener
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Sandiford @ 2022-04-14 13:55 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, richard.earnshaw

Richard Biener <rguenther@suse.de> writes:
> On Thu, 14 Apr 2022, Richard Sandiford wrote:
>
>> Richard Biener <rguenther@suse.de> writes:
>> > When doing BB vectorization the scalar cost compute is derailed
>> > by patterns, causing lanes to be considered live and thus not
>> > costed on the scalar side.  For the testcase in PR104010 this
>> > prevents vectorization which was done by GCC 11.  PR103941
>> > shows similar cases of missed optimizations that are fixed by
>> > this patch.
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> >
>> > I'm only considering this now because PR104010 is identified
>> > as regression on arm - Richards, what do you think?  I do think
>> > this will enable vectorization of more stuff now which might
>> > be good or bad - who knowns, but at least it needs to involve
>> > patterns.
>> >
>> > Thanks,
>> > Richard.
>> >
>> > 2022-04-13  Richard Biener  <rguenther@suse.de>
>> >
>> > 	PR tree-optimization/104010
>> > 	PR tree-optimization/103941
>> > 	* tree-vect-slp.cc (vect_bb_slp_scalar_cost): When
>> > 	we run into stmts in patterns continue walking those
>> > 	for uses outside of the vectorized region instead of
>> > 	marking the lane live.
>> >
>> > 	* gcc.target/i386/pr103941-1.c: New testcase.
>> > 	* gcc.target/i386/pr103941-2.c: Likewise.
>> > ---
>> >  gcc/testsuite/gcc.target/i386/pr103941-1.c | 14 +++++++
>> >  gcc/testsuite/gcc.target/i386/pr103941-2.c | 12 ++++++
>> >  gcc/tree-vect-slp.cc                       | 47 ++++++++++++++++------
>> >  3 files changed, 61 insertions(+), 12 deletions(-)
>> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-1.c
>> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-2.c
>> >
>> > diff --git a/gcc/testsuite/gcc.target/i386/pr103941-1.c b/gcc/testsuite/gcc.target/i386/pr103941-1.c
>> > new file mode 100644
>> > index 00000000000..524fdd0b4b1
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/i386/pr103941-1.c
>> > @@ -0,0 +1,14 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-O2 -msse2" } */
>> > +
>> > +unsigned char ur[16], ua[16], ub[16];
>> > +
>> > +void avgu_v2qi (void)
>> > +{
>> > +  int i;
>> > +
>> > +  for (i = 0; i < 2; i++)
>> > +    ur[i] = (ua[i] + ub[i] + 1) >> 1;
>> > +}
>> > +
>> > +/* { dg-final { scan-assembler "pavgb" } } */
>> > diff --git a/gcc/testsuite/gcc.target/i386/pr103941-2.c b/gcc/testsuite/gcc.target/i386/pr103941-2.c
>> > new file mode 100644
>> > index 00000000000..972a32be997
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/i386/pr103941-2.c
>> > @@ -0,0 +1,12 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-O2 -msse2" } */
>> > +
>> > +void foo (int *c, float *x, float *y)
>> > +{
>> > +  c[0] = x[0] < y[0];
>> > +  c[1] = x[1] < y[1];
>> > +  c[2] = x[2] < y[2];
>> > +  c[3] = x[3] < y[3];
>> > +}
>> > +
>> > +/* { dg-final { scan-assembler "cmpltps" } } */
>> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
>> > index 4ac2b70303c..c7687065374 100644
>> > --- a/gcc/tree-vect-slp.cc
>> > +++ b/gcc/tree-vect-slp.cc
>> > @@ -5185,22 +5185,45 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
>> >  	 the scalar cost.  */
>> >        if (!STMT_VINFO_LIVE_P (stmt_info))
>> >  	{
>> > -	  FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF)
>> > +	  auto_vec<gimple *, 8> worklist;
>> > +	  hash_set<gimple *> *worklist_visited = NULL;
>> > +	  worklist.quick_push (orig_stmt);
>> > +	  do
>> >  	    {
>> > -	      imm_use_iterator use_iter;
>> > -	      gimple *use_stmt;
>> > -	      FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
>> > -		if (!is_gimple_debug (use_stmt))
>> > -		  {
>> > -		    stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt);
>> > -		    if (!use_stmt_info
>> > -			|| !vectorized_scalar_stmts.contains (use_stmt_info))
>> > +	      gimple *work_stmt = worklist.pop ();
>> > +	      FOR_EACH_PHI_OR_STMT_DEF (def_p, work_stmt, op_iter, SSA_OP_DEF)
>> > +		{
>> > +		  imm_use_iterator use_iter;
>> > +		  gimple *use_stmt;
>> > +		  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter,
>> > +					 DEF_FROM_PTR (def_p))
>> > +		    if (!is_gimple_debug (use_stmt))
>> >  		      {
>> > -			(*life)[i] = true;
>> > -			break;
>> > +			stmt_vec_info use_stmt_info
>> > +			  = vinfo->lookup_stmt (use_stmt);
>> > +			if (!use_stmt_info
>> > +			    || !vectorized_scalar_stmts.contains (use_stmt_info))
>> > +			  {
>> > +			    if (STMT_VINFO_IN_PATTERN_P (use_stmt_info))
>> > +			      {
>> 
>> I guess I should walk through the testcase and figure it out for myself,
>> but: I assume vectorized_scalar_stmts exists because not every statement
>> we've considered vectorising has made the cut.  Isn't that also true
>> of (original) scalar statements that would have been vectorised using
>> patterns?
>> 
>> Does vectorized_scalar_stmts record original statements or statements to
>> vectorise?  From its name I'd have assumed original statements, in which
>> case I wouldn't have expected IN_PATTERN_P to need special handling.
>
> vectorized_scalar_stmts records statements to vectorize, and yes, that's
> done because PURE_SLP isn't accurate when we promoted SLP nodes to
> extern because they failed to vectorize.

Ah, OK.

> But it's true that in case a pattern is composed from multiple scalar
> stmts we only cost the scalar root of the pattern.  As I said in the PR
> the whole thing is up for another rewrite - we're basically trying to
> compute the "scalar stmt cover" of the vectorized stmts and then
> try to exclude those scalar stmts in the cover that have uses outside
> of it.  Currently we're split this problem in an odd way that makes
> doing it correctly difficult - I hope there's a better way but I
> haven't yet found the time to do it.  Ideally we'd also merge it with
> the copy that computes live lanes to be code generated as element
> extracts.  And IIRC there's another variant somewhere, I think with
> the SLP graph partitioning.
>
> If you have good ideas or want to give it a stab be welcome ;)

No, no good ideas.  I just wasn't sure why, if vectorized_scalar_stmts
records statements to vectorize, we couldn't just use
vect_stmt_to_vectorize (use_stmt_info).  But maybe that isn't
reliable for SLP.

Anyway, you knoew this code much better than me :-), so I've certainly
no objections to the patch for GCC 12.  We've done some aarch64 costing
work recently but I don't remember a case in which we relied/worked
around the specific behaviour being patched here.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tree-optimization/104010 - fix SLP scalar costing with patterns
  2022-04-14 12:39 ` Richard Sandiford
@ 2022-04-14 12:51   ` Richard Biener
  2022-04-14 13:55     ` Richard Sandiford
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2022-04-14 12:51 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, richard.earnshaw

On Thu, 14 Apr 2022, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > When doing BB vectorization the scalar cost compute is derailed
> > by patterns, causing lanes to be considered live and thus not
> > costed on the scalar side.  For the testcase in PR104010 this
> > prevents vectorization which was done by GCC 11.  PR103941
> > shows similar cases of missed optimizations that are fixed by
> > this patch.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > I'm only considering this now because PR104010 is identified
> > as regression on arm - Richards, what do you think?  I do think
> > this will enable vectorization of more stuff now which might
> > be good or bad - who knowns, but at least it needs to involve
> > patterns.
> >
> > Thanks,
> > Richard.
> >
> > 2022-04-13  Richard Biener  <rguenther@suse.de>
> >
> > 	PR tree-optimization/104010
> > 	PR tree-optimization/103941
> > 	* tree-vect-slp.cc (vect_bb_slp_scalar_cost): When
> > 	we run into stmts in patterns continue walking those
> > 	for uses outside of the vectorized region instead of
> > 	marking the lane live.
> >
> > 	* gcc.target/i386/pr103941-1.c: New testcase.
> > 	* gcc.target/i386/pr103941-2.c: Likewise.
> > ---
> >  gcc/testsuite/gcc.target/i386/pr103941-1.c | 14 +++++++
> >  gcc/testsuite/gcc.target/i386/pr103941-2.c | 12 ++++++
> >  gcc/tree-vect-slp.cc                       | 47 ++++++++++++++++------
> >  3 files changed, 61 insertions(+), 12 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-2.c
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr103941-1.c b/gcc/testsuite/gcc.target/i386/pr103941-1.c
> > new file mode 100644
> > index 00000000000..524fdd0b4b1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr103941-1.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -msse2" } */
> > +
> > +unsigned char ur[16], ua[16], ub[16];
> > +
> > +void avgu_v2qi (void)
> > +{
> > +  int i;
> > +
> > +  for (i = 0; i < 2; i++)
> > +    ur[i] = (ua[i] + ub[i] + 1) >> 1;
> > +}
> > +
> > +/* { dg-final { scan-assembler "pavgb" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr103941-2.c b/gcc/testsuite/gcc.target/i386/pr103941-2.c
> > new file mode 100644
> > index 00000000000..972a32be997
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr103941-2.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -msse2" } */
> > +
> > +void foo (int *c, float *x, float *y)
> > +{
> > +  c[0] = x[0] < y[0];
> > +  c[1] = x[1] < y[1];
> > +  c[2] = x[2] < y[2];
> > +  c[3] = x[3] < y[3];
> > +}
> > +
> > +/* { dg-final { scan-assembler "cmpltps" } } */
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index 4ac2b70303c..c7687065374 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -5185,22 +5185,45 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
> >  	 the scalar cost.  */
> >        if (!STMT_VINFO_LIVE_P (stmt_info))
> >  	{
> > -	  FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF)
> > +	  auto_vec<gimple *, 8> worklist;
> > +	  hash_set<gimple *> *worklist_visited = NULL;
> > +	  worklist.quick_push (orig_stmt);
> > +	  do
> >  	    {
> > -	      imm_use_iterator use_iter;
> > -	      gimple *use_stmt;
> > -	      FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
> > -		if (!is_gimple_debug (use_stmt))
> > -		  {
> > -		    stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt);
> > -		    if (!use_stmt_info
> > -			|| !vectorized_scalar_stmts.contains (use_stmt_info))
> > +	      gimple *work_stmt = worklist.pop ();
> > +	      FOR_EACH_PHI_OR_STMT_DEF (def_p, work_stmt, op_iter, SSA_OP_DEF)
> > +		{
> > +		  imm_use_iterator use_iter;
> > +		  gimple *use_stmt;
> > +		  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter,
> > +					 DEF_FROM_PTR (def_p))
> > +		    if (!is_gimple_debug (use_stmt))
> >  		      {
> > -			(*life)[i] = true;
> > -			break;
> > +			stmt_vec_info use_stmt_info
> > +			  = vinfo->lookup_stmt (use_stmt);
> > +			if (!use_stmt_info
> > +			    || !vectorized_scalar_stmts.contains (use_stmt_info))
> > +			  {
> > +			    if (STMT_VINFO_IN_PATTERN_P (use_stmt_info))
> > +			      {
> 
> I guess I should walk through the testcase and figure it out for myself,
> but: I assume vectorized_scalar_stmts exists because not every statement
> we've considered vectorising has made the cut.  Isn't that also true
> of (original) scalar statements that would have been vectorised using
> patterns?
> 
> Does vectorized_scalar_stmts record original statements or statements to
> vectorise?  From its name I'd have assumed original statements, in which
> case I wouldn't have expected IN_PATTERN_P to need special handling.

vectorized_scalar_stmts records statements to vectorize, and yes, that's
done because PURE_SLP isn't accurate when we promoted SLP nodes to
extern because they failed to vectorize.

But it's true that in case a pattern is composed from multiple scalar
stmts we only cost the scalar root of the pattern.  As I said in the PR
the whole thing is up for another rewrite - we're basically trying to
compute the "scalar stmt cover" of the vectorized stmts and then
try to exclude those scalar stmts in the cover that have uses outside
of it.  Currently we're split this problem in an odd way that makes
doing it correctly difficult - I hope there's a better way but I
haven't yet found the time to do it.  Ideally we'd also merge it with
the copy that computes live lanes to be code generated as element
extracts.  And IIRC there's another variant somewhere, I think with
the SLP graph partitioning.

If you have good ideas or want to give it a stab be welcome ;)

> I know these are likely to be dumb questions, sorry. :-)

There's no such thing as a dumb question we teach our children ;)

Anyway, I _think_ the patch is incrementally improving things (and
fixing the particular regression), just with late cost changes there's
always the risk that we'll tip over to a bad side on some important
benchmark ...

Richard.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tree-optimization/104010 - fix SLP scalar costing with patterns
  2022-04-13 12:57 Richard Biener
  2022-04-13 13:09 ` Christophe Lyon
@ 2022-04-14 12:39 ` Richard Sandiford
  2022-04-14 12:51   ` Richard Biener
  1 sibling, 1 reply; 7+ messages in thread
From: Richard Sandiford @ 2022-04-14 12:39 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, richard.earnshaw

Richard Biener <rguenther@suse.de> writes:
> When doing BB vectorization the scalar cost compute is derailed
> by patterns, causing lanes to be considered live and thus not
> costed on the scalar side.  For the testcase in PR104010 this
> prevents vectorization which was done by GCC 11.  PR103941
> shows similar cases of missed optimizations that are fixed by
> this patch.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> I'm only considering this now because PR104010 is identified
> as regression on arm - Richards, what do you think?  I do think
> this will enable vectorization of more stuff now which might
> be good or bad - who knowns, but at least it needs to involve
> patterns.
>
> Thanks,
> Richard.
>
> 2022-04-13  Richard Biener  <rguenther@suse.de>
>
> 	PR tree-optimization/104010
> 	PR tree-optimization/103941
> 	* tree-vect-slp.cc (vect_bb_slp_scalar_cost): When
> 	we run into stmts in patterns continue walking those
> 	for uses outside of the vectorized region instead of
> 	marking the lane live.
>
> 	* gcc.target/i386/pr103941-1.c: New testcase.
> 	* gcc.target/i386/pr103941-2.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/i386/pr103941-1.c | 14 +++++++
>  gcc/testsuite/gcc.target/i386/pr103941-2.c | 12 ++++++
>  gcc/tree-vect-slp.cc                       | 47 ++++++++++++++++------
>  3 files changed, 61 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-2.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr103941-1.c b/gcc/testsuite/gcc.target/i386/pr103941-1.c
> new file mode 100644
> index 00000000000..524fdd0b4b1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103941-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2" } */
> +
> +unsigned char ur[16], ua[16], ub[16];
> +
> +void avgu_v2qi (void)
> +{
> +  int i;
> +
> +  for (i = 0; i < 2; i++)
> +    ur[i] = (ua[i] + ub[i] + 1) >> 1;
> +}
> +
> +/* { dg-final { scan-assembler "pavgb" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr103941-2.c b/gcc/testsuite/gcc.target/i386/pr103941-2.c
> new file mode 100644
> index 00000000000..972a32be997
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103941-2.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2" } */
> +
> +void foo (int *c, float *x, float *y)
> +{
> +  c[0] = x[0] < y[0];
> +  c[1] = x[1] < y[1];
> +  c[2] = x[2] < y[2];
> +  c[3] = x[3] < y[3];
> +}
> +
> +/* { dg-final { scan-assembler "cmpltps" } } */
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 4ac2b70303c..c7687065374 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -5185,22 +5185,45 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
>  	 the scalar cost.  */
>        if (!STMT_VINFO_LIVE_P (stmt_info))
>  	{
> -	  FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF)
> +	  auto_vec<gimple *, 8> worklist;
> +	  hash_set<gimple *> *worklist_visited = NULL;
> +	  worklist.quick_push (orig_stmt);
> +	  do
>  	    {
> -	      imm_use_iterator use_iter;
> -	      gimple *use_stmt;
> -	      FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
> -		if (!is_gimple_debug (use_stmt))
> -		  {
> -		    stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt);
> -		    if (!use_stmt_info
> -			|| !vectorized_scalar_stmts.contains (use_stmt_info))
> +	      gimple *work_stmt = worklist.pop ();
> +	      FOR_EACH_PHI_OR_STMT_DEF (def_p, work_stmt, op_iter, SSA_OP_DEF)
> +		{
> +		  imm_use_iterator use_iter;
> +		  gimple *use_stmt;
> +		  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter,
> +					 DEF_FROM_PTR (def_p))
> +		    if (!is_gimple_debug (use_stmt))
>  		      {
> -			(*life)[i] = true;
> -			break;
> +			stmt_vec_info use_stmt_info
> +			  = vinfo->lookup_stmt (use_stmt);
> +			if (!use_stmt_info
> +			    || !vectorized_scalar_stmts.contains (use_stmt_info))
> +			  {
> +			    if (STMT_VINFO_IN_PATTERN_P (use_stmt_info))
> +			      {

I guess I should walk through the testcase and figure it out for myself,
but: I assume vectorized_scalar_stmts exists because not every statement
we've considered vectorising has made the cut.  Isn't that also true
of (original) scalar statements that would have been vectorised using
patterns?

Does vectorized_scalar_stmts record original statements or statements to
vectorise?  From its name I'd have assumed original statements, in which
case I wouldn't have expected IN_PATTERN_P to need special handling.

I know these are likely to be dumb questions, sorry. :-)

Richard

> +				/* For stmts participating in patterns we have
> +				   to check its uses recursively.  */
> +				if (!worklist_visited)
> +				  worklist_visited = new hash_set<gimple *> ();
> +				if (!worklist_visited->add (use_stmt))
> +				  worklist.safe_push (use_stmt);
> +				continue;
> +			      }
> +			    (*life)[i] = true;
> +			    goto next_lane;
> +			  }
>  		      }
> -		  }
> +		}
>  	    }
> +	  while (!worklist.is_empty ());
> +next_lane:
> +	  if (worklist_visited)
> +	    delete worklist_visited;
>  	  if ((*life)[i])
>  	    continue;
>  	}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tree-optimization/104010 - fix SLP scalar costing with patterns
  2022-04-13 12:57 Richard Biener
@ 2022-04-13 13:09 ` Christophe Lyon
  2022-04-14 12:39 ` Richard Sandiford
  1 sibling, 0 replies; 7+ messages in thread
From: Christophe Lyon @ 2022-04-13 13:09 UTC (permalink / raw)
  To: Richard Biener, gcc-patches; +Cc: richard.sandiford, richard.earnshaw



On 4/13/22 14:57, Richard Biener via Gcc-patches wrote:
> When doing BB vectorization the scalar cost compute is derailed
> by patterns, causing lanes to be considered live and thus not
> costed on the scalar side.  For the testcase in PR104010 this
> prevents vectorization which was done by GCC 11.  PR103941
> shows similar cases of missed optimizations that are fixed by
> this patch.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> I'm only considering this now because PR104010 is identified
> as regression on arm - Richards, what do you think?  I do think
> this will enable vectorization of more stuff now which might
> be good or bad - who knowns, but at least it needs to involve
> patterns.
> 
> Thanks,
> Richard.
> 
> 2022-04-13  Richard Biener  <rguenther@suse.de>
> 
> 	PR tree-optimization/104010
> 	PR tree-optimization/103941
> 	* tree-vect-slp.cc (vect_bb_slp_scalar_cost): When
> 	we run into stmts in patterns continue walking those
> 	for uses outside of the vectorized region instead of
> 	marking the lane live.

Thanks!
I do confirm that without your patch gcc.target/arm/simd/neon-vcmp.c 
fails, and passes with your patch. Sorry for the delay.

Christophe

> 
> 	* gcc.target/i386/pr103941-1.c: New testcase.
> 	* gcc.target/i386/pr103941-2.c: Likewise.
> ---
>   gcc/testsuite/gcc.target/i386/pr103941-1.c | 14 +++++++
>   gcc/testsuite/gcc.target/i386/pr103941-2.c | 12 ++++++
>   gcc/tree-vect-slp.cc                       | 47 ++++++++++++++++------
>   3 files changed, 61 insertions(+), 12 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-1.c
>   create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-2.c
> 
> diff --git a/gcc/testsuite/gcc.target/i386/pr103941-1.c b/gcc/testsuite/gcc.target/i386/pr103941-1.c
> new file mode 100644
> index 00000000000..524fdd0b4b1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103941-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2" } */
> +
> +unsigned char ur[16], ua[16], ub[16];
> +
> +void avgu_v2qi (void)
> +{
> +  int i;
> +
> +  for (i = 0; i < 2; i++)
> +    ur[i] = (ua[i] + ub[i] + 1) >> 1;
> +}
> +
> +/* { dg-final { scan-assembler "pavgb" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr103941-2.c b/gcc/testsuite/gcc.target/i386/pr103941-2.c
> new file mode 100644
> index 00000000000..972a32be997
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103941-2.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2" } */
> +
> +void foo (int *c, float *x, float *y)
> +{
> +  c[0] = x[0] < y[0];
> +  c[1] = x[1] < y[1];
> +  c[2] = x[2] < y[2];
> +  c[3] = x[3] < y[3];
> +}
> +
> +/* { dg-final { scan-assembler "cmpltps" } } */
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 4ac2b70303c..c7687065374 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -5185,22 +5185,45 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
>   	 the scalar cost.  */
>         if (!STMT_VINFO_LIVE_P (stmt_info))
>   	{
> -	  FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF)
> +	  auto_vec<gimple *, 8> worklist;
> +	  hash_set<gimple *> *worklist_visited = NULL;
> +	  worklist.quick_push (orig_stmt);
> +	  do
>   	    {
> -	      imm_use_iterator use_iter;
> -	      gimple *use_stmt;
> -	      FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
> -		if (!is_gimple_debug (use_stmt))
> -		  {
> -		    stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt);
> -		    if (!use_stmt_info
> -			|| !vectorized_scalar_stmts.contains (use_stmt_info))
> +	      gimple *work_stmt = worklist.pop ();
> +	      FOR_EACH_PHI_OR_STMT_DEF (def_p, work_stmt, op_iter, SSA_OP_DEF)
> +		{
> +		  imm_use_iterator use_iter;
> +		  gimple *use_stmt;
> +		  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter,
> +					 DEF_FROM_PTR (def_p))
> +		    if (!is_gimple_debug (use_stmt))
>   		      {
> -			(*life)[i] = true;
> -			break;
> +			stmt_vec_info use_stmt_info
> +			  = vinfo->lookup_stmt (use_stmt);
> +			if (!use_stmt_info
> +			    || !vectorized_scalar_stmts.contains (use_stmt_info))
> +			  {
> +			    if (STMT_VINFO_IN_PATTERN_P (use_stmt_info))
> +			      {
> +				/* For stmts participating in patterns we have
> +				   to check its uses recursively.  */
> +				if (!worklist_visited)
> +				  worklist_visited = new hash_set<gimple *> ();
> +				if (!worklist_visited->add (use_stmt))
> +				  worklist.safe_push (use_stmt);
> +				continue;
> +			      }
> +			    (*life)[i] = true;
> +			    goto next_lane;
> +			  }
>   		      }
> -		  }
> +		}
>   	    }
> +	  while (!worklist.is_empty ());
> +next_lane:
> +	  if (worklist_visited)
> +	    delete worklist_visited;
>   	  if ((*life)[i])
>   	    continue;
>   	}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] tree-optimization/104010 - fix SLP scalar costing with patterns
@ 2022-04-13 12:57 Richard Biener
  2022-04-13 13:09 ` Christophe Lyon
  2022-04-14 12:39 ` Richard Sandiford
  0 siblings, 2 replies; 7+ messages in thread
From: Richard Biener @ 2022-04-13 12:57 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, richard.earnshaw

When doing BB vectorization the scalar cost compute is derailed
by patterns, causing lanes to be considered live and thus not
costed on the scalar side.  For the testcase in PR104010 this
prevents vectorization which was done by GCC 11.  PR103941
shows similar cases of missed optimizations that are fixed by
this patch.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

I'm only considering this now because PR104010 is identified
as regression on arm - Richards, what do you think?  I do think
this will enable vectorization of more stuff now which might
be good or bad - who knowns, but at least it needs to involve
patterns.

Thanks,
Richard.

2022-04-13  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/104010
	PR tree-optimization/103941
	* tree-vect-slp.cc (vect_bb_slp_scalar_cost): When
	we run into stmts in patterns continue walking those
	for uses outside of the vectorized region instead of
	marking the lane live.

	* gcc.target/i386/pr103941-1.c: New testcase.
	* gcc.target/i386/pr103941-2.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/pr103941-1.c | 14 +++++++
 gcc/testsuite/gcc.target/i386/pr103941-2.c | 12 ++++++
 gcc/tree-vect-slp.cc                       | 47 ++++++++++++++++------
 3 files changed, 61 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-2.c

diff --git a/gcc/testsuite/gcc.target/i386/pr103941-1.c b/gcc/testsuite/gcc.target/i386/pr103941-1.c
new file mode 100644
index 00000000000..524fdd0b4b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103941-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+
+unsigned char ur[16], ua[16], ub[16];
+
+void avgu_v2qi (void)
+{
+  int i;
+
+  for (i = 0; i < 2; i++)
+    ur[i] = (ua[i] + ub[i] + 1) >> 1;
+}
+
+/* { dg-final { scan-assembler "pavgb" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr103941-2.c b/gcc/testsuite/gcc.target/i386/pr103941-2.c
new file mode 100644
index 00000000000..972a32be997
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103941-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+
+void foo (int *c, float *x, float *y)
+{
+  c[0] = x[0] < y[0];
+  c[1] = x[1] < y[1];
+  c[2] = x[2] < y[2];
+  c[3] = x[3] < y[3];
+}
+
+/* { dg-final { scan-assembler "cmpltps" } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 4ac2b70303c..c7687065374 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -5185,22 +5185,45 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
 	 the scalar cost.  */
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
-	  FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF)
+	  auto_vec<gimple *, 8> worklist;
+	  hash_set<gimple *> *worklist_visited = NULL;
+	  worklist.quick_push (orig_stmt);
+	  do
 	    {
-	      imm_use_iterator use_iter;
-	      gimple *use_stmt;
-	      FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
-		if (!is_gimple_debug (use_stmt))
-		  {
-		    stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt);
-		    if (!use_stmt_info
-			|| !vectorized_scalar_stmts.contains (use_stmt_info))
+	      gimple *work_stmt = worklist.pop ();
+	      FOR_EACH_PHI_OR_STMT_DEF (def_p, work_stmt, op_iter, SSA_OP_DEF)
+		{
+		  imm_use_iterator use_iter;
+		  gimple *use_stmt;
+		  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter,
+					 DEF_FROM_PTR (def_p))
+		    if (!is_gimple_debug (use_stmt))
 		      {
-			(*life)[i] = true;
-			break;
+			stmt_vec_info use_stmt_info
+			  = vinfo->lookup_stmt (use_stmt);
+			if (!use_stmt_info
+			    || !vectorized_scalar_stmts.contains (use_stmt_info))
+			  {
+			    if (STMT_VINFO_IN_PATTERN_P (use_stmt_info))
+			      {
+				/* For stmts participating in patterns we have
+				   to check its uses recursively.  */
+				if (!worklist_visited)
+				  worklist_visited = new hash_set<gimple *> ();
+				if (!worklist_visited->add (use_stmt))
+				  worklist.safe_push (use_stmt);
+				continue;
+			      }
+			    (*life)[i] = true;
+			    goto next_lane;
+			  }
 		      }
-		  }
+		}
 	    }
+	  while (!worklist.is_empty ());
+next_lane:
+	  if (worklist_visited)
+	    delete worklist_visited;
 	  if ((*life)[i])
 	    continue;
 	}
-- 
2.34.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-04-19  6:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-13 13:46 [PATCH] tree-optimization/104010 - fix SLP scalar costing with patterns Richard Biener
  -- strict thread matches above, loose matches on Subject: below --
2022-04-13 12:57 Richard Biener
2022-04-13 13:09 ` Christophe Lyon
2022-04-14 12:39 ` Richard Sandiford
2022-04-14 12:51   ` Richard Biener
2022-04-14 13:55     ` Richard Sandiford
2022-04-19  6:59       ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).