From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 613F2385734E
 for <gcc-patches@gcc.gnu.org>; Thu, 14 Apr 2022 12:39:29 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 613F2385734E
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 07DB6139F;
 Thu, 14 Apr 2022 05:39:27 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.88])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 69D073F70D;
 Thu, 14 Apr 2022 05:39:26 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Biener <rguenther@suse.de>
Mail-Followup-To: Richard Biener <rguenther@suse.de>, gcc-patches@gcc.gnu.org,
 richard.earnshaw@arm.com, richard.sandiford@arm.com
Cc: gcc-patches@gcc.gnu.org,  richard.earnshaw@arm.com
Subject: Re: [PATCH] tree-optimization/104010 - fix SLP scalar costing with
 patterns
References: <20220413125716.5CA0F13AB8@imap2.suse-dmz.suse.de>
Date: Thu, 14 Apr 2022 13:39:25 +0100
In-Reply-To: <20220413125716.5CA0F13AB8@imap2.suse-dmz.suse.de> (Richard
 Biener's message of "Wed, 13 Apr 2022 14:57:15 +0200 (CEST)")
Message-ID: <mpt4k2von42.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 14 Apr 2022 12:39:32 -0000

Richard Biener <rguenther@suse.de> writes:
> When doing BB vectorization the scalar cost compute is derailed
> by patterns, causing lanes to be considered live and thus not
> costed on the scalar side.  For the testcase in PR104010 this
> prevents vectorization which was done by GCC 11.  PR103941
> shows similar cases of missed optimizations that are fixed by
> this patch.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> I'm only considering this now because PR104010 is identified
> as regression on arm - Richards, what do you think?  I do think
> this will enable vectorization of more stuff now which might
> be good or bad - who knowns, but at least it needs to involve
> patterns.
>
> Thanks,
> Richard.
>
> 2022-04-13  Richard Biener  <rguenther@suse.de>
>
> 	PR tree-optimization/104010
> 	PR tree-optimization/103941
> 	* tree-vect-slp.cc (vect_bb_slp_scalar_cost): When
> 	we run into stmts in patterns continue walking those
> 	for uses outside of the vectorized region instead of
> 	marking the lane live.
>
> 	* gcc.target/i386/pr103941-1.c: New testcase.
> 	* gcc.target/i386/pr103941-2.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/i386/pr103941-1.c | 14 +++++++
>  gcc/testsuite/gcc.target/i386/pr103941-2.c | 12 ++++++
>  gcc/tree-vect-slp.cc                       | 47 ++++++++++++++++------
>  3 files changed, 61 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-2.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr103941-1.c b/gcc/testsuite/gcc.target/i386/pr103941-1.c
> new file mode 100644
> index 00000000000..524fdd0b4b1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103941-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2" } */
> +
> +unsigned char ur[16], ua[16], ub[16];
> +
> +void avgu_v2qi (void)
> +{
> +  int i;
> +
> +  for (i = 0; i < 2; i++)
> +    ur[i] = (ua[i] + ub[i] + 1) >> 1;
> +}
> +
> +/* { dg-final { scan-assembler "pavgb" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr103941-2.c b/gcc/testsuite/gcc.target/i386/pr103941-2.c
> new file mode 100644
> index 00000000000..972a32be997
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103941-2.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2" } */
> +
> +void foo (int *c, float *x, float *y)
> +{
> +  c[0] = x[0] < y[0];
> +  c[1] = x[1] < y[1];
> +  c[2] = x[2] < y[2];
> +  c[3] = x[3] < y[3];
> +}
> +
> +/* { dg-final { scan-assembler "cmpltps" } } */
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 4ac2b70303c..c7687065374 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -5185,22 +5185,45 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
>  	 the scalar cost.  */
>        if (!STMT_VINFO_LIVE_P (stmt_info))
>  	{
> -	  FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF)
> +	  auto_vec<gimple *, 8> worklist;
> +	  hash_set<gimple *> *worklist_visited = NULL;
> +	  worklist.quick_push (orig_stmt);
> +	  do
>  	    {
> -	      imm_use_iterator use_iter;
> -	      gimple *use_stmt;
> -	      FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
> -		if (!is_gimple_debug (use_stmt))
> -		  {
> -		    stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt);
> -		    if (!use_stmt_info
> -			|| !vectorized_scalar_stmts.contains (use_stmt_info))
> +	      gimple *work_stmt = worklist.pop ();
> +	      FOR_EACH_PHI_OR_STMT_DEF (def_p, work_stmt, op_iter, SSA_OP_DEF)
> +		{
> +		  imm_use_iterator use_iter;
> +		  gimple *use_stmt;
> +		  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter,
> +					 DEF_FROM_PTR (def_p))
> +		    if (!is_gimple_debug (use_stmt))
>  		      {
> -			(*life)[i] = true;
> -			break;
> +			stmt_vec_info use_stmt_info
> +			  = vinfo->lookup_stmt (use_stmt);
> +			if (!use_stmt_info
> +			    || !vectorized_scalar_stmts.contains (use_stmt_info))
> +			  {
> +			    if (STMT_VINFO_IN_PATTERN_P (use_stmt_info))
> +			      {

I guess I should walk through the testcase and figure it out for myself,
but: I assume vectorized_scalar_stmts exists because not every statement
we've considered vectorising has made the cut.  Isn't that also true
of (original) scalar statements that would have been vectorised using
patterns?

Does vectorized_scalar_stmts record original statements or statements to
vectorise?  From its name I'd have assumed original statements, in which
case I wouldn't have expected IN_PATTERN_P to need special handling.

I know these are likely to be dumb questions, sorry. :-)

Richard

> +				/* For stmts participating in patterns we have
> +				   to check its uses recursively.  */
> +				if (!worklist_visited)
> +				  worklist_visited = new hash_set<gimple *> ();
> +				if (!worklist_visited->add (use_stmt))
> +				  worklist.safe_push (use_stmt);
> +				continue;
> +			      }
> +			    (*life)[i] = true;
> +			    goto next_lane;
> +			  }
>  		      }
> -		  }
> +		}
>  	    }
> +	  while (!worklist.is_empty ());
> +next_lane:
> +	  if (worklist_visited)
> +	    delete worklist_visited;
>  	  if ((*life)[i])
>  	    continue;
>  	}