From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 613F2385734E for ; Thu, 14 Apr 2022 12:39:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 613F2385734E Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 07DB6139F; Thu, 14 Apr 2022 05:39:27 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 69D073F70D; Thu, 14 Apr 2022 05:39:26 -0700 (PDT) From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener , gcc-patches@gcc.gnu.org, richard.earnshaw@arm.com, richard.sandiford@arm.com Cc: gcc-patches@gcc.gnu.org, richard.earnshaw@arm.com Subject: Re: [PATCH] tree-optimization/104010 - fix SLP scalar costing with patterns References: <20220413125716.5CA0F13AB8@imap2.suse-dmz.suse.de> Date: Thu, 14 Apr 2022 13:39:25 +0100 In-Reply-To: <20220413125716.5CA0F13AB8@imap2.suse-dmz.suse.de> (Richard Biener's message of "Wed, 13 Apr 2022 14:57:15 +0200 (CEST)") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Apr 2022 12:39:32 -0000 Richard Biener writes: > When doing BB vectorization the scalar cost compute is derailed > by patterns, causing lanes to be considered live and thus not > costed on the scalar side. For the testcase in PR104010 this > prevents vectorization which was done by GCC 11. PR103941 > shows similar cases of missed optimizations that are fixed by > this patch. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > I'm only considering this now because PR104010 is identified > as regression on arm - Richards, what do you think? I do think > this will enable vectorization of more stuff now which might > be good or bad - who knowns, but at least it needs to involve > patterns. > > Thanks, > Richard. > > 2022-04-13 Richard Biener > > PR tree-optimization/104010 > PR tree-optimization/103941 > * tree-vect-slp.cc (vect_bb_slp_scalar_cost): When > we run into stmts in patterns continue walking those > for uses outside of the vectorized region instead of > marking the lane live. > > * gcc.target/i386/pr103941-1.c: New testcase. > * gcc.target/i386/pr103941-2.c: Likewise. > --- > gcc/testsuite/gcc.target/i386/pr103941-1.c | 14 +++++++ > gcc/testsuite/gcc.target/i386/pr103941-2.c | 12 ++++++ > gcc/tree-vect-slp.cc | 47 ++++++++++++++++------ > 3 files changed, 61 insertions(+), 12 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr103941-2.c > > diff --git a/gcc/testsuite/gcc.target/i386/pr103941-1.c b/gcc/testsuite/gcc.target/i386/pr103941-1.c > new file mode 100644 > index 00000000000..524fdd0b4b1 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr103941-1.c > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -msse2" } */ > + > +unsigned char ur[16], ua[16], ub[16]; > + > +void avgu_v2qi (void) > +{ > + int i; > + > + for (i = 0; i < 2; i++) > + ur[i] = (ua[i] + ub[i] + 1) >> 1; > +} > + > +/* { dg-final { scan-assembler "pavgb" } } */ > diff --git a/gcc/testsuite/gcc.target/i386/pr103941-2.c b/gcc/testsuite/gcc.target/i386/pr103941-2.c > new file mode 100644 > index 00000000000..972a32be997 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr103941-2.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -msse2" } */ > + > +void foo (int *c, float *x, float *y) > +{ > + c[0] = x[0] < y[0]; > + c[1] = x[1] < y[1]; > + c[2] = x[2] < y[2]; > + c[3] = x[3] < y[3]; > +} > + > +/* { dg-final { scan-assembler "cmpltps" } } */ > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index 4ac2b70303c..c7687065374 100644 > --- a/gcc/tree-vect-slp.cc > +++ b/gcc/tree-vect-slp.cc > @@ -5185,22 +5185,45 @@ vect_bb_slp_scalar_cost (vec_info *vinfo, > the scalar cost. */ > if (!STMT_VINFO_LIVE_P (stmt_info)) > { > - FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF) > + auto_vec worklist; > + hash_set *worklist_visited = NULL; > + worklist.quick_push (orig_stmt); > + do > { > - imm_use_iterator use_iter; > - gimple *use_stmt; > - FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p)) > - if (!is_gimple_debug (use_stmt)) > - { > - stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt); > - if (!use_stmt_info > - || !vectorized_scalar_stmts.contains (use_stmt_info)) > + gimple *work_stmt = worklist.pop (); > + FOR_EACH_PHI_OR_STMT_DEF (def_p, work_stmt, op_iter, SSA_OP_DEF) > + { > + imm_use_iterator use_iter; > + gimple *use_stmt; > + FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, > + DEF_FROM_PTR (def_p)) > + if (!is_gimple_debug (use_stmt)) > { > - (*life)[i] = true; > - break; > + stmt_vec_info use_stmt_info > + = vinfo->lookup_stmt (use_stmt); > + if (!use_stmt_info > + || !vectorized_scalar_stmts.contains (use_stmt_info)) > + { > + if (STMT_VINFO_IN_PATTERN_P (use_stmt_info)) > + { I guess I should walk through the testcase and figure it out for myself, but: I assume vectorized_scalar_stmts exists because not every statement we've considered vectorising has made the cut. Isn't that also true of (original) scalar statements that would have been vectorised using patterns? Does vectorized_scalar_stmts record original statements or statements to vectorise? From its name I'd have assumed original statements, in which case I wouldn't have expected IN_PATTERN_P to need special handling. I know these are likely to be dumb questions, sorry. :-) Richard > + /* For stmts participating in patterns we have > + to check its uses recursively. */ > + if (!worklist_visited) > + worklist_visited = new hash_set (); > + if (!worklist_visited->add (use_stmt)) > + worklist.safe_push (use_stmt); > + continue; > + } > + (*life)[i] = true; > + goto next_lane; > + } > } > - } > + } > } > + while (!worklist.is_empty ()); > +next_lane: > + if (worklist_visited) > + delete worklist_visited; > if ((*life)[i]) > continue; > }