From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 370213858D37 for ; Tue, 23 May 2023 16:54:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 370213858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CDF2ED75; Tue, 23 May 2023 09:54:44 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 560B33F67D; Tue, 23 May 2023 09:53:59 -0700 (PDT) From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener ,gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] tree-optimization/109747 - SLP cost of CTORs References: <20230523151803.3428B13A10@imap2.suse-dmz.suse.de> Date: Tue, 23 May 2023 17:53:58 +0100 In-Reply-To: <20230523151803.3428B13A10@imap2.suse-dmz.suse.de> (Richard Biener's message of "Tue, 23 May 2023 17:18:02 +0200 (CEST)") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-28.8 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Richard Biener writes: > The x86 backend looks at the SLP node passed to the add_stmt_cost > hook when costing vec_construct, looking for elements that require > a move from a GPR to a vector register and cost that. But since > vect_prologue_cost_for_slp decomposes the cost for an external > SLP node into individual pieces this cost gets applied N times > without a chance for the backend to know it's just dealing with > a part of the SLP node. Just looking at a part is also not perfect > since the GPR to XMM move cost applies only once per distinct > element so handling the whole SLP node one more correctly reflects > cost (albeit without considering other external SLP nodes). > > The following addresses the issue by passing down the SLP node > only for one piece and nullptr for the rest. The x86 backend > is currently the only one looking at it. > > In the future the cost of external elements is something to deal > with globally but that would require the full SLP tree be available > to costing. > > It's difficult to write a testcase, at the tipping point not > vectorizing is better so I'll followup with x86 specific adjustments > and will see to add a testcase later. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > Richard, we talked about this issue two weeks ago and I was looking > for a solution that would be OK for backporting if the need arises. > The following is what I could come up with that retains the whole > SLP-node wide "CSE" of the element move cost. Is that OK until > we come up with a better plan for trunk at some point? Yeah, seems like a neat workaround to me FWIW. Thanks, Richard > > Thanks, > Richard. > > PR tree-optimization/109747 > * tree-vect-slp.cc (vect_prologue_cost_for_slp): Pass down > the SLP node only once to the cost hook. > --- > gcc/tree-vect-slp.cc | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index e5c9d7e766e..a6f277c5e21 100644 > --- a/gcc/tree-vect-slp.cc > +++ b/gcc/tree-vect-slp.cc > @@ -6069,6 +6069,7 @@ vect_prologue_cost_for_slp (slp_tree node, > } > /* ??? We're just tracking whether vectors in a single node are the same. > Ideally we'd do something more global. */ > + bool passed = false; > for (unsigned int start : starts) > { > vect_cost_for_stmt kind; > @@ -6078,7 +6079,15 @@ vect_prologue_cost_for_slp (slp_tree node, > kind = scalar_to_vec; > else > kind = vec_construct; > - record_stmt_cost (cost_vec, 1, kind, node, vectype, 0, vect_prologue); > + /* The target cost hook has no idea which part of the SLP node > + we are costing so avoid passing it down more than once. Pass > + it to the first vec_construct or scalar_to_vec part since for those > + the x86 backend tries to account for GPR to XMM register moves. */ > + record_stmt_cost (cost_vec, 1, kind, > + (kind != vector_load && !passed) ? node : nullptr, > + vectype, 0, vect_prologue); > + if (kind != vector_load) > + passed = true; > } > }