From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id 4E67D3858D3C for ; Tue, 10 May 2022 22:36:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4E67D3858D3C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 24AMZGBQ006755; Tue, 10 May 2022 17:35:16 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 24AMZGli006753; Tue, 10 May 2022 17:35:16 -0500 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Tue, 10 May 2022 17:35:16 -0500 From: Segher Boessenkool To: Peter Bergner Cc: David Edelsohn , GCC Patches Subject: Re: rs6000: Prefer assigning the MMA vector operands to altivec registers [PR105556] Message-ID: <20220510223516.GP25951@gate.crashing.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-9.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 May 2022 22:36:19 -0000 Hi! On Tue, May 10, 2022 at 03:47:40PM -0500, Peter Bergner wrote: > This patch addresses an issue when compiling the MMA optimized DGEMM kernel If you want to use this same message as commit message, you shouldn't say "this patch". Also, try not to use lines more than 72 positions wide (which handily is also a good maximum length for email messages, that way it can be quoted a few times without wrapping). > in OpenBLAS. The MMA code uses all 8 accumulators, which overlap all vs0-vs31 > vector registers. Current trunk assigns one of the normal vector inputs to > one of the MMA instructions, which forces us to spill one of the accumulators > to memory, leading to poor performance. The solution here is to replace the > "wa" constraints for the vector input operands in the MMA instruction patterns > with "v,?d" so that we disparage using vs0-vs31 and prefer using the altivec > registers vs32-vs63 instead, which fixes the dgemm performance issue. And I assume generated code still looks at least as good otherwise? > This passed bootstrap and regtesting with no regressions on powerpc64le-linux. > Ok for trunk and after a few days of burn-in to the GCC12 release branch? > > Technically, the same issue exists in GCC11 and GCC10, but the RA > assignment is OK with the current code, so unless/until we have a > test case that exhibits the issue, I'm only asking for a backport to > GCC12 which does show the performance problem. So, you put everything that shouldn't be in the commit message at the end of the mail, easy to delete when applying the patch. Good good :-) > gcc/ > PR target/105556 > * config/rs6000/mma.md (mma_, mma_, mma_, mma_, > mma_, mma_, mma_, mma_, > mma_, mma_, mma_, mma_, > mma_, mma_): Replace "wa" constraint with "v,?d". > diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md > index 907c9d6d516..9c9920870e4 100644 > --- a/gcc/config/rs6000/mma.md > +++ b/gcc/config/rs6000/mma.md > @@ -490,50 +490,50 @@ (define_insn "mma_xxsetaccz" > [(set_attr "type" "mma")]) > > (define_insn "mma_" > - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") > - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa") > - (match_operand:V16QI 2 "vsx_register_operand" "wa")] > + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d") > + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,v,?d,?d") > + (match_operand:V16QI 2 "vsx_register_operand" "v,?d,v,?d")] This is more involved than just replacing one constrait with two. You shoould say that in the changelog (and in your message). Out of interest, did you try using v,?wa (so just two alternatives, not four)? Or did you think it wouldresult in measurably worse code? Or did you decide it is not such bad backend code size explosion after all :-) Okay for trunk with a slightly better changelog. Thanks! Segher