From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <segher@kernel.crashing.org>
Received: from gate.crashing.org (gate.crashing.org [63.228.1.57])
 by sourceware.org (Postfix) with ESMTP id 4E67D3858D3C
 for <gcc-patches@gcc.gnu.org>; Tue, 10 May 2022 22:36:17 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4E67D3858D3C
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=kernel.crashing.org
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=kernel.crashing.org
Received: from gate.crashing.org (localhost.localdomain [127.0.0.1])
 by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 24AMZGBQ006755;
 Tue, 10 May 2022 17:35:16 -0500
Received: (from segher@localhost)
 by gate.crashing.org (8.14.1/8.14.1/Submit) id 24AMZGli006753;
 Tue, 10 May 2022 17:35:16 -0500
X-Authentication-Warning: gate.crashing.org: segher set sender to
 segher@kernel.crashing.org using -f
Date: Tue, 10 May 2022 17:35:16 -0500
From: Segher Boessenkool <segher@kernel.crashing.org>
To: Peter Bergner <bergner@linux.ibm.com>
Cc: David Edelsohn <dje.gcc@gmail.com>, GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: rs6000: Prefer assigning the MMA vector operands to altivec
 registers [PR105556]
Message-ID: <20220510223516.GP25951@gate.crashing.org>
References: <e7cd06f2-3aca-d4b6-deec-9a67e752e0b9@linux.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <e7cd06f2-3aca-d4b6-deec-9a67e752e0b9@linux.ibm.com>
User-Agent: Mutt/1.4.2.3i
X-Spam-Status: No, score=-9.2 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 10 May 2022 22:36:19 -0000

Hi!

On Tue, May 10, 2022 at 03:47:40PM -0500, Peter Bergner wrote:
> This patch addresses an issue when compiling the MMA optimized DGEMM kernel

If you want to use this same message as commit message, you shouldn't
say "this patch".  Also, try not to use lines more than 72 positions
wide (which handily is also a good maximum length for email messages,
that way it can be quoted a few times without wrapping).

> in OpenBLAS.  The MMA code uses all 8 accumulators, which overlap all vs0-vs31
> vector registers.  Current trunk assigns one of the normal vector inputs to
> one of the MMA instructions, which forces us to spill one of the accumulators
> to memory, leading to poor performance.  The solution here is to replace the
> "wa" constraints for the vector input operands in the MMA instruction patterns
> with "v,?d" so that we disparage using vs0-vs31 and prefer using the altivec
> registers vs32-vs63 instead, which fixes the dgemm performance issue.

And I assume generated code still looks at least as good otherwise?

> This passed bootstrap and regtesting with no regressions on powerpc64le-linux.
> Ok for trunk and after a few days of burn-in to the GCC12 release branch?
> 
> Technically, the same issue exists in GCC11 and GCC10, but the RA
> assignment is OK with the current code, so unless/until we have a
> test case that exhibits the issue, I'm only asking for a backport to
> GCC12 which does show the performance problem.

So, you put everything that shouldn't be in the commit message at the
end of the mail, easy to delete when applying the patch.  Good good :-)

> gcc/
> 	PR target/105556
> 	* config/rs6000/mma.md (mma_<vv>, mma_<avv>, mma_<pv>, mma_<apv>,
> 	mma_<vvi4i4i8>, mma_<avvi4i4i8>, mma_<vvi4i4i2>, mma_<avvi4i4i2>,
> 	mma_<vvi4i4>, mma_<avvi4i4>, mma_<pvi4i2>, mma_<apvi4i2>,
> 	mma_<vvi4i4i4>, mma_<avvi4i4i4>): Replace "wa" constraint with "v,?d".

> diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> index 907c9d6d516..9c9920870e4 100644
> --- a/gcc/config/rs6000/mma.md
> +++ b/gcc/config/rs6000/mma.md
> @@ -490,50 +490,50 @@ (define_insn "mma_xxsetaccz"
>    [(set_attr "type" "mma")])
>  
>  (define_insn "mma_<vv>"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
> -	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa")
> -		    (match_operand:V16QI 2 "vsx_register_operand" "wa")]
> +  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d")
> +	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,v,?d,?d")
> +		    (match_operand:V16QI 2 "vsx_register_operand" "v,?d,v,?d")]

This is more involved than just replacing one constrait with two.  You
shoould say that in the changelog (and in your message).

Out of interest, did you try using v,?wa (so just two alternatives, not
four)?  Or did you think it wouldresult in  measurably worse code?  Or
did you decide it is not such bad backend code size explosion after
all :-)

Okay for trunk with a slightly better changelog.  Thanks!


Segher