From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 123A73858D32 for ; Mon, 24 Apr 2023 09:05:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 123A73858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AD0A0FEC; Mon, 24 Apr 2023 02:06:12 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4CFF13F5A1; Mon, 24 Apr 2023 02:05:28 -0700 (PDT) From: Richard Sandiford To: Kyrylo Tkachov Mail-Followup-To: Kyrylo Tkachov ,"gcc-patches\@gcc.gnu.org" , richard.sandiford@arm.com Cc: "gcc-patches\@gcc.gnu.org" Subject: Re: [PATCH] aarch64: PR target/109406 Add support for SVE2 unpredicated MUL References: Date: Mon, 24 Apr 2023 10:05:27 +0100 In-Reply-To: (Kyrylo Tkachov's message of "Mon, 24 Apr 2023 08:56:40 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-30.0 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_ASCII_DIVIDERS,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_LOTSOFHASH,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Kyrylo Tkachov writes: > Hi all, > > SVE2 supports an unpredicated vector integer MUL form that we can emit from our SVE expanders > without using up a predicate registers. This patch does so. > As the SVE MUL expansion currently is templated away through a code iterator I did not split it > off just for this case but instead special-cased it in the define_expand. It seemed somewhat less > invasive than the alternatives but I could split it off more explicitly if others want to. > The div-by-bitmask_1.c testcase is adjusted to expect this new MUL form. > > Bootstrapped and tested on aarch64-none-linux-gnu. > > Ok for trunk? > Thanks, > Kyrill > > gcc/ChangeLog: > > PR target/109406 > * config/aarch64/aarch64-sve.md (3): Handle TARGET_SVE2 MUL > case. > * config/aarch64/aarch64-sve2.md (*aarch64_mul_unpredicated_): New > pattern. > > gcc/testsuite/ChangeLog: > > PR target/109406 > * gcc.target/aarch64/sve2/div-by-bitmask_1.c: Adjust for unpredicated SVE2 > MUL. > * gcc.target/aarch64/sve2/unpred_mul_1.c: New test. LGTM. Thanks, Richard > diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md > index b11b55f7ac718db199920b61bf3e4b4881c69660..4b4c02c90fec6ce1ff15a8b2a5df348224a307b7 100644 > --- a/gcc/config/aarch64/aarch64-sve.md > +++ b/gcc/config/aarch64/aarch64-sve.md > @@ -3657,6 +3657,15 @@ (define_expand "3" > UNSPEC_PRED_X))] > "TARGET_SVE" > { > + /* SVE2 supports the MUL (vectors, unpredicated) form. Emit the simple > + pattern for it here rather than splitting off the MULT expander > + separately. */ > + if (TARGET_SVE2 && == MULT) > + { > + emit_move_insn (operands[0], gen_rtx_MULT (mode, > + operands[1], operands[2])); > + DONE; > + } > operands[3] = aarch64_ptrue_reg (mode); > } > ) > diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md > index 2346f9f835d26f5b87afd47cdc9e44f9f47604ed..da8a424dd57fc5482cb20ba417d4141148ac61b6 100644 > --- a/gcc/config/aarch64/aarch64-sve2.md > +++ b/gcc/config/aarch64/aarch64-sve2.md > @@ -189,7 +189,7 @@ (define_insn "@aarch64_scatter_stnt_" > ;; ------------------------------------------------------------------------- > ;; ---- [INT] Multiplication > ;; ------------------------------------------------------------------------- > -;; Includes the lane forms of: > +;; Includes the lane and unpredicated forms of: > ;; - MUL > ;; ------------------------------------------------------------------------- > > @@ -205,6 +205,21 @@ (define_insn "@aarch64_mul_lane_" > "mul\t%0., %1., %2.[%3]" > ) > > +;; The 2nd and 3rd alternatives are valid for just TARGET_SVE as well but > +;; we include them here to allow matching simpler, unpredicated RTL. > +(define_insn "*aarch64_mul_unpredicated_" > + [(set (match_operand:SVE_I 0 "register_operand" "=w,w,?&w") > + (mult:SVE_I > + (match_operand:SVE_I 1 "register_operand" "w,0,w") > + (match_operand:SVE_I 2 "aarch64_sve_vsm_operand" "w,vsm,vsm")))] > + "TARGET_SVE2" > + "@ > + mul\t%0., %1., %2. > + mul\t%0., %0., #%2 > + movprfx\t%0, %1\;mul\t%0., %0., #%2" > + [(set_attr "movprfx" "*,*,yes")] > +) > + > ;; ------------------------------------------------------------------------- > ;; ---- [INT] Scaled high-part multiplication > ;; ------------------------------------------------------------------------- > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c > index e6f5098c30f4e2eb8ed1af153c0bb0d204cda6d9..1e546a93906962ba2469ddb3bf2ee9c0166dbae1 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c > @@ -7,7 +7,7 @@ > /* > ** draw_bitmap1: > ** ... > -** mul z[0-9]+.h, p[0-9]+/m, z[0-9]+.h, z[0-9]+.h > +** mul z[0-9]+.h, z[0-9]+.h, z[0-9]+.h > ** addhnb z[0-9]+.b, z[0-9]+.h, z[0-9]+.h > ** addhnb z[0-9]+.b, z[0-9]+.h, z[0-9]+.h > ** ... > @@ -27,7 +27,7 @@ void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) > /* > ** draw_bitmap3: > ** ... > -** mul z[0-9]+.s, p[0-9]+/m, z[0-9]+.s, z[0-9]+.s > +** mul z[0-9]+.s, z[0-9]+.s, z[0-9]+.s > ** addhnb z[0-9]+.h, z[0-9]+.s, z[0-9]+.s > ** addhnb z[0-9]+.h, z[0-9]+.s, z[0-9]+.s > ** ... > @@ -41,7 +41,7 @@ void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) > /* > ** draw_bitmap4: > ** ... > -** mul z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d > +** mul z[0-9]+.d, z[0-9]+.d, z[0-9]+.d > ** addhnb z[0-9]+.s, z[0-9]+.d, z[0-9]+.d > ** addhnb z[0-9]+.s, z[0-9]+.d, z[0-9]+.d > ** ... > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/unpred_mul_1.c b/gcc/testsuite/gcc.target/aarch64/sve2/unpred_mul_1.c > new file mode 100644 > index 0000000000000000000000000000000000000000..aaf0ce49c99447439146a1e17ed0533231e141c2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/unpred_mul_1.c > @@ -0,0 +1,29 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ftree-vectorize" } */ > + > +#include > + > +#define N 1024 > + > +#define TYPE(N) int##N##_t > + > +#define TEMPLATE(SIZE) \ > +void __attribute__ ((noinline, noclone)) \ > +f_##SIZE##_##OP \ > + (TYPE(SIZE) *restrict a, TYPE(SIZE) *restrict b, \ > + TYPE(SIZE) *restrict c) \ > +{ \ > + for (int i = 0; i < N; i++) \ > + a[i] = b[i] * c[i]; \ > +} > + > +TEMPLATE (8); > +TEMPLATE (16); > +TEMPLATE (32); > +TEMPLATE (64); > + > +/* { dg-final { scan-assembler-times {\tmul\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d} 1 } } */ > +/* { dg-final { scan-assembler-times {\tmul\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s} 1 } } */ > +/* { dg-final { scan-assembler-times {\tmul\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h} 1 } } */ > +/* { dg-final { scan-assembler-times {\tmul\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b} 1 } } */ > +