From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id D8479385734E for ; Tue, 2 May 2023 12:02:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D8479385734E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A7964C14; Tue, 2 May 2023 05:03:15 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 047813F64C; Tue, 2 May 2023 05:02:30 -0700 (PDT) From: Richard Sandiford To: Prathamesh Kulkarni Mail-Followup-To: Prathamesh Kulkarni ,gcc Patches , richard.sandiford@arm.com Cc: gcc Patches Subject: Re: [aarch64] Code-gen for vector initialization involving constants References: Date: Tue, 02 May 2023 13:02:29 +0100 In-Reply-To: (Prathamesh Kulkarni's message of "Tue, 2 May 2023 15:52:29 +0530") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-30.3 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Prathamesh Kulkarni writes: > On Tue, 2 May 2023 at 14:56, Richard Sandiford > wrote: >> > [aarch64] Improve code-gen for vector initialization with single constant element. >> > >> > gcc/ChangeLog: >> > * config/aarch64/aarc64.cc (aarch64_expand_vector_init): Tweak condition >> > if (n_var == n_elts && n_elts <= 16) to allow a single constant, >> > and if maxv == 1, use constant element for duplicating into register. >> > >> > gcc/testsuite/ChangeLog: >> > * gcc.target/aarch64/vec-init-single-const.c: New test. >> > >> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc >> > index 2b0de7ca038..f46750133a6 100644 >> > --- a/gcc/config/aarch64/aarch64.cc >> > +++ b/gcc/config/aarch64/aarch64.cc >> > @@ -22167,7 +22167,7 @@ aarch64_expand_vector_init (rtx target, rtx vals) >> > and matches[X][1] with the count of duplicate elements (if X is the >> > earliest element which has duplicates). */ >> > >> > - if (n_var == n_elts && n_elts <= 16) >> > + if ((n_var >= n_elts - 1) && n_elts <= 16) >> > { >> > int matches[16][2] = {0}; >> > for (int i = 0; i < n_elts; i++) >> > @@ -22227,6 +22227,18 @@ aarch64_expand_vector_init (rtx target, rtx vals) >> > vector register. For big-endian we want that position to hold >> > the last element of VALS. */ >> > maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0; >> > + >> > + /* If we have a single constant element, use that for duplicating >> > + instead. */ >> > + if (n_var == n_elts - 1) >> > + for (int i = 0; i < n_elts; i++) >> > + if (CONST_INT_P (XVECEXP (vals, 0, i)) >> > + || CONST_DOUBLE_P (XVECEXP (vals, 0, i))) >> > + { >> > + maxelement = i; >> > + break; >> > + } >> > + >> > rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement)); >> > aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode)); >> >> We don't want to force the constant into a register though. > OK right, sorry. > With the attached patch, for the following test-case: > int64x2_t f_s64(int64_t x) > { > return (int64x2_t) { x, 1 }; > } > > it loads constant from memory (same code-gen as without patch). > f_s64: > adrp x1, .LC0 > ldr q0, [x1, #:lo12:.LC0] > ins v0.d[0], x0 > ret > > Does the patch look OK ? > > Thanks, > Prathamesh > [...] > [aarch64] Improve code-gen for vector initialization with single constant element. > > gcc/ChangeLog: > * config/aarch64/aarc64.cc (aarch64_expand_vector_init): Tweak condition > if (n_var == n_elts && n_elts <= 16) to allow a single constant, > and if maxv == 1, use constant element for duplicating into register. > > gcc/testsuite/ChangeLog: > * gcc.target/aarch64/vec-init-single-const.c: New test. > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 2b0de7ca038..97309ddec4f 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -22167,7 +22167,7 @@ aarch64_expand_vector_init (rtx target, rtx vals) > and matches[X][1] with the count of duplicate elements (if X is the > earliest element which has duplicates). */ > > - if (n_var == n_elts && n_elts <= 16) > + if ((n_var >= n_elts - 1) && n_elts <= 16) No need for the extra brackets. > { > int matches[16][2] = {0}; > for (int i = 0; i < n_elts; i++) > @@ -22227,8 +22227,26 @@ aarch64_expand_vector_init (rtx target, rtx vals) > vector register. For big-endian we want that position to hold > the last element of VALS. */ > maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0; > - rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement)); > - aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode)); > + > + /* If we have a single constant element, use that for duplicating > + instead. */ > + if (n_var == n_elts - 1) > + for (int i = 0; i < n_elts; i++) > + if (CONST_INT_P (XVECEXP (vals, 0, i)) > + || CONST_DOUBLE_P (XVECEXP (vals, 0, i))) > + { > + maxelement = i; > + break; > + } > + > + rtx maxval = XVECEXP (vals, 0, maxelement); > + if (!(CONST_INT_P (maxval) || CONST_DOUBLE_P (maxval))) > + { > + rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement)); > + aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode)); > + } > + else > + aarch64_emit_move (target, gen_vec_duplicate (mode, maxval)); > } > else > { This seems a bit convoluted. It might be easier to record whether we see a CONST_INT_P or a CONST_DOUBLE_P during the previous loop, and if so what the constant is. Then handle that case first, as a separate arm of the "if". > diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-single-const.c b/gcc/testsuite/gcc.target/aarch64/vec-init-single-const.c > new file mode 100644 > index 00000000000..682fd43439a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-single-const.c > @@ -0,0 +1,66 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { check-function-bodies "**" "" "" } } */ > + > +#include > + > +/* > +** f_s8: > +** ... > +** dup v[0-9]+\.16b, w[0-9]+ > +** movi v[0-9]+\.8b, 0x1 > +** ins v[0-9]+\.b\[15\], v[0-9]+\.b\[0\] > +** ... > +** ret Like with the divide-and-conquer patch, there's nothing that requires the first two instructions to be in that order. What is the second ... hiding? What sequences do we actually generate? BTW, remember to say how patches were tested :-) Thanks, Richard > +*/ > + > +int8x16_t f_s8(int8_t x) > +{ > + return (int8x16_t) { x, x, x, x, x, x, x, x, > + x, x, x, x, x, x, x, 1 }; > +} > + > +/* > +** f_s16: > +** ... > +** dup v[0-9]+\.8h, w[0-9]+ > +** movi v[0-9]+\.4h, 0x1 > +** ins v[0-9]+\.h\[7\], v[0-9]+\.h\[0\] > +** ... > +** ret > +*/ > + > +int16x8_t f_s16(int16_t x) > +{ > + return (int16x8_t) { x, x, x, x, x, x, x, 1 }; > +} > + > +/* > +** f_s32: > +** ... > +** movi v[0-9]\.2s, 0x1 > +** dup v[0-9]\.4s, w[0-9]+ > +** ins v[0-9]+\.s\[3\], v[0-9]+\.s\[0\] > +** ... > +** ret > +*/ > + > +int32x4_t f_s32(int32_t x) > +{ > + return (int32x4_t) { x, x, x, 1 }; > +} > + > +/* > +** f_s64: > +** ... > +** adrp x[0-9]+, .LC[0-9]+ > +** ldr q[0-9]+, \[x[0-9]+, #:lo12:.LC[0-9]+\] > +** ins v[0-9]+\.d\[0\], x[0-9]+ > +** ... > +** ret > +*/ > + > +int64x2_t f_s64(int64_t x) > +{ > + return (int64x2_t) { x, 1 }; > +}