From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 66A1C389043C for ; Wed, 21 Apr 2021 12:40:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 66A1C389043C Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1084511D4; Wed, 21 Apr 2021 05:40:15 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 571D13F694; Wed, 21 Apr 2021 05:40:14 -0700 (PDT) From: Richard Sandiford To: Victor Do Nascimento via Gcc-patches Mail-Followup-To: Victor Do Nascimento via Gcc-patches , Victor Do Nascimento , nd@arm.com, richard.earnshaw@arm.com, richard.sandiford@arm.com Cc: Victor Do Nascimento , nd@arm.com, richard.earnshaw@arm.com Subject: Re: [PATCH][AArch64] Leveraging the use of STP instruction for vec_duplicate References: Date: Wed, 21 Apr 2021 13:40:13 +0100 In-Reply-To: (Victor Do Nascimento via Gcc-patches's message of "Wed, 20 Jan 2021 14:06:22 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Apr 2021 12:40:26 -0000 Victor Do Nascimento via Gcc-patches writes: > The backend pattern for storing a pair of identical values in 32 and 64-bit modes with the machine instruction STP was missing, and multiple instructions were needed to reproduce this behavior as a result of failed RTL pattern match in combine pass. > > For the test case : > > typedef long long v2di __attribute__((vector_size (16))); > typedef int v2si __attribute__((vector_size (8))); > > void > foo (v2di *x, long long a) > { > v2di tmp = {a, a}; > *x = tmp; > } > > void > foo2 (v2si *x, int a) > { > v2si tmp = {a, a}; > *x = tmp; > } > > at -O2 on aarch64 gives: > > foo: > stp x1, x1, [x0] > ret > foo2: > stp w1, w1, [x0] > ret > > instead of: > > foo: > dup v0.2d, x1 > str q0, [x0] > ret > foo2: > dup v0.2s, w1 > str d0, [x0] > ret > > In preparation for the next stage 1 phase of development, added new RTL template, unittest and checked for regressions on bootstrapped aarch64-none-linux-gnu. > > gcc/ChangeLog > > 2021-02-04 victor Do Nascimento > > * config/aarch64/aarch64-simd.md: Implement RTX pattern for > mapping 'vec_duplicate' RTX onto 'STP' ASM insn. > * config/aarch64/iterators.md: Implement ldpstp_vel_sz iterator > to map STP/LDP vector element mode to correct suffix in > attribute type definition of aarch64_simd_stp pattern. A more typical changelog entry would be: * config/aarch64/iterators.md (ldpstp_vel_sz): New mode attribute. * config/aarch64/aarch64-simd.md (aarch64_simd_stp): New pattern. > gcc/testsuite/ChangeLog > > 2021-02-04 Victor Do Nascimento > > * gcc.target/stp_vec-dup_32_64-1.c: Added test. > > Regards, > Victor > > --- > gcc/config/aarch64/aarch64-simd.md | 10 +++++++++ > gcc/config/aarch64/iterators.md | 3 +++ > .../gcc.target/aarch64/stp_vec_dup_32_64-1.c | 22 +++++++++++++++++++ > 3 files changed, 35 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_vec_dup_32_64-1.c > > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md > index 71aa77dd010..3d53bab0018 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -205,6 +205,16 @@ > [(set_attr "type" "neon_stp")] > ) > > +(define_insn "aarch64_simd_stp" > + [(set (match_operand:VP_2E 0 "aarch64_mem_pair_operand" "=Ump,Ump") > + (vec_duplicate:VP_2E (match_operand: 1 "register_operand" "w,r")))] Formatting nit: should just be one tab here. I would have just made that change locally and committed, but I think there's a problem: aarch64_mem_pair_operand and Ump are geared for pairs of full-vector stores, rather than for pairs of elements. This means that (for example) the V2SI range will be [-256,255] * 8 rather than the expected [-256,255] * 4. I think we need to use aarch64_mem_pair_lanes_operand and Umn instead, as for store_pair_lanes. In addition: /* If we are dealing with ADDR_QUERY_LDP_STP_N that means the incoming mode corresponds to the actual size of the memory being loaded/stored and the mode of the corresponding addressing mode is half of that. */ if (type == ADDR_QUERY_LDP_STP_N && known_eq (GET_MODE_SIZE (mode), 16)) mode = DFmode; only handles 128-bit vectors, whereas here we need it to handle 64-bit vectors too. It would be good to test the limits, e.g.: > diff --git a/gcc/testsuite/gcc.target/aarch64/stp_vec_dup_32_64-1.c b/gcc/testsuite/gcc.target/aarch64/stp_vec_dup_32_64-1.c > new file mode 100644 > index 00000000000..a37c903dfd4 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/stp_vec_dup_32_64-1.c > @@ -0,1 +1,22 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +typedef long long v2di __attribute__((vector_size (16))); > +typedef int v2si __attribute__((vector_size (8))); > + > +void > +foo (v2di *x, long long a) > +{ > + v2di tmp = {a, a}; > + *x = tmp; > +} > + > +void > +foo2 (v2si *x, int a) > +{ > + v2si tmp = {a, a}; > + *x = tmp; > +} We could have additional tests for: x[-129] = tmp; // out of range x[-128] = tmp; // in range x[127] = tmp; // in range x[128] = tmp; // out of range Thanks, Richard > + > +/* { dg-final { scan-assembler-times "stp\t" 2 } } */ > +/* { dg-final { scan-assembler-not "dup\t" } } */