From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 66A1C389043C
 for <gcc-patches@gcc.gnu.org>; Wed, 21 Apr 2021 12:40:15 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 66A1C389043C
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1084511D4;
 Wed, 21 Apr 2021 05:40:15 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 571D13F694;
 Wed, 21 Apr 2021 05:40:14 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Victor Do Nascimento via Gcc-patches <gcc-patches@gcc.gnu.org>
Mail-Followup-To: Victor Do Nascimento via Gcc-patches
 <gcc-patches@gcc.gnu.org>, Victor Do Nascimento <victor.donascimento@arm.com>,
 nd@arm.com, richard.earnshaw@arm.com, richard.sandiford@arm.com
Cc: Victor Do Nascimento <victor.donascimento@arm.com>, nd@arm.com,
 richard.earnshaw@arm.com
Subject: Re: [PATCH][AArch64] Leveraging the use of STP instruction for
 vec_duplicate
References: <yw8jo8fg1nk6.fsf@arm.com>
Date: Wed, 21 Apr 2021 13:40:13 +0100
In-Reply-To: <yw8jo8fg1nk6.fsf@arm.com> (Victor Do Nascimento via
 Gcc-patches's message of "Wed, 20 Jan 2021 14:06:22 +0000")
Message-ID: <mptbla7retu.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Apr 2021 12:40:26 -0000

Victor Do Nascimento via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> The backend pattern for storing a pair of identical values in 32 and 64-bit modes with the machine instruction STP was missing, and multiple instructions were needed to reproduce this behavior as a result of failed RTL pattern match in combine pass.
>
> For the test case :
>
> typedef long long v2di __attribute__((vector_size (16)));
> typedef int v2si __attribute__((vector_size (8)));
>
> void
> foo (v2di *x, long long a)
> {
> 	v2di tmp = {a, a};
> 	*x = tmp;
> }
>
> void
> foo2 (v2si *x, int a)
> {
> 	v2si tmp = {a, a};
> 	*x = tmp;
> }
>
> at -O2 on aarch64 gives:
>
> foo:
> 	stp x1, x1, [x0]
> 	ret
> foo2:
> 	stp w1, w1, [x0]
> 	ret
>
> instead of:
>
> foo:
> 	dup     v0.2d, x1
> 	str     q0, [x0]
> 	ret
> foo2:
> 	dup     v0.2s, w1
> 	str     d0, [x0]
> 	ret
>
> In preparation for the next stage 1  phase of development, added new RTL template, unittest and checked for regressions on bootstrapped aarch64-none-linux-gnu.
>
> gcc/ChangeLog
>
> 2021-02-04 victor Do Nascimento <victor.donascimento@arm.com>
>
> 	* config/aarch64/aarch64-simd.md: Implement RTX pattern for
> 	mapping 'vec_duplicate' RTX onto 'STP' ASM insn.
> 	* config/aarch64/iterators.md: Implement ldpstp_vel_sz iterator
> 	to map STP/LDP vector element mode to correct suffix in
> 	attribute type definition of aarch64_simd_stp<mode> pattern.

A more typical changelog entry would be:

	* config/aarch64/iterators.md (ldpstp_vel_sz): New mode attribute.
	* config/aarch64/aarch64-simd.md (aarch64_simd_stp<mode>): New pattern.

> gcc/testsuite/ChangeLog
>
> 2021-02-04 Victor Do Nascimento <victor.donascimento@arm.com>
>
> 	* gcc.target/stp_vec-dup_32_64-1.c: Added test.
>
> Regards,
> Victor
>
> ---
>  gcc/config/aarch64/aarch64-simd.md            | 10 +++++++++
>  gcc/config/aarch64/iterators.md               |  3 +++
>  .../gcc.target/aarch64/stp_vec_dup_32_64-1.c  | 22 +++++++++++++++++++
>  3 files changed, 35 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_vec_dup_32_64-1.c
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index 71aa77dd010..3d53bab0018 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -205,6 +205,16 @@
>    [(set_attr "type" "neon_stp")]
>  )
>  
> +(define_insn "aarch64_simd_stp<mode>"
> +  [(set (match_operand:VP_2E 0 "aarch64_mem_pair_operand" "=Ump,Ump")
> +		(vec_duplicate:VP_2E (match_operand:<VEL> 1 "register_operand" "w,r")))]

Formatting nit: should just be one tab here.

I would have just made that change locally and committed, but I think
there's a problem: aarch64_mem_pair_operand and Ump are geared for pairs
of full-vector stores, rather than for pairs of elements.  This means that
(for example) the V2SI range will be [-256,255] * 8 rather than the expected
[-256,255] * 4.

I think we need to use aarch64_mem_pair_lanes_operand and Umn instead,
as for store_pair_lanes<mode>.  In addition:

  /* If we are dealing with ADDR_QUERY_LDP_STP_N that means the incoming mode
     corresponds to the actual size of the memory being loaded/stored and the
     mode of the corresponding addressing mode is half of that.  */
  if (type == ADDR_QUERY_LDP_STP_N
      && known_eq (GET_MODE_SIZE (mode), 16))
    mode = DFmode;

only handles 128-bit vectors, whereas here we need it to handle 64-bit
vectors too.

It would be good to test the limits, e.g.:

> diff --git a/gcc/testsuite/gcc.target/aarch64/stp_vec_dup_32_64-1.c b/gcc/testsuite/gcc.target/aarch64/stp_vec_dup_32_64-1.c
> new file mode 100644
> index 00000000000..a37c903dfd4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/stp_vec_dup_32_64-1.c
> @@ -0,1 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +typedef long long v2di __attribute__((vector_size (16)));
> +typedef int v2si __attribute__((vector_size (8)));
> +
> +void
> +foo (v2di *x, long long a)
> +{
> +  v2di tmp = {a, a};
> +  *x = tmp;
> +}
> +
> +void
> +foo2 (v2si *x, int a)
> +{
> +  v2si tmp = {a, a};
> +  *x = tmp;
> +}

We could have additional tests for:

x[-129] = tmp; // out of range
x[-128] = tmp; // in range
x[127] = tmp; // in range
x[128] = tmp; // out of range

Thanks,
Richard

> +
> +/* { dg-final { scan-assembler-times "stp\t" 2 } } */
> +/* { dg-final { scan-assembler-not "dup\t" } } */