From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-410370-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 79787 invoked by alias); 16 Oct 2015 12:58:45 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 79771 invoked by uid 89); 16 Oct 2015 12:58:43 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 16 Oct 2015 12:58:41 +0000
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-10-_6SvObwTRM-mmVhuAEUMvQ-1; Fri, 16 Oct 2015 13:58:36 +0100
Received: from [10.2.207.50] ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959);	 Fri, 16 Oct 2015 13:58:36 +0100
Message-ID: <5620F47B.9010107@arm.com>
Date: Fri, 16 Oct 2015 12:59:00 -0000
From: Kyrill Tkachov <kyrylo.tkachov@arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: GCC Patches <gcc-patches@gcc.gnu.org>
CC: Marcus Shawcroft <marcus.shawcroft@arm.com>,  Richard Earnshaw <Richard.Earnshaw@arm.com>, James Greenhalgh <james.greenhalgh@arm.com>
Subject: [PATCH][AArch64] Add support for 64-bit vector-mode ldp/stp
X-MC-Unique: _6SvObwTRM-mmVhuAEUMvQ-1
Content-Type: multipart/mixed; boundary="------------040901010004080408070700"
X-IsSubscribed: yes
X-SW-Source: 2015-10/txt/msg01591.txt.bz2

This is a multi-part message in MIME format.
--------------040901010004080408070700
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Content-length: 1160

Hi all,

We already support load/store-pair operations on the D-registers when they =
contain an FP value, but the peepholes/sched-fusion machinery that
do all the hard work currently ignore 64-bit vector modes.

This patch adds support for fusing loads/stores of 64-bit vector operands i=
nto ldp and stp instructions.
I've seen this trigger a few times in SPEC2006. Not too many times, but the=
 times it did trigger the code seemed objectively better
i.e. long sequences of ldr and str instructions essentially halved in size.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2015-10-16  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64.c (aarch64_mode_valid_for_sched_fusion_p):
     New function.
     (fusion_load_store): Use it.
     * config/aarch64/aarch64-ldpstp.md: Add new peephole2s for
     ldp and stp in VD modes.
     * config/aarch64/aarch64-simd.md (load_pair<mode>, VD): New pattern.
     (store_pair<mode>, VD): Likewise.

2015-10-16  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/stp_vec_64_1.c: New test.
     * gcc.target/aarch64/ldp_vec_64_1.c: New test.

--------------040901010004080408070700
Content-Type: text/x-patch; name=aarch64-ldp-stp-64.patch
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
 filename="aarch64-ldp-stp-64.patch"
Content-length: 5887

commit b5f4a5b87a7315fb8a4d88da3e4c4afc52d16052
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Tue Oct 6 12:08:24 2015 +0100

    [AArch64] Add support for 64-bit vector-mode ldp/stp
diff --git a/gcc/config/aarch64/aarch64-ldpstp.md b/gcc/config/aarch64/aarc=
h64-ldpstp.md
index 8d6d882..458829c 100644
--- a/gcc/config/aarch64/aarch64-ldpstp.md
+++ b/gcc/config/aarch64/aarch64-ldpstp.md
@@ -98,6 +98,47 @@ (define_peephole2
     }
 })
=20
+(define_peephole2
+  [(set (match_operand:VD 0 "register_operand" "")
+	(match_operand:VD 1 "aarch64_mem_pair_operand" ""))
+   (set (match_operand:VD 2 "register_operand" "")
+	(match_operand:VD 3 "memory_operand" ""))]
+  "aarch64_operands_ok_for_ldpstp (operands, true, <MODE>mode)"
+  [(parallel [(set (match_dup 0) (match_dup 1))
+	      (set (match_dup 2) (match_dup 3))])]
+{
+  rtx base, offset_1, offset_2;
+
+  extract_base_offset_in_addr (operands[1], &base, &offset_1);
+  extract_base_offset_in_addr (operands[3], &base, &offset_2);
+  if (INTVAL (offset_1) > INTVAL (offset_2))
+    {
+      std::swap (operands[0], operands[2]);
+      std::swap (operands[1], operands[3]);
+    }
+})
+
+(define_peephole2
+  [(set (match_operand:VD 0 "aarch64_mem_pair_operand" "")
+	(match_operand:VD 1 "register_operand" ""))
+   (set (match_operand:VD 2 "memory_operand" "")
+	(match_operand:VD 3 "register_operand" ""))]
+  "TARGET_SIMD && aarch64_operands_ok_for_ldpstp (operands, false, <MODE>m=
ode)"
+  [(parallel [(set (match_dup 0) (match_dup 1))
+	      (set (match_dup 2) (match_dup 3))])]
+{
+  rtx base, offset_1, offset_2;
+
+  extract_base_offset_in_addr (operands[0], &base, &offset_1);
+  extract_base_offset_in_addr (operands[2], &base, &offset_2);
+  if (INTVAL (offset_1) > INTVAL (offset_2))
+    {
+      std::swap (operands[0], operands[2]);
+      std::swap (operands[1], operands[3]);
+    }
+})
+
+
 ;; Handle sign/zero extended consecutive load/store.
=20
 (define_peephole2
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch6=
4-simd.md
index 6a2ab61..bf051c3 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -153,6 +153,34 @@ (define_insn "*aarch64_simd_mov<mode>"
    (set_attr "length" "4,4,4,8,8,8,4")]
 )
=20
+(define_insn "load_pair<mode>"
+  [(set (match_operand:VD 0 "register_operand" "=3Dw")
+	(match_operand:VD 1 "aarch64_mem_pair_operand" "Ump"))
+   (set (match_operand:VD 2 "register_operand" "=3Dw")
+	(match_operand:VD 3 "memory_operand" "m"))]
+  "TARGET_SIMD
+   && rtx_equal_p (XEXP (operands[3], 0),
+		   plus_constant (Pmode,
+				  XEXP (operands[1], 0),
+				  GET_MODE_SIZE (<MODE>mode)))"
+  "ldp\\t%d0, %d2, %1"
+  [(set_attr "type" "neon_ldp")]
+)
+
+(define_insn "store_pair<mode>"
+  [(set (match_operand:VD 0 "aarch64_mem_pair_operand" "=3DUmp")
+	(match_operand:VD 1 "register_operand" "w"))
+   (set (match_operand:VD 2 "memory_operand" "=3Dm")
+	(match_operand:VD 3 "register_operand" "w"))]
+  "TARGET_SIMD
+   && rtx_equal_p (XEXP (operands[2], 0),
+		   plus_constant (Pmode,
+				  XEXP (operands[0], 0),
+				  GET_MODE_SIZE (<MODE>mode)))"
+  "stp\\t%d1, %d3, %0"
+  [(set_attr "type" "neon_stp")]
+)
+
 (define_split
   [(set (match_operand:VQ 0 "register_operand" "")
       (match_operand:VQ 1 "register_operand" ""))]
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d7d05b8..7682417 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3491,6 +3491,18 @@ offset_12bit_unsigned_scaled_p (machine_mode mode, H=
OST_WIDE_INT offset)
 	  && offset % GET_MODE_SIZE (mode) =3D=3D 0);
 }
=20
+/* Return true if MODE is one of the modes for which we
+   support LDP/STP operations.  */
+
+static bool
+aarch64_mode_valid_for_sched_fusion_p (machine_mode mode)
+{
+  return mode =3D=3D SImode || mode =3D=3D DImode
+	 || mode =3D=3D SFmode || mode =3D=3D DFmode
+	 || (aarch64_vector_mode_supported_p (mode)
+	     && GET_MODE_SIZE (mode) =3D=3D 8);
+}
+
 /* Return true if X is a valid address for machine mode MODE.  If it is,
    fill in INFO appropriately.  STRICT_P is true if REG_OK_STRICT is in
    effect.  OUTER_CODE is PARALLEL for a load/store pair.  */
@@ -12863,8 +12875,9 @@ fusion_load_store (rtx_insn *insn, rtx *base, rtx *=
offset)
   src =3D SET_SRC (x);
   dest =3D SET_DEST (x);
=20
-  if (GET_MODE (dest) !=3D SImode && GET_MODE (dest) !=3D DImode
-      && GET_MODE (dest) !=3D SFmode && GET_MODE (dest) !=3D DFmode)
+  machine_mode dest_mode =3D GET_MODE (dest);
+
+  if (!aarch64_mode_valid_for_sched_fusion_p (dest_mode))
     return SCHED_FUSION_NONE;
=20
   if (GET_CODE (src) =3D=3D SIGN_EXTEND)
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_vec_64_1.c b/gcc/testsuit=
e/gcc.target/aarch64/ldp_vec_64_1.c
new file mode 100644
index 0000000..62213f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_vec_64_1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+
+typedef int int32x2_t __attribute__ ((__vector_size__ ((8))));
+
+void
+foo (int32x2_t *foo, int32x2_t *bar)
+{
+  int i =3D 0;
+  int32x2_t val =3D { 3, 2 };
+
+  for (i =3D 0; i < 1024; i+=3D2)
+    foo[i] =3D bar[i] + bar[i + 1];
+}
+
+/* { dg-final { scan-assembler "ldp\td\[0-9\]+, d\[0-9\]" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/stp_vec_64_1.c b/gcc/testsuit=
e/gcc.target/aarch64/stp_vec_64_1.c
new file mode 100644
index 0000000..11e757a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stp_vec_64_1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+
+
+typedef int int32x2_t __attribute__ ((__vector_size__ ((8))));
+
+void
+bar (int32x2_t *foo)
+{
+  int i =3D 0;
+  int32x2_t val =3D { 3, 2 };
+
+  for (i =3D 0; i < 256; i+=3D2)
+    {
+      foo[i] =3D val;
+      foo[i+1] =3D val;
+    }
+}
+
+/* { dg-final { scan-assembler "stp\td\[0-9\]+, d\[0-9\]" } } */

--------------040901010004080408070700--