From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 79787 invoked by alias); 16 Oct 2015 12:58:45 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 79771 invoked by uid 89); 16 Oct 2015 12:58:43 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 16 Oct 2015 12:58:41 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-10-_6SvObwTRM-mmVhuAEUMvQ-1; Fri, 16 Oct 2015 13:58:36 +0100 Received: from [10.2.207.50] ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Fri, 16 Oct 2015 13:58:36 +0100 Message-ID: <5620F47B.9010107@arm.com> Date: Fri, 16 Oct 2015 12:59:00 -0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: GCC Patches CC: Marcus Shawcroft , Richard Earnshaw , James Greenhalgh Subject: [PATCH][AArch64] Add support for 64-bit vector-mode ldp/stp X-MC-Unique: _6SvObwTRM-mmVhuAEUMvQ-1 Content-Type: multipart/mixed; boundary="------------040901010004080408070700" X-IsSubscribed: yes X-SW-Source: 2015-10/txt/msg01591.txt.bz2 This is a multi-part message in MIME format. --------------040901010004080408070700 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Content-length: 1160 Hi all, We already support load/store-pair operations on the D-registers when they = contain an FP value, but the peepholes/sched-fusion machinery that do all the hard work currently ignore 64-bit vector modes. This patch adds support for fusing loads/stores of 64-bit vector operands i= nto ldp and stp instructions. I've seen this trigger a few times in SPEC2006. Not too many times, but the= times it did trigger the code seemed objectively better i.e. long sequences of ldr and str instructions essentially halved in size. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill 2015-10-16 Kyrylo Tkachov * config/aarch64/aarch64.c (aarch64_mode_valid_for_sched_fusion_p): New function. (fusion_load_store): Use it. * config/aarch64/aarch64-ldpstp.md: Add new peephole2s for ldp and stp in VD modes. * config/aarch64/aarch64-simd.md (load_pair, VD): New pattern. (store_pair, VD): Likewise. 2015-10-16 Kyrylo Tkachov * gcc.target/aarch64/stp_vec_64_1.c: New test. * gcc.target/aarch64/ldp_vec_64_1.c: New test. --------------040901010004080408070700 Content-Type: text/x-patch; name=aarch64-ldp-stp-64.patch Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="aarch64-ldp-stp-64.patch" Content-length: 5887 commit b5f4a5b87a7315fb8a4d88da3e4c4afc52d16052 Author: Kyrylo Tkachov Date: Tue Oct 6 12:08:24 2015 +0100 [AArch64] Add support for 64-bit vector-mode ldp/stp diff --git a/gcc/config/aarch64/aarch64-ldpstp.md b/gcc/config/aarch64/aarc= h64-ldpstp.md index 8d6d882..458829c 100644 --- a/gcc/config/aarch64/aarch64-ldpstp.md +++ b/gcc/config/aarch64/aarch64-ldpstp.md @@ -98,6 +98,47 @@ (define_peephole2 } }) =20 +(define_peephole2 + [(set (match_operand:VD 0 "register_operand" "") + (match_operand:VD 1 "aarch64_mem_pair_operand" "")) + (set (match_operand:VD 2 "register_operand" "") + (match_operand:VD 3 "memory_operand" ""))] + "aarch64_operands_ok_for_ldpstp (operands, true, mode)" + [(parallel [(set (match_dup 0) (match_dup 1)) + (set (match_dup 2) (match_dup 3))])] +{ + rtx base, offset_1, offset_2; + + extract_base_offset_in_addr (operands[1], &base, &offset_1); + extract_base_offset_in_addr (operands[3], &base, &offset_2); + if (INTVAL (offset_1) > INTVAL (offset_2)) + { + std::swap (operands[0], operands[2]); + std::swap (operands[1], operands[3]); + } +}) + +(define_peephole2 + [(set (match_operand:VD 0 "aarch64_mem_pair_operand" "") + (match_operand:VD 1 "register_operand" "")) + (set (match_operand:VD 2 "memory_operand" "") + (match_operand:VD 3 "register_operand" ""))] + "TARGET_SIMD && aarch64_operands_ok_for_ldpstp (operands, false, m= ode)" + [(parallel [(set (match_dup 0) (match_dup 1)) + (set (match_dup 2) (match_dup 3))])] +{ + rtx base, offset_1, offset_2; + + extract_base_offset_in_addr (operands[0], &base, &offset_1); + extract_base_offset_in_addr (operands[2], &base, &offset_2); + if (INTVAL (offset_1) > INTVAL (offset_2)) + { + std::swap (operands[0], operands[2]); + std::swap (operands[1], operands[3]); + } +}) + + ;; Handle sign/zero extended consecutive load/store. =20 (define_peephole2 diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch6= 4-simd.md index 6a2ab61..bf051c3 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -153,6 +153,34 @@ (define_insn "*aarch64_simd_mov" (set_attr "length" "4,4,4,8,8,8,4")] ) =20 +(define_insn "load_pair" + [(set (match_operand:VD 0 "register_operand" "=3Dw") + (match_operand:VD 1 "aarch64_mem_pair_operand" "Ump")) + (set (match_operand:VD 2 "register_operand" "=3Dw") + (match_operand:VD 3 "memory_operand" "m"))] + "TARGET_SIMD + && rtx_equal_p (XEXP (operands[3], 0), + plus_constant (Pmode, + XEXP (operands[1], 0), + GET_MODE_SIZE (mode)))" + "ldp\\t%d0, %d2, %1" + [(set_attr "type" "neon_ldp")] +) + +(define_insn "store_pair" + [(set (match_operand:VD 0 "aarch64_mem_pair_operand" "=3DUmp") + (match_operand:VD 1 "register_operand" "w")) + (set (match_operand:VD 2 "memory_operand" "=3Dm") + (match_operand:VD 3 "register_operand" "w"))] + "TARGET_SIMD + && rtx_equal_p (XEXP (operands[2], 0), + plus_constant (Pmode, + XEXP (operands[0], 0), + GET_MODE_SIZE (mode)))" + "stp\\t%d1, %d3, %0" + [(set_attr "type" "neon_stp")] +) + (define_split [(set (match_operand:VQ 0 "register_operand" "") (match_operand:VQ 1 "register_operand" ""))] diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index d7d05b8..7682417 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3491,6 +3491,18 @@ offset_12bit_unsigned_scaled_p (machine_mode mode, H= OST_WIDE_INT offset) && offset % GET_MODE_SIZE (mode) =3D=3D 0); } =20 +/* Return true if MODE is one of the modes for which we + support LDP/STP operations. */ + +static bool +aarch64_mode_valid_for_sched_fusion_p (machine_mode mode) +{ + return mode =3D=3D SImode || mode =3D=3D DImode + || mode =3D=3D SFmode || mode =3D=3D DFmode + || (aarch64_vector_mode_supported_p (mode) + && GET_MODE_SIZE (mode) =3D=3D 8); +} + /* Return true if X is a valid address for machine mode MODE. If it is, fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in effect. OUTER_CODE is PARALLEL for a load/store pair. */ @@ -12863,8 +12875,9 @@ fusion_load_store (rtx_insn *insn, rtx *base, rtx *= offset) src =3D SET_SRC (x); dest =3D SET_DEST (x); =20 - if (GET_MODE (dest) !=3D SImode && GET_MODE (dest) !=3D DImode - && GET_MODE (dest) !=3D SFmode && GET_MODE (dest) !=3D DFmode) + machine_mode dest_mode =3D GET_MODE (dest); + + if (!aarch64_mode_valid_for_sched_fusion_p (dest_mode)) return SCHED_FUSION_NONE; =20 if (GET_CODE (src) =3D=3D SIGN_EXTEND) diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_vec_64_1.c b/gcc/testsuit= e/gcc.target/aarch64/ldp_vec_64_1.c new file mode 100644 index 0000000..62213f3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ldp_vec_64_1.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast" } */ + +typedef int int32x2_t __attribute__ ((__vector_size__ ((8)))); + +void +foo (int32x2_t *foo, int32x2_t *bar) +{ + int i =3D 0; + int32x2_t val =3D { 3, 2 }; + + for (i =3D 0; i < 1024; i+=3D2) + foo[i] =3D bar[i] + bar[i + 1]; +} + +/* { dg-final { scan-assembler "ldp\td\[0-9\]+, d\[0-9\]" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/stp_vec_64_1.c b/gcc/testsuit= e/gcc.target/aarch64/stp_vec_64_1.c new file mode 100644 index 0000000..11e757a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/stp_vec_64_1.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast" } */ + + +typedef int int32x2_t __attribute__ ((__vector_size__ ((8)))); + +void +bar (int32x2_t *foo) +{ + int i =3D 0; + int32x2_t val =3D { 3, 2 }; + + for (i =3D 0; i < 256; i+=3D2) + { + foo[i] =3D val; + foo[i+1] =3D val; + } +} + +/* { dg-final { scan-assembler "stp\td\[0-9\]+, d\[0-9\]" } } */ --------------040901010004080408070700--