From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 104713 invoked by alias); 7 Dec 2018 17:34:23 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 104095 invoked by uid 89); 7 Dec 2018 17:34:22 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_PASS autolearn=ham version=3.3.2 spammy=clones, masked X-HELO: foss.arm.com Received: from usa-sjc-mx-foss1.foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 07 Dec 2018 17:34:19 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B3A34EBD; Fri, 7 Dec 2018 09:34:17 -0800 (PST) Received: from localhost (unknown [10.32.99.101]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 149893F5AF; Fri, 7 Dec 2018 09:34:16 -0800 (PST) From: Richard Sandiford To: Steve Ellcey Mail-Followup-To: Steve Ellcey ,gcc-patches , richard.sandiford@arm.com Cc: gcc-patches Subject: Re: [Patch 1/4][Aarch64] v2: Implement Aarch64 SIMD ABI References: <1541699539.12016.6.camel@cavium.com> Date: Fri, 07 Dec 2018 17:34:00 -0000 In-Reply-To: <1541699539.12016.6.camel@cavium.com> (Steve Ellcey's message of "Thu, 8 Nov 2018 17:52:20 +0000") Message-ID: <87in05utpk.fsf@arm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-SW-Source: 2018-12/txt/msg00493.txt.bz2 Sorry for the slow review. Steve Ellcey writes: > @@ -1470,6 +1479,45 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode) > return false; > } > > +/* Return true if this is a definition of a vectorized simd function. */ > + > +static bool > +aarch64_simd_decl_p (tree fndecl) > +{ > + tree fntype; > + > + if (fndecl == NULL) > + return false; > + fntype = TREE_TYPE (fndecl); > + if (fntype == NULL) > + return false; > + > + /* All functions with the aarch64_vector_pcs attribute use the simd ABI. */ > + if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)) != NULL) > + return true; > + /* Functions without the aarch64_vector_pcs or simd attribute never use the > + simd ABI. */ > + if (lookup_attribute ("simd", DECL_ATTRIBUTES (fndecl)) == NULL) > + return false; > + /* Functions with the simd attribute can generate three versions of a > + function, a masked vector function, an unmasked vector function, > + and a scalar version. Only the vector versions use the simd ABI. */ > + return (VECTOR_TYPE_P (TREE_TYPE (fntype))); Is this enough? E.g.: void __attribute__ ((simd)) f (int *x) { *x = 1; } generates SIMD clones but doesn't have a vector return type. I'm not an expert on this stuff, but it looks like: struct cgraph_node *node = cgraph_node::get (fndecl); return node && node->simdclone; might work. But in some ways it would be cleaner to add the aarch64_vector_pcs attribute for SIMD clones, e.g. via a new hook, so that the function type is "correct". It might be more efficient to save aarch64_simd_decl_p in cfun->machine. > @@ -4863,6 +4949,7 @@ aarch64_process_components (sbitmap components, bool prologue_p) > mergeable with the current one into a pair. */ > if (!satisfies_constraint_Ump (mem) > || GP_REGNUM_P (regno) != GP_REGNUM_P (regno2) > + || (aarch64_simd_decl_p (cfun->decl) && (FP_REGNUM_P (regno))) Formatting nit: redundant brackets around FP_REGNUM_P (regno). > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 82af4d4..44261ee 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -724,7 +724,13 @@ > "" > ) > > -(define_insn "simple_return" > +(define_expand "simple_return" > + [(simple_return)] > + "aarch64_use_simple_return_insn_p ()" > + "" > +) > + > +(define_insn "*simple_return" > [(simple_return)] > "" > "ret" Can't you just change the condition on the existing define_insn, without turning it in a define_expand? Worth a comment explaining why if not. > @@ -1487,6 +1538,23 @@ > [(set_attr "type" "neon_store1_2reg")] > ) > > +(define_insn "storewb_pair_" > + [(parallel > + [(set (match_operand:P 0 "register_operand" "=&k") > + (plus:P (match_operand:P 1 "register_operand" "0") > + (match_operand:P 4 "aarch64_mem_pair_offset" "n"))) > + (set (mem:TX (plus:P (match_dup 0) > + (match_dup 4))) Should be indented under the (match_dup 0). > + (match_operand:TX 2 "register_operand" "w")) > + (set (mem:TX (plus:P (match_dup 0) > + (match_operand:P 5 "const_int_operand" "n"))) > + (match_operand:TX 3 "register_operand" "w"))])] Think this last part should be: (set (mem:TX (plus:P (plus:P (match_dup 0) (match_dup 4)) (match_operand:P 5 "const_int_operand" "n"))) (match_operand:TX 3 "register_operand" "w"))])] > + "TARGET_SIMD && > + INTVAL (operands[5]) == INTVAL (operands[4]) + GET_MODE_SIZE (mode)" && should be on the second line (which makes the line long enough to need breaking). > diff --git a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c > index e69de29..bf6e64a 100644 > --- a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c > +++ b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c > @@ -0,0 +1,33 @@ > +/* { dg-do compile } */ > + > +void > +f (void) > +{ > + /* Clobber all fp/simd regs and verify that the correct ones are saved > + and restored in the prologue and epilogue of a SIMD function. */ > + __asm__ __volatile__ ("" ::: "q0", "q1", "q2", "q3"); > + __asm__ __volatile__ ("" ::: "q4", "q5", "q6", "q7"); > + __asm__ __volatile__ ("" ::: "q8", "q9", "q10", "q11"); > + __asm__ __volatile__ ("" ::: "q12", "q13", "q14", "q15"); > + __asm__ __volatile__ ("" ::: "q16", "q17", "q18", "q19"); > + __asm__ __volatile__ ("" ::: "q20", "q21", "q22", "q23"); > + __asm__ __volatile__ ("" ::: "q24", "q25", "q26", "q27"); > + __asm__ __volatile__ ("" ::: "q28", "q29", "q30", "q31"); > +} > + > +/* { dg-final { scan-assembler {\sstp\td8, d9} } } */ > +/* { dg-final { scan-assembler {\sstp\td10, d11} } } */ > +/* { dg-final { scan-assembler {\sstp\td12, d13} } } */ > +/* { dg-final { scan-assembler {\sstp\td14, d15} } } */ > +/* { dg-final { scan-assembler {\sldp\td8, d9} } } */ > +/* { dg-final { scan-assembler {\sldp\td10, d11} } } */ > +/* { dg-final { scan-assembler {\sldp\td12, d13} } } */ > +/* { dg-final { scan-assembler {\sldp\td14, d15} } } */ > +/* { dg-final { scan-assembler-not {\sstp\tq} } } */ > +/* { dg-final { scan-assembler-not {\sldp\tq} } } */ > +/* { dg-final { scan-assembler-not {\sstp\tq[01234567]} } } */ > +/* { dg-final { scan-assembler-not {\sldp\tq[01234567]} } } */ > +/* { dg-final { scan-assembler-not {\sstp\tq1[6789]} } } */ > +/* { dg-final { scan-assembler-not {\sldp\tq1[6789]} } } */ > +/* { dg-final { scan-assembler-not {\sstr\t} } } */ > +/* { dg-final { scan-assembler-not {\sldr\t} } } */ Comment doesn't match code: this is testing a normal PCS function. > diff --git a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-5.c b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-5.c > index e69de29..7d639a5e 100644 > --- a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-5.c > +++ b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-5.c > @@ -0,0 +1,22 @@ > +/* { dg-do compile } */ > + > +void __attribute__ ((aarch64_vector_pcs)) > +f (void) > +{ > + /* Clobber some fp/simd regs and verify that only those are saved > + and restored in the prologue and epilogue of a SIMD function. */ > + __asm__ __volatile__ ("" ::: "q8", "q9", "q10", "q11"); > +} > + > +/* { dg-final { scan-assembler {\sstp\tq8, q9} } } */ > +/* { dg-final { scan-assembler {\sstp\tq10, q11} } } */ > +/* { dg-final { scan-assembler-not {\sstp\tq[034567]} } } */ > +/* { dg-final { scan-assembler-not {\sldp\tq[034567]} } } */ > +/* { dg-final { scan-assembler-not {\sstp\tq1[23456789]} } } */ > +/* { dg-final { scan-assembler-not {\sldp\tq1[23456789]} } } */ > +/* { dg-final { scan-assembler-not {\sstp\tq2[456789]} } } */ > +/* { dg-final { scan-assembler-not {\sldp\tq2[456789]} } } */ > +/* { dg-final { scan-assembler-not {\sstp\td} } } */ > +/* { dg-final { scan-assembler-not {\sldp\td} } } */ > +/* { dg-final { scan-assembler-not {\sstr\t} } } */ > +/* { dg-final { scan-assembler-not {\sldr\t} } } */ This is a nice test, but I think it would also be good to have versions that don't clobber full register pairs. E.g. one without q9 and another without q10 would test individual STR Qs. LGTM otherwise. Thanks, Richard