From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 79519 invoked by alias); 13 Sep 2017 16:35:02 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 79024 invoked by uid 89); 13 Sep 2017 16:35:02 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.7 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LAZY_DOMAIN_SECURITY,KAM_LOTSOFHASH,RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=H*M:f697 X-HELO: foss.arm.com Received: from usa-sjc-mx-foss1.foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 13 Sep 2017 16:35:00 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 294691435; Wed, 13 Sep 2017 09:34:59 -0700 (PDT) Received: from [10.2.206.195] (e112997-lin.cambridge.arm.com [10.2.206.195]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1684A3F483; Wed, 13 Sep 2017 09:34:57 -0700 (PDT) Subject: Re: [AArch64, PATCH] Improve Neon store of zero To: James Greenhalgh Cc: Wilco Dijkstra , GCC Patches , "richard.sandiford@linaro.org" , nd , Richard Earnshaw References: <20170912162806.GB33912@arm.com> From: Jackson Woodruff Message-ID: <10b0413b-c929-f697-24cc-72fc336f5016@foss.arm.com> Date: Wed, 13 Sep 2017 16:35:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: <20170912162806.GB33912@arm.com> Content-Type: multipart/mixed; boundary="------------9519C547B4939BA547E5EA94" X-IsSubscribed: yes X-SW-Source: 2017-09/txt/msg00816.txt.bz2 This is a multi-part message in MIME format. --------------9519C547B4939BA547E5EA94 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-length: 3495 Hi, I have addressed the issues you raised below. Is the amended patch OK for trunk? Thanks, Jackson. On 09/12/2017 05:28 PM, James Greenhalgh wrote: > On Wed, Sep 06, 2017 at 10:02:52AM +0100, Jackson Woodruff wrote: >> Hi all, >> >> I've attached a new patch that addresses some of the issues raised with >> my original patch. >> >> On 08/23/2017 03:35 PM, Wilco Dijkstra wrote: >>> Richard Sandiford wrote: >>>> >>>> Sorry for only noticing now, but the call to aarch64_legitimate_address_p >>>> is asking whether the MEM itself is a legitimate LDP/STP address. Also, >>>> it might be better to pass false for strict_p, since this can be called >>>> before RA. So maybe: >>>> >>>> if (GET_CODE (operands[0]) == MEM >>>> && !(aarch64_simd_imm_zero (operands[1], mode) >>>> && aarch64_mem_pair_operand (operands[0], mode))) >> >> There were also some issues with the choice of mode for the call the >> aarch64_mem_pair_operand. >> >> For a 128-bit wide mode, we want to check `aarch64_mem_pair_operand >> (operands[0], DImode)` since that's what the stp will be. >> >> For a 64-bit wide mode, we don't need to do that check because a normal >> `str` can be issued. >> >> I've updated the condition as such. >> >>> >>> Is there any reason for doing this check at all (or at least this early during >>> expand)? >> >> Not doing this check means that the zero is forced into a register, so >> we then carry around a bit more RTL and rely on combine to merge things. >> >>> >>> There is a similar issue with this part: >>> >>> (define_insn "*aarch64_simd_mov" >>> [(set (match_operand:VQ 0 "nonimmediate_operand" >>> - "=w, m, w, ?r, ?w, ?r, w") >>> + "=w, Ump, m, w, ?r, ?w, ?r, w") >>> >>> The Ump causes the instruction to always split off the address offset. Ump >>> cannot be used in patterns that are generated before register allocation as it >>> also calls laarch64_legitimate_address_p with strict_p set to true. >> >> I've changed the constraint to a new constraint 'Umq', that acts the >> same as Ump, but calls aarch64_legitimate_address_p with strict_p set to >> false and uses DImode for the mode to pass. > > This looks mostly OK to me, but this conditional: > >> + if (GET_CODE (operands[0]) == MEM >> + && !(aarch64_simd_imm_zero (operands[1], mode) >> + && ((GET_MODE_SIZE (mode) == 16 >> + && aarch64_mem_pair_operand (operands[0], DImode)) >> + || GET_MODE_SIZE (mode) == 8))) > > Has grown a bit too big in such a general pattern to live without a comment > explaining what is going on. > >> +(define_memory_constraint "Umq" >> + "@internal >> + A memory address which uses a base register with an offset small enough for >> + a load/store pair operation in DI mode." >> + (and (match_code "mem") >> + (match_test "aarch64_legitimate_address_p (DImode, XEXP (op, 0), >> + PARALLEL, 0)"))) > > And here you want 'false' rather than '0'. > > I'll happily merge the patch with those changes, please send an update. > > Thanks, > James > > >> >> ChangeLog: >> >> gcc/ >> >> 2017-08-29 Jackson Woodruff >> >> * config/aarch64/constraints.md (Umq): New constraint. >> * config/aarch64/aarch64-simd.md (*aarch64_simd_mov): >> Change to use Umq. >> (mov): Update condition. >> >> gcc/testsuite >> >> 2017-08-29 Jackson Woodruff >> >> * gcc.target/aarch64/simd/vect_str_zero.c: >> Update testcase. > --------------9519C547B4939BA547E5EA94 Content-Type: text/x-patch; name="patchfile" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="patchfile" Content-length: 2974 diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index f3e084f8778d70c82823b92fa80ff96021ad26db..c20e513f59a35f3410eae3eb0fdc2fc86352a9fc 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -23,10 +23,17 @@ (match_operand:VALL_F16 1 "general_operand" ""))] "TARGET_SIMD" " - if (GET_CODE (operands[0]) == MEM - && !(aarch64_simd_imm_zero (operands[1], mode) - && aarch64_legitimate_address_p (mode, operands[0], - PARALLEL, 1))) + /* Force the operand into a register if it is not an + immediate whose use can be replaced with xzr. + If the mode is 16 bytes wide, then we will be doing + a stp in DI mode, so we check the validity of that. + If the mode is 8 bytes wide, then we will do doing a + normal str, so the check need not apply. */ + if (GET_CODE (operands[0]) == MEM + && !(aarch64_simd_imm_zero (operands[1], mode) + && ((GET_MODE_SIZE (mode) == 16 + && aarch64_mem_pair_operand (operands[0], DImode)) + || GET_MODE_SIZE (mode) == 8))) operands[1] = force_reg (mode, operands[1]); " ) @@ -126,7 +133,7 @@ (define_insn "*aarch64_simd_mov" [(set (match_operand:VQ 0 "nonimmediate_operand" - "=w, Ump, m, w, ?r, ?w, ?r, w") + "=w, Umq, m, w, ?r, ?w, ?r, w") (match_operand:VQ 1 "general_operand" "m, Dz, w, w, w, r, r, Dn"))] "TARGET_SIMD diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index 9ce3d4efaf31a301dfb7c1772a6b685fb2cbd2ee..3649fb48a33454c208a6b81e051fdd316c495710 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -156,6 +156,14 @@ (and (match_code "mem") (match_test "REG_P (XEXP (op, 0))"))) +(define_memory_constraint "Umq" + "@internal + A memory address which uses a base register with an offset small enough for + a load/store pair operation in DI mode." + (and (match_code "mem") + (match_test "aarch64_legitimate_address_p (DImode, XEXP (op, 0), + PARALLEL, false)"))) + (define_memory_constraint "Ump" "@internal A memory address suitable for a load/store pair operation." diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vect_str_zero.c b/gcc/testsuite/gcc.target/aarch64/simd/vect_str_zero.c index 07198de109432b530745cc540790303ae0245efb..00cbf20a0b293e71ed713f0c08d89d8a525fa785 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vect_str_zero.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vect_str_zero.c @@ -7,7 +7,7 @@ void f (uint32x4_t *p) { uint32x4_t x = { 0, 0, 0, 0}; - p[1] = x; + p[4] = x; /* { dg-final { scan-assembler "stp\txzr, xzr," } } */ } @@ -16,7 +16,9 @@ void g (float32x2_t *p) { float32x2_t x = {0.0, 0.0}; - p[0] = x; + p[400] = x; /* { dg-final { scan-assembler "str\txzr, " } } */ } + +/* { dg-final { scan-assembler-not "add\tx\[0-9\]\+, x0, \[0-9\]+" } } */ --------------9519C547B4939BA547E5EA94--