From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 103021 invoked by alias); 26 Jan 2018 14:00:04 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 102806 invoked by uid 89); 26 Jan 2018 14:00:03 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.9 required=5.0 tests=BAYES_00,GIT_PATCH_2,GIT_PATCH_3,KAM_LAZY_DOMAIN_SECURITY,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy= X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 26 Jan 2018 14:00:00 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DF6A91529; Fri, 26 Jan 2018 05:59:58 -0800 (PST) Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DF23F3F53D; Fri, 26 Jan 2018 05:59:57 -0800 (PST) Message-ID: <5A6B345C.9040105@foss.arm.com> Date: Fri, 26 Jan 2018 14:22:00 -0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: gcc-patches@gcc.gnu.org, richard.earnshaw@arm.com, james.greenhalgh@arm.com, marcus.shawcroft@arm.com, richard.sandiford@linaro.org Subject: Re: [AArch64] Fix sve/extract_[12].c for big-endian SVE References: <87d11wpx0r.fsf@linaro.org> In-Reply-To: <87d11wpx0r.fsf@linaro.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2018-01/txt/msg02179.txt.bz2 Hi Richard, On 26/01/18 13:31, Richard Sandiford wrote: > sve/extract_[12].c were relying on the target-independent optimisation > that removes a redundant vec_select, so that we don't end up with > things like: > > dup v0.4s, v0.4s[0] > ...use s0... > > But that optimisation rightly doesn't trigger for big-endian targets, > because GCC expects lane 0 to be in the high part of the register > rather than the low part. > > SVE breaks this assumption -- see the comment at the head of > aarch64-sve.md for details -- so the optimisation is valid for > both endiannesses. Long term, we probably need some kind of target > hook to make GCC aware of this. > > But there's another problem with the current extract pattern: it doesn't > tell the register allocator how cheap an extraction of lane 0 is with > tied registers. It seems better to split the lane 0 case out into > its own pattern and use tied operands for the FPR<-SIMD case, > so that using different registers has the cost of an extra reload. > I think we want this for both endiannesses, regardless of the hook > described above. > > Also, the gen_lowpart in this pattern fails for aarch64_be due to > TARGET_CAN_CHANGE_MODE_CLASS restrictions, so the patch uses gen_rtx_REG > instead. We're only creating this rtl in order to print it, so there's > no need for anything fancier. > > Tested on aarch64_be-elf and aarch64-linux-gnu. OK to install? > > Richard > > > 2018-01-26 Richard Sandiford > > gcc/ > * config/aarch64/aarch64-sve.md (*vec_extract_0): New > pattern. > (*vec_extract_v128): Require a nonzero lane number. > Use gen_rtx_REG rather than gen_lowpart. > > Index: gcc/config/aarch64/aarch64-sve.md > =================================================================== > --- gcc/config/aarch64/aarch64-sve.md 2018-01-13 18:01:51.232735405 +0000 > +++ gcc/config/aarch64/aarch64-sve.md 2018-01-26 13:26:50.176756711 +0000 > @@ -484,18 +484,52 @@ (define_expand "vec_extract" > } > ) > > +;; Extract element zero. This is a special case because we want to force > +;; the registers to be the same for the second alternative, and then > +;; split the instruction into nothing after RA. > +(define_insn_and_split "*vec_extract_0" > + [(set (match_operand: 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv") > + (vec_select: > + (match_operand:SVE_ALL 1 "register_operand" "w, 0, w") > + (parallel [(const_int 0)])))] > + "TARGET_SVE" > + { > + operands[1] = gen_rtx_REG (mode, REGNO (operands[1])); > + switch (which_alternative) > + { > + case 0: > + return "umov\\t%0, %1.[0]"; > + case 1: > + return "#"; > + case 2: > + return "st1\\t{%1.}[0], %0"; > + default: > + gcc_unreachable (); > + } > + } > + "&& reload_completed > + && REG_P (operands[1]) > + && REGNO (operands[0]) == REGNO (operands[1])" Is it guaranteed that operands[0] will be a REG rtx here? The aarch64_simd_nonimmediate_operand predicate and Utv constraint might allow a MEM, which would cause an error in an RTL-checking build. If I'm not mistaken GCC may attempt the split even when matching alternative 2. Kyrill > + [(const_int 0)] > + { > + emit_note (NOTE_INSN_DELETED); > + DONE; > + } > + [(set_attr "type" "neon_to_gp_q, untyped, neon_store1_one_lane_q")] > +) > + > ;; Extract an element from the Advanced SIMD portion of the register. > ;; We don't just reuse the aarch64-simd.md pattern because we don't > -;; want any chnage in lane number on big-endian targets. > +;; want any change in lane number on big-endian targets. > (define_insn "*vec_extract_v128" > [(set (match_operand: 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv") > (vec_select: > (match_operand:SVE_ALL 1 "register_operand" "w, w, w") > (parallel [(match_operand:SI 2 "const_int_operand")])))] > "TARGET_SVE > - && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (mode), 0, 15)" > + && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (mode), 1, 15)" > { > - operands[1] = gen_lowpart (mode, operands[1]); > + operands[1] = gen_rtx_REG (mode, REGNO (operands[1])); > switch (which_alternative) > { > case 0: