From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 52856 invoked by alias); 3 Dec 2015 10:26:22 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 52838 invoked by uid 89); 3 Dec 2015 10:26:21 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 03 Dec 2015 10:26:19 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6F20C49; Thu, 3 Dec 2015 02:25:57 -0800 (PST) Received: from e105689-lin.cambridge.arm.com (e105689-lin.cambridge.arm.com [10.2.207.32]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9A0E63F4FF; Thu, 3 Dec 2015 02:26:16 -0800 (PST) Subject: Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address To: "Bin.Cheng" References: <000001d12119$49548570$dbfd9050$@arm.com> <20151117100800.GA6727@arm.com> <564F5ABF.2020302@foss.arm.com> <5654343A.2080609@foss.arm.com> <56543963.3070704@foss.arm.com> <565D7580.6030303@foss.arm.com> Cc: James Greenhalgh , Bin Cheng , gcc-patches List From: Richard Earnshaw Message-ID: <566018C7.4000107@foss.arm.com> Date: Thu, 03 Dec 2015 10:26:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2015-12/txt/msg00410.txt.bz2 On 03/12/15 05:26, Bin.Cheng wrote: > On Tue, Dec 1, 2015 at 6:25 PM, Richard Earnshaw > wrote: >> On 01/12/15 03:19, Bin.Cheng wrote: >>> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw >>> wrote: >>>> On 24/11/15 09:56, Richard Earnshaw wrote: >>>>> On 24/11/15 02:51, Bin.Cheng wrote: >>>>>>>> The aarch64's problem is we don't define addptr3 pattern, and we don't >>>>>>>>>> have direct insn pattern describing the "x + y << z". According to >>>>>>>>>> gcc internal: >>>>>>>>>> >>>>>>>>>> ‘addptrm3’ >>>>>>>>>> Like addm3 but is guaranteed to only be used for address calculations. >>>>>>>>>> The expanded code is not allowed to clobber the condition code. It >>>>>>>>>> only needs to be defined if addm3 sets the condition code. >>>>>>>> >>>>>>>> addm3 on aarch64 does not set the condition codes, so by this rule we >>>>>>>> shouldn't need to define this pattern. >>>>>> Hi Richard, >>>>>> I think that rule has a prerequisite that backend needs to support >>>>>> register shifted addition in addm3 pattern. >>>>> >>>>> addm3 is a named pattern and its format is well defined. It does not >>>>> take a shifted operand and never has. >>>>> >>>>>> Apparently for AArch64, >>>>>> addm3 only supports "reg+reg" or "reg+imm". Also we don't really >>>>>> "does not set the condition codes" actually, because both >>>>>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags. >>>>> >>>>> You appear to be confusing named patterns (used by expand) with >>>>> recognizers. Anyway, we have >>>>> >>>>> (define_insn "*add__" >>>>> [(set (match_operand:GPI 0 "register_operand" "=r") >>>>> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r") >>>>> (match_operand:QI 2 >>>>> "aarch64_shift_imm_" "n")) >>>>> (match_operand:GPI 3 "register_operand" "r")))] >>>>> >>>>> Which is a non-flag setting add with shifted operand. >>>>> >>>>>> Either way I think it is another backend issue, so do you approve that >>>>>> I commit this patch now? >>>>> >>>>> Not yet. I think there's something fundamental amiss here. >>>>> >>>>> BTW, it looks to me as though addptr3 should have exactly the same >>>>> operand rules as add3 (documentation reads "like add3"), so a >>>>> shifted operand shouldn't be supported there either. If that isn't the >>>>> case then that should be clearly called out in the documentation. >>>>> >>>>> R. >>>>> >>>> >>>> PS. >>>> >>>> I presume you are aware of the canonicalization rules for add? That is, >>>> for a shift-and-add operation, the shift operand must appear first. Ie. >>>> >>>> (plus (shift (op, op)), op) >>>> >>>> not >>>> >>>> (plus (op, (shift (op, op)) >>> >>> Hi Richard, >>> Thanks for the comments. I realized that the not-recognized insn >>> issue is because the original patch build non-canonical expressions. >>> When reloading address expression, LRA generates non-canonical >>> register scaled insn, which can't be recognized by aarch64 backend. >>> >>> Here is the updated patch using canonical form pattern, it passes >>> bootstrap and regression test. Well, the ivo failure still exists, >>> but it analyzed in the original message. >>> >>> Is this patch OK? >>> >>> As for Jiong's concern about the additional extension instruction, I >>> think this only stands for atmoic load store instructions. For >>> general load store, AArch64 supports zext/sext in register scaling >>> addressing mode, the additional instruction can be forward propagated >>> into memory reference. The problem for atomic load store is AArch64 >>> only supports direct register addressing mode. After LRA reloads >>> address expression out of memory reference, there is no combine/fwprop >>> optimizer to merge instructions. The problem is atomic_store's >>> predicate doesn't match its constraint. The predicate used for >>> atomic_store is memory_operand, while all other atomic patterns >>> use aarch64_sync_memory_operand. I think this might be a typo. With >>> this change, expand will not generate addressing mode requiring reload >>> anymore. I will test another patch fixing this. >>> >>> Thanks, >>> bin >> >> Some comments inline. >> >>>> >>>> R. >>>> >>>> aarch64_legitimize_addr-20151128.txt >>>> >>>> >>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>>> index 3fe2f0f..5b3e3c4 100644 >>>> --- a/gcc/config/aarch64/aarch64.c >>>> +++ b/gcc/config/aarch64/aarch64.c >>>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) >>>> We try to pick as large a range for the offset as possible to >>>> maximize the chance of a CSE. However, for aligned addresses >>>> we limit the range to 4k so that structures with different sized >>>> - elements are likely to use the same base. */ >>>> + elements are likely to use the same base. We need to be careful >>>> + not split CONST for some forms address expressions, otherwise it >> >> not to split a CONST for some forms of address expression, >> >>>> + will generate sub-optimal code. */ >>>> >>>> if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))) >>>> { >>>> HOST_WIDE_INT offset = INTVAL (XEXP (x, 1)); >>>> HOST_WIDE_INT base_offset; >>>> >>>> + if (GET_CODE (XEXP (x, 0)) == PLUS) >>>> + { >>>> + rtx op0 = XEXP (XEXP (x, 0), 0); >>>> + rtx op1 = XEXP (XEXP (x, 0), 1); >>>> + >>>> + /* For addr expression in the form like "r1 + r2 + 0x3ffc". >>>> + Since the offset is within range supported by addressing >>>> + mode "reg+offset", we don't split the const and legalize >>>> + it into below insn and expr sequence: >>>> + r3 = r1 + r2; >>>> + "r3 + 0x3ffc". */ >> >> I think this comment would read better as >> >> /* Address expressions of the form Ra + Rb + CONST. >> >> If CONST is within the range supported by the addressing >> mode "reg+offset", do not split CONST and use the >> sequence >> Rt = Ra + Rb >> addr = Rt + CONST. */ >> >>>> + if (REG_P (op0) && REG_P (op1)) >>>> + { >>>> + machine_mode addr_mode = GET_MODE (x); >>>> + rtx base = gen_reg_rtx (addr_mode); >>>> + rtx addr = plus_constant (addr_mode, base, offset); >>>> + >>>> + if (aarch64_legitimate_address_hook_p (mode, addr, false)) >>>> + { >>>> + emit_insn (gen_adddi3 (base, op0, op1)); >>>> + return addr; >>>> + } >>>> + } >>>> + /* For addr expression in the form like "r1 + r2<<2 + 0x3ffc". >>>> + Live above, we don't split the const and legalize it into >>>> + below insn and expr sequence: >> >> Similarly. >>>> + r3 = 0x3ffc; >>>> + r4 = r1 + r3; >>>> + "r4 + r2<<2". */ >> >> Why don't we generate >> >> r3 = r1 + r2 << 2 >> r4 = r3 + 0x3ffc >> >> utilizing the shift-and-add instructions? > > All other comments are addressed in the attached new patch. > As for this question, Wilco also asked it on internal channel before. > The main idea is to depend on GIMPLE IVO/SLSR to find CSE > opportunities of the scaled plus sub expr. The scaled index is most > likely loop iv, so I would like to split const plus out of memory > reference so that it can be identified/hoisted as loop invariant. > This is more important when base is sfp related. > Ah, yes. The SFP problem. Since at least two people have queried this, it's clearly non-obvious enough to require explanation in comment. OK with that change and a suitable changelog entry. R. > Thanks, > bin > > > aarch64_legitimize_addr-20151203.txt > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 3fe2f0f..248937e 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -4757,13 +4757,67 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) > We try to pick as large a range for the offset as possible to > maximize the chance of a CSE. However, for aligned addresses > we limit the range to 4k so that structures with different sized > - elements are likely to use the same base. */ > + elements are likely to use the same base. We need to be careful > + not to split a CONST for some forms of address expression, otherwise > + it will generate sub-optimal code. */ > > if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))) > { > HOST_WIDE_INT offset = INTVAL (XEXP (x, 1)); > HOST_WIDE_INT base_offset; > > + if (GET_CODE (XEXP (x, 0)) == PLUS) > + { > + rtx op0 = XEXP (XEXP (x, 0), 0); > + rtx op1 = XEXP (XEXP (x, 0), 1); > + > + /* Address expressions of the form Ra + Rb + CONST. > + > + If CONST is within the range supported by the addressing > + mode "reg+offset", do not split CONST and use the > + sequence > + Rt = Ra + Rb; > + addr = Rt + CONST. */ > + if (REG_P (op0) && REG_P (op1)) > + { > + machine_mode addr_mode = GET_MODE (x); > + rtx base = gen_reg_rtx (addr_mode); > + rtx addr = plus_constant (addr_mode, base, offset); > + > + if (aarch64_legitimate_address_hook_p (mode, addr, false)) > + { > + emit_insn (gen_adddi3 (base, op0, op1)); > + return addr; > + } > + } > + /* Address expressions of the form Ra + Rb< + > + If Reg + Rb< + split CONST and use the sequence > + Rc = CONST; > + Rt = Ra + Rc; > + addr = Rt + Rb< + else if (REG_P (op0) || REG_P (op1)) > + { > + machine_mode addr_mode = GET_MODE (x); > + rtx base = gen_reg_rtx (addr_mode); > + > + /* Switch to make sure that register is in op0. */ > + if (REG_P (op1)) > + std::swap (op0, op1); > + > + rtx addr = gen_rtx_PLUS (addr_mode, op1, base); > + > + if (aarch64_legitimate_address_hook_p (mode, addr, false)) > + { > + base = force_operand (plus_constant (addr_mode, > + op0, offset), > + NULL_RTX); > + return gen_rtx_PLUS (addr_mode, op1, base); > + } > + } > + } > + > /* Does it look like we'll need a load/store-pair operation? */ > if (GET_MODE_SIZE (mode) > 16 > || mode == TImode) >