From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7199 invoked by alias); 1 Dec 2015 10:25:11 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 7188 invoked by uid 89); 1 Dec 2015 10:25:10 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 01 Dec 2015 10:25:09 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AA10149; Tue, 1 Dec 2015 02:24:47 -0800 (PST) Received: from e105689-lin.cambridge.arm.com (e105689-lin.cambridge.arm.com [10.2.207.32]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 357C13F267; Tue, 1 Dec 2015 02:25:06 -0800 (PST) Subject: Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address To: "Bin.Cheng" References: <000001d12119$49548570$dbfd9050$@arm.com> <20151117100800.GA6727@arm.com> <564F5ABF.2020302@foss.arm.com> <5654343A.2080609@foss.arm.com> <56543963.3070704@foss.arm.com> Cc: James Greenhalgh , Bin Cheng , gcc-patches List From: Richard Earnshaw X-Enigmail-Draft-Status: N1110 Message-ID: <565D7580.6030303@foss.arm.com> Date: Tue, 01 Dec 2015 10:25:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2015-12/txt/msg00058.txt.bz2 On 01/12/15 03:19, Bin.Cheng wrote: > On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw > wrote: >> On 24/11/15 09:56, Richard Earnshaw wrote: >>> On 24/11/15 02:51, Bin.Cheng wrote: >>>>>> The aarch64's problem is we don't define addptr3 pattern, and we don't >>>>>>>> have direct insn pattern describing the "x + y << z". According to >>>>>>>> gcc internal: >>>>>>>> >>>>>>>> ‘addptrm3’ >>>>>>>> Like addm3 but is guaranteed to only be used for address calculations. >>>>>>>> The expanded code is not allowed to clobber the condition code. It >>>>>>>> only needs to be defined if addm3 sets the condition code. >>>>>> >>>>>> addm3 on aarch64 does not set the condition codes, so by this rule we >>>>>> shouldn't need to define this pattern. >>>> Hi Richard, >>>> I think that rule has a prerequisite that backend needs to support >>>> register shifted addition in addm3 pattern. >>> >>> addm3 is a named pattern and its format is well defined. It does not >>> take a shifted operand and never has. >>> >>>> Apparently for AArch64, >>>> addm3 only supports "reg+reg" or "reg+imm". Also we don't really >>>> "does not set the condition codes" actually, because both >>>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags. >>> >>> You appear to be confusing named patterns (used by expand) with >>> recognizers. Anyway, we have >>> >>> (define_insn "*add__" >>> [(set (match_operand:GPI 0 "register_operand" "=r") >>> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r") >>> (match_operand:QI 2 >>> "aarch64_shift_imm_" "n")) >>> (match_operand:GPI 3 "register_operand" "r")))] >>> >>> Which is a non-flag setting add with shifted operand. >>> >>>> Either way I think it is another backend issue, so do you approve that >>>> I commit this patch now? >>> >>> Not yet. I think there's something fundamental amiss here. >>> >>> BTW, it looks to me as though addptr3 should have exactly the same >>> operand rules as add3 (documentation reads "like add3"), so a >>> shifted operand shouldn't be supported there either. If that isn't the >>> case then that should be clearly called out in the documentation. >>> >>> R. >>> >> >> PS. >> >> I presume you are aware of the canonicalization rules for add? That is, >> for a shift-and-add operation, the shift operand must appear first. Ie. >> >> (plus (shift (op, op)), op) >> >> not >> >> (plus (op, (shift (op, op)) > > Hi Richard, > Thanks for the comments. I realized that the not-recognized insn > issue is because the original patch build non-canonical expressions. > When reloading address expression, LRA generates non-canonical > register scaled insn, which can't be recognized by aarch64 backend. > > Here is the updated patch using canonical form pattern, it passes > bootstrap and regression test. Well, the ivo failure still exists, > but it analyzed in the original message. > > Is this patch OK? > > As for Jiong's concern about the additional extension instruction, I > think this only stands for atmoic load store instructions. For > general load store, AArch64 supports zext/sext in register scaling > addressing mode, the additional instruction can be forward propagated > into memory reference. The problem for atomic load store is AArch64 > only supports direct register addressing mode. After LRA reloads > address expression out of memory reference, there is no combine/fwprop > optimizer to merge instructions. The problem is atomic_store's > predicate doesn't match its constraint. The predicate used for > atomic_store is memory_operand, while all other atomic patterns > use aarch64_sync_memory_operand. I think this might be a typo. With > this change, expand will not generate addressing mode requiring reload > anymore. I will test another patch fixing this. > > Thanks, > bin Some comments inline. >> >> R. >> >> aarch64_legitimize_addr-20151128.txt >> >> >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >> index 3fe2f0f..5b3e3c4 100644 >> --- a/gcc/config/aarch64/aarch64.c >> +++ b/gcc/config/aarch64/aarch64.c >> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) >> We try to pick as large a range for the offset as possible to >> maximize the chance of a CSE. However, for aligned addresses >> we limit the range to 4k so that structures with different sized >> - elements are likely to use the same base. */ >> + elements are likely to use the same base. We need to be careful >> + not split CONST for some forms address expressions, otherwise it not to split a CONST for some forms of address expression, >> + will generate sub-optimal code. */ >> >> if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))) >> { >> HOST_WIDE_INT offset = INTVAL (XEXP (x, 1)); >> HOST_WIDE_INT base_offset; >> >> + if (GET_CODE (XEXP (x, 0)) == PLUS) >> + { >> + rtx op0 = XEXP (XEXP (x, 0), 0); >> + rtx op1 = XEXP (XEXP (x, 0), 1); >> + >> + /* For addr expression in the form like "r1 + r2 + 0x3ffc". >> + Since the offset is within range supported by addressing >> + mode "reg+offset", we don't split the const and legalize >> + it into below insn and expr sequence: >> + r3 = r1 + r2; >> + "r3 + 0x3ffc". */ I think this comment would read better as /* Address expressions of the form Ra + Rb + CONST. If CONST is within the range supported by the addressing mode "reg+offset", do not split CONST and use the sequence Rt = Ra + Rb addr = Rt + CONST. */ >> + if (REG_P (op0) && REG_P (op1)) >> + { >> + machine_mode addr_mode = GET_MODE (x); >> + rtx base = gen_reg_rtx (addr_mode); >> + rtx addr = plus_constant (addr_mode, base, offset); >> + >> + if (aarch64_legitimate_address_hook_p (mode, addr, false)) >> + { >> + emit_insn (gen_adddi3 (base, op0, op1)); >> + return addr; >> + } >> + } >> + /* For addr expression in the form like "r1 + r2<<2 + 0x3ffc". >> + Live above, we don't split the const and legalize it into >> + below insn and expr sequence: Similarly. >> + r3 = 0x3ffc; >> + r4 = r1 + r3; >> + "r4 + r2<<2". */ Why don't we generate r3 = r1 + r2 << 2 r4 = r3 + 0x3ffc utilizing the shift-and-add instructions? >> + else if (REG_P (op0) || REG_P (op1)) >> + { >> + machine_mode addr_mode = GET_MODE (x); >> + rtx base = gen_reg_rtx (addr_mode); >> + >> + /* Switch to make sure that register is in op0. */ >> + if (REG_P (op1)) >> + std::swap (op0, op1); >> + >> + rtx addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base); Use gen_rtx_PLUS. >> + >> + if (aarch64_legitimate_address_hook_p (mode, addr, false)) >> + { >> + base = force_operand (plus_constant (addr_mode, >> + op0, offset), >> + NULL_RTX); >> + return gen_rtx_fmt_ee (PLUS, addr_mode, op1, base); Likewise. >> + } >> + } >> + } >> + >> /* Does it look like we'll need a load/store-pair operation? */ >> if (GET_MODE_SIZE (mode) > 16 >> || mode == TImode) R.