From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 107766 invoked by alias); 4 Dec 2015 03:18:50 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 107716 invoked by uid 89); 4 Dec 2015 03:18:41 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-vk0-f41.google.com Received: from mail-vk0-f41.google.com (HELO mail-vk0-f41.google.com) (209.85.213.41) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Fri, 04 Dec 2015 03:18:39 +0000 Received: by vkca188 with SMTP id a188so58606937vkc.0 for ; Thu, 03 Dec 2015 19:18:37 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.31.54.1 with SMTP id d1mr8769398vka.7.1449199116861; Thu, 03 Dec 2015 19:18:36 -0800 (PST) Received: by 10.103.45.207 with HTTP; Thu, 3 Dec 2015 19:18:36 -0800 (PST) In-Reply-To: <566018C7.4000107@foss.arm.com> References: <000001d12119$49548570$dbfd9050$@arm.com> <20151117100800.GA6727@arm.com> <564F5ABF.2020302@foss.arm.com> <5654343A.2080609@foss.arm.com> <56543963.3070704@foss.arm.com> <565D7580.6030303@foss.arm.com> <566018C7.4000107@foss.arm.com> Date: Fri, 04 Dec 2015 03:18:00 -0000 Message-ID: Subject: Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address From: "Bin.Cheng" To: Richard Earnshaw Cc: James Greenhalgh , Bin Cheng , gcc-patches List Content-Type: multipart/mixed; boundary=001a11438ada91f143052609f666 X-IsSubscribed: yes X-SW-Source: 2015-12/txt/msg00509.txt.bz2 --001a11438ada91f143052609f666 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-length: 8455 On Thu, Dec 3, 2015 at 6:26 PM, Richard Earnshaw wrote: > On 03/12/15 05:26, Bin.Cheng wrote: >> On Tue, Dec 1, 2015 at 6:25 PM, Richard Earnshaw >> wrote: >>> On 01/12/15 03:19, Bin.Cheng wrote: >>>> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw >>>> wrote: >>>>> On 24/11/15 09:56, Richard Earnshaw wrote: >>>>>> On 24/11/15 02:51, Bin.Cheng wrote: >>>>>>>>> The aarch64's problem is we don't define addptr3 pattern, and we = don't >>>>>>>>>>> have direct insn pattern describing the "x + y << z". Accordin= g to >>>>>>>>>>> gcc internal: >>>>>>>>>>> >>>>>>>>>>> =E2=80=98addptrm3=E2=80=99 >>>>>>>>>>> Like addm3 but is guaranteed to only be used for address calcul= ations. >>>>>>>>>>> The expanded code is not allowed to clobber the condition code.= It >>>>>>>>>>> only needs to be defined if addm3 sets the condition code. >>>>>>>>> >>>>>>>>> addm3 on aarch64 does not set the condition codes, so by this rul= e we >>>>>>>>> shouldn't need to define this pattern. >>>>>>> Hi Richard, >>>>>>> I think that rule has a prerequisite that backend needs to support >>>>>>> register shifted addition in addm3 pattern. >>>>>> >>>>>> addm3 is a named pattern and its format is well defined. It does not >>>>>> take a shifted operand and never has. >>>>>> >>>>>>> Apparently for AArch64, >>>>>>> addm3 only supports "reg+reg" or "reg+imm". Also we don't really >>>>>>> "does not set the condition codes" actually, because both >>>>>>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags. >>>>>> >>>>>> You appear to be confusing named patterns (used by expand) with >>>>>> recognizers. Anyway, we have >>>>>> >>>>>> (define_insn "*add__" >>>>>> [(set (match_operand:GPI 0 "register_operand" "=3Dr") >>>>>> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand= " "r") >>>>>> (match_operand:QI 2 >>>>>> "aarch64_shift_imm_" "n")) >>>>>> (match_operand:GPI 3 "register_operand" "r")))] >>>>>> >>>>>> Which is a non-flag setting add with shifted operand. >>>>>> >>>>>>> Either way I think it is another backend issue, so do you approve t= hat >>>>>>> I commit this patch now? >>>>>> >>>>>> Not yet. I think there's something fundamental amiss here. >>>>>> >>>>>> BTW, it looks to me as though addptr3 should have exactly the same >>>>>> operand rules as add3 (documentation reads "like add3"), so a >>>>>> shifted operand shouldn't be supported there either. If that isn't = the >>>>>> case then that should be clearly called out in the documentation. >>>>>> >>>>>> R. >>>>>> >>>>> >>>>> PS. >>>>> >>>>> I presume you are aware of the canonicalization rules for add? That = is, >>>>> for a shift-and-add operation, the shift operand must appear first. = Ie. >>>>> >>>>> (plus (shift (op, op)), op) >>>>> >>>>> not >>>>> >>>>> (plus (op, (shift (op, op)) >>>> >>>> Hi Richard, >>>> Thanks for the comments. I realized that the not-recognized insn >>>> issue is because the original patch build non-canonical expressions. >>>> When reloading address expression, LRA generates non-canonical >>>> register scaled insn, which can't be recognized by aarch64 backend. >>>> >>>> Here is the updated patch using canonical form pattern, it passes >>>> bootstrap and regression test. Well, the ivo failure still exists, >>>> but it analyzed in the original message. >>>> >>>> Is this patch OK? >>>> >>>> As for Jiong's concern about the additional extension instruction, I >>>> think this only stands for atmoic load store instructions. For >>>> general load store, AArch64 supports zext/sext in register scaling >>>> addressing mode, the additional instruction can be forward propagated >>>> into memory reference. The problem for atomic load store is AArch64 >>>> only supports direct register addressing mode. After LRA reloads >>>> address expression out of memory reference, there is no combine/fwprop >>>> optimizer to merge instructions. The problem is atomic_store's >>>> predicate doesn't match its constraint. The predicate used for >>>> atomic_store is memory_operand, while all other atomic patterns >>>> use aarch64_sync_memory_operand. I think this might be a typo. With >>>> this change, expand will not generate addressing mode requiring reload >>>> anymore. I will test another patch fixing this. >>>> >>>> Thanks, >>>> bin >>> >>> Some comments inline. >>> >>>>> >>>>> R. >>>>> >>>>> aarch64_legitimize_addr-20151128.txt >>>>> >>>>> >>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch6= 4.c >>>>> index 3fe2f0f..5b3e3c4 100644 >>>>> --- a/gcc/config/aarch64/aarch64.c >>>>> +++ b/gcc/config/aarch64/aarch64.c >>>>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* ori= g_x */, machine_mode mode) >>>>> We try to pick as large a range for the offset as possible to >>>>> maximize the chance of a CSE. However, for aligned addresses >>>>> we limit the range to 4k so that structures with different sized >>>>> - elements are likely to use the same base. */ >>>>> + elements are likely to use the same base. We need to be careful >>>>> + not split CONST for some forms address expressions, otherwise it >>> >>> not to split a CONST for some forms of address expression, >>> >>>>> + will generate sub-optimal code. */ >>>>> >>>>> if (GET_CODE (x) =3D=3D PLUS && CONST_INT_P (XEXP (x, 1))) >>>>> { >>>>> HOST_WIDE_INT offset =3D INTVAL (XEXP (x, 1)); >>>>> HOST_WIDE_INT base_offset; >>>>> >>>>> + if (GET_CODE (XEXP (x, 0)) =3D=3D PLUS) >>>>> + { >>>>> + rtx op0 =3D XEXP (XEXP (x, 0), 0); >>>>> + rtx op1 =3D XEXP (XEXP (x, 0), 1); >>>>> + >>>>> + /* For addr expression in the form like "r1 + r2 + 0x3ffc". >>>>> + Since the offset is within range supported by addressing >>>>> + mode "reg+offset", we don't split the const and legalize >>>>> + it into below insn and expr sequence: >>>>> + r3 =3D r1 + r2; >>>>> + "r3 + 0x3ffc". */ >>> >>> I think this comment would read better as >>> >>> /* Address expressions of the form Ra + Rb + CONST. >>> >>> If CONST is within the range supported by the addressing >>> mode "reg+offset", do not split CONST and use the >>> sequence >>> Rt =3D Ra + Rb >>> addr =3D Rt + CONST. */ >>> >>>>> + if (REG_P (op0) && REG_P (op1)) >>>>> + { >>>>> + machine_mode addr_mode =3D GET_MODE (x); >>>>> + rtx base =3D gen_reg_rtx (addr_mode); >>>>> + rtx addr =3D plus_constant (addr_mode, base, offset); >>>>> + >>>>> + if (aarch64_legitimate_address_hook_p (mode, addr, false)) >>>>> + { >>>>> + emit_insn (gen_adddi3 (base, op0, op1)); >>>>> + return addr; >>>>> + } >>>>> + } >>>>> + /* For addr expression in the form like "r1 + r2<<2 + 0x3ffc". >>>>> + Live above, we don't split the const and legalize it into >>>>> + below insn and expr sequence: >>> >>> Similarly. >>>>> + r3 =3D 0x3ffc; >>>>> + r4 =3D r1 + r3; >>>>> + "r4 + r2<<2". */ >>> >>> Why don't we generate >>> >>> r3 =3D r1 + r2 << 2 >>> r4 =3D r3 + 0x3ffc >>> >>> utilizing the shift-and-add instructions? >> >> All other comments are addressed in the attached new patch. >> As for this question, Wilco also asked it on internal channel before. >> The main idea is to depend on GIMPLE IVO/SLSR to find CSE >> opportunities of the scaled plus sub expr. The scaled index is most >> likely loop iv, so I would like to split const plus out of memory >> reference so that it can be identified/hoisted as loop invariant. >> This is more important when base is sfp related. >> > > Ah, yes. The SFP problem. > > Since at least two people have queried this, it's clearly non-obvious > enough to require explanation in comment. > > OK with that change and a suitable changelog entry. Given your review comments, I am going to applying attached patch along with below Changelog entry. Thanks, bin 2015-12-04 Bin Cheng Jiong Wang * config/aarch64/aarch64.c (aarch64_legitimize_address): legitimize address expressions like Ra + Rb + CONST and Ra + Rb<