From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 119952 invoked by alias); 24 Nov 2015 02:52:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 119939 invoked by uid 89); 24 Nov 2015 02:51:59 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-vk0-f52.google.com Received: from mail-vk0-f52.google.com (HELO mail-vk0-f52.google.com) (209.85.213.52) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 24 Nov 2015 02:51:57 +0000 Received: by vkha189 with SMTP id a189so2726071vkh.2 for ; Mon, 23 Nov 2015 18:51:55 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.31.52.82 with SMTP id b79mr23537263vka.84.1448333514915; Mon, 23 Nov 2015 18:51:54 -0800 (PST) Received: by 10.103.45.207 with HTTP; Mon, 23 Nov 2015 18:51:54 -0800 (PST) In-Reply-To: <564F5ABF.2020302@foss.arm.com> References: <000001d12119$49548570$dbfd9050$@arm.com> <20151117100800.GA6727@arm.com> <564F5ABF.2020302@foss.arm.com> Date: Tue, 24 Nov 2015 03:23:00 -0000 Message-ID: Subject: Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address From: "Bin.Cheng" To: Richard Earnshaw Cc: James Greenhalgh , Bin Cheng , gcc-patches List Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2015-11/txt/msg02815.txt.bz2 On Sat, Nov 21, 2015 at 1:39 AM, Richard Earnshaw wrote: > On 20/11/15 08:31, Bin.Cheng wrote: >> On Thu, Nov 19, 2015 at 10:32 AM, Bin.Cheng wrot= e: >>> On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh >>> wrote: >>>> On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote: >>>>> Hi, >>>>> GIMPLE IVO needs to call backend interface to calculate costs for addr >>>>> expressions like below: >>>>> FORM1: "r73 + r74 + 16380" >>>>> FORM2: "r73 << 2 + r74 + 16380" >>>>> >>>>> They are invalid address expression on AArch64, so will be legitimize= d by >>>>> aarch64_legitimize_address. Below are what we got from that function: >>>>> >>>>> For FORM1, the address expression is legitimized into below insn sequ= ence >>>>> and rtx: >>>>> r84:DI=3Dr73:DI+r74:DI >>>>> r85:DI=3Dr84:DI+0x3000 >>>>> r83:DI=3Dr85:DI >>>>> "r83 + 4092" >>>>> >>>>> For FORM2, the address expression is legitimized into below insn sequ= ence >>>>> and rtx: >>>>> r108:DI=3Dr73:DI<<0x2 >>>>> r109:DI=3Dr108:DI+r74:DI >>>>> r110:DI=3Dr109:DI+0x3000 >>>>> r107:DI=3Dr110:DI >>>>> "r107 + 4092" >>>>> >>>>> So the costs computed are 12/16 respectively. The high cost prevents= IVO >>>>> from choosing right candidates. Besides cost computation, I also thi= nk the >>>>> legitmization is bad in terms of code generation. >>>>> The root cause in aarch64_legitimize_address can be described by it's >>>>> comment: >>>>> /* Try to split X+CONST into Y=3DX+(CONST & ~mask), Y+(CONST&mask), >>>>> where mask is selected by alignment and size of the offset. >>>>> We try to pick as large a range for the offset as possible to >>>>> maximize the chance of a CSE. However, for aligned addresses >>>>> we limit the range to 4k so that structures with different sized >>>>> elements are likely to use the same base. */ >>>>> I think the split of CONST is intended for REG+CONST where the const = offset >>>>> is not in the range of AArch64's addressing modes. Unfortunately, it >>>>> doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<>>>> when the CONST are in the range of addressing modes. As a result, th= ese two >>>>> cases fallthrough this logic, resulting in sub-optimal results. >>>>> >>>>> It's obvious we can do below legitimization: >>>>> FORM1: >>>>> r83:DI=3Dr73:DI+r74:DI >>>>> "r83 + 16380" >>>>> FORM2: >>>>> r107:DI=3D0x3ffc >>>>> r106:DI=3Dr74:DI+r107:DI >>>>> REG_EQUAL r74:DI+0x3ffc >>>>> "r106 + r73 << 2" >>>>> >>>>> This patch handles these two cases as described. >>>> >>>> Thanks for the description, it made the patch very easy to review. I o= nly >>>> have a style comment. >>>> >>>>> Bootstrap & test on AArch64 along with other patch. Is it OK? >>>>> >>>>> 2015-11-04 Bin Cheng >>>>> Jiong Wang >>>>> >>>>> * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle >>>>> address expressions like REG+REG+CONST and REG+NON_REG+CONST. >>>> >>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch6= 4.c >>>>> index 5c8604f..47875ac 100644 >>>>> --- a/gcc/config/aarch64/aarch64.c >>>>> +++ b/gcc/config/aarch64/aarch64.c >>>>> @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig= _x */, machine_mode mode) >>>>> { >>>>> HOST_WIDE_INT offset =3D INTVAL (XEXP (x, 1)); >>>>> HOST_WIDE_INT base_offset; >>>>> + rtx op0 =3D XEXP (x,0); >>>>> + >>>>> + if (GET_CODE (op0) =3D=3D PLUS) >>>>> + { >>>>> + rtx op0_ =3D XEXP (op0, 0); >>>>> + rtx op1_ =3D XEXP (op0, 1); >>>> >>>> I don't see this trailing _ on a variable name in many places in the s= ource >>>> tree (mostly in the Go frontend), and certainly not in the aarch64 bac= kend. >>>> Can we pick a different name for op0_ and op1_? >>>> >>>>> + >>>>> + /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) w= ill >>>>> + reach here, the 'CONST' may be valid in which case we shou= ld >>>>> + not split. */ >>>>> + if (REG_P (op0_) && REG_P (op1_)) >>>>> + { >>>>> + machine_mode addr_mode =3D GET_MODE (op0); >>>>> + rtx addr =3D gen_reg_rtx (addr_mode); >>>>> + >>>>> + rtx ret =3D plus_constant (addr_mode, addr, offset); >>>>> + if (aarch64_legitimate_address_hook_p (mode, ret, false)) >>>>> + { >>>>> + emit_insn (gen_adddi3 (addr, op0_, op1_)); >>>>> + return ret; >>>>> + } >>>>> + } >>>>> + /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONS= T) >>>>> + will reach here. If (PLUS REG, NON_REG) is valid addr exp= r, >>>>> + we split it into Y=3DREG+CONST, Y+NON_REG. */ >>>>> + else if (REG_P (op0_) || REG_P (op1_)) >>>>> + { >>>>> + machine_mode addr_mode =3D GET_MODE (op0); >>>>> + rtx addr =3D gen_reg_rtx (addr_mode); >>>>> + >>>>> + /* Switch to make sure that register is in op0_. */ >>>>> + if (REG_P (op1_)) >>>>> + std::swap (op0_, op1_); >>>>> + >>>>> + rtx ret =3D gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_); >>>>> + if (aarch64_legitimate_address_hook_p (mode, ret, false)) >>>>> + { >>>>> + addr =3D force_operand (plus_constant (addr_mode, >>>>> + op0_, offset), >>>>> + NULL_RTX); >>>>> + ret =3D gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_); >>>>> + return ret; >>>>> + } >>>> >>>> The logic here is a bit hairy to follow, you construct a PLUS RTX to c= heck >>>> aarch64_legitimate_address_hook_p, then construct a different PLUS RTX >>>> to use as the return value. This can probably be clarified by choosing= a >>>> name other than ret for the temporary address expression you construct. >>>> >>>> It would also be good to take some of your detailed description and wr= ite >>>> that here. Certainly I found the explicit examples in the cover letter >>>> easier to follow than: >>>> >>>>> + /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONS= T) >>>>> + will reach here. If (PLUS REG, NON_REG) is valid addr exp= r, >>>>> + we split it into Y=3DREG+CONST, Y+NON_REG. */ >>>> >>>> Otherwise this patch is OK. >>> Thanks for reviewing, here is the updated patch. >> >> Hmm, I retested the patch on aarch64 and found it caused two >> additional failures. >> >> FAIL: gcc.target/aarch64/ldp_vec_64_1.c scan-assembler ldp\td[0-9]+, d[0= -9] >> This is caused by different ivopt decision because of this patch's >> cost change. As far as IVO can tell, the new decision is better than >> the old one. So is the IVOPTed dump. I can fix this by changing how >> this patch legitimize address "r1 + r2 + offset". In this patch, it's >> legitimized into "r3 =3D r1 + r2; [r3 + offset]"; we could change it >> into "r3 =3D offset; r4 =3D r1 + r3; [r4 + r2]". This new form is better >> because possibly r4 is a loop invariant, but the cost is higher. I >> tend to keep this patch the way it is since I don't know how the >> changed cost affects performance data. We may need to live with this >> failure for a while. >> >> FAIL: gcc.dg/atomic/stdatomic-vm.c -O1 (internal compiler error) >> This I think is a potential bug in aarch64 backend. GCC could >> generate "[r1 + r2 << 3] =3D unspec..." with this patch, for this test, >> LRA needs to make a reload for the address expression by doing >> "r1+r2<<3" outside of memory reference. In function emit_add3_insn, >> it firstly checks have_addptr3_insn/gen_addptr3_insn, then the add3 >> pattern. The code is as below: >> >> if (have_addptr3_insn (x, y, z)) >> { >> rtx_insn *insn =3D gen_addptr3_insn (x, y, z); >> >> /* If the target provides an "addptr" pattern it hopefully does >> for a reason. So falling back to the normal add would be >> a bug. */ >> lra_assert (insn !=3D NULL_RTX); >> emit_insn (insn); >> return insn; >> } >> >> rtx_insn* insn =3D emit_insn (gen_rtx_SET (x, gen_rtx_PLUS (GET_MODE >> (y), y, z))); >> if (recog_memoized (insn) < 0) >> { >> delete_insns_since (last); >> insn =3D NULL; >> } >> return insn; >> >> The aarch64's problem is we don't define addptr3 pattern, and we don't >> have direct insn pattern describing the "x + y << z". According to >> gcc internal: >> >> =E2=80=98addptrm3=E2=80=99 >> Like addm3 but is guaranteed to only be used for address calculations. >> The expanded code is not allowed to clobber the condition code. It >> only needs to be defined if addm3 sets the condition code. > > addm3 on aarch64 does not set the condition codes, so by this rule we > shouldn't need to define this pattern. Hi Richard, I think that rule has a prerequisite that backend needs to support register shifted addition in addm3 pattern. Apparently for AArch64, addm3 only supports "reg+reg" or "reg+imm". Also we don't really "does not set the condition codes" actually, because both "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags. Either way I think it is another backend issue, so do you approve that I commit this patch now? Thanks, bin