From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 54063 invoked by alias); 20 Nov 2015 08:31:29 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 54045 invoked by uid 89); 20 Nov 2015 08:31:28 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-yk0-f175.google.com Received: from mail-yk0-f175.google.com (HELO mail-yk0-f175.google.com) (209.85.160.175) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Fri, 20 Nov 2015 08:31:26 +0000 Received: by ykfs79 with SMTP id s79so151484294ykf.1 for ; Fri, 20 Nov 2015 00:31:24 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.129.81.147 with SMTP id f141mr11524587ywb.176.1448008284803; Fri, 20 Nov 2015 00:31:24 -0800 (PST) Received: by 10.103.45.207 with HTTP; Fri, 20 Nov 2015 00:31:24 -0800 (PST) In-Reply-To: References: <000001d12119$49548570$dbfd9050$@arm.com> <20151117100800.GA6727@arm.com> Date: Fri, 20 Nov 2015 08:31:00 -0000 Message-ID: Subject: Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address From: "Bin.Cheng" To: James Greenhalgh Cc: Bin Cheng , gcc-patches List Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2015-11/txt/msg02446.txt.bz2 On Thu, Nov 19, 2015 at 10:32 AM, Bin.Cheng wrote: > On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh > wrote: >> On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote: >>> Hi, >>> GIMPLE IVO needs to call backend interface to calculate costs for addr >>> expressions like below: >>> FORM1: "r73 + r74 + 16380" >>> FORM2: "r73 << 2 + r74 + 16380" >>> >>> They are invalid address expression on AArch64, so will be legitimized = by >>> aarch64_legitimize_address. Below are what we got from that function: >>> >>> For FORM1, the address expression is legitimized into below insn sequen= ce >>> and rtx: >>> r84:DI=3Dr73:DI+r74:DI >>> r85:DI=3Dr84:DI+0x3000 >>> r83:DI=3Dr85:DI >>> "r83 + 4092" >>> >>> For FORM2, the address expression is legitimized into below insn sequen= ce >>> and rtx: >>> r108:DI=3Dr73:DI<<0x2 >>> r109:DI=3Dr108:DI+r74:DI >>> r110:DI=3Dr109:DI+0x3000 >>> r107:DI=3Dr110:DI >>> "r107 + 4092" >>> >>> So the costs computed are 12/16 respectively. The high cost prevents I= VO >>> from choosing right candidates. Besides cost computation, I also think= the >>> legitmization is bad in terms of code generation. >>> The root cause in aarch64_legitimize_address can be described by it's >>> comment: >>> /* Try to split X+CONST into Y=3DX+(CONST & ~mask), Y+(CONST&mask), >>> where mask is selected by alignment and size of the offset. >>> We try to pick as large a range for the offset as possible to >>> maximize the chance of a CSE. However, for aligned addresses >>> we limit the range to 4k so that structures with different sized >>> elements are likely to use the same base. */ >>> I think the split of CONST is intended for REG+CONST where the const of= fset >>> is not in the range of AArch64's addressing modes. Unfortunately, it >>> doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<>> when the CONST are in the range of addressing modes. As a result, thes= e two >>> cases fallthrough this logic, resulting in sub-optimal results. >>> >>> It's obvious we can do below legitimization: >>> FORM1: >>> r83:DI=3Dr73:DI+r74:DI >>> "r83 + 16380" >>> FORM2: >>> r107:DI=3D0x3ffc >>> r106:DI=3Dr74:DI+r107:DI >>> REG_EQUAL r74:DI+0x3ffc >>> "r106 + r73 << 2" >>> >>> This patch handles these two cases as described. >> >> Thanks for the description, it made the patch very easy to review. I only >> have a style comment. >> >>> Bootstrap & test on AArch64 along with other patch. Is it OK? >>> >>> 2015-11-04 Bin Cheng >>> Jiong Wang >>> >>> * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle >>> address expressions like REG+REG+CONST and REG+NON_REG+CONST. >> >>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>> index 5c8604f..47875ac 100644 >>> --- a/gcc/config/aarch64/aarch64.c >>> +++ b/gcc/config/aarch64/aarch64.c >>> @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x= */, machine_mode mode) >>> { >>> HOST_WIDE_INT offset =3D INTVAL (XEXP (x, 1)); >>> HOST_WIDE_INT base_offset; >>> + rtx op0 =3D XEXP (x,0); >>> + >>> + if (GET_CODE (op0) =3D=3D PLUS) >>> + { >>> + rtx op0_ =3D XEXP (op0, 0); >>> + rtx op1_ =3D XEXP (op0, 1); >> >> I don't see this trailing _ on a variable name in many places in the sou= rce >> tree (mostly in the Go frontend), and certainly not in the aarch64 backe= nd. >> Can we pick a different name for op0_ and op1_? >> >>> + >>> + /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will >>> + reach here, the 'CONST' may be valid in which case we should >>> + not split. */ >>> + if (REG_P (op0_) && REG_P (op1_)) >>> + { >>> + machine_mode addr_mode =3D GET_MODE (op0); >>> + rtx addr =3D gen_reg_rtx (addr_mode); >>> + >>> + rtx ret =3D plus_constant (addr_mode, addr, offset); >>> + if (aarch64_legitimate_address_hook_p (mode, ret, false)) >>> + { >>> + emit_insn (gen_adddi3 (addr, op0_, op1_)); >>> + return ret; >>> + } >>> + } >>> + /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST) >>> + will reach here. If (PLUS REG, NON_REG) is valid addr expr, >>> + we split it into Y=3DREG+CONST, Y+NON_REG. */ >>> + else if (REG_P (op0_) || REG_P (op1_)) >>> + { >>> + machine_mode addr_mode =3D GET_MODE (op0); >>> + rtx addr =3D gen_reg_rtx (addr_mode); >>> + >>> + /* Switch to make sure that register is in op0_. */ >>> + if (REG_P (op1_)) >>> + std::swap (op0_, op1_); >>> + >>> + rtx ret =3D gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_); >>> + if (aarch64_legitimate_address_hook_p (mode, ret, false)) >>> + { >>> + addr =3D force_operand (plus_constant (addr_mode, >>> + op0_, offset), >>> + NULL_RTX); >>> + ret =3D gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_); >>> + return ret; >>> + } >> >> The logic here is a bit hairy to follow, you construct a PLUS RTX to che= ck >> aarch64_legitimate_address_hook_p, then construct a different PLUS RTX >> to use as the return value. This can probably be clarified by choosing a >> name other than ret for the temporary address expression you construct. >> >> It would also be good to take some of your detailed description and write >> that here. Certainly I found the explicit examples in the cover letter >> easier to follow than: >> >>> + /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST) >>> + will reach here. If (PLUS REG, NON_REG) is valid addr expr, >>> + we split it into Y=3DREG+CONST, Y+NON_REG. */ >> >> Otherwise this patch is OK. > Thanks for reviewing, here is the updated patch. Hmm, I retested the patch on aarch64 and found it caused two additional failures. FAIL: gcc.target/aarch64/ldp_vec_64_1.c scan-assembler ldp\td[0-9]+, d[0-9] This is caused by different ivopt decision because of this patch's cost change. As far as IVO can tell, the new decision is better than the old one. So is the IVOPTed dump. I can fix this by changing how this patch legitimize address "r1 + r2 + offset". In this patch, it's legitimized into "r3 =3D r1 + r2; [r3 + offset]"; we could change it into "r3 =3D offset; r4 =3D r1 + r3; [r4 + r2]". This new form is better because possibly r4 is a loop invariant, but the cost is higher. I tend to keep this patch the way it is since I don't know how the changed cost affects performance data. We may need to live with this failure for a while. FAIL: gcc.dg/atomic/stdatomic-vm.c -O1 (internal compiler error) This I think is a potential bug in aarch64 backend. GCC could generate "[r1 + r2 << 3] =3D unspec..." with this patch, for this test, LRA needs to make a reload for the address expression by doing "r1+r2<<3" outside of memory reference. In function emit_add3_insn, it firstly checks have_addptr3_insn/gen_addptr3_insn, then the add3 pattern. The code is as below: if (have_addptr3_insn (x, y, z)) { rtx_insn *insn =3D gen_addptr3_insn (x, y, z); /* If the target provides an "addptr" pattern it hopefully does for a reason. So falling back to the normal add would be a bug. */ lra_assert (insn !=3D NULL_RTX); emit_insn (insn); return insn; } rtx_insn* insn =3D emit_insn (gen_rtx_SET (x, gen_rtx_PLUS (GET_MODE (y), y, z))); if (recog_memoized (insn) < 0) { delete_insns_since (last); insn =3D NULL; } return insn; The aarch64's problem is we don't define addptr3 pattern, and we don't have direct insn pattern describing the "x + y << z". According to gcc internal: =E2=80=98addptrm3=E2=80=99 Like addm3 but is guaranteed to only be used for address calculations. The expanded code is not allowed to clobber the condition code. It only needs to be defined if addm3 sets the condition code. If adds used for address calculations and normal adds are not compatible it is required to expand a distinct pattern (e.g. using an unspec). The pattern is used by LRA to emit address calculations. addm3 is used if addptrm3 is not defined. Seems to me aarch64 needs to support "add(shifted register)" in add3 pattern, or needs to support addptr3 pattern. I tend to apply this patch to expose the bug, but I will let aarch64 experts give some comments before that. Thanks, bin