From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23813 invoked by alias); 3 Dec 2015 05:26:28 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 23796 invoked by uid 89); 3 Dec 2015 05:26:25 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-vk0-f49.google.com Received: from mail-vk0-f49.google.com (HELO mail-vk0-f49.google.com) (209.85.213.49) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Thu, 03 Dec 2015 05:26:23 +0000 Received: by vkbs1 with SMTP id s1so38939153vkb.1 for ; Wed, 02 Dec 2015 21:26:21 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.31.134.3 with SMTP id i3mr4893455vkd.14.1449120380859; Wed, 02 Dec 2015 21:26:20 -0800 (PST) Received: by 10.103.45.207 with HTTP; Wed, 2 Dec 2015 21:26:20 -0800 (PST) In-Reply-To: <565D7580.6030303@foss.arm.com> References: <000001d12119$49548570$dbfd9050$@arm.com> <20151117100800.GA6727@arm.com> <564F5ABF.2020302@foss.arm.com> <5654343A.2080609@foss.arm.com> <56543963.3070704@foss.arm.com> <565D7580.6030303@foss.arm.com> Date: Thu, 03 Dec 2015 05:26:00 -0000 Message-ID: Subject: Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address From: "Bin.Cheng" To: Richard Earnshaw Cc: James Greenhalgh , Bin Cheng , gcc-patches List Content-Type: multipart/mixed; boundary=001a11441afc8a48280525f7a106 X-IsSubscribed: yes X-SW-Source: 2015-12/txt/msg00390.txt.bz2 --001a11441afc8a48280525f7a106 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-length: 7405 On Tue, Dec 1, 2015 at 6:25 PM, Richard Earnshaw wrote: > On 01/12/15 03:19, Bin.Cheng wrote: >> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw >> wrote: >>> On 24/11/15 09:56, Richard Earnshaw wrote: >>>> On 24/11/15 02:51, Bin.Cheng wrote: >>>>>>> The aarch64's problem is we don't define addptr3 pattern, and we do= n't >>>>>>>>> have direct insn pattern describing the "x + y << z". According = to >>>>>>>>> gcc internal: >>>>>>>>> >>>>>>>>> =E2=80=98addptrm3=E2=80=99 >>>>>>>>> Like addm3 but is guaranteed to only be used for address calculat= ions. >>>>>>>>> The expanded code is not allowed to clobber the condition code. It >>>>>>>>> only needs to be defined if addm3 sets the condition code. >>>>>>> >>>>>>> addm3 on aarch64 does not set the condition codes, so by this rule = we >>>>>>> shouldn't need to define this pattern. >>>>> Hi Richard, >>>>> I think that rule has a prerequisite that backend needs to support >>>>> register shifted addition in addm3 pattern. >>>> >>>> addm3 is a named pattern and its format is well defined. It does not >>>> take a shifted operand and never has. >>>> >>>>> Apparently for AArch64, >>>>> addm3 only supports "reg+reg" or "reg+imm". Also we don't really >>>>> "does not set the condition codes" actually, because both >>>>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags. >>>> >>>> You appear to be confusing named patterns (used by expand) with >>>> recognizers. Anyway, we have >>>> >>>> (define_insn "*add__" >>>> [(set (match_operand:GPI 0 "register_operand" "=3Dr") >>>> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" = "r") >>>> (match_operand:QI 2 >>>> "aarch64_shift_imm_" "n")) >>>> (match_operand:GPI 3 "register_operand" "r")))] >>>> >>>> Which is a non-flag setting add with shifted operand. >>>> >>>>> Either way I think it is another backend issue, so do you approve that >>>>> I commit this patch now? >>>> >>>> Not yet. I think there's something fundamental amiss here. >>>> >>>> BTW, it looks to me as though addptr3 should have exactly the same >>>> operand rules as add3 (documentation reads "like add3"), so a >>>> shifted operand shouldn't be supported there either. If that isn't the >>>> case then that should be clearly called out in the documentation. >>>> >>>> R. >>>> >>> >>> PS. >>> >>> I presume you are aware of the canonicalization rules for add? That is, >>> for a shift-and-add operation, the shift operand must appear first. Ie. >>> >>> (plus (shift (op, op)), op) >>> >>> not >>> >>> (plus (op, (shift (op, op)) >> >> Hi Richard, >> Thanks for the comments. I realized that the not-recognized insn >> issue is because the original patch build non-canonical expressions. >> When reloading address expression, LRA generates non-canonical >> register scaled insn, which can't be recognized by aarch64 backend. >> >> Here is the updated patch using canonical form pattern, it passes >> bootstrap and regression test. Well, the ivo failure still exists, >> but it analyzed in the original message. >> >> Is this patch OK? >> >> As for Jiong's concern about the additional extension instruction, I >> think this only stands for atmoic load store instructions. For >> general load store, AArch64 supports zext/sext in register scaling >> addressing mode, the additional instruction can be forward propagated >> into memory reference. The problem for atomic load store is AArch64 >> only supports direct register addressing mode. After LRA reloads >> address expression out of memory reference, there is no combine/fwprop >> optimizer to merge instructions. The problem is atomic_store's >> predicate doesn't match its constraint. The predicate used for >> atomic_store is memory_operand, while all other atomic patterns >> use aarch64_sync_memory_operand. I think this might be a typo. With >> this change, expand will not generate addressing mode requiring reload >> anymore. I will test another patch fixing this. >> >> Thanks, >> bin > > Some comments inline. > >>> >>> R. >>> >>> aarch64_legitimize_addr-20151128.txt >>> >>> >>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>> index 3fe2f0f..5b3e3c4 100644 >>> --- a/gcc/config/aarch64/aarch64.c >>> +++ b/gcc/config/aarch64/aarch64.c >>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_= x */, machine_mode mode) >>> We try to pick as large a range for the offset as possible to >>> maximize the chance of a CSE. However, for aligned addresses >>> we limit the range to 4k so that structures with different sized >>> - elements are likely to use the same base. */ >>> + elements are likely to use the same base. We need to be careful >>> + not split CONST for some forms address expressions, otherwise it > > not to split a CONST for some forms of address expression, > >>> + will generate sub-optimal code. */ >>> >>> if (GET_CODE (x) =3D=3D PLUS && CONST_INT_P (XEXP (x, 1))) >>> { >>> HOST_WIDE_INT offset =3D INTVAL (XEXP (x, 1)); >>> HOST_WIDE_INT base_offset; >>> >>> + if (GET_CODE (XEXP (x, 0)) =3D=3D PLUS) >>> + { >>> + rtx op0 =3D XEXP (XEXP (x, 0), 0); >>> + rtx op1 =3D XEXP (XEXP (x, 0), 1); >>> + >>> + /* For addr expression in the form like "r1 + r2 + 0x3ffc". >>> + Since the offset is within range supported by addressing >>> + mode "reg+offset", we don't split the const and legalize >>> + it into below insn and expr sequence: >>> + r3 =3D r1 + r2; >>> + "r3 + 0x3ffc". */ > > I think this comment would read better as > > /* Address expressions of the form Ra + Rb + CONST. > > If CONST is within the range supported by the addressing > mode "reg+offset", do not split CONST and use the > sequence > Rt =3D Ra + Rb > addr =3D Rt + CONST. */ > >>> + if (REG_P (op0) && REG_P (op1)) >>> + { >>> + machine_mode addr_mode =3D GET_MODE (x); >>> + rtx base =3D gen_reg_rtx (addr_mode); >>> + rtx addr =3D plus_constant (addr_mode, base, offset); >>> + >>> + if (aarch64_legitimate_address_hook_p (mode, addr, false)) >>> + { >>> + emit_insn (gen_adddi3 (base, op0, op1)); >>> + return addr; >>> + } >>> + } >>> + /* For addr expression in the form like "r1 + r2<<2 + 0x3ffc". >>> + Live above, we don't split the const and legalize it into >>> + below insn and expr sequence: > > Similarly. >>> + r3 =3D 0x3ffc; >>> + r4 =3D r1 + r3; >>> + "r4 + r2<<2". */ > > Why don't we generate > > r3 =3D r1 + r2 << 2 > r4 =3D r3 + 0x3ffc > > utilizing the shift-and-add instructions? All other comments are addressed in the attached new patch. As for this question, Wilco also asked it on internal channel before. The main idea is to depend on GIMPLE IVO/SLSR to find CSE opportunities of the scaled plus sub expr. The scaled index is most likely loop iv, so I would like to split const plus out of memory reference so that it can be identified/hoisted as loop invariant. This is more important when base is sfp related. Thanks, bin --001a11441afc8a48280525f7a106 Content-Type: text/plain; charset=US-ASCII; name="aarch64_legitimize_addr-20151203.txt" Content-Disposition: attachment; filename="aarch64_legitimize_addr-20151203.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_ihpss3xx0 Content-length: 3425 ZGlmZiAtLWdpdCBhL2djYy9jb25maWcvYWFyY2g2NC9hYXJjaDY0LmMgYi9n Y2MvY29uZmlnL2FhcmNoNjQvYWFyY2g2NC5jCmluZGV4IDNmZTJmMGYuLjI0 ODkzN2UgMTAwNjQ0Ci0tLSBhL2djYy9jb25maWcvYWFyY2g2NC9hYXJjaDY0 LmMKKysrIGIvZ2NjL2NvbmZpZy9hYXJjaDY0L2FhcmNoNjQuYwpAQCAtNDc1 NywxMyArNDc1Nyw2NyBAQCBhYXJjaDY0X2xlZ2l0aW1pemVfYWRkcmVzcyAo cnR4IHgsIHJ0eCAvKiBvcmlnX3ggICovLCBtYWNoaW5lX21vZGUgbW9kZSkK ICAgICAgV2UgdHJ5IHRvIHBpY2sgYXMgbGFyZ2UgYSByYW5nZSBmb3IgdGhl IG9mZnNldCBhcyBwb3NzaWJsZSB0bwogICAgICBtYXhpbWl6ZSB0aGUgY2hh bmNlIG9mIGEgQ1NFLiAgSG93ZXZlciwgZm9yIGFsaWduZWQgYWRkcmVzc2Vz CiAgICAgIHdlIGxpbWl0IHRoZSByYW5nZSB0byA0ayBzbyB0aGF0IHN0cnVj dHVyZXMgd2l0aCBkaWZmZXJlbnQgc2l6ZWQKLSAgICAgZWxlbWVudHMgYXJl IGxpa2VseSB0byB1c2UgdGhlIHNhbWUgYmFzZS4gICovCisgICAgIGVsZW1l bnRzIGFyZSBsaWtlbHkgdG8gdXNlIHRoZSBzYW1lIGJhc2UuICBXZSBuZWVk IHRvIGJlIGNhcmVmdWwKKyAgICAgbm90IHRvIHNwbGl0IGEgQ09OU1QgZm9y IHNvbWUgZm9ybXMgb2YgYWRkcmVzcyBleHByZXNzaW9uLCBvdGhlcndpc2UK KyAgICAgaXQgd2lsbCBnZW5lcmF0ZSBzdWItb3B0aW1hbCBjb2RlLiAgKi8K IAogICBpZiAoR0VUX0NPREUgKHgpID09IFBMVVMgJiYgQ09OU1RfSU5UX1Ag KFhFWFAgKHgsIDEpKSkKICAgICB7CiAgICAgICBIT1NUX1dJREVfSU5UIG9m ZnNldCA9IElOVFZBTCAoWEVYUCAoeCwgMSkpOwogICAgICAgSE9TVF9XSURF X0lOVCBiYXNlX29mZnNldDsKIAorICAgICAgaWYgKEdFVF9DT0RFIChYRVhQ ICh4LCAwKSkgPT0gUExVUykKKwl7CisJICBydHggb3AwID0gWEVYUCAoWEVY UCAoeCwgMCksIDApOworCSAgcnR4IG9wMSA9IFhFWFAgKFhFWFAgKHgsIDAp LCAxKTsKKworCSAgLyogQWRkcmVzcyBleHByZXNzaW9ucyBvZiB0aGUgZm9y bSBSYSArIFJiICsgQ09OU1QuCisKKwkgICAgIElmIENPTlNUIGlzIHdpdGhp biB0aGUgcmFuZ2Ugc3VwcG9ydGVkIGJ5IHRoZSBhZGRyZXNzaW5nCisJICAg ICBtb2RlICJyZWcrb2Zmc2V0IiwgZG8gbm90IHNwbGl0IENPTlNUIGFuZCB1 c2UgdGhlCisJICAgICBzZXF1ZW5jZQorCSAgICAgICBSdCA9IFJhICsgUmI7 CisJICAgICAgIGFkZHIgPSBSdCArIENPTlNULiAgKi8KKwkgIGlmIChSRUdf UCAob3AwKSAmJiBSRUdfUCAob3AxKSkKKwkgICAgeworCSAgICAgIG1hY2hp bmVfbW9kZSBhZGRyX21vZGUgPSBHRVRfTU9ERSAoeCk7CisJICAgICAgcnR4 IGJhc2UgPSBnZW5fcmVnX3J0eCAoYWRkcl9tb2RlKTsKKwkgICAgICBydHgg YWRkciA9IHBsdXNfY29uc3RhbnQgKGFkZHJfbW9kZSwgYmFzZSwgb2Zmc2V0 KTsKKworCSAgICAgIGlmIChhYXJjaDY0X2xlZ2l0aW1hdGVfYWRkcmVzc19o b29rX3AgKG1vZGUsIGFkZHIsIGZhbHNlKSkKKwkJeworCQkgIGVtaXRfaW5z biAoZ2VuX2FkZGRpMyAoYmFzZSwgb3AwLCBvcDEpKTsKKwkJICByZXR1cm4g YWRkcjsKKwkJfQorCSAgICB9CisJICAvKiBBZGRyZXNzIGV4cHJlc3Npb25z IG9mIHRoZSBmb3JtIFJhICsgUmI8PFNDQUxFICsgQ09OU1QuCisKKwkgICAg IElmIFJlZyArIFJiPDxTQ0FMRSBpcyBhIHZhbGlkIGFkZHJlc3MgZXhwcmVz c2lvbiwgZG8gbm90CisJICAgICBzcGxpdCBDT05TVCBhbmQgdXNlIHRoZSBz ZXF1ZW5jZQorCSAgICAgICBSYyA9IENPTlNUOworCSAgICAgICBSdCA9IFJh ICsgUmM7CisJICAgICAgIGFkZHIgPSBSdCArIFJiPDxTQ0FMRS4gICovCisJ ICBlbHNlIGlmIChSRUdfUCAob3AwKSB8fCBSRUdfUCAob3AxKSkKKwkgICAg eworCSAgICAgIG1hY2hpbmVfbW9kZSBhZGRyX21vZGUgPSBHRVRfTU9ERSAo eCk7CisJICAgICAgcnR4IGJhc2UgPSBnZW5fcmVnX3J0eCAoYWRkcl9tb2Rl KTsKKworCSAgICAgIC8qIFN3aXRjaCB0byBtYWtlIHN1cmUgdGhhdCByZWdp c3RlciBpcyBpbiBvcDAuICAqLworCSAgICAgIGlmIChSRUdfUCAob3AxKSkK KwkJc3RkOjpzd2FwIChvcDAsIG9wMSk7CisKKwkgICAgICBydHggYWRkciA9 IGdlbl9ydHhfUExVUyAoYWRkcl9tb2RlLCBvcDEsIGJhc2UpOworCisJICAg ICAgaWYgKGFhcmNoNjRfbGVnaXRpbWF0ZV9hZGRyZXNzX2hvb2tfcCAobW9k ZSwgYWRkciwgZmFsc2UpKQorCQl7CisJCSAgYmFzZSA9IGZvcmNlX29wZXJh bmQgKHBsdXNfY29uc3RhbnQgKGFkZHJfbW9kZSwKKwkJCQkJCSAgICAgICBv cDAsIG9mZnNldCksCisJCQkJCU5VTExfUlRYKTsKKwkJICByZXR1cm4gZ2Vu X3J0eF9QTFVTIChhZGRyX21vZGUsIG9wMSwgYmFzZSk7CisJCX0KKwkgICAg fQorCX0KKwogICAgICAgLyogRG9lcyBpdCBsb29rIGxpa2Ugd2UnbGwgbmVl ZCBhIGxvYWQvc3RvcmUtcGFpciBvcGVyYXRpb24/ICAqLwogICAgICAgaWYg KEdFVF9NT0RFX1NJWkUgKG1vZGUpID4gMTYKIAkgIHx8IG1vZGUgPT0gVElt b2RlKQo= --001a11441afc8a48280525f7a106--