From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-416214-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 23813 invoked by alias); 3 Dec 2015 05:26:28 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 23796 invoked by uid 89); 3 Dec 2015 05:26:25 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-vk0-f49.google.com
Received: from mail-vk0-f49.google.com (HELO mail-vk0-f49.google.com) (209.85.213.49) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Thu, 03 Dec 2015 05:26:23 +0000
Received: by vkbs1 with SMTP id s1so38939153vkb.1        for <gcc-patches@gcc.gnu.org>; Wed, 02 Dec 2015 21:26:21 -0800 (PST)
MIME-Version: 1.0
X-Received: by 10.31.134.3 with SMTP id i3mr4893455vkd.14.1449120380859; Wed, 02 Dec 2015 21:26:20 -0800 (PST)
Received: by 10.103.45.207 with HTTP; Wed, 2 Dec 2015 21:26:20 -0800 (PST)
In-Reply-To: <565D7580.6030303@foss.arm.com>
References: <000001d12119$49548570$dbfd9050$@arm.com>	<20151117100800.GA6727@arm.com>	<CAHFci2_fK2LFS8cjaePZr66tCgL8YufmrswyYUGUFb00MbTMRQ@mail.gmail.com>	<CAHFci28V+bXsQecQyju2oHcVJ0iM1RKcn6c8jandB-jpk9ufgA@mail.gmail.com>	<564F5ABF.2020302@foss.arm.com>	<CAHFci2-SyJsy4UJK2_Z9p9Xztc=5yurHChsAYBc=b8R3sqT5uw@mail.gmail.com>	<5654343A.2080609@foss.arm.com>	<56543963.3070704@foss.arm.com>	<CAHFci28NFKd53RuJyB_uJCS4hA=YAR+dpfoO2HyThuNMnYeBXw@mail.gmail.com>	<565D7580.6030303@foss.arm.com>
Date: Thu, 03 Dec 2015 05:26:00 -0000
Message-ID: <CAHFci283j71omoXrKVgbBGx79d77mfpk3bSYbGqLM+W0A_KDiw@mail.gmail.com>
Subject: Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
From: "Bin.Cheng" <amker.cheng@gmail.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>
Cc: James Greenhalgh <james.greenhalgh@arm.com>, Bin Cheng <bin.cheng@arm.com>, 	gcc-patches List <gcc-patches@gcc.gnu.org>
Content-Type: multipart/mixed; boundary=001a11441afc8a48280525f7a106
X-IsSubscribed: yes
X-SW-Source: 2015-12/txt/msg00390.txt.bz2


--001a11441afc8a48280525f7a106
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Content-length: 7405

On Tue, Dec 1, 2015 at 6:25 PM, Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
> On 01/12/15 03:19, Bin.Cheng wrote:
>> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw
>> <Richard.Earnshaw@foss.arm.com> wrote:
>>> On 24/11/15 09:56, Richard Earnshaw wrote:
>>>> On 24/11/15 02:51, Bin.Cheng wrote:
>>>>>>> The aarch64's problem is we don't define addptr3 pattern, and we do=
n't
>>>>>>>>> have direct insn pattern describing the "x + y << z".  According =
to
>>>>>>>>> gcc internal:
>>>>>>>>>
>>>>>>>>> =E2=80=98addptrm3=E2=80=99
>>>>>>>>> Like addm3 but is guaranteed to only be used for address calculat=
ions.
>>>>>>>>> The expanded code is not allowed to clobber the condition code. It
>>>>>>>>> only needs to be defined if addm3 sets the condition code.
>>>>>>>
>>>>>>> addm3 on aarch64 does not set the condition codes, so by this rule =
we
>>>>>>> shouldn't need to define this pattern.
>>>>> Hi Richard,
>>>>> I think that rule has a prerequisite that backend needs to support
>>>>> register shifted addition in addm3 pattern.
>>>>
>>>> addm3 is a named pattern and its format is well defined.  It does not
>>>> take a shifted operand and never has.
>>>>
>>>>> Apparently for AArch64,
>>>>> addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
>>>>> "does not set the condition codes" actually, because both
>>>>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
>>>>
>>>> You appear to be confusing named patterns (used by expand) with
>>>> recognizers.  Anyway, we have
>>>>
>>>> (define_insn "*add_<shift>_<mode>"
>>>>   [(set (match_operand:GPI 0 "register_operand" "=3Dr")
>>>>         (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" =
"r")
>>>>                               (match_operand:QI 2
>>>> "aarch64_shift_imm_<mode>" "n"))
>>>>                   (match_operand:GPI 3 "register_operand" "r")))]
>>>>
>>>> Which is a non-flag setting add with shifted operand.
>>>>
>>>>> Either way I think it is another backend issue, so do you approve that
>>>>> I commit this patch now?
>>>>
>>>> Not yet.  I think there's something fundamental amiss here.
>>>>
>>>> BTW, it looks to me as though addptr<m>3 should have exactly the same
>>>> operand rules as add<m>3 (documentation reads "like add<m>3"), so a
>>>> shifted operand shouldn't be supported there either.  If that isn't the
>>>> case then that should be clearly called out in the documentation.
>>>>
>>>> R.
>>>>
>>>
>>> PS.
>>>
>>> I presume you are aware of the canonicalization rules for add?  That is,
>>> for a shift-and-add operation, the shift operand must appear first.  Ie.
>>>
>>> (plus (shift (op, op)), op)
>>>
>>> not
>>>
>>> (plus (op, (shift (op, op))
>>
>> Hi Richard,
>> Thanks for the comments.  I realized that the not-recognized insn
>> issue is because the original patch build non-canonical expressions.
>> When reloading address expression, LRA generates non-canonical
>> register scaled insn, which can't be recognized by aarch64 backend.
>>
>> Here is the updated patch using canonical form pattern,  it passes
>> bootstrap and regression test.  Well, the ivo failure still exists,
>> but it analyzed in the original message.
>>
>> Is this patch OK?
>>
>> As for Jiong's concern about the additional extension instruction, I
>> think this only stands for atmoic load store instructions.  For
>> general load store, AArch64 supports zext/sext in register scaling
>> addressing mode, the additional instruction can be forward propagated
>> into memory reference.  The problem for atomic load store is AArch64
>> only supports direct register addressing mode.  After LRA reloads
>> address expression out of memory reference, there is no combine/fwprop
>> optimizer to merge instructions.  The problem is atomic_store's
>> predicate doesn't match its constraint.   The predicate used for
>> atomic_store<mode> is memory_operand, while all other atomic patterns
>> use aarch64_sync_memory_operand.  I think this might be a typo.  With
>> this change, expand will not generate addressing mode requiring reload
>> anymore.  I will test another patch fixing this.
>>
>> Thanks,
>> bin
>
> Some comments inline.
>
>>>
>>> R.
>>>
>>> aarch64_legitimize_addr-20151128.txt
>>>
>>>
>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>> index 3fe2f0f..5b3e3c4 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_=
x  */, machine_mode mode)
>>>       We try to pick as large a range for the offset as possible to
>>>       maximize the chance of a CSE.  However, for aligned addresses
>>>       we limit the range to 4k so that structures with different sized
>>> -     elements are likely to use the same base.  */
>>> +     elements are likely to use the same base.  We need to be careful
>>> +     not split CONST for some forms address expressions, otherwise it
>
> not to split a CONST for some forms of address expression,
>
>>> +     will generate sub-optimal code.  */
>>>
>>>    if (GET_CODE (x) =3D=3D PLUS && CONST_INT_P (XEXP (x, 1)))
>>>      {
>>>        HOST_WIDE_INT offset =3D INTVAL (XEXP (x, 1));
>>>        HOST_WIDE_INT base_offset;
>>>
>>> +      if (GET_CODE (XEXP (x, 0)) =3D=3D PLUS)
>>> +    {
>>> +      rtx op0 =3D XEXP (XEXP (x, 0), 0);
>>> +      rtx op1 =3D XEXP (XEXP (x, 0), 1);
>>> +
>>> +      /* For addr expression in the form like "r1 + r2 + 0x3ffc".
>>> +         Since the offset is within range supported by addressing
>>> +         mode "reg+offset", we don't split the const and legalize
>>> +         it into below insn and expr sequence:
>>> +           r3 =3D r1 + r2;
>>> +           "r3 + 0x3ffc".  */
>
> I think this comment would read better as
>
>         /* Address expressions of the form Ra + Rb + CONST.
>
>            If CONST is within the range supported by the addressing
>            mode "reg+offset", do not split CONST and use the
>            sequence
>                 Rt =3D Ra + Rb
>                 addr =3D Rt + CONST.  */
>
>>> +      if (REG_P (op0) && REG_P (op1))
>>> +        {
>>> +          machine_mode addr_mode =3D GET_MODE (x);
>>> +          rtx base =3D gen_reg_rtx (addr_mode);
>>> +          rtx addr =3D plus_constant (addr_mode, base, offset);
>>> +
>>> +          if (aarch64_legitimate_address_hook_p (mode, addr, false))
>>> +            {
>>> +              emit_insn (gen_adddi3 (base, op0, op1));
>>> +              return addr;
>>> +            }
>>> +        }
>>> +      /* For addr expression in the form like "r1 + r2<<2 + 0x3ffc".
>>> +         Live above, we don't split the const and legalize it into
>>> +         below insn and expr sequence:
>
> Similarly.
>>> +           r3 =3D 0x3ffc;
>>> +           r4 =3D r1 + r3;
>>> +           "r4 + r2<<2".  */
>
> Why don't we generate
>
>   r3 =3D r1 + r2 << 2
>   r4 =3D r3 + 0x3ffc
>
> utilizing the shift-and-add instructions?

All other comments are addressed in the attached new patch.
As for this question, Wilco also asked it on internal channel before.
The main idea is to depend on GIMPLE IVO/SLSR to find CSE
opportunities of the scaled plus sub expr.  The scaled index is most
likely loop iv, so I would like to split const plus out of memory
reference so that it can be identified/hoisted as loop invariant.
This is more important when base is sfp related.

Thanks,
bin

--001a11441afc8a48280525f7a106
Content-Type: text/plain; charset=US-ASCII; name="aarch64_legitimize_addr-20151203.txt"
Content-Disposition: attachment; 
	filename="aarch64_legitimize_addr-20151203.txt"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_ihpss3xx0
Content-length: 3425

ZGlmZiAtLWdpdCBhL2djYy9jb25maWcvYWFyY2g2NC9hYXJjaDY0LmMgYi9n
Y2MvY29uZmlnL2FhcmNoNjQvYWFyY2g2NC5jCmluZGV4IDNmZTJmMGYuLjI0
ODkzN2UgMTAwNjQ0Ci0tLSBhL2djYy9jb25maWcvYWFyY2g2NC9hYXJjaDY0
LmMKKysrIGIvZ2NjL2NvbmZpZy9hYXJjaDY0L2FhcmNoNjQuYwpAQCAtNDc1
NywxMyArNDc1Nyw2NyBAQCBhYXJjaDY0X2xlZ2l0aW1pemVfYWRkcmVzcyAo
cnR4IHgsIHJ0eCAvKiBvcmlnX3ggICovLCBtYWNoaW5lX21vZGUgbW9kZSkK
ICAgICAgV2UgdHJ5IHRvIHBpY2sgYXMgbGFyZ2UgYSByYW5nZSBmb3IgdGhl
IG9mZnNldCBhcyBwb3NzaWJsZSB0bwogICAgICBtYXhpbWl6ZSB0aGUgY2hh
bmNlIG9mIGEgQ1NFLiAgSG93ZXZlciwgZm9yIGFsaWduZWQgYWRkcmVzc2Vz
CiAgICAgIHdlIGxpbWl0IHRoZSByYW5nZSB0byA0ayBzbyB0aGF0IHN0cnVj
dHVyZXMgd2l0aCBkaWZmZXJlbnQgc2l6ZWQKLSAgICAgZWxlbWVudHMgYXJl
IGxpa2VseSB0byB1c2UgdGhlIHNhbWUgYmFzZS4gICovCisgICAgIGVsZW1l
bnRzIGFyZSBsaWtlbHkgdG8gdXNlIHRoZSBzYW1lIGJhc2UuICBXZSBuZWVk
IHRvIGJlIGNhcmVmdWwKKyAgICAgbm90IHRvIHNwbGl0IGEgQ09OU1QgZm9y
IHNvbWUgZm9ybXMgb2YgYWRkcmVzcyBleHByZXNzaW9uLCBvdGhlcndpc2UK
KyAgICAgaXQgd2lsbCBnZW5lcmF0ZSBzdWItb3B0aW1hbCBjb2RlLiAgKi8K
IAogICBpZiAoR0VUX0NPREUgKHgpID09IFBMVVMgJiYgQ09OU1RfSU5UX1Ag
KFhFWFAgKHgsIDEpKSkKICAgICB7CiAgICAgICBIT1NUX1dJREVfSU5UIG9m
ZnNldCA9IElOVFZBTCAoWEVYUCAoeCwgMSkpOwogICAgICAgSE9TVF9XSURF
X0lOVCBiYXNlX29mZnNldDsKIAorICAgICAgaWYgKEdFVF9DT0RFIChYRVhQ
ICh4LCAwKSkgPT0gUExVUykKKwl7CisJICBydHggb3AwID0gWEVYUCAoWEVY
UCAoeCwgMCksIDApOworCSAgcnR4IG9wMSA9IFhFWFAgKFhFWFAgKHgsIDAp
LCAxKTsKKworCSAgLyogQWRkcmVzcyBleHByZXNzaW9ucyBvZiB0aGUgZm9y
bSBSYSArIFJiICsgQ09OU1QuCisKKwkgICAgIElmIENPTlNUIGlzIHdpdGhp
biB0aGUgcmFuZ2Ugc3VwcG9ydGVkIGJ5IHRoZSBhZGRyZXNzaW5nCisJICAg
ICBtb2RlICJyZWcrb2Zmc2V0IiwgZG8gbm90IHNwbGl0IENPTlNUIGFuZCB1
c2UgdGhlCisJICAgICBzZXF1ZW5jZQorCSAgICAgICBSdCA9IFJhICsgUmI7
CisJICAgICAgIGFkZHIgPSBSdCArIENPTlNULiAgKi8KKwkgIGlmIChSRUdf
UCAob3AwKSAmJiBSRUdfUCAob3AxKSkKKwkgICAgeworCSAgICAgIG1hY2hp
bmVfbW9kZSBhZGRyX21vZGUgPSBHRVRfTU9ERSAoeCk7CisJICAgICAgcnR4
IGJhc2UgPSBnZW5fcmVnX3J0eCAoYWRkcl9tb2RlKTsKKwkgICAgICBydHgg
YWRkciA9IHBsdXNfY29uc3RhbnQgKGFkZHJfbW9kZSwgYmFzZSwgb2Zmc2V0
KTsKKworCSAgICAgIGlmIChhYXJjaDY0X2xlZ2l0aW1hdGVfYWRkcmVzc19o
b29rX3AgKG1vZGUsIGFkZHIsIGZhbHNlKSkKKwkJeworCQkgIGVtaXRfaW5z
biAoZ2VuX2FkZGRpMyAoYmFzZSwgb3AwLCBvcDEpKTsKKwkJICByZXR1cm4g
YWRkcjsKKwkJfQorCSAgICB9CisJICAvKiBBZGRyZXNzIGV4cHJlc3Npb25z
IG9mIHRoZSBmb3JtIFJhICsgUmI8PFNDQUxFICsgQ09OU1QuCisKKwkgICAg
IElmIFJlZyArIFJiPDxTQ0FMRSBpcyBhIHZhbGlkIGFkZHJlc3MgZXhwcmVz
c2lvbiwgZG8gbm90CisJICAgICBzcGxpdCBDT05TVCBhbmQgdXNlIHRoZSBz
ZXF1ZW5jZQorCSAgICAgICBSYyA9IENPTlNUOworCSAgICAgICBSdCA9IFJh
ICsgUmM7CisJICAgICAgIGFkZHIgPSBSdCArIFJiPDxTQ0FMRS4gICovCisJ
ICBlbHNlIGlmIChSRUdfUCAob3AwKSB8fCBSRUdfUCAob3AxKSkKKwkgICAg
eworCSAgICAgIG1hY2hpbmVfbW9kZSBhZGRyX21vZGUgPSBHRVRfTU9ERSAo
eCk7CisJICAgICAgcnR4IGJhc2UgPSBnZW5fcmVnX3J0eCAoYWRkcl9tb2Rl
KTsKKworCSAgICAgIC8qIFN3aXRjaCB0byBtYWtlIHN1cmUgdGhhdCByZWdp
c3RlciBpcyBpbiBvcDAuICAqLworCSAgICAgIGlmIChSRUdfUCAob3AxKSkK
KwkJc3RkOjpzd2FwIChvcDAsIG9wMSk7CisKKwkgICAgICBydHggYWRkciA9
IGdlbl9ydHhfUExVUyAoYWRkcl9tb2RlLCBvcDEsIGJhc2UpOworCisJICAg
ICAgaWYgKGFhcmNoNjRfbGVnaXRpbWF0ZV9hZGRyZXNzX2hvb2tfcCAobW9k
ZSwgYWRkciwgZmFsc2UpKQorCQl7CisJCSAgYmFzZSA9IGZvcmNlX29wZXJh
bmQgKHBsdXNfY29uc3RhbnQgKGFkZHJfbW9kZSwKKwkJCQkJCSAgICAgICBv
cDAsIG9mZnNldCksCisJCQkJCU5VTExfUlRYKTsKKwkJICByZXR1cm4gZ2Vu
X3J0eF9QTFVTIChhZGRyX21vZGUsIG9wMSwgYmFzZSk7CisJCX0KKwkgICAg
fQorCX0KKwogICAgICAgLyogRG9lcyBpdCBsb29rIGxpa2Ugd2UnbGwgbmVl
ZCBhIGxvYWQvc3RvcmUtcGFpciBvcGVyYXRpb24/ICAqLwogICAgICAgaWYg
KEdFVF9NT0RFX1NJWkUgKG1vZGUpID4gMTYKIAkgIHx8IG1vZGUgPT0gVElt
b2RlKQo=

--001a11441afc8a48280525f7a106--