From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-416333-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 107766 invoked by alias); 4 Dec 2015 03:18:50 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 107716 invoked by uid 89); 4 Dec 2015 03:18:41 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-vk0-f41.google.com
Received: from mail-vk0-f41.google.com (HELO mail-vk0-f41.google.com) (209.85.213.41) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Fri, 04 Dec 2015 03:18:39 +0000
Received: by vkca188 with SMTP id a188so58606937vkc.0        for <gcc-patches@gcc.gnu.org>; Thu, 03 Dec 2015 19:18:37 -0800 (PST)
MIME-Version: 1.0
X-Received: by 10.31.54.1 with SMTP id d1mr8769398vka.7.1449199116861; Thu, 03 Dec 2015 19:18:36 -0800 (PST)
Received: by 10.103.45.207 with HTTP; Thu, 3 Dec 2015 19:18:36 -0800 (PST)
In-Reply-To: <566018C7.4000107@foss.arm.com>
References: <000001d12119$49548570$dbfd9050$@arm.com>	<20151117100800.GA6727@arm.com>	<CAHFci2_fK2LFS8cjaePZr66tCgL8YufmrswyYUGUFb00MbTMRQ@mail.gmail.com>	<CAHFci28V+bXsQecQyju2oHcVJ0iM1RKcn6c8jandB-jpk9ufgA@mail.gmail.com>	<564F5ABF.2020302@foss.arm.com>	<CAHFci2-SyJsy4UJK2_Z9p9Xztc=5yurHChsAYBc=b8R3sqT5uw@mail.gmail.com>	<5654343A.2080609@foss.arm.com>	<56543963.3070704@foss.arm.com>	<CAHFci28NFKd53RuJyB_uJCS4hA=YAR+dpfoO2HyThuNMnYeBXw@mail.gmail.com>	<565D7580.6030303@foss.arm.com>	<CAHFci283j71omoXrKVgbBGx79d77mfpk3bSYbGqLM+W0A_KDiw@mail.gmail.com>	<566018C7.4000107@foss.arm.com>
Date: Fri, 04 Dec 2015 03:18:00 -0000
Message-ID: <CAHFci28yNaR3R-HYq+BrxvU00zrOEQubBrTJf+3bvu7jfuewgQ@mail.gmail.com>
Subject: Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
From: "Bin.Cheng" <amker.cheng@gmail.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>
Cc: James Greenhalgh <james.greenhalgh@arm.com>, Bin Cheng <bin.cheng@arm.com>, 	gcc-patches List <gcc-patches@gcc.gnu.org>
Content-Type: multipart/mixed; boundary=001a11438ada91f143052609f666
X-IsSubscribed: yes
X-SW-Source: 2015-12/txt/msg00509.txt.bz2


--001a11438ada91f143052609f666
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Content-length: 8455

On Thu, Dec 3, 2015 at 6:26 PM, Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
> On 03/12/15 05:26, Bin.Cheng wrote:
>> On Tue, Dec 1, 2015 at 6:25 PM, Richard Earnshaw
>> <Richard.Earnshaw@foss.arm.com> wrote:
>>> On 01/12/15 03:19, Bin.Cheng wrote:
>>>> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw
>>>> <Richard.Earnshaw@foss.arm.com> wrote:
>>>>> On 24/11/15 09:56, Richard Earnshaw wrote:
>>>>>> On 24/11/15 02:51, Bin.Cheng wrote:
>>>>>>>>> The aarch64's problem is we don't define addptr3 pattern, and we =
don't
>>>>>>>>>>> have direct insn pattern describing the "x + y << z".  Accordin=
g to
>>>>>>>>>>> gcc internal:
>>>>>>>>>>>
>>>>>>>>>>> =E2=80=98addptrm3=E2=80=99
>>>>>>>>>>> Like addm3 but is guaranteed to only be used for address calcul=
ations.
>>>>>>>>>>> The expanded code is not allowed to clobber the condition code.=
 It
>>>>>>>>>>> only needs to be defined if addm3 sets the condition code.
>>>>>>>>>
>>>>>>>>> addm3 on aarch64 does not set the condition codes, so by this rul=
e we
>>>>>>>>> shouldn't need to define this pattern.
>>>>>>> Hi Richard,
>>>>>>> I think that rule has a prerequisite that backend needs to support
>>>>>>> register shifted addition in addm3 pattern.
>>>>>>
>>>>>> addm3 is a named pattern and its format is well defined.  It does not
>>>>>> take a shifted operand and never has.
>>>>>>
>>>>>>> Apparently for AArch64,
>>>>>>> addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
>>>>>>> "does not set the condition codes" actually, because both
>>>>>>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
>>>>>>
>>>>>> You appear to be confusing named patterns (used by expand) with
>>>>>> recognizers.  Anyway, we have
>>>>>>
>>>>>> (define_insn "*add_<shift>_<mode>"
>>>>>>   [(set (match_operand:GPI 0 "register_operand" "=3Dr")
>>>>>>         (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand=
" "r")
>>>>>>                               (match_operand:QI 2
>>>>>> "aarch64_shift_imm_<mode>" "n"))
>>>>>>                   (match_operand:GPI 3 "register_operand" "r")))]
>>>>>>
>>>>>> Which is a non-flag setting add with shifted operand.
>>>>>>
>>>>>>> Either way I think it is another backend issue, so do you approve t=
hat
>>>>>>> I commit this patch now?
>>>>>>
>>>>>> Not yet.  I think there's something fundamental amiss here.
>>>>>>
>>>>>> BTW, it looks to me as though addptr<m>3 should have exactly the same
>>>>>> operand rules as add<m>3 (documentation reads "like add<m>3"), so a
>>>>>> shifted operand shouldn't be supported there either.  If that isn't =
the
>>>>>> case then that should be clearly called out in the documentation.
>>>>>>
>>>>>> R.
>>>>>>
>>>>>
>>>>> PS.
>>>>>
>>>>> I presume you are aware of the canonicalization rules for add?  That =
is,
>>>>> for a shift-and-add operation, the shift operand must appear first.  =
Ie.
>>>>>
>>>>> (plus (shift (op, op)), op)
>>>>>
>>>>> not
>>>>>
>>>>> (plus (op, (shift (op, op))
>>>>
>>>> Hi Richard,
>>>> Thanks for the comments.  I realized that the not-recognized insn
>>>> issue is because the original patch build non-canonical expressions.
>>>> When reloading address expression, LRA generates non-canonical
>>>> register scaled insn, which can't be recognized by aarch64 backend.
>>>>
>>>> Here is the updated patch using canonical form pattern,  it passes
>>>> bootstrap and regression test.  Well, the ivo failure still exists,
>>>> but it analyzed in the original message.
>>>>
>>>> Is this patch OK?
>>>>
>>>> As for Jiong's concern about the additional extension instruction, I
>>>> think this only stands for atmoic load store instructions.  For
>>>> general load store, AArch64 supports zext/sext in register scaling
>>>> addressing mode, the additional instruction can be forward propagated
>>>> into memory reference.  The problem for atomic load store is AArch64
>>>> only supports direct register addressing mode.  After LRA reloads
>>>> address expression out of memory reference, there is no combine/fwprop
>>>> optimizer to merge instructions.  The problem is atomic_store's
>>>> predicate doesn't match its constraint.   The predicate used for
>>>> atomic_store<mode> is memory_operand, while all other atomic patterns
>>>> use aarch64_sync_memory_operand.  I think this might be a typo.  With
>>>> this change, expand will not generate addressing mode requiring reload
>>>> anymore.  I will test another patch fixing this.
>>>>
>>>> Thanks,
>>>> bin
>>>
>>> Some comments inline.
>>>
>>>>>
>>>>> R.
>>>>>
>>>>> aarch64_legitimize_addr-20151128.txt
>>>>>
>>>>>
>>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch6=
4.c
>>>>> index 3fe2f0f..5b3e3c4 100644
>>>>> --- a/gcc/config/aarch64/aarch64.c
>>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* ori=
g_x  */, machine_mode mode)
>>>>>       We try to pick as large a range for the offset as possible to
>>>>>       maximize the chance of a CSE.  However, for aligned addresses
>>>>>       we limit the range to 4k so that structures with different sized
>>>>> -     elements are likely to use the same base.  */
>>>>> +     elements are likely to use the same base.  We need to be careful
>>>>> +     not split CONST for some forms address expressions, otherwise it
>>>
>>> not to split a CONST for some forms of address expression,
>>>
>>>>> +     will generate sub-optimal code.  */
>>>>>
>>>>>    if (GET_CODE (x) =3D=3D PLUS && CONST_INT_P (XEXP (x, 1)))
>>>>>      {
>>>>>        HOST_WIDE_INT offset =3D INTVAL (XEXP (x, 1));
>>>>>        HOST_WIDE_INT base_offset;
>>>>>
>>>>> +      if (GET_CODE (XEXP (x, 0)) =3D=3D PLUS)
>>>>> +    {
>>>>> +      rtx op0 =3D XEXP (XEXP (x, 0), 0);
>>>>> +      rtx op1 =3D XEXP (XEXP (x, 0), 1);
>>>>> +
>>>>> +      /* For addr expression in the form like "r1 + r2 + 0x3ffc".
>>>>> +         Since the offset is within range supported by addressing
>>>>> +         mode "reg+offset", we don't split the const and legalize
>>>>> +         it into below insn and expr sequence:
>>>>> +           r3 =3D r1 + r2;
>>>>> +           "r3 + 0x3ffc".  */
>>>
>>> I think this comment would read better as
>>>
>>>         /* Address expressions of the form Ra + Rb + CONST.
>>>
>>>            If CONST is within the range supported by the addressing
>>>            mode "reg+offset", do not split CONST and use the
>>>            sequence
>>>                 Rt =3D Ra + Rb
>>>                 addr =3D Rt + CONST.  */
>>>
>>>>> +      if (REG_P (op0) && REG_P (op1))
>>>>> +        {
>>>>> +          machine_mode addr_mode =3D GET_MODE (x);
>>>>> +          rtx base =3D gen_reg_rtx (addr_mode);
>>>>> +          rtx addr =3D plus_constant (addr_mode, base, offset);
>>>>> +
>>>>> +          if (aarch64_legitimate_address_hook_p (mode, addr, false))
>>>>> +            {
>>>>> +              emit_insn (gen_adddi3 (base, op0, op1));
>>>>> +              return addr;
>>>>> +            }
>>>>> +        }
>>>>> +      /* For addr expression in the form like "r1 + r2<<2 + 0x3ffc".
>>>>> +         Live above, we don't split the const and legalize it into
>>>>> +         below insn and expr sequence:
>>>
>>> Similarly.
>>>>> +           r3 =3D 0x3ffc;
>>>>> +           r4 =3D r1 + r3;
>>>>> +           "r4 + r2<<2".  */
>>>
>>> Why don't we generate
>>>
>>>   r3 =3D r1 + r2 << 2
>>>   r4 =3D r3 + 0x3ffc
>>>
>>> utilizing the shift-and-add instructions?
>>
>> All other comments are addressed in the attached new patch.
>> As for this question, Wilco also asked it on internal channel before.
>> The main idea is to depend on GIMPLE IVO/SLSR to find CSE
>> opportunities of the scaled plus sub expr.  The scaled index is most
>> likely loop iv, so I would like to split const plus out of memory
>> reference so that it can be identified/hoisted as loop invariant.
>> This is more important when base is sfp related.
>>
>
> Ah, yes.  The SFP problem.
>
> Since at least two people have queried this, it's clearly non-obvious
> enough to require explanation in comment.
>
> OK with that change and a suitable changelog entry.

Given your review comments, I am going to applying attached patch
along with below Changelog entry.

Thanks,
bin


2015-12-04  Bin Cheng  <bin.cheng@arm.com>
        Jiong Wang  <jiong.wang@arm.com>

    * config/aarch64/aarch64.c (aarch64_legitimize_address): legitimize
    address expressions like Ra + Rb + CONST and Ra + Rb<<SCALE + CONST.

--001a11438ada91f143052609f666
Content-Type: text/plain; charset=US-ASCII; name="aarch64_legitimize_addr-20151204.txt"
Content-Disposition: attachment; 
	filename="aarch64_legitimize_addr-20151204.txt"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_ihr3qdcz0
Content-length: 3990

ZGlmZiAtLWdpdCBhL2djYy9jb25maWcvYWFyY2g2NC9hYXJjaDY0LmMgYi9n
Y2MvY29uZmlnL2FhcmNoNjQvYWFyY2g2NC5jCmluZGV4IDNmZTJmMGYuLjdj
ZmRkYTEgMTAwNjQ0Ci0tLSBhL2djYy9jb25maWcvYWFyY2g2NC9hYXJjaDY0
LmMKKysrIGIvZ2NjL2NvbmZpZy9hYXJjaDY0L2FhcmNoNjQuYwpAQCAtNDc1
NywxMyArNDc1Nyw3NSBAQCBhYXJjaDY0X2xlZ2l0aW1pemVfYWRkcmVzcyAo
cnR4IHgsIHJ0eCAvKiBvcmlnX3ggICovLCBtYWNoaW5lX21vZGUgbW9kZSkK
ICAgICAgV2UgdHJ5IHRvIHBpY2sgYXMgbGFyZ2UgYSByYW5nZSBmb3IgdGhl
IG9mZnNldCBhcyBwb3NzaWJsZSB0bwogICAgICBtYXhpbWl6ZSB0aGUgY2hh
bmNlIG9mIGEgQ1NFLiAgSG93ZXZlciwgZm9yIGFsaWduZWQgYWRkcmVzc2Vz
CiAgICAgIHdlIGxpbWl0IHRoZSByYW5nZSB0byA0ayBzbyB0aGF0IHN0cnVj
dHVyZXMgd2l0aCBkaWZmZXJlbnQgc2l6ZWQKLSAgICAgZWxlbWVudHMgYXJl
IGxpa2VseSB0byB1c2UgdGhlIHNhbWUgYmFzZS4gICovCisgICAgIGVsZW1l
bnRzIGFyZSBsaWtlbHkgdG8gdXNlIHRoZSBzYW1lIGJhc2UuICBXZSBuZWVk
IHRvIGJlIGNhcmVmdWwKKyAgICAgbm90IHRvIHNwbGl0IGEgQ09OU1QgZm9y
IHNvbWUgZm9ybXMgb2YgYWRkcmVzcyBleHByZXNzaW9uLCBvdGhlcndpc2UK
KyAgICAgaXQgd2lsbCBnZW5lcmF0ZSBzdWItb3B0aW1hbCBjb2RlLiAgKi8K
IAogICBpZiAoR0VUX0NPREUgKHgpID09IFBMVVMgJiYgQ09OU1RfSU5UX1Ag
KFhFWFAgKHgsIDEpKSkKICAgICB7CiAgICAgICBIT1NUX1dJREVfSU5UIG9m
ZnNldCA9IElOVFZBTCAoWEVYUCAoeCwgMSkpOwogICAgICAgSE9TVF9XSURF
X0lOVCBiYXNlX29mZnNldDsKIAorICAgICAgaWYgKEdFVF9DT0RFIChYRVhQ
ICh4LCAwKSkgPT0gUExVUykKKwl7CisJICBydHggb3AwID0gWEVYUCAoWEVY
UCAoeCwgMCksIDApOworCSAgcnR4IG9wMSA9IFhFWFAgKFhFWFAgKHgsIDAp
LCAxKTsKKworCSAgLyogQWRkcmVzcyBleHByZXNzaW9ucyBvZiB0aGUgZm9y
bSBSYSArIFJiICsgQ09OU1QuCisKKwkgICAgIElmIENPTlNUIGlzIHdpdGhp
biB0aGUgcmFuZ2Ugc3VwcG9ydGVkIGJ5IHRoZSBhZGRyZXNzaW5nCisJICAg
ICBtb2RlICJyZWcrb2Zmc2V0IiwgZG8gbm90IHNwbGl0IENPTlNUIGFuZCB1
c2UgdGhlCisJICAgICBzZXF1ZW5jZQorCSAgICAgICBSdCA9IFJhICsgUmI7
CisJICAgICAgIGFkZHIgPSBSdCArIENPTlNULiAgKi8KKwkgIGlmIChSRUdf
UCAob3AwKSAmJiBSRUdfUCAob3AxKSkKKwkgICAgeworCSAgICAgIG1hY2hp
bmVfbW9kZSBhZGRyX21vZGUgPSBHRVRfTU9ERSAoeCk7CisJICAgICAgcnR4
IGJhc2UgPSBnZW5fcmVnX3J0eCAoYWRkcl9tb2RlKTsKKwkgICAgICBydHgg
YWRkciA9IHBsdXNfY29uc3RhbnQgKGFkZHJfbW9kZSwgYmFzZSwgb2Zmc2V0
KTsKKworCSAgICAgIGlmIChhYXJjaDY0X2xlZ2l0aW1hdGVfYWRkcmVzc19o
b29rX3AgKG1vZGUsIGFkZHIsIGZhbHNlKSkKKwkJeworCQkgIGVtaXRfaW5z
biAoZ2VuX2FkZGRpMyAoYmFzZSwgb3AwLCBvcDEpKTsKKwkJICByZXR1cm4g
YWRkcjsKKwkJfQorCSAgICB9CisJICAvKiBBZGRyZXNzIGV4cHJlc3Npb25z
IG9mIHRoZSBmb3JtIFJhICsgUmI8PFNDQUxFICsgQ09OU1QuCisKKwkgICAg
IElmIFJlZyArIFJiPDxTQ0FMRSBpcyBhIHZhbGlkIGFkZHJlc3MgZXhwcmVz
c2lvbiwgZG8gbm90CisJICAgICBzcGxpdCBDT05TVCBhbmQgdXNlIHRoZSBz
ZXF1ZW5jZQorCSAgICAgICBSYyA9IENPTlNUOworCSAgICAgICBSdCA9IFJh
ICsgUmM7CisJICAgICAgIGFkZHIgPSBSdCArIFJiPDxTQ0FMRS4KKworCSAg
ICAgSGVyZSB3ZSBzcGxpdCBDT05TVCBvdXQgb2YgbWVtb3J5IHJlZmVyZWNl
IGJlY2F1c2U6CisJICAgICAgIGEpIFdlIGRlcGVuZCBvbiBHSU1QTEUgb3B0
aW1pemVycyB0byBwaWNrIHVwIGNvbW1vbiBzdWIKKwkJICBleHByZXNzaW9u
IGludm9sdmluZyB0aGUgc2NhbGluZyBvcGVyYXRpb24uCisJICAgICAgIGIp
IFRoZSBpbmRleCBSYiBpcyBsaWtlbHkgYSBsb29wIGl2LCBpdCdzIGJldHRl
ciB0byBzcGxpdAorCQkgIHRoZSBDT05TVCBzbyB0aGF0IGNvbXB1dGF0aW9u
IG9mIG5ldyBiYXNlIFJ0IGlzIGEgbG9vcAorCQkgIGludmFyaWFudCBhbmQg
Y2FuIGJlIG1vdmVkIG91dCBvZiBsb29wLiAgVGhpcyBpcyBtb3JlCisJCSAg
aW1wb3J0YW50IHdoZW4gdGhlIG9yaWdpbmFsIGJhc2UgUmEgaXMgc2ZwIHJl
bGF0ZWQuICAqLworCSAgZWxzZSBpZiAoUkVHX1AgKG9wMCkgfHwgUkVHX1Ag
KG9wMSkpCisJICAgIHsKKwkgICAgICBtYWNoaW5lX21vZGUgYWRkcl9tb2Rl
ID0gR0VUX01PREUgKHgpOworCSAgICAgIHJ0eCBiYXNlID0gZ2VuX3JlZ19y
dHggKGFkZHJfbW9kZSk7CisKKwkgICAgICAvKiBTd2l0Y2ggdG8gbWFrZSBz
dXJlIHRoYXQgcmVnaXN0ZXIgaXMgaW4gb3AwLiAgKi8KKwkgICAgICBpZiAo
UkVHX1AgKG9wMSkpCisJCXN0ZDo6c3dhcCAob3AwLCBvcDEpOworCisJICAg
ICAgcnR4IGFkZHIgPSBnZW5fcnR4X1BMVVMgKGFkZHJfbW9kZSwgb3AxLCBi
YXNlKTsKKworCSAgICAgIGlmIChhYXJjaDY0X2xlZ2l0aW1hdGVfYWRkcmVz
c19ob29rX3AgKG1vZGUsIGFkZHIsIGZhbHNlKSkKKwkJeworCQkgIGJhc2Ug
PSBmb3JjZV9vcGVyYW5kIChwbHVzX2NvbnN0YW50IChhZGRyX21vZGUsCisJ
CQkJCQkgICAgICAgb3AwLCBvZmZzZXQpLAorCQkJCQlOVUxMX1JUWCk7CisJ
CSAgcmV0dXJuIGdlbl9ydHhfUExVUyAoYWRkcl9tb2RlLCBvcDEsIGJhc2Up
OworCQl9CisJICAgIH0KKwl9CisKICAgICAgIC8qIERvZXMgaXQgbG9vayBs
aWtlIHdlJ2xsIG5lZWQgYSBsb2FkL3N0b3JlLXBhaXIgb3BlcmF0aW9uPyAg
Ki8KICAgICAgIGlmIChHRVRfTU9ERV9TSVpFIChtb2RlKSA+IDE2CiAJICB8
fCBtb2RlID09IFRJbW9kZSkK

--001a11438ada91f143052609f666--