From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-415099-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 119952 invoked by alias); 24 Nov 2015 02:52:00 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 119939 invoked by uid 89); 24 Nov 2015 02:51:59 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-vk0-f52.google.com
Received: from mail-vk0-f52.google.com (HELO mail-vk0-f52.google.com) (209.85.213.52) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 24 Nov 2015 02:51:57 +0000
Received: by vkha189 with SMTP id a189so2726071vkh.2        for <gcc-patches@gcc.gnu.org>; Mon, 23 Nov 2015 18:51:55 -0800 (PST)
MIME-Version: 1.0
X-Received: by 10.31.52.82 with SMTP id b79mr23537263vka.84.1448333514915; Mon, 23 Nov 2015 18:51:54 -0800 (PST)
Received: by 10.103.45.207 with HTTP; Mon, 23 Nov 2015 18:51:54 -0800 (PST)
In-Reply-To: <564F5ABF.2020302@foss.arm.com>
References: <000001d12119$49548570$dbfd9050$@arm.com>	<20151117100800.GA6727@arm.com>	<CAHFci2_fK2LFS8cjaePZr66tCgL8YufmrswyYUGUFb00MbTMRQ@mail.gmail.com>	<CAHFci28V+bXsQecQyju2oHcVJ0iM1RKcn6c8jandB-jpk9ufgA@mail.gmail.com>	<564F5ABF.2020302@foss.arm.com>
Date: Tue, 24 Nov 2015 03:23:00 -0000
Message-ID: <CAHFci2-SyJsy4UJK2_Z9p9Xztc=5yurHChsAYBc=b8R3sqT5uw@mail.gmail.com>
Subject: Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
From: "Bin.Cheng" <amker.cheng@gmail.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>
Cc: James Greenhalgh <james.greenhalgh@arm.com>, Bin Cheng <bin.cheng@arm.com>, 	gcc-patches List <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2015-11/txt/msg02815.txt.bz2

On Sat, Nov 21, 2015 at 1:39 AM, Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
> On 20/11/15 08:31, Bin.Cheng wrote:
>> On Thu, Nov 19, 2015 at 10:32 AM, Bin.Cheng <amker.cheng@gmail.com> wrot=
e:
>>> On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh
>>> <james.greenhalgh@arm.com> wrote:
>>>> On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote:
>>>>> Hi,
>>>>> GIMPLE IVO needs to call backend interface to calculate costs for addr
>>>>> expressions like below:
>>>>>    FORM1: "r73 + r74 + 16380"
>>>>>    FORM2: "r73 << 2 + r74 + 16380"
>>>>>
>>>>> They are invalid address expression on AArch64, so will be legitimize=
d by
>>>>> aarch64_legitimize_address.  Below are what we got from that function:
>>>>>
>>>>> For FORM1, the address expression is legitimized into below insn sequ=
ence
>>>>> and rtx:
>>>>>    r84:DI=3Dr73:DI+r74:DI
>>>>>    r85:DI=3Dr84:DI+0x3000
>>>>>    r83:DI=3Dr85:DI
>>>>>    "r83 + 4092"
>>>>>
>>>>> For FORM2, the address expression is legitimized into below insn sequ=
ence
>>>>> and rtx:
>>>>>    r108:DI=3Dr73:DI<<0x2
>>>>>    r109:DI=3Dr108:DI+r74:DI
>>>>>    r110:DI=3Dr109:DI+0x3000
>>>>>    r107:DI=3Dr110:DI
>>>>>    "r107 + 4092"
>>>>>
>>>>> So the costs computed are 12/16 respectively.  The high cost prevents=
 IVO
>>>>> from choosing right candidates.  Besides cost computation, I also thi=
nk the
>>>>> legitmization is bad in terms of code generation.
>>>>> The root cause in aarch64_legitimize_address can be described by it's
>>>>> comment:
>>>>>    /* Try to split X+CONST into Y=3DX+(CONST & ~mask), Y+(CONST&mask),
>>>>>       where mask is selected by alignment and size of the offset.
>>>>>       We try to pick as large a range for the offset as possible to
>>>>>       maximize the chance of a CSE.  However, for aligned addresses
>>>>>       we limit the range to 4k so that structures with different sized
>>>>>       elements are likely to use the same base.  */
>>>>> I think the split of CONST is intended for REG+CONST where the const =
offset
>>>>> is not in the range of AArch64's addressing modes.  Unfortunately, it
>>>>> doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<<SCALE+=
CONST"
>>>>> when the CONST are in the range of addressing modes.  As a result, th=
ese two
>>>>> cases fallthrough this logic, resulting in sub-optimal results.
>>>>>
>>>>> It's obvious we can do below legitimization:
>>>>> FORM1:
>>>>>    r83:DI=3Dr73:DI+r74:DI
>>>>>    "r83 + 16380"
>>>>> FORM2:
>>>>>    r107:DI=3D0x3ffc
>>>>>    r106:DI=3Dr74:DI+r107:DI
>>>>>       REG_EQUAL r74:DI+0x3ffc
>>>>>    "r106 + r73 << 2"
>>>>>
>>>>> This patch handles these two cases as described.
>>>>
>>>> Thanks for the description, it made the patch very easy to review. I o=
nly
>>>> have a style comment.
>>>>
>>>>> Bootstrap & test on AArch64 along with other patch.  Is it OK?
>>>>>
>>>>> 2015-11-04  Bin Cheng  <bin.cheng@arm.com>
>>>>>           Jiong Wang  <jiong.wang@arm.com>
>>>>>
>>>>>       * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle
>>>>>       address expressions like REG+REG+CONST and REG+NON_REG+CONST.
>>>>
>>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch6=
4.c
>>>>> index 5c8604f..47875ac 100644
>>>>> --- a/gcc/config/aarch64/aarch64.c
>>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>>> @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig=
_x  */, machine_mode mode)
>>>>>      {
>>>>>        HOST_WIDE_INT offset =3D INTVAL (XEXP (x, 1));
>>>>>        HOST_WIDE_INT base_offset;
>>>>> +      rtx op0 =3D XEXP (x,0);
>>>>> +
>>>>> +      if (GET_CODE (op0) =3D=3D PLUS)
>>>>> +     {
>>>>> +       rtx op0_ =3D XEXP (op0, 0);
>>>>> +       rtx op1_ =3D XEXP (op0, 1);
>>>>
>>>> I don't see this trailing _ on a variable name in many places in the s=
ource
>>>> tree (mostly in the Go frontend), and certainly not in the aarch64 bac=
kend.
>>>> Can we pick a different name for op0_ and op1_?
>>>>
>>>>> +
>>>>> +       /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) w=
ill
>>>>> +          reach here, the 'CONST' may be valid in which case we shou=
ld
>>>>> +          not split.  */
>>>>> +       if (REG_P (op0_) && REG_P (op1_))
>>>>> +         {
>>>>> +           machine_mode addr_mode =3D GET_MODE (op0);
>>>>> +           rtx addr =3D gen_reg_rtx (addr_mode);
>>>>> +
>>>>> +           rtx ret =3D plus_constant (addr_mode, addr, offset);
>>>>> +           if (aarch64_legitimate_address_hook_p (mode, ret, false))
>>>>> +             {
>>>>> +               emit_insn (gen_adddi3 (addr, op0_, op1_));
>>>>> +               return ret;
>>>>> +             }
>>>>> +         }
>>>>> +       /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONS=
T)
>>>>> +          will reach here.  If (PLUS REG, NON_REG) is valid addr exp=
r,
>>>>> +          we split it into Y=3DREG+CONST, Y+NON_REG.  */
>>>>> +       else if (REG_P (op0_) || REG_P (op1_))
>>>>> +         {
>>>>> +           machine_mode addr_mode =3D GET_MODE (op0);
>>>>> +           rtx addr =3D gen_reg_rtx (addr_mode);
>>>>> +
>>>>> +           /* Switch to make sure that register is in op0_.  */
>>>>> +           if (REG_P (op1_))
>>>>> +             std::swap (op0_, op1_);
>>>>> +
>>>>> +           rtx ret =3D gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
>>>>> +           if (aarch64_legitimate_address_hook_p (mode, ret, false))
>>>>> +             {
>>>>> +               addr =3D force_operand (plus_constant (addr_mode,
>>>>> +                                                    op0_, offset),
>>>>> +                                     NULL_RTX);
>>>>> +               ret =3D gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
>>>>> +               return ret;
>>>>> +             }
>>>>
>>>> The logic here is a bit hairy to follow, you construct a PLUS RTX to c=
heck
>>>> aarch64_legitimate_address_hook_p, then construct a different PLUS RTX
>>>> to use as the return value. This can probably be clarified by choosing=
 a
>>>> name other than ret for the temporary address expression you construct.
>>>>
>>>> It would also be good to take some of your detailed description and wr=
ite
>>>> that here. Certainly I found the explicit examples in the cover letter
>>>> easier to follow than:
>>>>
>>>>> +       /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONS=
T)
>>>>> +          will reach here.  If (PLUS REG, NON_REG) is valid addr exp=
r,
>>>>> +          we split it into Y=3DREG+CONST, Y+NON_REG.  */
>>>>
>>>> Otherwise this patch is OK.
>>> Thanks for reviewing, here is the updated patch.
>>
>> Hmm, I retested the patch on aarch64 and found it caused two
>> additional failures.
>>
>> FAIL: gcc.target/aarch64/ldp_vec_64_1.c scan-assembler ldp\td[0-9]+, d[0=
-9]
>> This is caused by different ivopt decision because of this patch's
>> cost change.  As far as IVO can tell, the new decision is better than
>> the old one.  So is the IVOPTed dump.  I can fix this by changing how
>> this patch legitimize address "r1 + r2 + offset".  In this patch, it's
>> legitimized into "r3 =3D r1 + r2; [r3 + offset]"; we could change it
>> into "r3 =3D offset; r4 =3D r1 + r3; [r4 + r2]".  This new form is better
>> because possibly r4 is a loop invariant, but the cost is higher.  I
>> tend to keep this patch the way it is since I don't know how the
>> changed cost affects performance data.  We may need to live with this
>> failure for a while.
>>
>> FAIL: gcc.dg/atomic/stdatomic-vm.c   -O1  (internal compiler error)
>> This I think is a potential bug in aarch64 backend.  GCC could
>> generate "[r1 + r2 << 3] =3D unspec..." with this patch, for this test,
>> LRA needs to make a reload for the address expression by doing
>> "r1+r2<<3" outside of memory reference.  In function emit_add3_insn,
>> it firstly checks have_addptr3_insn/gen_addptr3_insn, then the add3
>> pattern.  The code is as below:
>>
>>   if (have_addptr3_insn (x, y, z))
>>     {
>>       rtx_insn *insn =3D gen_addptr3_insn (x, y, z);
>>
>>       /* If the target provides an "addptr" pattern it hopefully does
>>      for a reason.  So falling back to the normal add would be
>>      a bug.  */
>>       lra_assert (insn !=3D NULL_RTX);
>>       emit_insn (insn);
>>       return insn;
>>     }
>>
>>   rtx_insn* insn =3D emit_insn (gen_rtx_SET (x, gen_rtx_PLUS (GET_MODE
>> (y), y, z)));
>>   if (recog_memoized (insn) < 0)
>>     {
>>       delete_insns_since (last);
>>       insn =3D NULL;
>>     }
>>   return insn;
>>
>> The aarch64's problem is we don't define addptr3 pattern, and we don't
>> have direct insn pattern describing the "x + y << z".  According to
>> gcc internal:
>>
>> =E2=80=98addptrm3=E2=80=99
>> Like addm3 but is guaranteed to only be used for address calculations.
>> The expanded code is not allowed to clobber the condition code. It
>> only needs to be defined if addm3 sets the condition code.
>
> addm3 on aarch64 does not set the condition codes, so by this rule we
> shouldn't need to define this pattern.

Hi Richard,
I think that rule has a prerequisite that backend needs to support
register shifted addition in addm3 pattern.  Apparently for AArch64,
addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
"does not set the condition codes" actually, because both
"adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
Either way I think it is another backend issue, so do you approve that
I commit this patch now?

Thanks,
bin