From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-414730-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 54063 invoked by alias); 20 Nov 2015 08:31:29 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 54045 invoked by uid 89); 20 Nov 2015 08:31:28 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-yk0-f175.google.com
Received: from mail-yk0-f175.google.com (HELO mail-yk0-f175.google.com) (209.85.160.175) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Fri, 20 Nov 2015 08:31:26 +0000
Received: by ykfs79 with SMTP id s79so151484294ykf.1        for <gcc-patches@gcc.gnu.org>; Fri, 20 Nov 2015 00:31:24 -0800 (PST)
MIME-Version: 1.0
X-Received: by 10.129.81.147 with SMTP id f141mr11524587ywb.176.1448008284803; Fri, 20 Nov 2015 00:31:24 -0800 (PST)
Received: by 10.103.45.207 with HTTP; Fri, 20 Nov 2015 00:31:24 -0800 (PST)
In-Reply-To: <CAHFci2_fK2LFS8cjaePZr66tCgL8YufmrswyYUGUFb00MbTMRQ@mail.gmail.com>
References: <000001d12119$49548570$dbfd9050$@arm.com>	<20151117100800.GA6727@arm.com>	<CAHFci2_fK2LFS8cjaePZr66tCgL8YufmrswyYUGUFb00MbTMRQ@mail.gmail.com>
Date: Fri, 20 Nov 2015 08:31:00 -0000
Message-ID: <CAHFci28V+bXsQecQyju2oHcVJ0iM1RKcn6c8jandB-jpk9ufgA@mail.gmail.com>
Subject: Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
From: "Bin.Cheng" <amker.cheng@gmail.com>
To: James Greenhalgh <james.greenhalgh@arm.com>
Cc: Bin Cheng <bin.cheng@arm.com>, gcc-patches List <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2015-11/txt/msg02446.txt.bz2

On Thu, Nov 19, 2015 at 10:32 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh
> <james.greenhalgh@arm.com> wrote:
>> On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote:
>>> Hi,
>>> GIMPLE IVO needs to call backend interface to calculate costs for addr
>>> expressions like below:
>>>    FORM1: "r73 + r74 + 16380"
>>>    FORM2: "r73 << 2 + r74 + 16380"
>>>
>>> They are invalid address expression on AArch64, so will be legitimized =
by
>>> aarch64_legitimize_address.  Below are what we got from that function:
>>>
>>> For FORM1, the address expression is legitimized into below insn sequen=
ce
>>> and rtx:
>>>    r84:DI=3Dr73:DI+r74:DI
>>>    r85:DI=3Dr84:DI+0x3000
>>>    r83:DI=3Dr85:DI
>>>    "r83 + 4092"
>>>
>>> For FORM2, the address expression is legitimized into below insn sequen=
ce
>>> and rtx:
>>>    r108:DI=3Dr73:DI<<0x2
>>>    r109:DI=3Dr108:DI+r74:DI
>>>    r110:DI=3Dr109:DI+0x3000
>>>    r107:DI=3Dr110:DI
>>>    "r107 + 4092"
>>>
>>> So the costs computed are 12/16 respectively.  The high cost prevents I=
VO
>>> from choosing right candidates.  Besides cost computation, I also think=
 the
>>> legitmization is bad in terms of code generation.
>>> The root cause in aarch64_legitimize_address can be described by it's
>>> comment:
>>>    /* Try to split X+CONST into Y=3DX+(CONST & ~mask), Y+(CONST&mask),
>>>       where mask is selected by alignment and size of the offset.
>>>       We try to pick as large a range for the offset as possible to
>>>       maximize the chance of a CSE.  However, for aligned addresses
>>>       we limit the range to 4k so that structures with different sized
>>>       elements are likely to use the same base.  */
>>> I think the split of CONST is intended for REG+CONST where the const of=
fset
>>> is not in the range of AArch64's addressing modes.  Unfortunately, it
>>> doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<<SCALE+CO=
NST"
>>> when the CONST are in the range of addressing modes.  As a result, thes=
e two
>>> cases fallthrough this logic, resulting in sub-optimal results.
>>>
>>> It's obvious we can do below legitimization:
>>> FORM1:
>>>    r83:DI=3Dr73:DI+r74:DI
>>>    "r83 + 16380"
>>> FORM2:
>>>    r107:DI=3D0x3ffc
>>>    r106:DI=3Dr74:DI+r107:DI
>>>       REG_EQUAL r74:DI+0x3ffc
>>>    "r106 + r73 << 2"
>>>
>>> This patch handles these two cases as described.
>>
>> Thanks for the description, it made the patch very easy to review. I only
>> have a style comment.
>>
>>> Bootstrap & test on AArch64 along with other patch.  Is it OK?
>>>
>>> 2015-11-04  Bin Cheng  <bin.cheng@arm.com>
>>>           Jiong Wang  <jiong.wang@arm.com>
>>>
>>>       * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle
>>>       address expressions like REG+REG+CONST and REG+NON_REG+CONST.
>>
>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>> index 5c8604f..47875ac 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x=
  */, machine_mode mode)
>>>      {
>>>        HOST_WIDE_INT offset =3D INTVAL (XEXP (x, 1));
>>>        HOST_WIDE_INT base_offset;
>>> +      rtx op0 =3D XEXP (x,0);
>>> +
>>> +      if (GET_CODE (op0) =3D=3D PLUS)
>>> +     {
>>> +       rtx op0_ =3D XEXP (op0, 0);
>>> +       rtx op1_ =3D XEXP (op0, 1);
>>
>> I don't see this trailing _ on a variable name in many places in the sou=
rce
>> tree (mostly in the Go frontend), and certainly not in the aarch64 backe=
nd.
>> Can we pick a different name for op0_ and op1_?
>>
>>> +
>>> +       /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will
>>> +          reach here, the 'CONST' may be valid in which case we should
>>> +          not split.  */
>>> +       if (REG_P (op0_) && REG_P (op1_))
>>> +         {
>>> +           machine_mode addr_mode =3D GET_MODE (op0);
>>> +           rtx addr =3D gen_reg_rtx (addr_mode);
>>> +
>>> +           rtx ret =3D plus_constant (addr_mode, addr, offset);
>>> +           if (aarch64_legitimate_address_hook_p (mode, ret, false))
>>> +             {
>>> +               emit_insn (gen_adddi3 (addr, op0_, op1_));
>>> +               return ret;
>>> +             }
>>> +         }
>>> +       /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
>>> +          will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
>>> +          we split it into Y=3DREG+CONST, Y+NON_REG.  */
>>> +       else if (REG_P (op0_) || REG_P (op1_))
>>> +         {
>>> +           machine_mode addr_mode =3D GET_MODE (op0);
>>> +           rtx addr =3D gen_reg_rtx (addr_mode);
>>> +
>>> +           /* Switch to make sure that register is in op0_.  */
>>> +           if (REG_P (op1_))
>>> +             std::swap (op0_, op1_);
>>> +
>>> +           rtx ret =3D gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
>>> +           if (aarch64_legitimate_address_hook_p (mode, ret, false))
>>> +             {
>>> +               addr =3D force_operand (plus_constant (addr_mode,
>>> +                                                    op0_, offset),
>>> +                                     NULL_RTX);
>>> +               ret =3D gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
>>> +               return ret;
>>> +             }
>>
>> The logic here is a bit hairy to follow, you construct a PLUS RTX to che=
ck
>> aarch64_legitimate_address_hook_p, then construct a different PLUS RTX
>> to use as the return value. This can probably be clarified by choosing a
>> name other than ret for the temporary address expression you construct.
>>
>> It would also be good to take some of your detailed description and write
>> that here. Certainly I found the explicit examples in the cover letter
>> easier to follow than:
>>
>>> +       /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
>>> +          will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
>>> +          we split it into Y=3DREG+CONST, Y+NON_REG.  */
>>
>> Otherwise this patch is OK.
> Thanks for reviewing, here is the updated patch.

Hmm, I retested the patch on aarch64 and found it caused two
additional failures.

FAIL: gcc.target/aarch64/ldp_vec_64_1.c scan-assembler ldp\td[0-9]+, d[0-9]
This is caused by different ivopt decision because of this patch's
cost change.  As far as IVO can tell, the new decision is better than
the old one.  So is the IVOPTed dump.  I can fix this by changing how
this patch legitimize address "r1 + r2 + offset".  In this patch, it's
legitimized into "r3 =3D r1 + r2; [r3 + offset]"; we could change it
into "r3 =3D offset; r4 =3D r1 + r3; [r4 + r2]".  This new form is better
because possibly r4 is a loop invariant, but the cost is higher.  I
tend to keep this patch the way it is since I don't know how the
changed cost affects performance data.  We may need to live with this
failure for a while.

FAIL: gcc.dg/atomic/stdatomic-vm.c   -O1  (internal compiler error)
This I think is a potential bug in aarch64 backend.  GCC could
generate "[r1 + r2 << 3] =3D unspec..." with this patch, for this test,
LRA needs to make a reload for the address expression by doing
"r1+r2<<3" outside of memory reference.  In function emit_add3_insn,
it firstly checks have_addptr3_insn/gen_addptr3_insn, then the add3
pattern.  The code is as below:

  if (have_addptr3_insn (x, y, z))
    {
      rtx_insn *insn =3D gen_addptr3_insn (x, y, z);

      /* If the target provides an "addptr" pattern it hopefully does
     for a reason.  So falling back to the normal add would be
     a bug.  */
      lra_assert (insn !=3D NULL_RTX);
      emit_insn (insn);
      return insn;
    }

  rtx_insn* insn =3D emit_insn (gen_rtx_SET (x, gen_rtx_PLUS (GET_MODE
(y), y, z)));
  if (recog_memoized (insn) < 0)
    {
      delete_insns_since (last);
      insn =3D NULL;
    }
  return insn;

The aarch64's problem is we don't define addptr3 pattern, and we don't
have direct insn pattern describing the "x + y << z".  According to
gcc internal:

=E2=80=98addptrm3=E2=80=99
Like addm3 but is guaranteed to only be used for address calculations.
The expanded code is not allowed to clobber the condition code. It
only needs to be defined if addm3 sets the condition code. If adds
used for address calculations and normal adds are not compatible it is
required to expand a distinct pattern (e.g. using an unspec). The
pattern is used by LRA to emit address calculations. addm3 is used if
addptrm3 is not defined.

Seems to me aarch64 needs to support "add(shifted register)" in add3
pattern, or needs to support addptr3 pattern.  I tend to apply this
patch to expose the bug, but I will let aarch64 experts give some
comments before that.

Thanks,
bin