From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-416234-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 52856 invoked by alias); 3 Dec 2015 10:26:22 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 52838 invoked by uid 89); 3 Dec 2015 10:26:21 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,T_RP_MATCHES_RCVD autolearn=no version=3.3.2
X-HELO: foss.arm.com
Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 03 Dec 2015 10:26:19 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6F20C49;	Thu,  3 Dec 2015 02:25:57 -0800 (PST)
Received: from e105689-lin.cambridge.arm.com (e105689-lin.cambridge.arm.com [10.2.207.32])	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9A0E63F4FF;	Thu,  3 Dec 2015 02:26:16 -0800 (PST)
Subject: Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
To: "Bin.Cheng" <amker.cheng@gmail.com>
References: <000001d12119$49548570$dbfd9050$@arm.com> <20151117100800.GA6727@arm.com> <CAHFci2_fK2LFS8cjaePZr66tCgL8YufmrswyYUGUFb00MbTMRQ@mail.gmail.com> <CAHFci28V+bXsQecQyju2oHcVJ0iM1RKcn6c8jandB-jpk9ufgA@mail.gmail.com> <564F5ABF.2020302@foss.arm.com> <CAHFci2-SyJsy4UJK2_Z9p9Xztc=5yurHChsAYBc=b8R3sqT5uw@mail.gmail.com> <5654343A.2080609@foss.arm.com> <56543963.3070704@foss.arm.com> <CAHFci28NFKd53RuJyB_uJCS4hA=YAR+dpfoO2HyThuNMnYeBXw@mail.gmail.com> <565D7580.6030303@foss.arm.com> <CAHFci283j71omoXrKVgbBGx79d77mfpk3bSYbGqLM+W0A_KDiw@mail.gmail.com>
Cc: James Greenhalgh <james.greenhalgh@arm.com>, Bin Cheng <bin.cheng@arm.com>, gcc-patches List <gcc-patches@gcc.gnu.org>
From: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>
Message-ID: <566018C7.4000107@foss.arm.com>
Date: Thu, 03 Dec 2015 10:26:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <CAHFci283j71omoXrKVgbBGx79d77mfpk3bSYbGqLM+W0A_KDiw@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-IsSubscribed: yes
X-SW-Source: 2015-12/txt/msg00410.txt.bz2

On 03/12/15 05:26, Bin.Cheng wrote:
> On Tue, Dec 1, 2015 at 6:25 PM, Richard Earnshaw
> <Richard.Earnshaw@foss.arm.com> wrote:
>> On 01/12/15 03:19, Bin.Cheng wrote:
>>> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw
>>> <Richard.Earnshaw@foss.arm.com> wrote:
>>>> On 24/11/15 09:56, Richard Earnshaw wrote:
>>>>> On 24/11/15 02:51, Bin.Cheng wrote:
>>>>>>>> The aarch64's problem is we don't define addptr3 pattern, and we don't
>>>>>>>>>> have direct insn pattern describing the "x + y << z".  According to
>>>>>>>>>> gcc internal:
>>>>>>>>>>
>>>>>>>>>> âaddptrm3â
>>>>>>>>>> Like addm3 but is guaranteed to only be used for address calculations.
>>>>>>>>>> The expanded code is not allowed to clobber the condition code. It
>>>>>>>>>> only needs to be defined if addm3 sets the condition code.
>>>>>>>>
>>>>>>>> addm3 on aarch64 does not set the condition codes, so by this rule we
>>>>>>>> shouldn't need to define this pattern.
>>>>>> Hi Richard,
>>>>>> I think that rule has a prerequisite that backend needs to support
>>>>>> register shifted addition in addm3 pattern.
>>>>>
>>>>> addm3 is a named pattern and its format is well defined.  It does not
>>>>> take a shifted operand and never has.
>>>>>
>>>>>> Apparently for AArch64,
>>>>>> addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
>>>>>> "does not set the condition codes" actually, because both
>>>>>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
>>>>>
>>>>> You appear to be confusing named patterns (used by expand) with
>>>>> recognizers.  Anyway, we have
>>>>>
>>>>> (define_insn "*add_<shift>_<mode>"
>>>>>   [(set (match_operand:GPI 0 "register_operand" "=r")
>>>>>         (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>>>>>                               (match_operand:QI 2
>>>>> "aarch64_shift_imm_<mode>" "n"))
>>>>>                   (match_operand:GPI 3 "register_operand" "r")))]
>>>>>
>>>>> Which is a non-flag setting add with shifted operand.
>>>>>
>>>>>> Either way I think it is another backend issue, so do you approve that
>>>>>> I commit this patch now?
>>>>>
>>>>> Not yet.  I think there's something fundamental amiss here.
>>>>>
>>>>> BTW, it looks to me as though addptr<m>3 should have exactly the same
>>>>> operand rules as add<m>3 (documentation reads "like add<m>3"), so a
>>>>> shifted operand shouldn't be supported there either.  If that isn't the
>>>>> case then that should be clearly called out in the documentation.
>>>>>
>>>>> R.
>>>>>
>>>>
>>>> PS.
>>>>
>>>> I presume you are aware of the canonicalization rules for add?  That is,
>>>> for a shift-and-add operation, the shift operand must appear first.  Ie.
>>>>
>>>> (plus (shift (op, op)), op)
>>>>
>>>> not
>>>>
>>>> (plus (op, (shift (op, op))
>>>
>>> Hi Richard,
>>> Thanks for the comments.  I realized that the not-recognized insn
>>> issue is because the original patch build non-canonical expressions.
>>> When reloading address expression, LRA generates non-canonical
>>> register scaled insn, which can't be recognized by aarch64 backend.
>>>
>>> Here is the updated patch using canonical form pattern,  it passes
>>> bootstrap and regression test.  Well, the ivo failure still exists,
>>> but it analyzed in the original message.
>>>
>>> Is this patch OK?
>>>
>>> As for Jiong's concern about the additional extension instruction, I
>>> think this only stands for atmoic load store instructions.  For
>>> general load store, AArch64 supports zext/sext in register scaling
>>> addressing mode, the additional instruction can be forward propagated
>>> into memory reference.  The problem for atomic load store is AArch64
>>> only supports direct register addressing mode.  After LRA reloads
>>> address expression out of memory reference, there is no combine/fwprop
>>> optimizer to merge instructions.  The problem is atomic_store's
>>> predicate doesn't match its constraint.   The predicate used for
>>> atomic_store<mode> is memory_operand, while all other atomic patterns
>>> use aarch64_sync_memory_operand.  I think this might be a typo.  With
>>> this change, expand will not generate addressing mode requiring reload
>>> anymore.  I will test another patch fixing this.
>>>
>>> Thanks,
>>> bin
>>
>> Some comments inline.
>>
>>>>
>>>> R.
>>>>
>>>> aarch64_legitimize_addr-20151128.txt
>>>>
>>>>
>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>>> index 3fe2f0f..5b3e3c4 100644
>>>> --- a/gcc/config/aarch64/aarch64.c
>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
>>>>       We try to pick as large a range for the offset as possible to
>>>>       maximize the chance of a CSE.  However, for aligned addresses
>>>>       we limit the range to 4k so that structures with different sized
>>>> -     elements are likely to use the same base.  */
>>>> +     elements are likely to use the same base.  We need to be careful
>>>> +     not split CONST for some forms address expressions, otherwise it
>>
>> not to split a CONST for some forms of address expression,
>>
>>>> +     will generate sub-optimal code.  */
>>>>
>>>>    if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
>>>>      {
>>>>        HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>>>>        HOST_WIDE_INT base_offset;
>>>>
>>>> +      if (GET_CODE (XEXP (x, 0)) == PLUS)
>>>> +    {
>>>> +      rtx op0 = XEXP (XEXP (x, 0), 0);
>>>> +      rtx op1 = XEXP (XEXP (x, 0), 1);
>>>> +
>>>> +      /* For addr expression in the form like "r1 + r2 + 0x3ffc".
>>>> +         Since the offset is within range supported by addressing
>>>> +         mode "reg+offset", we don't split the const and legalize
>>>> +         it into below insn and expr sequence:
>>>> +           r3 = r1 + r2;
>>>> +           "r3 + 0x3ffc".  */
>>
>> I think this comment would read better as
>>
>>         /* Address expressions of the form Ra + Rb + CONST.
>>
>>            If CONST is within the range supported by the addressing
>>            mode "reg+offset", do not split CONST and use the
>>            sequence
>>                 Rt = Ra + Rb
>>                 addr = Rt + CONST.  */
>>
>>>> +      if (REG_P (op0) && REG_P (op1))
>>>> +        {
>>>> +          machine_mode addr_mode = GET_MODE (x);
>>>> +          rtx base = gen_reg_rtx (addr_mode);
>>>> +          rtx addr = plus_constant (addr_mode, base, offset);
>>>> +
>>>> +          if (aarch64_legitimate_address_hook_p (mode, addr, false))
>>>> +            {
>>>> +              emit_insn (gen_adddi3 (base, op0, op1));
>>>> +              return addr;
>>>> +            }
>>>> +        }
>>>> +      /* For addr expression in the form like "r1 + r2<<2 + 0x3ffc".
>>>> +         Live above, we don't split the const and legalize it into
>>>> +         below insn and expr sequence:
>>
>> Similarly.
>>>> +           r3 = 0x3ffc;
>>>> +           r4 = r1 + r3;
>>>> +           "r4 + r2<<2".  */
>>
>> Why don't we generate
>>
>>   r3 = r1 + r2 << 2
>>   r4 = r3 + 0x3ffc
>>
>> utilizing the shift-and-add instructions?
> 
> All other comments are addressed in the attached new patch.
> As for this question, Wilco also asked it on internal channel before.
> The main idea is to depend on GIMPLE IVO/SLSR to find CSE
> opportunities of the scaled plus sub expr.  The scaled index is most
> likely loop iv, so I would like to split const plus out of memory
> reference so that it can be identified/hoisted as loop invariant.
> This is more important when base is sfp related.
> 

Ah, yes.  The SFP problem.

Since at least two people have queried this, it's clearly non-obvious
enough to require explanation in comment.

OK with that change and a suitable changelog entry.

R.

> Thanks,
> bin
> 
> 
> aarch64_legitimize_addr-20151203.txt
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 3fe2f0f..248937e 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -4757,13 +4757,67 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
>       We try to pick as large a range for the offset as possible to
>       maximize the chance of a CSE.  However, for aligned addresses
>       we limit the range to 4k so that structures with different sized
> -     elements are likely to use the same base.  */
> +     elements are likely to use the same base.  We need to be careful
> +     not to split a CONST for some forms of address expression, otherwise
> +     it will generate sub-optimal code.  */
>  
>    if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
>      {
>        HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>        HOST_WIDE_INT base_offset;
>  
> +      if (GET_CODE (XEXP (x, 0)) == PLUS)
> +	{
> +	  rtx op0 = XEXP (XEXP (x, 0), 0);
> +	  rtx op1 = XEXP (XEXP (x, 0), 1);
> +
> +	  /* Address expressions of the form Ra + Rb + CONST.
> +
> +	     If CONST is within the range supported by the addressing
> +	     mode "reg+offset", do not split CONST and use the
> +	     sequence
> +	       Rt = Ra + Rb;
> +	       addr = Rt + CONST.  */
> +	  if (REG_P (op0) && REG_P (op1))
> +	    {
> +	      machine_mode addr_mode = GET_MODE (x);
> +	      rtx base = gen_reg_rtx (addr_mode);
> +	      rtx addr = plus_constant (addr_mode, base, offset);
> +
> +	      if (aarch64_legitimate_address_hook_p (mode, addr, false))
> +		{
> +		  emit_insn (gen_adddi3 (base, op0, op1));
> +		  return addr;
> +		}
> +	    }
> +	  /* Address expressions of the form Ra + Rb<<SCALE + CONST.
> +
> +	     If Reg + Rb<<SCALE is a valid address expression, do not
> +	     split CONST and use the sequence
> +	       Rc = CONST;
> +	       Rt = Ra + Rc;
> +	       addr = Rt + Rb<<SCALE.  */
> +	  else if (REG_P (op0) || REG_P (op1))
> +	    {
> +	      machine_mode addr_mode = GET_MODE (x);
> +	      rtx base = gen_reg_rtx (addr_mode);
> +
> +	      /* Switch to make sure that register is in op0.  */
> +	      if (REG_P (op1))
> +		std::swap (op0, op1);
> +
> +	      rtx addr = gen_rtx_PLUS (addr_mode, op1, base);
> +
> +	      if (aarch64_legitimate_address_hook_p (mode, addr, false))
> +		{
> +		  base = force_operand (plus_constant (addr_mode,
> +						       op0, offset),
> +					NULL_RTX);
> +		  return gen_rtx_PLUS (addr_mode, op1, base);
> +		}
> +	    }
> +	}
> +
>        /* Does it look like we'll need a load/store-pair operation?  */
>        if (GET_MODE_SIZE (mode) > 16
>  	  || mode == TImode)
>