public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
@ 2015-11-17  9:21 Bin Cheng
  2015-11-17 10:08 ` James Greenhalgh
  0 siblings, 1 reply; 18+ messages in thread
From: Bin Cheng @ 2015-11-17  9:21 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2286 bytes --]

Hi,
GIMPLE IVO needs to call backend interface to calculate costs for addr
expressions like below:
   FORM1: "r73 + r74 + 16380"
   FORM2: "r73 << 2 + r74 + 16380"

They are invalid address expression on AArch64, so will be legitimized by
aarch64_legitimize_address.  Below are what we got from that function:

For FORM1, the address expression is legitimized into below insn sequence
and rtx:
   r84:DI=r73:DI+r74:DI
   r85:DI=r84:DI+0x3000
   r83:DI=r85:DI
   "r83 + 4092"

For FORM2, the address expression is legitimized into below insn sequence
and rtx:
   r108:DI=r73:DI<<0x2
   r109:DI=r108:DI+r74:DI
   r110:DI=r109:DI+0x3000
   r107:DI=r110:DI
   "r107 + 4092"

So the costs computed are 12/16 respectively.  The high cost prevents IVO
from choosing right candidates.  Besides cost computation, I also think the
legitmization is bad in terms of code generation.
The root cause in aarch64_legitimize_address can be described by it's
comment:
   /* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask),
      where mask is selected by alignment and size of the offset.
      We try to pick as large a range for the offset as possible to
      maximize the chance of a CSE.  However, for aligned addresses
      we limit the range to 4k so that structures with different sized
      elements are likely to use the same base.  */
I think the split of CONST is intended for REG+CONST where the const offset
is not in the range of AArch64's addressing modes.  Unfortunately, it
doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<<SCALE+CONST"
when the CONST are in the range of addressing modes.  As a result, these two
cases fallthrough this logic, resulting in sub-optimal results.

It's obvious we can do below legitimization:
FORM1:
   r83:DI=r73:DI+r74:DI
   "r83 + 16380"
FORM2:
   r107:DI=0x3ffc
   r106:DI=r74:DI+r107:DI
      REG_EQUAL r74:DI+0x3ffc
   "r106 + r73 << 2"

This patch handles these two cases as described.
Bootstrap & test on AArch64 along with other patch.  Is it OK?

2015-11-04  Bin Cheng  <bin.cheng@arm.com>
	    Jiong Wang  <jiong.wang@arm.com>

	* config/aarch64/aarch64.c (aarch64_legitimize_address): Handle
	address expressions like REG+REG+CONST and REG+NON_REG+CONST.

[-- Attachment #2: aarch64_legitimize_addr-20151104.txt --]
[-- Type: text/plain, Size: 1944 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5c8604f..47875ac 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
     {
       HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
       HOST_WIDE_INT base_offset;
+      rtx op0 = XEXP (x,0);
+
+      if (GET_CODE (op0) == PLUS)
+	{
+	  rtx op0_ = XEXP (op0, 0);
+	  rtx op1_ = XEXP (op0, 1);
+
+	  /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will
+	     reach here, the 'CONST' may be valid in which case we should
+	     not split.  */
+	  if (REG_P (op0_) && REG_P (op1_))
+	    {
+	      machine_mode addr_mode = GET_MODE (op0);
+	      rtx addr = gen_reg_rtx (addr_mode);
+
+	      rtx ret = plus_constant (addr_mode, addr, offset);
+	      if (aarch64_legitimate_address_hook_p (mode, ret, false))
+		{
+		  emit_insn (gen_adddi3 (addr, op0_, op1_));
+		  return ret;
+		}
+	    }
+	  /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
+	     will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
+	     we split it into Y=REG+CONST, Y+NON_REG.  */
+	  else if (REG_P (op0_) || REG_P (op1_))
+	    {
+	      machine_mode addr_mode = GET_MODE (op0);
+	      rtx addr = gen_reg_rtx (addr_mode);
+
+	      /* Switch to make sure that register is in op0_.  */
+	      if (REG_P (op1_))
+		std::swap (op0_, op1_);
+
+	      rtx ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
+	      if (aarch64_legitimate_address_hook_p (mode, ret, false))
+		{
+		  addr = force_operand (plus_constant (addr_mode,
+						       op0_, offset),
+					NULL_RTX);
+		  ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
+		  return ret;
+		}
+	    }
+	}
 
       /* Does it look like we'll need a load/store-pair operation?  */
       if (GET_MODE_SIZE (mode) > 16

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
  2015-11-17  9:21 [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address Bin Cheng
@ 2015-11-17 10:08 ` James Greenhalgh
  2015-11-19  2:32   ` Bin.Cheng
  0 siblings, 1 reply; 18+ messages in thread
From: James Greenhalgh @ 2015-11-17 10:08 UTC (permalink / raw)
  To: Bin Cheng; +Cc: gcc-patches

On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote:
> Hi,
> GIMPLE IVO needs to call backend interface to calculate costs for addr
> expressions like below:
>    FORM1: "r73 + r74 + 16380"
>    FORM2: "r73 << 2 + r74 + 16380"
> 
> They are invalid address expression on AArch64, so will be legitimized by
> aarch64_legitimize_address.  Below are what we got from that function:
> 
> For FORM1, the address expression is legitimized into below insn sequence
> and rtx:
>    r84:DI=r73:DI+r74:DI
>    r85:DI=r84:DI+0x3000
>    r83:DI=r85:DI
>    "r83 + 4092"
> 
> For FORM2, the address expression is legitimized into below insn sequence
> and rtx:
>    r108:DI=r73:DI<<0x2
>    r109:DI=r108:DI+r74:DI
>    r110:DI=r109:DI+0x3000
>    r107:DI=r110:DI
>    "r107 + 4092"
> 
> So the costs computed are 12/16 respectively.  The high cost prevents IVO
> from choosing right candidates.  Besides cost computation, I also think the
> legitmization is bad in terms of code generation.
> The root cause in aarch64_legitimize_address can be described by it's
> comment:
>    /* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask),
>       where mask is selected by alignment and size of the offset.
>       We try to pick as large a range for the offset as possible to
>       maximize the chance of a CSE.  However, for aligned addresses
>       we limit the range to 4k so that structures with different sized
>       elements are likely to use the same base.  */
> I think the split of CONST is intended for REG+CONST where the const offset
> is not in the range of AArch64's addressing modes.  Unfortunately, it
> doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<<SCALE+CONST"
> when the CONST are in the range of addressing modes.  As a result, these two
> cases fallthrough this logic, resulting in sub-optimal results.
> 
> It's obvious we can do below legitimization:
> FORM1:
>    r83:DI=r73:DI+r74:DI
>    "r83 + 16380"
> FORM2:
>    r107:DI=0x3ffc
>    r106:DI=r74:DI+r107:DI
>       REG_EQUAL r74:DI+0x3ffc
>    "r106 + r73 << 2"
> 
> This patch handles these two cases as described.

Thanks for the description, it made the patch very easy to review. I only
have a style comment.

> Bootstrap & test on AArch64 along with other patch.  Is it OK?
> 
> 2015-11-04  Bin Cheng  <bin.cheng@arm.com>
> 	    Jiong Wang  <jiong.wang@arm.com>
> 
> 	* config/aarch64/aarch64.c (aarch64_legitimize_address): Handle
> 	address expressions like REG+REG+CONST and REG+NON_REG+CONST.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 5c8604f..47875ac 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
>      {
>        HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>        HOST_WIDE_INT base_offset;
> +      rtx op0 = XEXP (x,0);
> +
> +      if (GET_CODE (op0) == PLUS)
> +	{
> +	  rtx op0_ = XEXP (op0, 0);
> +	  rtx op1_ = XEXP (op0, 1);

I don't see this trailing _ on a variable name in many places in the source
tree (mostly in the Go frontend), and certainly not in the aarch64 backend.
Can we pick a different name for op0_ and op1_?

> +
> +	  /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will
> +	     reach here, the 'CONST' may be valid in which case we should
> +	     not split.  */
> +	  if (REG_P (op0_) && REG_P (op1_))
> +	    {
> +	      machine_mode addr_mode = GET_MODE (op0);
> +	      rtx addr = gen_reg_rtx (addr_mode);
> +
> +	      rtx ret = plus_constant (addr_mode, addr, offset);
> +	      if (aarch64_legitimate_address_hook_p (mode, ret, false))
> +		{
> +		  emit_insn (gen_adddi3 (addr, op0_, op1_));
> +		  return ret;
> +		}
> +	    }
> +	  /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
> +	     will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
> +	     we split it into Y=REG+CONST, Y+NON_REG.  */
> +	  else if (REG_P (op0_) || REG_P (op1_))
> +	    {
> +	      machine_mode addr_mode = GET_MODE (op0);
> +	      rtx addr = gen_reg_rtx (addr_mode);
> +
> +	      /* Switch to make sure that register is in op0_.  */
> +	      if (REG_P (op1_))
> +		std::swap (op0_, op1_);
> +
> +	      rtx ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
> +	      if (aarch64_legitimate_address_hook_p (mode, ret, false))
> +		{
> +		  addr = force_operand (plus_constant (addr_mode,
> +						       op0_, offset),
> +					NULL_RTX);
> +		  ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
> +		  return ret;
> +		}

The logic here is a bit hairy to follow, you construct a PLUS RTX to check
aarch64_legitimate_address_hook_p, then construct a different PLUS RTX
to use as the return value. This can probably be clarified by choosing a
name other than ret for the temporary address expression you construct.

It would also be good to take some of your detailed description and write
that here. Certainly I found the explicit examples in the cover letter
easier to follow than:

> +	  /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
> +	     will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
> +	     we split it into Y=REG+CONST, Y+NON_REG.  */

Otherwise this patch is OK.

Thanks,
James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
  2015-11-17 10:08 ` James Greenhalgh
@ 2015-11-19  2:32   ` Bin.Cheng
  2015-11-20  8:31     ` Bin.Cheng
  0 siblings, 1 reply; 18+ messages in thread
From: Bin.Cheng @ 2015-11-19  2:32 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: Bin Cheng, gcc-patches List

[-- Attachment #1: Type: text/plain, Size: 5849 bytes --]

On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
> On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote:
>> Hi,
>> GIMPLE IVO needs to call backend interface to calculate costs for addr
>> expressions like below:
>>    FORM1: "r73 + r74 + 16380"
>>    FORM2: "r73 << 2 + r74 + 16380"
>>
>> They are invalid address expression on AArch64, so will be legitimized by
>> aarch64_legitimize_address.  Below are what we got from that function:
>>
>> For FORM1, the address expression is legitimized into below insn sequence
>> and rtx:
>>    r84:DI=r73:DI+r74:DI
>>    r85:DI=r84:DI+0x3000
>>    r83:DI=r85:DI
>>    "r83 + 4092"
>>
>> For FORM2, the address expression is legitimized into below insn sequence
>> and rtx:
>>    r108:DI=r73:DI<<0x2
>>    r109:DI=r108:DI+r74:DI
>>    r110:DI=r109:DI+0x3000
>>    r107:DI=r110:DI
>>    "r107 + 4092"
>>
>> So the costs computed are 12/16 respectively.  The high cost prevents IVO
>> from choosing right candidates.  Besides cost computation, I also think the
>> legitmization is bad in terms of code generation.
>> The root cause in aarch64_legitimize_address can be described by it's
>> comment:
>>    /* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask),
>>       where mask is selected by alignment and size of the offset.
>>       We try to pick as large a range for the offset as possible to
>>       maximize the chance of a CSE.  However, for aligned addresses
>>       we limit the range to 4k so that structures with different sized
>>       elements are likely to use the same base.  */
>> I think the split of CONST is intended for REG+CONST where the const offset
>> is not in the range of AArch64's addressing modes.  Unfortunately, it
>> doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<<SCALE+CONST"
>> when the CONST are in the range of addressing modes.  As a result, these two
>> cases fallthrough this logic, resulting in sub-optimal results.
>>
>> It's obvious we can do below legitimization:
>> FORM1:
>>    r83:DI=r73:DI+r74:DI
>>    "r83 + 16380"
>> FORM2:
>>    r107:DI=0x3ffc
>>    r106:DI=r74:DI+r107:DI
>>       REG_EQUAL r74:DI+0x3ffc
>>    "r106 + r73 << 2"
>>
>> This patch handles these two cases as described.
>
> Thanks for the description, it made the patch very easy to review. I only
> have a style comment.
>
>> Bootstrap & test on AArch64 along with other patch.  Is it OK?
>>
>> 2015-11-04  Bin Cheng  <bin.cheng@arm.com>
>>           Jiong Wang  <jiong.wang@arm.com>
>>
>>       * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle
>>       address expressions like REG+REG+CONST and REG+NON_REG+CONST.
>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 5c8604f..47875ac 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
>>      {
>>        HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>>        HOST_WIDE_INT base_offset;
>> +      rtx op0 = XEXP (x,0);
>> +
>> +      if (GET_CODE (op0) == PLUS)
>> +     {
>> +       rtx op0_ = XEXP (op0, 0);
>> +       rtx op1_ = XEXP (op0, 1);
>
> I don't see this trailing _ on a variable name in many places in the source
> tree (mostly in the Go frontend), and certainly not in the aarch64 backend.
> Can we pick a different name for op0_ and op1_?
>
>> +
>> +       /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will
>> +          reach here, the 'CONST' may be valid in which case we should
>> +          not split.  */
>> +       if (REG_P (op0_) && REG_P (op1_))
>> +         {
>> +           machine_mode addr_mode = GET_MODE (op0);
>> +           rtx addr = gen_reg_rtx (addr_mode);
>> +
>> +           rtx ret = plus_constant (addr_mode, addr, offset);
>> +           if (aarch64_legitimate_address_hook_p (mode, ret, false))
>> +             {
>> +               emit_insn (gen_adddi3 (addr, op0_, op1_));
>> +               return ret;
>> +             }
>> +         }
>> +       /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
>> +          will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
>> +          we split it into Y=REG+CONST, Y+NON_REG.  */
>> +       else if (REG_P (op0_) || REG_P (op1_))
>> +         {
>> +           machine_mode addr_mode = GET_MODE (op0);
>> +           rtx addr = gen_reg_rtx (addr_mode);
>> +
>> +           /* Switch to make sure that register is in op0_.  */
>> +           if (REG_P (op1_))
>> +             std::swap (op0_, op1_);
>> +
>> +           rtx ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
>> +           if (aarch64_legitimate_address_hook_p (mode, ret, false))
>> +             {
>> +               addr = force_operand (plus_constant (addr_mode,
>> +                                                    op0_, offset),
>> +                                     NULL_RTX);
>> +               ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
>> +               return ret;
>> +             }
>
> The logic here is a bit hairy to follow, you construct a PLUS RTX to check
> aarch64_legitimate_address_hook_p, then construct a different PLUS RTX
> to use as the return value. This can probably be clarified by choosing a
> name other than ret for the temporary address expression you construct.
>
> It would also be good to take some of your detailed description and write
> that here. Certainly I found the explicit examples in the cover letter
> easier to follow than:
>
>> +       /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
>> +          will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
>> +          we split it into Y=REG+CONST, Y+NON_REG.  */
>
> Otherwise this patch is OK.
Thanks for reviewing, here is the updated patch.

Thanks,
bin

[-- Attachment #2: aarch64_legitimize_addr-20151105.txt --]
[-- Type: text/plain, Size: 2564 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5c8604f..64bc6a4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4704,13 +4704,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
      We try to pick as large a range for the offset as possible to
      maximize the chance of a CSE.  However, for aligned addresses
      we limit the range to 4k so that structures with different sized
-     elements are likely to use the same base.  */
+     elements are likely to use the same base.  We need to be careful
+     not split CONST for some forms address expressions, otherwise it
+     will generate sub-optimal code.  */
 
   if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
     {
       HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
       HOST_WIDE_INT base_offset;
 
+      if (GET_CODE (XEXP (x, 0)) == PLUS)
+	{
+	  rtx op0 = XEXP (XEXP (x, 0), 0);
+	  rtx op1 = XEXP (XEXP (x, 0), 1);
+
+	  /* For addr expression in the form like "r1 + r2 + 0x3ffc".
+	     Since the offset is within range supported by addressing
+	     mode "reg+offset", we don't split the const and legalize
+	     it into below insn and expr sequence:
+	       r3 = r1 + r2;
+	       "r3 + 0x3ffc".  */
+	  if (REG_P (op0) && REG_P (op1))
+	    {
+	      machine_mode addr_mode = GET_MODE (x);
+	      rtx base = gen_reg_rtx (addr_mode);
+	      rtx addr = plus_constant (addr_mode, base, offset);
+
+	      if (aarch64_legitimate_address_hook_p (mode, addr, false))
+		{
+		  emit_insn (gen_adddi3 (base, op0, op1));
+		  return addr;
+		}
+	    }
+	  /* For addr expression in the form like "r1 + r2<<2 + 0x3ffc".
+	     Live above, we don't split the const and legalize it into
+	     below insn and expr sequence:
+	       r3 = 0x3ffc;
+	       r4 = r1 + r3;
+	       "r4 + r2<<2".  */
+	  else if (REG_P (op0) || REG_P (op1))
+	    {
+	      machine_mode addr_mode = GET_MODE (x);
+	      rtx base = gen_reg_rtx (addr_mode);
+
+	      /* Switch to make sure that register is in op0.  */
+	      if (REG_P (op1))
+		std::swap (op0, op1);
+
+	      rtx addr = gen_rtx_fmt_ee (PLUS, addr_mode, base, op1);
+
+	      if (aarch64_legitimate_address_hook_p (mode, addr, false))
+		{
+		  base = force_operand (plus_constant (addr_mode,
+						       op0, offset),
+					NULL_RTX);
+		  return gen_rtx_fmt_ee (PLUS, addr_mode, base, op1);
+		}
+	    }
+	}
+
       /* Does it look like we'll need a load/store-pair operation?  */
       if (GET_MODE_SIZE (mode) > 16
 	  || mode == TImode)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
  2015-11-19  2:32   ` Bin.Cheng
@ 2015-11-20  8:31     ` Bin.Cheng
  2015-11-20 17:39       ` Richard Earnshaw
  0 siblings, 1 reply; 18+ messages in thread
From: Bin.Cheng @ 2015-11-20  8:31 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: Bin Cheng, gcc-patches List

On Thu, Nov 19, 2015 at 10:32 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh
> <james.greenhalgh@arm.com> wrote:
>> On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote:
>>> Hi,
>>> GIMPLE IVO needs to call backend interface to calculate costs for addr
>>> expressions like below:
>>>    FORM1: "r73 + r74 + 16380"
>>>    FORM2: "r73 << 2 + r74 + 16380"
>>>
>>> They are invalid address expression on AArch64, so will be legitimized by
>>> aarch64_legitimize_address.  Below are what we got from that function:
>>>
>>> For FORM1, the address expression is legitimized into below insn sequence
>>> and rtx:
>>>    r84:DI=r73:DI+r74:DI
>>>    r85:DI=r84:DI+0x3000
>>>    r83:DI=r85:DI
>>>    "r83 + 4092"
>>>
>>> For FORM2, the address expression is legitimized into below insn sequence
>>> and rtx:
>>>    r108:DI=r73:DI<<0x2
>>>    r109:DI=r108:DI+r74:DI
>>>    r110:DI=r109:DI+0x3000
>>>    r107:DI=r110:DI
>>>    "r107 + 4092"
>>>
>>> So the costs computed are 12/16 respectively.  The high cost prevents IVO
>>> from choosing right candidates.  Besides cost computation, I also think the
>>> legitmization is bad in terms of code generation.
>>> The root cause in aarch64_legitimize_address can be described by it's
>>> comment:
>>>    /* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask),
>>>       where mask is selected by alignment and size of the offset.
>>>       We try to pick as large a range for the offset as possible to
>>>       maximize the chance of a CSE.  However, for aligned addresses
>>>       we limit the range to 4k so that structures with different sized
>>>       elements are likely to use the same base.  */
>>> I think the split of CONST is intended for REG+CONST where the const offset
>>> is not in the range of AArch64's addressing modes.  Unfortunately, it
>>> doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<<SCALE+CONST"
>>> when the CONST are in the range of addressing modes.  As a result, these two
>>> cases fallthrough this logic, resulting in sub-optimal results.
>>>
>>> It's obvious we can do below legitimization:
>>> FORM1:
>>>    r83:DI=r73:DI+r74:DI
>>>    "r83 + 16380"
>>> FORM2:
>>>    r107:DI=0x3ffc
>>>    r106:DI=r74:DI+r107:DI
>>>       REG_EQUAL r74:DI+0x3ffc
>>>    "r106 + r73 << 2"
>>>
>>> This patch handles these two cases as described.
>>
>> Thanks for the description, it made the patch very easy to review. I only
>> have a style comment.
>>
>>> Bootstrap & test on AArch64 along with other patch.  Is it OK?
>>>
>>> 2015-11-04  Bin Cheng  <bin.cheng@arm.com>
>>>           Jiong Wang  <jiong.wang@arm.com>
>>>
>>>       * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle
>>>       address expressions like REG+REG+CONST and REG+NON_REG+CONST.
>>
>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>> index 5c8604f..47875ac 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
>>>      {
>>>        HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>>>        HOST_WIDE_INT base_offset;
>>> +      rtx op0 = XEXP (x,0);
>>> +
>>> +      if (GET_CODE (op0) == PLUS)
>>> +     {
>>> +       rtx op0_ = XEXP (op0, 0);
>>> +       rtx op1_ = XEXP (op0, 1);
>>
>> I don't see this trailing _ on a variable name in many places in the source
>> tree (mostly in the Go frontend), and certainly not in the aarch64 backend.
>> Can we pick a different name for op0_ and op1_?
>>
>>> +
>>> +       /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will
>>> +          reach here, the 'CONST' may be valid in which case we should
>>> +          not split.  */
>>> +       if (REG_P (op0_) && REG_P (op1_))
>>> +         {
>>> +           machine_mode addr_mode = GET_MODE (op0);
>>> +           rtx addr = gen_reg_rtx (addr_mode);
>>> +
>>> +           rtx ret = plus_constant (addr_mode, addr, offset);
>>> +           if (aarch64_legitimate_address_hook_p (mode, ret, false))
>>> +             {
>>> +               emit_insn (gen_adddi3 (addr, op0_, op1_));
>>> +               return ret;
>>> +             }
>>> +         }
>>> +       /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
>>> +          will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
>>> +          we split it into Y=REG+CONST, Y+NON_REG.  */
>>> +       else if (REG_P (op0_) || REG_P (op1_))
>>> +         {
>>> +           machine_mode addr_mode = GET_MODE (op0);
>>> +           rtx addr = gen_reg_rtx (addr_mode);
>>> +
>>> +           /* Switch to make sure that register is in op0_.  */
>>> +           if (REG_P (op1_))
>>> +             std::swap (op0_, op1_);
>>> +
>>> +           rtx ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
>>> +           if (aarch64_legitimate_address_hook_p (mode, ret, false))
>>> +             {
>>> +               addr = force_operand (plus_constant (addr_mode,
>>> +                                                    op0_, offset),
>>> +                                     NULL_RTX);
>>> +               ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
>>> +               return ret;
>>> +             }
>>
>> The logic here is a bit hairy to follow, you construct a PLUS RTX to check
>> aarch64_legitimate_address_hook_p, then construct a different PLUS RTX
>> to use as the return value. This can probably be clarified by choosing a
>> name other than ret for the temporary address expression you construct.
>>
>> It would also be good to take some of your detailed description and write
>> that here. Certainly I found the explicit examples in the cover letter
>> easier to follow than:
>>
>>> +       /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
>>> +          will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
>>> +          we split it into Y=REG+CONST, Y+NON_REG.  */
>>
>> Otherwise this patch is OK.
> Thanks for reviewing, here is the updated patch.

Hmm, I retested the patch on aarch64 and found it caused two
additional failures.

FAIL: gcc.target/aarch64/ldp_vec_64_1.c scan-assembler ldp\td[0-9]+, d[0-9]
This is caused by different ivopt decision because of this patch's
cost change.  As far as IVO can tell, the new decision is better than
the old one.  So is the IVOPTed dump.  I can fix this by changing how
this patch legitimize address "r1 + r2 + offset".  In this patch, it's
legitimized into "r3 = r1 + r2; [r3 + offset]"; we could change it
into "r3 = offset; r4 = r1 + r3; [r4 + r2]".  This new form is better
because possibly r4 is a loop invariant, but the cost is higher.  I
tend to keep this patch the way it is since I don't know how the
changed cost affects performance data.  We may need to live with this
failure for a while.

FAIL: gcc.dg/atomic/stdatomic-vm.c   -O1  (internal compiler error)
This I think is a potential bug in aarch64 backend.  GCC could
generate "[r1 + r2 << 3] = unspec..." with this patch, for this test,
LRA needs to make a reload for the address expression by doing
"r1+r2<<3" outside of memory reference.  In function emit_add3_insn,
it firstly checks have_addptr3_insn/gen_addptr3_insn, then the add3
pattern.  The code is as below:

  if (have_addptr3_insn (x, y, z))
    {
      rtx_insn *insn = gen_addptr3_insn (x, y, z);

      /* If the target provides an "addptr" pattern it hopefully does
     for a reason.  So falling back to the normal add would be
     a bug.  */
      lra_assert (insn != NULL_RTX);
      emit_insn (insn);
      return insn;
    }

  rtx_insn* insn = emit_insn (gen_rtx_SET (x, gen_rtx_PLUS (GET_MODE
(y), y, z)));
  if (recog_memoized (insn) < 0)
    {
      delete_insns_since (last);
      insn = NULL;
    }
  return insn;

The aarch64's problem is we don't define addptr3 pattern, and we don't
have direct insn pattern describing the "x + y << z".  According to
gcc internal:

‘addptrm3’
Like addm3 but is guaranteed to only be used for address calculations.
The expanded code is not allowed to clobber the condition code. It
only needs to be defined if addm3 sets the condition code. If adds
used for address calculations and normal adds are not compatible it is
required to expand a distinct pattern (e.g. using an unspec). The
pattern is used by LRA to emit address calculations. addm3 is used if
addptrm3 is not defined.

Seems to me aarch64 needs to support "add(shifted register)" in add3
pattern, or needs to support addptr3 pattern.  I tend to apply this
patch to expose the bug, but I will let aarch64 experts give some
comments before that.

Thanks,
bin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
  2015-11-20  8:31     ` Bin.Cheng
@ 2015-11-20 17:39       ` Richard Earnshaw
  2015-11-24  3:23         ` Bin.Cheng
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Earnshaw @ 2015-11-20 17:39 UTC (permalink / raw)
  To: Bin.Cheng, James Greenhalgh; +Cc: Bin Cheng, gcc-patches List

On 20/11/15 08:31, Bin.Cheng wrote:
> On Thu, Nov 19, 2015 at 10:32 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh
>> <james.greenhalgh@arm.com> wrote:
>>> On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote:
>>>> Hi,
>>>> GIMPLE IVO needs to call backend interface to calculate costs for addr
>>>> expressions like below:
>>>>    FORM1: "r73 + r74 + 16380"
>>>>    FORM2: "r73 << 2 + r74 + 16380"
>>>>
>>>> They are invalid address expression on AArch64, so will be legitimized by
>>>> aarch64_legitimize_address.  Below are what we got from that function:
>>>>
>>>> For FORM1, the address expression is legitimized into below insn sequence
>>>> and rtx:
>>>>    r84:DI=r73:DI+r74:DI
>>>>    r85:DI=r84:DI+0x3000
>>>>    r83:DI=r85:DI
>>>>    "r83 + 4092"
>>>>
>>>> For FORM2, the address expression is legitimized into below insn sequence
>>>> and rtx:
>>>>    r108:DI=r73:DI<<0x2
>>>>    r109:DI=r108:DI+r74:DI
>>>>    r110:DI=r109:DI+0x3000
>>>>    r107:DI=r110:DI
>>>>    "r107 + 4092"
>>>>
>>>> So the costs computed are 12/16 respectively.  The high cost prevents IVO
>>>> from choosing right candidates.  Besides cost computation, I also think the
>>>> legitmization is bad in terms of code generation.
>>>> The root cause in aarch64_legitimize_address can be described by it's
>>>> comment:
>>>>    /* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask),
>>>>       where mask is selected by alignment and size of the offset.
>>>>       We try to pick as large a range for the offset as possible to
>>>>       maximize the chance of a CSE.  However, for aligned addresses
>>>>       we limit the range to 4k so that structures with different sized
>>>>       elements are likely to use the same base.  */
>>>> I think the split of CONST is intended for REG+CONST where the const offset
>>>> is not in the range of AArch64's addressing modes.  Unfortunately, it
>>>> doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<<SCALE+CONST"
>>>> when the CONST are in the range of addressing modes.  As a result, these two
>>>> cases fallthrough this logic, resulting in sub-optimal results.
>>>>
>>>> It's obvious we can do below legitimization:
>>>> FORM1:
>>>>    r83:DI=r73:DI+r74:DI
>>>>    "r83 + 16380"
>>>> FORM2:
>>>>    r107:DI=0x3ffc
>>>>    r106:DI=r74:DI+r107:DI
>>>>       REG_EQUAL r74:DI+0x3ffc
>>>>    "r106 + r73 << 2"
>>>>
>>>> This patch handles these two cases as described.
>>>
>>> Thanks for the description, it made the patch very easy to review. I only
>>> have a style comment.
>>>
>>>> Bootstrap & test on AArch64 along with other patch.  Is it OK?
>>>>
>>>> 2015-11-04  Bin Cheng  <bin.cheng@arm.com>
>>>>           Jiong Wang  <jiong.wang@arm.com>
>>>>
>>>>       * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle
>>>>       address expressions like REG+REG+CONST and REG+NON_REG+CONST.
>>>
>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>>> index 5c8604f..47875ac 100644
>>>> --- a/gcc/config/aarch64/aarch64.c
>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>> @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
>>>>      {
>>>>        HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>>>>        HOST_WIDE_INT base_offset;
>>>> +      rtx op0 = XEXP (x,0);
>>>> +
>>>> +      if (GET_CODE (op0) == PLUS)
>>>> +     {
>>>> +       rtx op0_ = XEXP (op0, 0);
>>>> +       rtx op1_ = XEXP (op0, 1);
>>>
>>> I don't see this trailing _ on a variable name in many places in the source
>>> tree (mostly in the Go frontend), and certainly not in the aarch64 backend.
>>> Can we pick a different name for op0_ and op1_?
>>>
>>>> +
>>>> +       /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will
>>>> +          reach here, the 'CONST' may be valid in which case we should
>>>> +          not split.  */
>>>> +       if (REG_P (op0_) && REG_P (op1_))
>>>> +         {
>>>> +           machine_mode addr_mode = GET_MODE (op0);
>>>> +           rtx addr = gen_reg_rtx (addr_mode);
>>>> +
>>>> +           rtx ret = plus_constant (addr_mode, addr, offset);
>>>> +           if (aarch64_legitimate_address_hook_p (mode, ret, false))
>>>> +             {
>>>> +               emit_insn (gen_adddi3 (addr, op0_, op1_));
>>>> +               return ret;
>>>> +             }
>>>> +         }
>>>> +       /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
>>>> +          will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
>>>> +          we split it into Y=REG+CONST, Y+NON_REG.  */
>>>> +       else if (REG_P (op0_) || REG_P (op1_))
>>>> +         {
>>>> +           machine_mode addr_mode = GET_MODE (op0);
>>>> +           rtx addr = gen_reg_rtx (addr_mode);
>>>> +
>>>> +           /* Switch to make sure that register is in op0_.  */
>>>> +           if (REG_P (op1_))
>>>> +             std::swap (op0_, op1_);
>>>> +
>>>> +           rtx ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
>>>> +           if (aarch64_legitimate_address_hook_p (mode, ret, false))
>>>> +             {
>>>> +               addr = force_operand (plus_constant (addr_mode,
>>>> +                                                    op0_, offset),
>>>> +                                     NULL_RTX);
>>>> +               ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
>>>> +               return ret;
>>>> +             }
>>>
>>> The logic here is a bit hairy to follow, you construct a PLUS RTX to check
>>> aarch64_legitimate_address_hook_p, then construct a different PLUS RTX
>>> to use as the return value. This can probably be clarified by choosing a
>>> name other than ret for the temporary address expression you construct.
>>>
>>> It would also be good to take some of your detailed description and write
>>> that here. Certainly I found the explicit examples in the cover letter
>>> easier to follow than:
>>>
>>>> +       /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
>>>> +          will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
>>>> +          we split it into Y=REG+CONST, Y+NON_REG.  */
>>>
>>> Otherwise this patch is OK.
>> Thanks for reviewing, here is the updated patch.
> 
> Hmm, I retested the patch on aarch64 and found it caused two
> additional failures.
> 
> FAIL: gcc.target/aarch64/ldp_vec_64_1.c scan-assembler ldp\td[0-9]+, d[0-9]
> This is caused by different ivopt decision because of this patch's
> cost change.  As far as IVO can tell, the new decision is better than
> the old one.  So is the IVOPTed dump.  I can fix this by changing how
> this patch legitimize address "r1 + r2 + offset".  In this patch, it's
> legitimized into "r3 = r1 + r2; [r3 + offset]"; we could change it
> into "r3 = offset; r4 = r1 + r3; [r4 + r2]".  This new form is better
> because possibly r4 is a loop invariant, but the cost is higher.  I
> tend to keep this patch the way it is since I don't know how the
> changed cost affects performance data.  We may need to live with this
> failure for a while.
> 
> FAIL: gcc.dg/atomic/stdatomic-vm.c   -O1  (internal compiler error)
> This I think is a potential bug in aarch64 backend.  GCC could
> generate "[r1 + r2 << 3] = unspec..." with this patch, for this test,
> LRA needs to make a reload for the address expression by doing
> "r1+r2<<3" outside of memory reference.  In function emit_add3_insn,
> it firstly checks have_addptr3_insn/gen_addptr3_insn, then the add3
> pattern.  The code is as below:
> 
>   if (have_addptr3_insn (x, y, z))
>     {
>       rtx_insn *insn = gen_addptr3_insn (x, y, z);
> 
>       /* If the target provides an "addptr" pattern it hopefully does
>      for a reason.  So falling back to the normal add would be
>      a bug.  */
>       lra_assert (insn != NULL_RTX);
>       emit_insn (insn);
>       return insn;
>     }
> 
>   rtx_insn* insn = emit_insn (gen_rtx_SET (x, gen_rtx_PLUS (GET_MODE
> (y), y, z)));
>   if (recog_memoized (insn) < 0)
>     {
>       delete_insns_since (last);
>       insn = NULL;
>     }
>   return insn;
> 
> The aarch64's problem is we don't define addptr3 pattern, and we don't
> have direct insn pattern describing the "x + y << z".  According to
> gcc internal:
> 
> ‘addptrm3’
> Like addm3 but is guaranteed to only be used for address calculations.
> The expanded code is not allowed to clobber the condition code. It
> only needs to be defined if addm3 sets the condition code.

addm3 on aarch64 does not set the condition codes, so by this rule we
shouldn't need to define this pattern.

R.

> If adds
> used for address calculations and normal adds are not compatible it is
> required to expand a distinct pattern (e.g. using an unspec). The
> pattern is used by LRA to emit address calculations. addm3 is used if
> addptrm3 is not defined.
> 
> Seems to me aarch64 needs to support "add(shifted register)" in add3
> pattern, or needs to support addptr3 pattern.  I tend to apply this
> patch to expose the bug, but I will let aarch64 experts give some
> comments before that.
> 
> Thanks,
> bin
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
  2015-11-20 17:39       ` Richard Earnshaw
@ 2015-11-24  3:23         ` Bin.Cheng
  2015-11-24  9:59           ` Richard Earnshaw
  0 siblings, 1 reply; 18+ messages in thread
From: Bin.Cheng @ 2015-11-24  3:23 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: James Greenhalgh, Bin Cheng, gcc-patches List

On Sat, Nov 21, 2015 at 1:39 AM, Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
> On 20/11/15 08:31, Bin.Cheng wrote:
>> On Thu, Nov 19, 2015 at 10:32 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh
>>> <james.greenhalgh@arm.com> wrote:
>>>> On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote:
>>>>> Hi,
>>>>> GIMPLE IVO needs to call backend interface to calculate costs for addr
>>>>> expressions like below:
>>>>>    FORM1: "r73 + r74 + 16380"
>>>>>    FORM2: "r73 << 2 + r74 + 16380"
>>>>>
>>>>> They are invalid address expression on AArch64, so will be legitimized by
>>>>> aarch64_legitimize_address.  Below are what we got from that function:
>>>>>
>>>>> For FORM1, the address expression is legitimized into below insn sequence
>>>>> and rtx:
>>>>>    r84:DI=r73:DI+r74:DI
>>>>>    r85:DI=r84:DI+0x3000
>>>>>    r83:DI=r85:DI
>>>>>    "r83 + 4092"
>>>>>
>>>>> For FORM2, the address expression is legitimized into below insn sequence
>>>>> and rtx:
>>>>>    r108:DI=r73:DI<<0x2
>>>>>    r109:DI=r108:DI+r74:DI
>>>>>    r110:DI=r109:DI+0x3000
>>>>>    r107:DI=r110:DI
>>>>>    "r107 + 4092"
>>>>>
>>>>> So the costs computed are 12/16 respectively.  The high cost prevents IVO
>>>>> from choosing right candidates.  Besides cost computation, I also think the
>>>>> legitmization is bad in terms of code generation.
>>>>> The root cause in aarch64_legitimize_address can be described by it's
>>>>> comment:
>>>>>    /* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask),
>>>>>       where mask is selected by alignment and size of the offset.
>>>>>       We try to pick as large a range for the offset as possible to
>>>>>       maximize the chance of a CSE.  However, for aligned addresses
>>>>>       we limit the range to 4k so that structures with different sized
>>>>>       elements are likely to use the same base.  */
>>>>> I think the split of CONST is intended for REG+CONST where the const offset
>>>>> is not in the range of AArch64's addressing modes.  Unfortunately, it
>>>>> doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<<SCALE+CONST"
>>>>> when the CONST are in the range of addressing modes.  As a result, these two
>>>>> cases fallthrough this logic, resulting in sub-optimal results.
>>>>>
>>>>> It's obvious we can do below legitimization:
>>>>> FORM1:
>>>>>    r83:DI=r73:DI+r74:DI
>>>>>    "r83 + 16380"
>>>>> FORM2:
>>>>>    r107:DI=0x3ffc
>>>>>    r106:DI=r74:DI+r107:DI
>>>>>       REG_EQUAL r74:DI+0x3ffc
>>>>>    "r106 + r73 << 2"
>>>>>
>>>>> This patch handles these two cases as described.
>>>>
>>>> Thanks for the description, it made the patch very easy to review. I only
>>>> have a style comment.
>>>>
>>>>> Bootstrap & test on AArch64 along with other patch.  Is it OK?
>>>>>
>>>>> 2015-11-04  Bin Cheng  <bin.cheng@arm.com>
>>>>>           Jiong Wang  <jiong.wang@arm.com>
>>>>>
>>>>>       * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle
>>>>>       address expressions like REG+REG+CONST and REG+NON_REG+CONST.
>>>>
>>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>>>> index 5c8604f..47875ac 100644
>>>>> --- a/gcc/config/aarch64/aarch64.c
>>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>>> @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
>>>>>      {
>>>>>        HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>>>>>        HOST_WIDE_INT base_offset;
>>>>> +      rtx op0 = XEXP (x,0);
>>>>> +
>>>>> +      if (GET_CODE (op0) == PLUS)
>>>>> +     {
>>>>> +       rtx op0_ = XEXP (op0, 0);
>>>>> +       rtx op1_ = XEXP (op0, 1);
>>>>
>>>> I don't see this trailing _ on a variable name in many places in the source
>>>> tree (mostly in the Go frontend), and certainly not in the aarch64 backend.
>>>> Can we pick a different name for op0_ and op1_?
>>>>
>>>>> +
>>>>> +       /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will
>>>>> +          reach here, the 'CONST' may be valid in which case we should
>>>>> +          not split.  */
>>>>> +       if (REG_P (op0_) && REG_P (op1_))
>>>>> +         {
>>>>> +           machine_mode addr_mode = GET_MODE (op0);
>>>>> +           rtx addr = gen_reg_rtx (addr_mode);
>>>>> +
>>>>> +           rtx ret = plus_constant (addr_mode, addr, offset);
>>>>> +           if (aarch64_legitimate_address_hook_p (mode, ret, false))
>>>>> +             {
>>>>> +               emit_insn (gen_adddi3 (addr, op0_, op1_));
>>>>> +               return ret;
>>>>> +             }
>>>>> +         }
>>>>> +       /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
>>>>> +          will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
>>>>> +          we split it into Y=REG+CONST, Y+NON_REG.  */
>>>>> +       else if (REG_P (op0_) || REG_P (op1_))
>>>>> +         {
>>>>> +           machine_mode addr_mode = GET_MODE (op0);
>>>>> +           rtx addr = gen_reg_rtx (addr_mode);
>>>>> +
>>>>> +           /* Switch to make sure that register is in op0_.  */
>>>>> +           if (REG_P (op1_))
>>>>> +             std::swap (op0_, op1_);
>>>>> +
>>>>> +           rtx ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
>>>>> +           if (aarch64_legitimate_address_hook_p (mode, ret, false))
>>>>> +             {
>>>>> +               addr = force_operand (plus_constant (addr_mode,
>>>>> +                                                    op0_, offset),
>>>>> +                                     NULL_RTX);
>>>>> +               ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
>>>>> +               return ret;
>>>>> +             }
>>>>
>>>> The logic here is a bit hairy to follow, you construct a PLUS RTX to check
>>>> aarch64_legitimate_address_hook_p, then construct a different PLUS RTX
>>>> to use as the return value. This can probably be clarified by choosing a
>>>> name other than ret for the temporary address expression you construct.
>>>>
>>>> It would also be good to take some of your detailed description and write
>>>> that here. Certainly I found the explicit examples in the cover letter
>>>> easier to follow than:
>>>>
>>>>> +       /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
>>>>> +          will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
>>>>> +          we split it into Y=REG+CONST, Y+NON_REG.  */
>>>>
>>>> Otherwise this patch is OK.
>>> Thanks for reviewing, here is the updated patch.
>>
>> Hmm, I retested the patch on aarch64 and found it caused two
>> additional failures.
>>
>> FAIL: gcc.target/aarch64/ldp_vec_64_1.c scan-assembler ldp\td[0-9]+, d[0-9]
>> This is caused by different ivopt decision because of this patch's
>> cost change.  As far as IVO can tell, the new decision is better than
>> the old one.  So is the IVOPTed dump.  I can fix this by changing how
>> this patch legitimize address "r1 + r2 + offset".  In this patch, it's
>> legitimized into "r3 = r1 + r2; [r3 + offset]"; we could change it
>> into "r3 = offset; r4 = r1 + r3; [r4 + r2]".  This new form is better
>> because possibly r4 is a loop invariant, but the cost is higher.  I
>> tend to keep this patch the way it is since I don't know how the
>> changed cost affects performance data.  We may need to live with this
>> failure for a while.
>>
>> FAIL: gcc.dg/atomic/stdatomic-vm.c   -O1  (internal compiler error)
>> This I think is a potential bug in aarch64 backend.  GCC could
>> generate "[r1 + r2 << 3] = unspec..." with this patch, for this test,
>> LRA needs to make a reload for the address expression by doing
>> "r1+r2<<3" outside of memory reference.  In function emit_add3_insn,
>> it firstly checks have_addptr3_insn/gen_addptr3_insn, then the add3
>> pattern.  The code is as below:
>>
>>   if (have_addptr3_insn (x, y, z))
>>     {
>>       rtx_insn *insn = gen_addptr3_insn (x, y, z);
>>
>>       /* If the target provides an "addptr" pattern it hopefully does
>>      for a reason.  So falling back to the normal add would be
>>      a bug.  */
>>       lra_assert (insn != NULL_RTX);
>>       emit_insn (insn);
>>       return insn;
>>     }
>>
>>   rtx_insn* insn = emit_insn (gen_rtx_SET (x, gen_rtx_PLUS (GET_MODE
>> (y), y, z)));
>>   if (recog_memoized (insn) < 0)
>>     {
>>       delete_insns_since (last);
>>       insn = NULL;
>>     }
>>   return insn;
>>
>> The aarch64's problem is we don't define addptr3 pattern, and we don't
>> have direct insn pattern describing the "x + y << z".  According to
>> gcc internal:
>>
>> ‘addptrm3’
>> Like addm3 but is guaranteed to only be used for address calculations.
>> The expanded code is not allowed to clobber the condition code. It
>> only needs to be defined if addm3 sets the condition code.
>
> addm3 on aarch64 does not set the condition codes, so by this rule we
> shouldn't need to define this pattern.

Hi Richard,
I think that rule has a prerequisite that backend needs to support
register shifted addition in addm3 pattern.  Apparently for AArch64,
addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
"does not set the condition codes" actually, because both
"adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
Either way I think it is another backend issue, so do you approve that
I commit this patch now?

Thanks,
bin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
  2015-11-24  3:23         ` Bin.Cheng
@ 2015-11-24  9:59           ` Richard Earnshaw
  2015-11-24 10:21             ` Richard Earnshaw
  2015-11-25  4:53             ` Bin.Cheng
  0 siblings, 2 replies; 18+ messages in thread
From: Richard Earnshaw @ 2015-11-24  9:59 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: James Greenhalgh, Bin Cheng, gcc-patches List

On 24/11/15 02:51, Bin.Cheng wrote:
>>> The aarch64's problem is we don't define addptr3 pattern, and we don't
>>> >> have direct insn pattern describing the "x + y << z".  According to
>>> >> gcc internal:
>>> >>
>>> >> ‘addptrm3’
>>> >> Like addm3 but is guaranteed to only be used for address calculations.
>>> >> The expanded code is not allowed to clobber the condition code. It
>>> >> only needs to be defined if addm3 sets the condition code.
>> >
>> > addm3 on aarch64 does not set the condition codes, so by this rule we
>> > shouldn't need to define this pattern.
> Hi Richard,
> I think that rule has a prerequisite that backend needs to support
> register shifted addition in addm3 pattern.  

addm3 is a named pattern and its format is well defined.  It does not
take a shifted operand and never has.

> Apparently for AArch64,
> addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
> "does not set the condition codes" actually, because both
> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.

You appear to be confusing named patterns (used by expand) with
recognizers.  Anyway, we have

(define_insn "*add_<shift>_<mode>"
  [(set (match_operand:GPI 0 "register_operand" "=r")
        (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
                              (match_operand:QI 2
"aarch64_shift_imm_<mode>" "n"))
                  (match_operand:GPI 3 "register_operand" "r")))]

Which is a non-flag setting add with shifted operand.

> Either way I think it is another backend issue, so do you approve that
> I commit this patch now?

Not yet.  I think there's something fundamental amiss here.

BTW, it looks to me as though addptr<m>3 should have exactly the same
operand rules as add<m>3 (documentation reads "like add<m>3"), so a
shifted operand shouldn't be supported there either.  If that isn't the
case then that should be clearly called out in the documentation.

R.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
  2015-11-24  9:59           ` Richard Earnshaw
@ 2015-11-24 10:21             ` Richard Earnshaw
  2015-11-24 13:13               ` Jiong Wang
  2015-12-01  3:19               ` Bin.Cheng
  2015-11-25  4:53             ` Bin.Cheng
  1 sibling, 2 replies; 18+ messages in thread
From: Richard Earnshaw @ 2015-11-24 10:21 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: James Greenhalgh, Bin Cheng, gcc-patches List

On 24/11/15 09:56, Richard Earnshaw wrote:
> On 24/11/15 02:51, Bin.Cheng wrote:
>>>> The aarch64's problem is we don't define addptr3 pattern, and we don't
>>>>>> have direct insn pattern describing the "x + y << z".  According to
>>>>>> gcc internal:
>>>>>>
>>>>>> ‘addptrm3’
>>>>>> Like addm3 but is guaranteed to only be used for address calculations.
>>>>>> The expanded code is not allowed to clobber the condition code. It
>>>>>> only needs to be defined if addm3 sets the condition code.
>>>>
>>>> addm3 on aarch64 does not set the condition codes, so by this rule we
>>>> shouldn't need to define this pattern.
>> Hi Richard,
>> I think that rule has a prerequisite that backend needs to support
>> register shifted addition in addm3 pattern.  
> 
> addm3 is a named pattern and its format is well defined.  It does not
> take a shifted operand and never has.
> 
>> Apparently for AArch64,
>> addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
>> "does not set the condition codes" actually, because both
>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
> 
> You appear to be confusing named patterns (used by expand) with
> recognizers.  Anyway, we have
> 
> (define_insn "*add_<shift>_<mode>"
>   [(set (match_operand:GPI 0 "register_operand" "=r")
>         (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>                               (match_operand:QI 2
> "aarch64_shift_imm_<mode>" "n"))
>                   (match_operand:GPI 3 "register_operand" "r")))]
> 
> Which is a non-flag setting add with shifted operand.
> 
>> Either way I think it is another backend issue, so do you approve that
>> I commit this patch now?
> 
> Not yet.  I think there's something fundamental amiss here.
> 
> BTW, it looks to me as though addptr<m>3 should have exactly the same
> operand rules as add<m>3 (documentation reads "like add<m>3"), so a
> shifted operand shouldn't be supported there either.  If that isn't the
> case then that should be clearly called out in the documentation.
> 
> R.
> 

PS.

I presume you are aware of the canonicalization rules for add?  That is,
for a shift-and-add operation, the shift operand must appear first.  Ie.

(plus (shift (op, op)), op)

not

(plus (op, (shift (op, op))

R.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
  2015-11-24 10:21             ` Richard Earnshaw
@ 2015-11-24 13:13               ` Jiong Wang
  2015-11-24 13:29                 ` Richard Earnshaw
  2015-12-01  3:19               ` Bin.Cheng
  1 sibling, 1 reply; 18+ messages in thread
From: Jiong Wang @ 2015-11-24 13:13 UTC (permalink / raw)
  To: Richard Earnshaw, Bin.Cheng; +Cc: James Greenhalgh, Bin Cheng, gcc-patches List



On 24/11/15 10:18, Richard Earnshaw wrote:
> I presume you are aware of the canonicalization rules for add?  That is,
> for a shift-and-add operation, the shift operand must appear first.  Ie.
>
> (plus (shift (op, op)), op)
>
> not
>
> (plus (op, (shift (op, op))
>
> R.

Looks to me it's not optimal to generate invalid mem addr, for example
(mem (plus reg, (mult reg, imm))) or even the simple (mem (plus (plus r, 
r), imm),
in the first place. Those complex rtx inside is hidden by the permissive
memory_operand predication, and only exposed during reload by stricter
constraints, then reload need to extra work. If we expose those complex rtx
earlier then some earlier rtl pass may find more optimization 
opportunities, for
example combine.

The following simple modification fix the ICE and generates best 
sequences to me:

-                 return gen_rtx_fmt_ee (PLUS, addr_mode, base, op1);
+                 addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base);
+                 emit_insn (gen_rtx_SET (base, addr));
+                 return base;

   67         add     x1, x29, 48
   68         add     x1, x1, x0, sxtw 3
   69         stlr    x19, [x1]

instead of

   67         add     x1, x29, 64
   68         add     x0, x1, x0, sxtw 3
   69         sub     x0, x0, #16
   70         stlr    x19, [x0]

or

   67         sxtw    x0, w0
   68         add     x1, x29, 48
   69         add     x1, x1, x0, sxtw 3
   70         stlr    x19, [x1]





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
  2015-11-24 13:13               ` Jiong Wang
@ 2015-11-24 13:29                 ` Richard Earnshaw
  2015-11-24 14:39                   ` Jiong Wang
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Earnshaw @ 2015-11-24 13:29 UTC (permalink / raw)
  To: Jiong Wang, Bin.Cheng; +Cc: James Greenhalgh, Bin Cheng, gcc-patches List

On 24/11/15 13:06, Jiong Wang wrote:
> 
> 
> On 24/11/15 10:18, Richard Earnshaw wrote:
>> I presume you are aware of the canonicalization rules for add?  That is,
>> for a shift-and-add operation, the shift operand must appear first.  Ie.
>>
>> (plus (shift (op, op)), op)
>>
>> not
>>
>> (plus (op, (shift (op, op))
>>
>> R.
> 
> Looks to me it's not optimal to generate invalid mem addr, for example
> (mem (plus reg, (mult reg, imm))) or even the simple (mem (plus (plus r,
> r), imm),
> in the first place. Those complex rtx inside is hidden by the permissive
> memory_operand predication, and only exposed during reload by stricter
> constraints, then reload need to extra work. If we expose those complex rtx
> earlier then some earlier rtl pass may find more optimization
> opportunities, for
> example combine.
> 
> The following simple modification fix the ICE and generates best
> sequences to me:
> 
> -                 return gen_rtx_fmt_ee (PLUS, addr_mode, base, op1);
> +                 addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base);
> +                 emit_insn (gen_rtx_SET (base, addr));
> +                 return base;
> 

That wouldn't be right either if op1 could be a const_int.

R.

>   67         add     x1, x29, 48
>   68         add     x1, x1, x0, sxtw 3
>   69         stlr    x19, [x1]
> 
> instead of
> 
>   67         add     x1, x29, 64
>   68         add     x0, x1, x0, sxtw 3
>   69         sub     x0, x0, #16
>   70         stlr    x19, [x0]
> 
> or
> 
>   67         sxtw    x0, w0
>   68         add     x1, x29, 48
>   69         add     x1, x1, x0, sxtw 3
>   70         stlr    x19, [x1]
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
  2015-11-24 13:29                 ` Richard Earnshaw
@ 2015-11-24 14:39                   ` Jiong Wang
  2015-11-24 14:55                     ` Richard Earnshaw
  0 siblings, 1 reply; 18+ messages in thread
From: Jiong Wang @ 2015-11-24 14:39 UTC (permalink / raw)
  To: Richard Earnshaw, Bin.Cheng; +Cc: James Greenhalgh, Bin Cheng, gcc-patches List



On 24/11/15 13:23, Richard Earnshaw wrote:
> On 24/11/15 13:06, Jiong Wang wrote:
>>
>> On 24/11/15 10:18, Richard Earnshaw wrote:
>>> I presume you are aware of the canonicalization rules for add?  That is,
>>> for a shift-and-add operation, the shift operand must appear first.  Ie.
>>>
>>> (plus (shift (op, op)), op)
>>>
>>> not
>>>
>>> (plus (op, (shift (op, op))
>>>
>>> R.
>> Looks to me it's not optimal to generate invalid mem addr, for example
>> (mem (plus reg, (mult reg, imm))) or even the simple (mem (plus (plus r,
>> r), imm),
>> in the first place. Those complex rtx inside is hidden by the permissive
>> memory_operand predication, and only exposed during reload by stricter
>> constraints, then reload need to extra work. If we expose those complex rtx
>> earlier then some earlier rtl pass may find more optimization
>> opportunities, for
>> example combine.
>>
>> The following simple modification fix the ICE and generates best
>> sequences to me:
>>
>> -                 return gen_rtx_fmt_ee (PLUS, addr_mode, base, op1);
>> +                 addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base);
>> +                 emit_insn (gen_rtx_SET (base, addr));
>> +                 return base;
>>
> That wouldn't be right either if op1 could be a const_int.

Indeed, though it would be strange to me if op1 would be const_int here, 
given
those early if check, if it is, then the incoming address rtx would be 
something
like the following without two const_int folded.

          +
         / \
        /   \
       +    const_int
      / \
     /   \
    Reg const_int

or we could sync the rtx checked in the early 
aarch64_legitimate_address_hook_p,
it will return false if op1 be const_int.

-             rtx addr = gen_rtx_fmt_ee (PLUS, addr_mode, base, op1);
+             rtx addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base);


>
> R.
>
>>    67         add     x1, x29, 48
>>    68         add     x1, x1, x0, sxtw 3
>>    69         stlr    x19, [x1]
>>
>> instead of
>>
>>    67         add     x1, x29, 64
>>    68         add     x0, x1, x0, sxtw 3
>>    69         sub     x0, x0, #16
>>    70         stlr    x19, [x0]
>>
>> or
>>
>>    67         sxtw    x0, w0
>>    68         add     x1, x29, 48
>>    69         add     x1, x1, x0, sxtw 3
>>    70         stlr    x19, [x1]
>>
>>
>>
>>
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
  2015-11-24 14:39                   ` Jiong Wang
@ 2015-11-24 14:55                     ` Richard Earnshaw
  0 siblings, 0 replies; 18+ messages in thread
From: Richard Earnshaw @ 2015-11-24 14:55 UTC (permalink / raw)
  To: Jiong Wang, Bin.Cheng; +Cc: James Greenhalgh, Bin Cheng, gcc-patches List

On 24/11/15 14:36, Jiong Wang wrote:
> 
> 
> On 24/11/15 13:23, Richard Earnshaw wrote:
>> On 24/11/15 13:06, Jiong Wang wrote:
>>>
>>> On 24/11/15 10:18, Richard Earnshaw wrote:
>>>> I presume you are aware of the canonicalization rules for add?  That
>>>> is,
>>>> for a shift-and-add operation, the shift operand must appear first. 
>>>> Ie.
>>>>
>>>> (plus (shift (op, op)), op)
>>>>
>>>> not
>>>>
>>>> (plus (op, (shift (op, op))
>>>>
>>>> R.
>>> Looks to me it's not optimal to generate invalid mem addr, for example
>>> (mem (plus reg, (mult reg, imm))) or even the simple (mem (plus (plus r,
>>> r), imm),
>>> in the first place. Those complex rtx inside is hidden by the permissive
>>> memory_operand predication, and only exposed during reload by stricter
>>> constraints, then reload need to extra work. If we expose those
>>> complex rtx
>>> earlier then some earlier rtl pass may find more optimization
>>> opportunities, for
>>> example combine.
>>>
>>> The following simple modification fix the ICE and generates best
>>> sequences to me:
>>>
>>> -                 return gen_rtx_fmt_ee (PLUS, addr_mode, base, op1);
>>> +                 addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base);
>>> +                 emit_insn (gen_rtx_SET (base, addr));
>>> +                 return base;
>>>
>> That wouldn't be right either if op1 could be a const_int.
> 
> Indeed, though it would be strange to me if op1 would be const_int here,
> given
> those early if check, if it is, then the incoming address rtx would be
> something
> like the following without two const_int folded.
> 
>          +
>         / \
>        /   \
>       +    const_int
>      / \
>     /   \
>    Reg const_int
> 
> or we could sync the rtx checked in the early
> aarch64_legitimate_address_hook_p,
> it will return false if op1 be const_int.
> 
> -             rtx addr = gen_rtx_fmt_ee (PLUS, addr_mode, base, op1);
> +             rtx addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base);
> 
> 

The safest thing is to insert a call to swap_commutative_operands_p and
then switch the order over if that returns true.

R.

>>
>> R.
>>
>>>    67         add     x1, x29, 48
>>>    68         add     x1, x1, x0, sxtw 3
>>>    69         stlr    x19, [x1]
>>>
>>> instead of
>>>
>>>    67         add     x1, x29, 64
>>>    68         add     x0, x1, x0, sxtw 3
>>>    69         sub     x0, x0, #16
>>>    70         stlr    x19, [x0]
>>>
>>> or
>>>
>>>    67         sxtw    x0, w0
>>>    68         add     x1, x29, 48
>>>    69         add     x1, x1, x0, sxtw 3
>>>    70         stlr    x19, [x1]
>>>
>>>
>>>
>>>
>>>
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
  2015-11-24  9:59           ` Richard Earnshaw
  2015-11-24 10:21             ` Richard Earnshaw
@ 2015-11-25  4:53             ` Bin.Cheng
  1 sibling, 0 replies; 18+ messages in thread
From: Bin.Cheng @ 2015-11-25  4:53 UTC (permalink / raw)
  To: Richard Earnshaw
  Cc: James Greenhalgh, Bin Cheng, gcc-patches List, Vladimir Makarov,
	Andreas.Krebbel

On Tue, Nov 24, 2015 at 5:56 PM, Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
> On 24/11/15 02:51, Bin.Cheng wrote:
>>>> The aarch64's problem is we don't define addptr3 pattern, and we don't
>>>> >> have direct insn pattern describing the "x + y << z".  According to
>>>> >> gcc internal:
>>>> >>
>>>> >> ‘addptrm3’
>>>> >> Like addm3 but is guaranteed to only be used for address calculations.
>>>> >> The expanded code is not allowed to clobber the condition code. It
>>>> >> only needs to be defined if addm3 sets the condition code.
>>> >
>>> > addm3 on aarch64 does not set the condition codes, so by this rule we
>>> > shouldn't need to define this pattern.
>> Hi Richard,
>> I think that rule has a prerequisite that backend needs to support
>> register shifted addition in addm3 pattern.
>
> addm3 is a named pattern and its format is well defined.  It does not
> take a shifted operand and never has.
>
>> Apparently for AArch64,
>> addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
>> "does not set the condition codes" actually, because both
>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
>
> You appear to be confusing named patterns (used by expand) with
> recognizers.  Anyway, we have
I understand the difference between expanding and recognizer.  I
missed below non-set-condition-flags insn patterns as you pointed out,
that's what I meant we don't support "does not set the condition
codes" wrto either expanding or recognizer.  Anyway, I was wrong and
we do have such nameless patterns.

>
> (define_insn "*add_<shift>_<mode>"
>   [(set (match_operand:GPI 0 "register_operand" "=r")
>         (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>                               (match_operand:QI 2
> "aarch64_shift_imm_<mode>" "n"))
>                   (match_operand:GPI 3 "register_operand" "r")))]
>
> Which is a non-flag setting add with shifted operand.
>
>> Either way I think it is another backend issue, so do you approve that
>> I commit this patch now?
>
> Not yet.  I think there's something fundamental amiss here.
>
> BTW, it looks to me as though addptr<m>3 should have exactly the same
> operand rules as add<m>3 (documentation reads "like add<m>3"), so a
> shifted operand shouldn't be supported there either.  If that isn't the
> case then that should be clearly called out in the documentation.
The direct cause of this ICE is LRA generates below insn sequence when
reloading "x=y+z*const" where the mult part is actually a register
scale operation:
    r1 = z*const
    x = y + r1
It generates mult directly but aarch64 doesn't have pattern describing
mult with constant.  We do have pattern for the corresponding shift
insn.  As a result, the constant mult insn is not recognized.  This
can be fixed by using force_operand to generate valid mult or shift
instructions, is it feasible in lra?
For this case, we can do better reload if we support scaled add in
addptr pattern.  After some
archaeological work, I found addptr is added for s390 which supports
addressing mode like "base + index + disp" and its variants.  Maybe
that's why the pattern's documentation doesn't explicitly describe the
scaled operation.  IMHO, this pattern should be target dependent and
be able to handle addressing modes that each target supports.  A side
proof is in code and comment of function lra_emit_add.  It explicitly
handles, describes scaled add address expression.

CCing Vlad and Andreas since they are the original authors and may help here.

Thanks,
bin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
  2015-11-24 10:21             ` Richard Earnshaw
  2015-11-24 13:13               ` Jiong Wang
@ 2015-12-01  3:19               ` Bin.Cheng
  2015-12-01 10:25                 ` Richard Earnshaw
  1 sibling, 1 reply; 18+ messages in thread
From: Bin.Cheng @ 2015-12-01  3:19 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: James Greenhalgh, Bin Cheng, gcc-patches List

[-- Attachment #1: Type: text/plain, Size: 3873 bytes --]

On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
> On 24/11/15 09:56, Richard Earnshaw wrote:
>> On 24/11/15 02:51, Bin.Cheng wrote:
>>>>> The aarch64's problem is we don't define addptr3 pattern, and we don't
>>>>>>> have direct insn pattern describing the "x + y << z".  According to
>>>>>>> gcc internal:
>>>>>>>
>>>>>>> ‘addptrm3’
>>>>>>> Like addm3 but is guaranteed to only be used for address calculations.
>>>>>>> The expanded code is not allowed to clobber the condition code. It
>>>>>>> only needs to be defined if addm3 sets the condition code.
>>>>>
>>>>> addm3 on aarch64 does not set the condition codes, so by this rule we
>>>>> shouldn't need to define this pattern.
>>> Hi Richard,
>>> I think that rule has a prerequisite that backend needs to support
>>> register shifted addition in addm3 pattern.
>>
>> addm3 is a named pattern and its format is well defined.  It does not
>> take a shifted operand and never has.
>>
>>> Apparently for AArch64,
>>> addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
>>> "does not set the condition codes" actually, because both
>>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
>>
>> You appear to be confusing named patterns (used by expand) with
>> recognizers.  Anyway, we have
>>
>> (define_insn "*add_<shift>_<mode>"
>>   [(set (match_operand:GPI 0 "register_operand" "=r")
>>         (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>>                               (match_operand:QI 2
>> "aarch64_shift_imm_<mode>" "n"))
>>                   (match_operand:GPI 3 "register_operand" "r")))]
>>
>> Which is a non-flag setting add with shifted operand.
>>
>>> Either way I think it is another backend issue, so do you approve that
>>> I commit this patch now?
>>
>> Not yet.  I think there's something fundamental amiss here.
>>
>> BTW, it looks to me as though addptr<m>3 should have exactly the same
>> operand rules as add<m>3 (documentation reads "like add<m>3"), so a
>> shifted operand shouldn't be supported there either.  If that isn't the
>> case then that should be clearly called out in the documentation.
>>
>> R.
>>
>
> PS.
>
> I presume you are aware of the canonicalization rules for add?  That is,
> for a shift-and-add operation, the shift operand must appear first.  Ie.
>
> (plus (shift (op, op)), op)
>
> not
>
> (plus (op, (shift (op, op))

Hi Richard,
Thanks for the comments.  I realized that the not-recognized insn
issue is because the original patch build non-canonical expressions.
When reloading address expression, LRA generates non-canonical
register scaled insn, which can't be recognized by aarch64 backend.

Here is the updated patch using canonical form pattern,  it passes
bootstrap and regression test.  Well, the ivo failure still exists,
but it analyzed in the original message.

Is this patch OK?

As for Jiong's concern about the additional extension instruction, I
think this only stands for atmoic load store instructions.  For
general load store, AArch64 supports zext/sext in register scaling
addressing mode, the additional instruction can be forward propagated
into memory reference.  The problem for atomic load store is AArch64
only supports direct register addressing mode.  After LRA reloads
address expression out of memory reference, there is no combine/fwprop
optimizer to merge instructions.  The problem is atomic_store's
predicate doesn't match its constraint.   The predicate used for
atomic_store<mode> is memory_operand, while all other atomic patterns
use aarch64_sync_memory_operand.  I think this might be a typo.  With
this change, expand will not generate addressing mode requiring reload
anymore.  I will test another patch fixing this.

Thanks,
bin
>
> R.

[-- Attachment #2: aarch64_legitimize_addr-20151128.txt --]
[-- Type: text/plain, Size: 2564 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3fe2f0f..5b3e3c4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
      We try to pick as large a range for the offset as possible to
      maximize the chance of a CSE.  However, for aligned addresses
      we limit the range to 4k so that structures with different sized
-     elements are likely to use the same base.  */
+     elements are likely to use the same base.  We need to be careful
+     not split CONST for some forms address expressions, otherwise it
+     will generate sub-optimal code.  */
 
   if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
     {
       HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
       HOST_WIDE_INT base_offset;
 
+      if (GET_CODE (XEXP (x, 0)) == PLUS)
+	{
+	  rtx op0 = XEXP (XEXP (x, 0), 0);
+	  rtx op1 = XEXP (XEXP (x, 0), 1);
+
+	  /* For addr expression in the form like "r1 + r2 + 0x3ffc".
+	     Since the offset is within range supported by addressing
+	     mode "reg+offset", we don't split the const and legalize
+	     it into below insn and expr sequence:
+	       r3 = r1 + r2;
+	       "r3 + 0x3ffc".  */
+	  if (REG_P (op0) && REG_P (op1))
+	    {
+	      machine_mode addr_mode = GET_MODE (x);
+	      rtx base = gen_reg_rtx (addr_mode);
+	      rtx addr = plus_constant (addr_mode, base, offset);
+
+	      if (aarch64_legitimate_address_hook_p (mode, addr, false))
+		{
+		  emit_insn (gen_adddi3 (base, op0, op1));
+		  return addr;
+		}
+	    }
+	  /* For addr expression in the form like "r1 + r2<<2 + 0x3ffc".
+	     Live above, we don't split the const and legalize it into
+	     below insn and expr sequence:
+	       r3 = 0x3ffc;
+	       r4 = r1 + r3;
+	       "r4 + r2<<2".  */
+	  else if (REG_P (op0) || REG_P (op1))
+	    {
+	      machine_mode addr_mode = GET_MODE (x);
+	      rtx base = gen_reg_rtx (addr_mode);
+
+	      /* Switch to make sure that register is in op0.  */
+	      if (REG_P (op1))
+		std::swap (op0, op1);
+
+	      rtx addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base);
+
+	      if (aarch64_legitimate_address_hook_p (mode, addr, false))
+		{
+		  base = force_operand (plus_constant (addr_mode,
+						       op0, offset),
+					NULL_RTX);
+		  return gen_rtx_fmt_ee (PLUS, addr_mode, op1, base);
+		}
+	    }
+	}
+
       /* Does it look like we'll need a load/store-pair operation?  */
       if (GET_MODE_SIZE (mode) > 16
 	  || mode == TImode)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
  2015-12-01  3:19               ` Bin.Cheng
@ 2015-12-01 10:25                 ` Richard Earnshaw
  2015-12-03  5:26                   ` Bin.Cheng
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Earnshaw @ 2015-12-01 10:25 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: James Greenhalgh, Bin Cheng, gcc-patches List

On 01/12/15 03:19, Bin.Cheng wrote:
> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw
> <Richard.Earnshaw@foss.arm.com> wrote:
>> On 24/11/15 09:56, Richard Earnshaw wrote:
>>> On 24/11/15 02:51, Bin.Cheng wrote:
>>>>>> The aarch64's problem is we don't define addptr3 pattern, and we don't
>>>>>>>> have direct insn pattern describing the "x + y << z".  According to
>>>>>>>> gcc internal:
>>>>>>>>
>>>>>>>> ‘addptrm3’
>>>>>>>> Like addm3 but is guaranteed to only be used for address calculations.
>>>>>>>> The expanded code is not allowed to clobber the condition code. It
>>>>>>>> only needs to be defined if addm3 sets the condition code.
>>>>>>
>>>>>> addm3 on aarch64 does not set the condition codes, so by this rule we
>>>>>> shouldn't need to define this pattern.
>>>> Hi Richard,
>>>> I think that rule has a prerequisite that backend needs to support
>>>> register shifted addition in addm3 pattern.
>>>
>>> addm3 is a named pattern and its format is well defined.  It does not
>>> take a shifted operand and never has.
>>>
>>>> Apparently for AArch64,
>>>> addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
>>>> "does not set the condition codes" actually, because both
>>>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
>>>
>>> You appear to be confusing named patterns (used by expand) with
>>> recognizers.  Anyway, we have
>>>
>>> (define_insn "*add_<shift>_<mode>"
>>>   [(set (match_operand:GPI 0 "register_operand" "=r")
>>>         (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>>>                               (match_operand:QI 2
>>> "aarch64_shift_imm_<mode>" "n"))
>>>                   (match_operand:GPI 3 "register_operand" "r")))]
>>>
>>> Which is a non-flag setting add with shifted operand.
>>>
>>>> Either way I think it is another backend issue, so do you approve that
>>>> I commit this patch now?
>>>
>>> Not yet.  I think there's something fundamental amiss here.
>>>
>>> BTW, it looks to me as though addptr<m>3 should have exactly the same
>>> operand rules as add<m>3 (documentation reads "like add<m>3"), so a
>>> shifted operand shouldn't be supported there either.  If that isn't the
>>> case then that should be clearly called out in the documentation.
>>>
>>> R.
>>>
>>
>> PS.
>>
>> I presume you are aware of the canonicalization rules for add?  That is,
>> for a shift-and-add operation, the shift operand must appear first.  Ie.
>>
>> (plus (shift (op, op)), op)
>>
>> not
>>
>> (plus (op, (shift (op, op))
> 
> Hi Richard,
> Thanks for the comments.  I realized that the not-recognized insn
> issue is because the original patch build non-canonical expressions.
> When reloading address expression, LRA generates non-canonical
> register scaled insn, which can't be recognized by aarch64 backend.
> 
> Here is the updated patch using canonical form pattern,  it passes
> bootstrap and regression test.  Well, the ivo failure still exists,
> but it analyzed in the original message.
> 
> Is this patch OK?
> 
> As for Jiong's concern about the additional extension instruction, I
> think this only stands for atmoic load store instructions.  For
> general load store, AArch64 supports zext/sext in register scaling
> addressing mode, the additional instruction can be forward propagated
> into memory reference.  The problem for atomic load store is AArch64
> only supports direct register addressing mode.  After LRA reloads
> address expression out of memory reference, there is no combine/fwprop
> optimizer to merge instructions.  The problem is atomic_store's
> predicate doesn't match its constraint.   The predicate used for
> atomic_store<mode> is memory_operand, while all other atomic patterns
> use aarch64_sync_memory_operand.  I think this might be a typo.  With
> this change, expand will not generate addressing mode requiring reload
> anymore.  I will test another patch fixing this.
> 
> Thanks,
> bin

Some comments inline.

>>
>> R.
>>
>> aarch64_legitimize_addr-20151128.txt
>>
>>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 3fe2f0f..5b3e3c4 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
>>       We try to pick as large a range for the offset as possible to
>>       maximize the chance of a CSE.  However, for aligned addresses
>>       we limit the range to 4k so that structures with different sized
>> -     elements are likely to use the same base.  */
>> +     elements are likely to use the same base.  We need to be careful
>> +     not split CONST for some forms address expressions, otherwise it

not to split a CONST for some forms of address expression,

>> +     will generate sub-optimal code.  */
>>  
>>    if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
>>      {
>>        HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>>        HOST_WIDE_INT base_offset;
>>  
>> +      if (GET_CODE (XEXP (x, 0)) == PLUS)
>> +	{
>> +	  rtx op0 = XEXP (XEXP (x, 0), 0);
>> +	  rtx op1 = XEXP (XEXP (x, 0), 1);
>> +
>> +	  /* For addr expression in the form like "r1 + r2 + 0x3ffc".
>> +	     Since the offset is within range supported by addressing
>> +	     mode "reg+offset", we don't split the const and legalize
>> +	     it into below insn and expr sequence:
>> +	       r3 = r1 + r2;
>> +	       "r3 + 0x3ffc".  */

I think this comment would read better as

	/* Address expressions of the form Ra + Rb + CONST.

	   If CONST is within the range supported by the addressing
	   mode "reg+offset", do not split CONST and use the
	   sequence
		Rt = Ra + Rb
		addr = Rt + CONST.  */

>> +	  if (REG_P (op0) && REG_P (op1))
>> +	    {
>> +	      machine_mode addr_mode = GET_MODE (x);
>> +	      rtx base = gen_reg_rtx (addr_mode);
>> +	      rtx addr = plus_constant (addr_mode, base, offset);
>> +
>> +	      if (aarch64_legitimate_address_hook_p (mode, addr, false))
>> +		{
>> +		  emit_insn (gen_adddi3 (base, op0, op1));
>> +		  return addr;
>> +		}
>> +	    }
>> +	  /* For addr expression in the form like "r1 + r2<<2 + 0x3ffc".
>> +	     Live above, we don't split the const and legalize it into
>> +	     below insn and expr sequence:

Similarly.
>> +	       r3 = 0x3ffc;
>> +	       r4 = r1 + r3;
>> +	       "r4 + r2<<2".  */

Why don't we generate

  r3 = r1 + r2 << 2
  r4 = r3 + 0x3ffc

utilizing the shift-and-add instructions?

>> +	  else if (REG_P (op0) || REG_P (op1))
>> +	    {
>> +	      machine_mode addr_mode = GET_MODE (x);
>> +	      rtx base = gen_reg_rtx (addr_mode);
>> +
>> +	      /* Switch to make sure that register is in op0.  */
>> +	      if (REG_P (op1))
>> +		std::swap (op0, op1);
>> +
>> +	      rtx addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base);
Use gen_rtx_PLUS.

>> +
>> +	      if (aarch64_legitimate_address_hook_p (mode, addr, false))
>> +		{
>> +		  base = force_operand (plus_constant (addr_mode,
>> +						       op0, offset),
>> +					NULL_RTX);
>> +		  return gen_rtx_fmt_ee (PLUS, addr_mode, op1, base);
Likewise.

>> +		}
>> +	    }
>> +	}
>> +
>>        /* Does it look like we'll need a load/store-pair operation?  */
>>        if (GET_MODE_SIZE (mode) > 16
>>  	  || mode == TImode)

R.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
  2015-12-01 10:25                 ` Richard Earnshaw
@ 2015-12-03  5:26                   ` Bin.Cheng
  2015-12-03 10:26                     ` Richard Earnshaw
  0 siblings, 1 reply; 18+ messages in thread
From: Bin.Cheng @ 2015-12-03  5:26 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: James Greenhalgh, Bin Cheng, gcc-patches List

[-- Attachment #1: Type: text/plain, Size: 7529 bytes --]

On Tue, Dec 1, 2015 at 6:25 PM, Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
> On 01/12/15 03:19, Bin.Cheng wrote:
>> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw
>> <Richard.Earnshaw@foss.arm.com> wrote:
>>> On 24/11/15 09:56, Richard Earnshaw wrote:
>>>> On 24/11/15 02:51, Bin.Cheng wrote:
>>>>>>> The aarch64's problem is we don't define addptr3 pattern, and we don't
>>>>>>>>> have direct insn pattern describing the "x + y << z".  According to
>>>>>>>>> gcc internal:
>>>>>>>>>
>>>>>>>>> ‘addptrm3’
>>>>>>>>> Like addm3 but is guaranteed to only be used for address calculations.
>>>>>>>>> The expanded code is not allowed to clobber the condition code. It
>>>>>>>>> only needs to be defined if addm3 sets the condition code.
>>>>>>>
>>>>>>> addm3 on aarch64 does not set the condition codes, so by this rule we
>>>>>>> shouldn't need to define this pattern.
>>>>> Hi Richard,
>>>>> I think that rule has a prerequisite that backend needs to support
>>>>> register shifted addition in addm3 pattern.
>>>>
>>>> addm3 is a named pattern and its format is well defined.  It does not
>>>> take a shifted operand and never has.
>>>>
>>>>> Apparently for AArch64,
>>>>> addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
>>>>> "does not set the condition codes" actually, because both
>>>>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
>>>>
>>>> You appear to be confusing named patterns (used by expand) with
>>>> recognizers.  Anyway, we have
>>>>
>>>> (define_insn "*add_<shift>_<mode>"
>>>>   [(set (match_operand:GPI 0 "register_operand" "=r")
>>>>         (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>>>>                               (match_operand:QI 2
>>>> "aarch64_shift_imm_<mode>" "n"))
>>>>                   (match_operand:GPI 3 "register_operand" "r")))]
>>>>
>>>> Which is a non-flag setting add with shifted operand.
>>>>
>>>>> Either way I think it is another backend issue, so do you approve that
>>>>> I commit this patch now?
>>>>
>>>> Not yet.  I think there's something fundamental amiss here.
>>>>
>>>> BTW, it looks to me as though addptr<m>3 should have exactly the same
>>>> operand rules as add<m>3 (documentation reads "like add<m>3"), so a
>>>> shifted operand shouldn't be supported there either.  If that isn't the
>>>> case then that should be clearly called out in the documentation.
>>>>
>>>> R.
>>>>
>>>
>>> PS.
>>>
>>> I presume you are aware of the canonicalization rules for add?  That is,
>>> for a shift-and-add operation, the shift operand must appear first.  Ie.
>>>
>>> (plus (shift (op, op)), op)
>>>
>>> not
>>>
>>> (plus (op, (shift (op, op))
>>
>> Hi Richard,
>> Thanks for the comments.  I realized that the not-recognized insn
>> issue is because the original patch build non-canonical expressions.
>> When reloading address expression, LRA generates non-canonical
>> register scaled insn, which can't be recognized by aarch64 backend.
>>
>> Here is the updated patch using canonical form pattern,  it passes
>> bootstrap and regression test.  Well, the ivo failure still exists,
>> but it analyzed in the original message.
>>
>> Is this patch OK?
>>
>> As for Jiong's concern about the additional extension instruction, I
>> think this only stands for atmoic load store instructions.  For
>> general load store, AArch64 supports zext/sext in register scaling
>> addressing mode, the additional instruction can be forward propagated
>> into memory reference.  The problem for atomic load store is AArch64
>> only supports direct register addressing mode.  After LRA reloads
>> address expression out of memory reference, there is no combine/fwprop
>> optimizer to merge instructions.  The problem is atomic_store's
>> predicate doesn't match its constraint.   The predicate used for
>> atomic_store<mode> is memory_operand, while all other atomic patterns
>> use aarch64_sync_memory_operand.  I think this might be a typo.  With
>> this change, expand will not generate addressing mode requiring reload
>> anymore.  I will test another patch fixing this.
>>
>> Thanks,
>> bin
>
> Some comments inline.
>
>>>
>>> R.
>>>
>>> aarch64_legitimize_addr-20151128.txt
>>>
>>>
>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>> index 3fe2f0f..5b3e3c4 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
>>>       We try to pick as large a range for the offset as possible to
>>>       maximize the chance of a CSE.  However, for aligned addresses
>>>       we limit the range to 4k so that structures with different sized
>>> -     elements are likely to use the same base.  */
>>> +     elements are likely to use the same base.  We need to be careful
>>> +     not split CONST for some forms address expressions, otherwise it
>
> not to split a CONST for some forms of address expression,
>
>>> +     will generate sub-optimal code.  */
>>>
>>>    if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
>>>      {
>>>        HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>>>        HOST_WIDE_INT base_offset;
>>>
>>> +      if (GET_CODE (XEXP (x, 0)) == PLUS)
>>> +    {
>>> +      rtx op0 = XEXP (XEXP (x, 0), 0);
>>> +      rtx op1 = XEXP (XEXP (x, 0), 1);
>>> +
>>> +      /* For addr expression in the form like "r1 + r2 + 0x3ffc".
>>> +         Since the offset is within range supported by addressing
>>> +         mode "reg+offset", we don't split the const and legalize
>>> +         it into below insn and expr sequence:
>>> +           r3 = r1 + r2;
>>> +           "r3 + 0x3ffc".  */
>
> I think this comment would read better as
>
>         /* Address expressions of the form Ra + Rb + CONST.
>
>            If CONST is within the range supported by the addressing
>            mode "reg+offset", do not split CONST and use the
>            sequence
>                 Rt = Ra + Rb
>                 addr = Rt + CONST.  */
>
>>> +      if (REG_P (op0) && REG_P (op1))
>>> +        {
>>> +          machine_mode addr_mode = GET_MODE (x);
>>> +          rtx base = gen_reg_rtx (addr_mode);
>>> +          rtx addr = plus_constant (addr_mode, base, offset);
>>> +
>>> +          if (aarch64_legitimate_address_hook_p (mode, addr, false))
>>> +            {
>>> +              emit_insn (gen_adddi3 (base, op0, op1));
>>> +              return addr;
>>> +            }
>>> +        }
>>> +      /* For addr expression in the form like "r1 + r2<<2 + 0x3ffc".
>>> +         Live above, we don't split the const and legalize it into
>>> +         below insn and expr sequence:
>
> Similarly.
>>> +           r3 = 0x3ffc;
>>> +           r4 = r1 + r3;
>>> +           "r4 + r2<<2".  */
>
> Why don't we generate
>
>   r3 = r1 + r2 << 2
>   r4 = r3 + 0x3ffc
>
> utilizing the shift-and-add instructions?

All other comments are addressed in the attached new patch.
As for this question, Wilco also asked it on internal channel before.
The main idea is to depend on GIMPLE IVO/SLSR to find CSE
opportunities of the scaled plus sub expr.  The scaled index is most
likely loop iv, so I would like to split const plus out of memory
reference so that it can be identified/hoisted as loop invariant.
This is more important when base is sfp related.

Thanks,
bin

[-- Attachment #2: aarch64_legitimize_addr-20151203.txt --]
[-- Type: text/plain, Size: 2525 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3fe2f0f..248937e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4757,13 +4757,67 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
      We try to pick as large a range for the offset as possible to
      maximize the chance of a CSE.  However, for aligned addresses
      we limit the range to 4k so that structures with different sized
-     elements are likely to use the same base.  */
+     elements are likely to use the same base.  We need to be careful
+     not to split a CONST for some forms of address expression, otherwise
+     it will generate sub-optimal code.  */
 
   if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
     {
       HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
       HOST_WIDE_INT base_offset;
 
+      if (GET_CODE (XEXP (x, 0)) == PLUS)
+	{
+	  rtx op0 = XEXP (XEXP (x, 0), 0);
+	  rtx op1 = XEXP (XEXP (x, 0), 1);
+
+	  /* Address expressions of the form Ra + Rb + CONST.
+
+	     If CONST is within the range supported by the addressing
+	     mode "reg+offset", do not split CONST and use the
+	     sequence
+	       Rt = Ra + Rb;
+	       addr = Rt + CONST.  */
+	  if (REG_P (op0) && REG_P (op1))
+	    {
+	      machine_mode addr_mode = GET_MODE (x);
+	      rtx base = gen_reg_rtx (addr_mode);
+	      rtx addr = plus_constant (addr_mode, base, offset);
+
+	      if (aarch64_legitimate_address_hook_p (mode, addr, false))
+		{
+		  emit_insn (gen_adddi3 (base, op0, op1));
+		  return addr;
+		}
+	    }
+	  /* Address expressions of the form Ra + Rb<<SCALE + CONST.
+
+	     If Reg + Rb<<SCALE is a valid address expression, do not
+	     split CONST and use the sequence
+	       Rc = CONST;
+	       Rt = Ra + Rc;
+	       addr = Rt + Rb<<SCALE.  */
+	  else if (REG_P (op0) || REG_P (op1))
+	    {
+	      machine_mode addr_mode = GET_MODE (x);
+	      rtx base = gen_reg_rtx (addr_mode);
+
+	      /* Switch to make sure that register is in op0.  */
+	      if (REG_P (op1))
+		std::swap (op0, op1);
+
+	      rtx addr = gen_rtx_PLUS (addr_mode, op1, base);
+
+	      if (aarch64_legitimate_address_hook_p (mode, addr, false))
+		{
+		  base = force_operand (plus_constant (addr_mode,
+						       op0, offset),
+					NULL_RTX);
+		  return gen_rtx_PLUS (addr_mode, op1, base);
+		}
+	    }
+	}
+
       /* Does it look like we'll need a load/store-pair operation?  */
       if (GET_MODE_SIZE (mode) > 16
 	  || mode == TImode)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
  2015-12-03  5:26                   ` Bin.Cheng
@ 2015-12-03 10:26                     ` Richard Earnshaw
  2015-12-04  3:18                       ` Bin.Cheng
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Earnshaw @ 2015-12-03 10:26 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: James Greenhalgh, Bin Cheng, gcc-patches List

On 03/12/15 05:26, Bin.Cheng wrote:
> On Tue, Dec 1, 2015 at 6:25 PM, Richard Earnshaw
> <Richard.Earnshaw@foss.arm.com> wrote:
>> On 01/12/15 03:19, Bin.Cheng wrote:
>>> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw
>>> <Richard.Earnshaw@foss.arm.com> wrote:
>>>> On 24/11/15 09:56, Richard Earnshaw wrote:
>>>>> On 24/11/15 02:51, Bin.Cheng wrote:
>>>>>>>> The aarch64's problem is we don't define addptr3 pattern, and we don't
>>>>>>>>>> have direct insn pattern describing the "x + y << z".  According to
>>>>>>>>>> gcc internal:
>>>>>>>>>>
>>>>>>>>>> ‘addptrm3’
>>>>>>>>>> Like addm3 but is guaranteed to only be used for address calculations.
>>>>>>>>>> The expanded code is not allowed to clobber the condition code. It
>>>>>>>>>> only needs to be defined if addm3 sets the condition code.
>>>>>>>>
>>>>>>>> addm3 on aarch64 does not set the condition codes, so by this rule we
>>>>>>>> shouldn't need to define this pattern.
>>>>>> Hi Richard,
>>>>>> I think that rule has a prerequisite that backend needs to support
>>>>>> register shifted addition in addm3 pattern.
>>>>>
>>>>> addm3 is a named pattern and its format is well defined.  It does not
>>>>> take a shifted operand and never has.
>>>>>
>>>>>> Apparently for AArch64,
>>>>>> addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
>>>>>> "does not set the condition codes" actually, because both
>>>>>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
>>>>>
>>>>> You appear to be confusing named patterns (used by expand) with
>>>>> recognizers.  Anyway, we have
>>>>>
>>>>> (define_insn "*add_<shift>_<mode>"
>>>>>   [(set (match_operand:GPI 0 "register_operand" "=r")
>>>>>         (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>>>>>                               (match_operand:QI 2
>>>>> "aarch64_shift_imm_<mode>" "n"))
>>>>>                   (match_operand:GPI 3 "register_operand" "r")))]
>>>>>
>>>>> Which is a non-flag setting add with shifted operand.
>>>>>
>>>>>> Either way I think it is another backend issue, so do you approve that
>>>>>> I commit this patch now?
>>>>>
>>>>> Not yet.  I think there's something fundamental amiss here.
>>>>>
>>>>> BTW, it looks to me as though addptr<m>3 should have exactly the same
>>>>> operand rules as add<m>3 (documentation reads "like add<m>3"), so a
>>>>> shifted operand shouldn't be supported there either.  If that isn't the
>>>>> case then that should be clearly called out in the documentation.
>>>>>
>>>>> R.
>>>>>
>>>>
>>>> PS.
>>>>
>>>> I presume you are aware of the canonicalization rules for add?  That is,
>>>> for a shift-and-add operation, the shift operand must appear first.  Ie.
>>>>
>>>> (plus (shift (op, op)), op)
>>>>
>>>> not
>>>>
>>>> (plus (op, (shift (op, op))
>>>
>>> Hi Richard,
>>> Thanks for the comments.  I realized that the not-recognized insn
>>> issue is because the original patch build non-canonical expressions.
>>> When reloading address expression, LRA generates non-canonical
>>> register scaled insn, which can't be recognized by aarch64 backend.
>>>
>>> Here is the updated patch using canonical form pattern,  it passes
>>> bootstrap and regression test.  Well, the ivo failure still exists,
>>> but it analyzed in the original message.
>>>
>>> Is this patch OK?
>>>
>>> As for Jiong's concern about the additional extension instruction, I
>>> think this only stands for atmoic load store instructions.  For
>>> general load store, AArch64 supports zext/sext in register scaling
>>> addressing mode, the additional instruction can be forward propagated
>>> into memory reference.  The problem for atomic load store is AArch64
>>> only supports direct register addressing mode.  After LRA reloads
>>> address expression out of memory reference, there is no combine/fwprop
>>> optimizer to merge instructions.  The problem is atomic_store's
>>> predicate doesn't match its constraint.   The predicate used for
>>> atomic_store<mode> is memory_operand, while all other atomic patterns
>>> use aarch64_sync_memory_operand.  I think this might be a typo.  With
>>> this change, expand will not generate addressing mode requiring reload
>>> anymore.  I will test another patch fixing this.
>>>
>>> Thanks,
>>> bin
>>
>> Some comments inline.
>>
>>>>
>>>> R.
>>>>
>>>> aarch64_legitimize_addr-20151128.txt
>>>>
>>>>
>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>>> index 3fe2f0f..5b3e3c4 100644
>>>> --- a/gcc/config/aarch64/aarch64.c
>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
>>>>       We try to pick as large a range for the offset as possible to
>>>>       maximize the chance of a CSE.  However, for aligned addresses
>>>>       we limit the range to 4k so that structures with different sized
>>>> -     elements are likely to use the same base.  */
>>>> +     elements are likely to use the same base.  We need to be careful
>>>> +     not split CONST for some forms address expressions, otherwise it
>>
>> not to split a CONST for some forms of address expression,
>>
>>>> +     will generate sub-optimal code.  */
>>>>
>>>>    if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
>>>>      {
>>>>        HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>>>>        HOST_WIDE_INT base_offset;
>>>>
>>>> +      if (GET_CODE (XEXP (x, 0)) == PLUS)
>>>> +    {
>>>> +      rtx op0 = XEXP (XEXP (x, 0), 0);
>>>> +      rtx op1 = XEXP (XEXP (x, 0), 1);
>>>> +
>>>> +      /* For addr expression in the form like "r1 + r2 + 0x3ffc".
>>>> +         Since the offset is within range supported by addressing
>>>> +         mode "reg+offset", we don't split the const and legalize
>>>> +         it into below insn and expr sequence:
>>>> +           r3 = r1 + r2;
>>>> +           "r3 + 0x3ffc".  */
>>
>> I think this comment would read better as
>>
>>         /* Address expressions of the form Ra + Rb + CONST.
>>
>>            If CONST is within the range supported by the addressing
>>            mode "reg+offset", do not split CONST and use the
>>            sequence
>>                 Rt = Ra + Rb
>>                 addr = Rt + CONST.  */
>>
>>>> +      if (REG_P (op0) && REG_P (op1))
>>>> +        {
>>>> +          machine_mode addr_mode = GET_MODE (x);
>>>> +          rtx base = gen_reg_rtx (addr_mode);
>>>> +          rtx addr = plus_constant (addr_mode, base, offset);
>>>> +
>>>> +          if (aarch64_legitimate_address_hook_p (mode, addr, false))
>>>> +            {
>>>> +              emit_insn (gen_adddi3 (base, op0, op1));
>>>> +              return addr;
>>>> +            }
>>>> +        }
>>>> +      /* For addr expression in the form like "r1 + r2<<2 + 0x3ffc".
>>>> +         Live above, we don't split the const and legalize it into
>>>> +         below insn and expr sequence:
>>
>> Similarly.
>>>> +           r3 = 0x3ffc;
>>>> +           r4 = r1 + r3;
>>>> +           "r4 + r2<<2".  */
>>
>> Why don't we generate
>>
>>   r3 = r1 + r2 << 2
>>   r4 = r3 + 0x3ffc
>>
>> utilizing the shift-and-add instructions?
> 
> All other comments are addressed in the attached new patch.
> As for this question, Wilco also asked it on internal channel before.
> The main idea is to depend on GIMPLE IVO/SLSR to find CSE
> opportunities of the scaled plus sub expr.  The scaled index is most
> likely loop iv, so I would like to split const plus out of memory
> reference so that it can be identified/hoisted as loop invariant.
> This is more important when base is sfp related.
> 

Ah, yes.  The SFP problem.

Since at least two people have queried this, it's clearly non-obvious
enough to require explanation in comment.

OK with that change and a suitable changelog entry.

R.

> Thanks,
> bin
> 
> 
> aarch64_legitimize_addr-20151203.txt
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 3fe2f0f..248937e 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -4757,13 +4757,67 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
>       We try to pick as large a range for the offset as possible to
>       maximize the chance of a CSE.  However, for aligned addresses
>       we limit the range to 4k so that structures with different sized
> -     elements are likely to use the same base.  */
> +     elements are likely to use the same base.  We need to be careful
> +     not to split a CONST for some forms of address expression, otherwise
> +     it will generate sub-optimal code.  */
>  
>    if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
>      {
>        HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>        HOST_WIDE_INT base_offset;
>  
> +      if (GET_CODE (XEXP (x, 0)) == PLUS)
> +	{
> +	  rtx op0 = XEXP (XEXP (x, 0), 0);
> +	  rtx op1 = XEXP (XEXP (x, 0), 1);
> +
> +	  /* Address expressions of the form Ra + Rb + CONST.
> +
> +	     If CONST is within the range supported by the addressing
> +	     mode "reg+offset", do not split CONST and use the
> +	     sequence
> +	       Rt = Ra + Rb;
> +	       addr = Rt + CONST.  */
> +	  if (REG_P (op0) && REG_P (op1))
> +	    {
> +	      machine_mode addr_mode = GET_MODE (x);
> +	      rtx base = gen_reg_rtx (addr_mode);
> +	      rtx addr = plus_constant (addr_mode, base, offset);
> +
> +	      if (aarch64_legitimate_address_hook_p (mode, addr, false))
> +		{
> +		  emit_insn (gen_adddi3 (base, op0, op1));
> +		  return addr;
> +		}
> +	    }
> +	  /* Address expressions of the form Ra + Rb<<SCALE + CONST.
> +
> +	     If Reg + Rb<<SCALE is a valid address expression, do not
> +	     split CONST and use the sequence
> +	       Rc = CONST;
> +	       Rt = Ra + Rc;
> +	       addr = Rt + Rb<<SCALE.  */
> +	  else if (REG_P (op0) || REG_P (op1))
> +	    {
> +	      machine_mode addr_mode = GET_MODE (x);
> +	      rtx base = gen_reg_rtx (addr_mode);
> +
> +	      /* Switch to make sure that register is in op0.  */
> +	      if (REG_P (op1))
> +		std::swap (op0, op1);
> +
> +	      rtx addr = gen_rtx_PLUS (addr_mode, op1, base);
> +
> +	      if (aarch64_legitimate_address_hook_p (mode, addr, false))
> +		{
> +		  base = force_operand (plus_constant (addr_mode,
> +						       op0, offset),
> +					NULL_RTX);
> +		  return gen_rtx_PLUS (addr_mode, op1, base);
> +		}
> +	    }
> +	}
> +
>        /* Does it look like we'll need a load/store-pair operation?  */
>        if (GET_MODE_SIZE (mode) > 16
>  	  || mode == TImode)
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
  2015-12-03 10:26                     ` Richard Earnshaw
@ 2015-12-04  3:18                       ` Bin.Cheng
  0 siblings, 0 replies; 18+ messages in thread
From: Bin.Cheng @ 2015-12-04  3:18 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: James Greenhalgh, Bin Cheng, gcc-patches List

[-- Attachment #1: Type: text/plain, Size: 8588 bytes --]

On Thu, Dec 3, 2015 at 6:26 PM, Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
> On 03/12/15 05:26, Bin.Cheng wrote:
>> On Tue, Dec 1, 2015 at 6:25 PM, Richard Earnshaw
>> <Richard.Earnshaw@foss.arm.com> wrote:
>>> On 01/12/15 03:19, Bin.Cheng wrote:
>>>> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw
>>>> <Richard.Earnshaw@foss.arm.com> wrote:
>>>>> On 24/11/15 09:56, Richard Earnshaw wrote:
>>>>>> On 24/11/15 02:51, Bin.Cheng wrote:
>>>>>>>>> The aarch64's problem is we don't define addptr3 pattern, and we don't
>>>>>>>>>>> have direct insn pattern describing the "x + y << z".  According to
>>>>>>>>>>> gcc internal:
>>>>>>>>>>>
>>>>>>>>>>> ‘addptrm3’
>>>>>>>>>>> Like addm3 but is guaranteed to only be used for address calculations.
>>>>>>>>>>> The expanded code is not allowed to clobber the condition code. It
>>>>>>>>>>> only needs to be defined if addm3 sets the condition code.
>>>>>>>>>
>>>>>>>>> addm3 on aarch64 does not set the condition codes, so by this rule we
>>>>>>>>> shouldn't need to define this pattern.
>>>>>>> Hi Richard,
>>>>>>> I think that rule has a prerequisite that backend needs to support
>>>>>>> register shifted addition in addm3 pattern.
>>>>>>
>>>>>> addm3 is a named pattern and its format is well defined.  It does not
>>>>>> take a shifted operand and never has.
>>>>>>
>>>>>>> Apparently for AArch64,
>>>>>>> addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
>>>>>>> "does not set the condition codes" actually, because both
>>>>>>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
>>>>>>
>>>>>> You appear to be confusing named patterns (used by expand) with
>>>>>> recognizers.  Anyway, we have
>>>>>>
>>>>>> (define_insn "*add_<shift>_<mode>"
>>>>>>   [(set (match_operand:GPI 0 "register_operand" "=r")
>>>>>>         (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>>>>>>                               (match_operand:QI 2
>>>>>> "aarch64_shift_imm_<mode>" "n"))
>>>>>>                   (match_operand:GPI 3 "register_operand" "r")))]
>>>>>>
>>>>>> Which is a non-flag setting add with shifted operand.
>>>>>>
>>>>>>> Either way I think it is another backend issue, so do you approve that
>>>>>>> I commit this patch now?
>>>>>>
>>>>>> Not yet.  I think there's something fundamental amiss here.
>>>>>>
>>>>>> BTW, it looks to me as though addptr<m>3 should have exactly the same
>>>>>> operand rules as add<m>3 (documentation reads "like add<m>3"), so a
>>>>>> shifted operand shouldn't be supported there either.  If that isn't the
>>>>>> case then that should be clearly called out in the documentation.
>>>>>>
>>>>>> R.
>>>>>>
>>>>>
>>>>> PS.
>>>>>
>>>>> I presume you are aware of the canonicalization rules for add?  That is,
>>>>> for a shift-and-add operation, the shift operand must appear first.  Ie.
>>>>>
>>>>> (plus (shift (op, op)), op)
>>>>>
>>>>> not
>>>>>
>>>>> (plus (op, (shift (op, op))
>>>>
>>>> Hi Richard,
>>>> Thanks for the comments.  I realized that the not-recognized insn
>>>> issue is because the original patch build non-canonical expressions.
>>>> When reloading address expression, LRA generates non-canonical
>>>> register scaled insn, which can't be recognized by aarch64 backend.
>>>>
>>>> Here is the updated patch using canonical form pattern,  it passes
>>>> bootstrap and regression test.  Well, the ivo failure still exists,
>>>> but it analyzed in the original message.
>>>>
>>>> Is this patch OK?
>>>>
>>>> As for Jiong's concern about the additional extension instruction, I
>>>> think this only stands for atmoic load store instructions.  For
>>>> general load store, AArch64 supports zext/sext in register scaling
>>>> addressing mode, the additional instruction can be forward propagated
>>>> into memory reference.  The problem for atomic load store is AArch64
>>>> only supports direct register addressing mode.  After LRA reloads
>>>> address expression out of memory reference, there is no combine/fwprop
>>>> optimizer to merge instructions.  The problem is atomic_store's
>>>> predicate doesn't match its constraint.   The predicate used for
>>>> atomic_store<mode> is memory_operand, while all other atomic patterns
>>>> use aarch64_sync_memory_operand.  I think this might be a typo.  With
>>>> this change, expand will not generate addressing mode requiring reload
>>>> anymore.  I will test another patch fixing this.
>>>>
>>>> Thanks,
>>>> bin
>>>
>>> Some comments inline.
>>>
>>>>>
>>>>> R.
>>>>>
>>>>> aarch64_legitimize_addr-20151128.txt
>>>>>
>>>>>
>>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>>>> index 3fe2f0f..5b3e3c4 100644
>>>>> --- a/gcc/config/aarch64/aarch64.c
>>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
>>>>>       We try to pick as large a range for the offset as possible to
>>>>>       maximize the chance of a CSE.  However, for aligned addresses
>>>>>       we limit the range to 4k so that structures with different sized
>>>>> -     elements are likely to use the same base.  */
>>>>> +     elements are likely to use the same base.  We need to be careful
>>>>> +     not split CONST for some forms address expressions, otherwise it
>>>
>>> not to split a CONST for some forms of address expression,
>>>
>>>>> +     will generate sub-optimal code.  */
>>>>>
>>>>>    if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
>>>>>      {
>>>>>        HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>>>>>        HOST_WIDE_INT base_offset;
>>>>>
>>>>> +      if (GET_CODE (XEXP (x, 0)) == PLUS)
>>>>> +    {
>>>>> +      rtx op0 = XEXP (XEXP (x, 0), 0);
>>>>> +      rtx op1 = XEXP (XEXP (x, 0), 1);
>>>>> +
>>>>> +      /* For addr expression in the form like "r1 + r2 + 0x3ffc".
>>>>> +         Since the offset is within range supported by addressing
>>>>> +         mode "reg+offset", we don't split the const and legalize
>>>>> +         it into below insn and expr sequence:
>>>>> +           r3 = r1 + r2;
>>>>> +           "r3 + 0x3ffc".  */
>>>
>>> I think this comment would read better as
>>>
>>>         /* Address expressions of the form Ra + Rb + CONST.
>>>
>>>            If CONST is within the range supported by the addressing
>>>            mode "reg+offset", do not split CONST and use the
>>>            sequence
>>>                 Rt = Ra + Rb
>>>                 addr = Rt + CONST.  */
>>>
>>>>> +      if (REG_P (op0) && REG_P (op1))
>>>>> +        {
>>>>> +          machine_mode addr_mode = GET_MODE (x);
>>>>> +          rtx base = gen_reg_rtx (addr_mode);
>>>>> +          rtx addr = plus_constant (addr_mode, base, offset);
>>>>> +
>>>>> +          if (aarch64_legitimate_address_hook_p (mode, addr, false))
>>>>> +            {
>>>>> +              emit_insn (gen_adddi3 (base, op0, op1));
>>>>> +              return addr;
>>>>> +            }
>>>>> +        }
>>>>> +      /* For addr expression in the form like "r1 + r2<<2 + 0x3ffc".
>>>>> +         Live above, we don't split the const and legalize it into
>>>>> +         below insn and expr sequence:
>>>
>>> Similarly.
>>>>> +           r3 = 0x3ffc;
>>>>> +           r4 = r1 + r3;
>>>>> +           "r4 + r2<<2".  */
>>>
>>> Why don't we generate
>>>
>>>   r3 = r1 + r2 << 2
>>>   r4 = r3 + 0x3ffc
>>>
>>> utilizing the shift-and-add instructions?
>>
>> All other comments are addressed in the attached new patch.
>> As for this question, Wilco also asked it on internal channel before.
>> The main idea is to depend on GIMPLE IVO/SLSR to find CSE
>> opportunities of the scaled plus sub expr.  The scaled index is most
>> likely loop iv, so I would like to split const plus out of memory
>> reference so that it can be identified/hoisted as loop invariant.
>> This is more important when base is sfp related.
>>
>
> Ah, yes.  The SFP problem.
>
> Since at least two people have queried this, it's clearly non-obvious
> enough to require explanation in comment.
>
> OK with that change and a suitable changelog entry.

Given your review comments, I am going to applying attached patch
along with below Changelog entry.

Thanks,
bin


2015-12-04  Bin Cheng  <bin.cheng@arm.com>
        Jiong Wang  <jiong.wang@arm.com>

    * config/aarch64/aarch64.c (aarch64_legitimize_address): legitimize
    address expressions like Ra + Rb + CONST and Ra + Rb<<SCALE + CONST.

[-- Attachment #2: aarch64_legitimize_addr-20151204.txt --]
[-- Type: text/plain, Size: 2943 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3fe2f0f..7cfdda1 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4757,13 +4757,75 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
      We try to pick as large a range for the offset as possible to
      maximize the chance of a CSE.  However, for aligned addresses
      we limit the range to 4k so that structures with different sized
-     elements are likely to use the same base.  */
+     elements are likely to use the same base.  We need to be careful
+     not to split a CONST for some forms of address expression, otherwise
+     it will generate sub-optimal code.  */
 
   if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
     {
       HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
       HOST_WIDE_INT base_offset;
 
+      if (GET_CODE (XEXP (x, 0)) == PLUS)
+	{
+	  rtx op0 = XEXP (XEXP (x, 0), 0);
+	  rtx op1 = XEXP (XEXP (x, 0), 1);
+
+	  /* Address expressions of the form Ra + Rb + CONST.
+
+	     If CONST is within the range supported by the addressing
+	     mode "reg+offset", do not split CONST and use the
+	     sequence
+	       Rt = Ra + Rb;
+	       addr = Rt + CONST.  */
+	  if (REG_P (op0) && REG_P (op1))
+	    {
+	      machine_mode addr_mode = GET_MODE (x);
+	      rtx base = gen_reg_rtx (addr_mode);
+	      rtx addr = plus_constant (addr_mode, base, offset);
+
+	      if (aarch64_legitimate_address_hook_p (mode, addr, false))
+		{
+		  emit_insn (gen_adddi3 (base, op0, op1));
+		  return addr;
+		}
+	    }
+	  /* Address expressions of the form Ra + Rb<<SCALE + CONST.
+
+	     If Reg + Rb<<SCALE is a valid address expression, do not
+	     split CONST and use the sequence
+	       Rc = CONST;
+	       Rt = Ra + Rc;
+	       addr = Rt + Rb<<SCALE.
+
+	     Here we split CONST out of memory referece because:
+	       a) We depend on GIMPLE optimizers to pick up common sub
+		  expression involving the scaling operation.
+	       b) The index Rb is likely a loop iv, it's better to split
+		  the CONST so that computation of new base Rt is a loop
+		  invariant and can be moved out of loop.  This is more
+		  important when the original base Ra is sfp related.  */
+	  else if (REG_P (op0) || REG_P (op1))
+	    {
+	      machine_mode addr_mode = GET_MODE (x);
+	      rtx base = gen_reg_rtx (addr_mode);
+
+	      /* Switch to make sure that register is in op0.  */
+	      if (REG_P (op1))
+		std::swap (op0, op1);
+
+	      rtx addr = gen_rtx_PLUS (addr_mode, op1, base);
+
+	      if (aarch64_legitimate_address_hook_p (mode, addr, false))
+		{
+		  base = force_operand (plus_constant (addr_mode,
+						       op0, offset),
+					NULL_RTX);
+		  return gen_rtx_PLUS (addr_mode, op1, base);
+		}
+	    }
+	}
+
       /* Does it look like we'll need a load/store-pair operation?  */
       if (GET_MODE_SIZE (mode) > 16
 	  || mode == TImode)

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-12-04  3:18 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-17  9:21 [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address Bin Cheng
2015-11-17 10:08 ` James Greenhalgh
2015-11-19  2:32   ` Bin.Cheng
2015-11-20  8:31     ` Bin.Cheng
2015-11-20 17:39       ` Richard Earnshaw
2015-11-24  3:23         ` Bin.Cheng
2015-11-24  9:59           ` Richard Earnshaw
2015-11-24 10:21             ` Richard Earnshaw
2015-11-24 13:13               ` Jiong Wang
2015-11-24 13:29                 ` Richard Earnshaw
2015-11-24 14:39                   ` Jiong Wang
2015-11-24 14:55                     ` Richard Earnshaw
2015-12-01  3:19               ` Bin.Cheng
2015-12-01 10:25                 ` Richard Earnshaw
2015-12-03  5:26                   ` Bin.Cheng
2015-12-03 10:26                     ` Richard Earnshaw
2015-12-04  3:18                       ` Bin.Cheng
2015-11-25  4:53             ` Bin.Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).