[PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
@ 2016-02-09 11:08 Bin Cheng
  2016-02-11  7:14 ` Jeff Law
  0 siblings, 1 reply; 12+ messages in thread
From: Bin Cheng @ 2016-02-09 11:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 1877 bytes --]

Hi,
When counting cost for loop inv, GCC checks if a loop inv can be propagated into its use site (a memory reference).  If it cannot be propagated, we increase its cost so that it's expensive enough to be hoisted out of loop.  Currently we simply replace loop inv register in the use site with its definition expression, then call validate_changes to check if the result insn is valid.  This is weak because validate_changes doesn't take canonicalization into consideration.  Given below example:

  Loop inv def:  
   69: r149:SI=r87:SI+const(unspec[`xxxx'] 1)
      REG_DEAD r87:SI
  Loop inv use:
   70: r150:SI=[r90:SI*0x4+r149:SI]
      REG_DEAD r149:SI

The address expression after propagation is "r90 * 0x4 + (r87 + const(unspec[`xxxx']))".  Function validate_changes simply returns false to it.  As a matter of fact, the propagation is feasible if we canonicalize address expression into the form like "(r90 * 0x4 + r87) + const(unspec[`xxxx'])".

This patch fixes the problem by canonicalizing address expression and verifying if the new addr is valid.  The canonicalization follows GCC insn canonicalization rules.  The test case from bugzilla PR is also included.
As for the canonicalize_address interface, there is another canonicalize_address in fwprop.c which only changes shift into mult.  I think it would be good to factor out a common RTL interface in GCC, but that's stage1 work.

Bootstrap and test on x86_64 and AArch64.  Is it OK?

Thanks,
bin

2016-02-09  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/69052
	* loop-invariant.c (canonicalize_address): New function.
	(inv_can_prop_to_addr_use): Check validity of address expression
	which is canonicalized by above function.

gcc/testsuite/ChangeLog
2016-02-09  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/69052
	* gcc.target/i386/pr69052.c: New test.

[-- Attachment #2: pr69052-20160204.txt --]
[-- Type: text/plain, Size: 4775 bytes --]

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 707f044..157e273 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -754,6 +754,74 @@ create_new_invariant (struct def *def, rtx_insn *insn, bitmap depends_on,
   return inv;
 }
 
+/* Returns a canonical version address for X.  It identifies
+   addr expr in the form of A + B + C.  Following instruction
+   canonicalization rules, MULT operand is moved to the front,
+   CONST operand is moved to the end; also PLUS operators are
+   chained to the left.  */
+
+static rtx
+canonicalize_address (rtx x)
+{
+  rtx op0, op1, op2;
+  machine_mode mode = GET_MODE (x);
+  enum rtx_code code = GET_CODE (x);
+
+  if (code != PLUS)
+    return x;
+
+  /* Extract operands from A + B (+ C).  */
+  if (GET_CODE (XEXP (x, 0)) == PLUS)
+    {
+      op0 = XEXP (XEXP (x, 0), 0);
+      op1 = XEXP (XEXP (x, 0), 1);
+      op2 = XEXP (x, 1);
+    }
+  else if (GET_CODE (XEXP (x, 1)) == PLUS)
+    {
+      op0 = XEXP (x, 0);
+      op1 = XEXP (XEXP (x, 1), 0);
+      op2 = XEXP (XEXP (x, 1), 1);
+    }
+  else
+    {
+      op0 = XEXP (x, 0);
+      op1 = XEXP (x, 1);
+      op2 = NULL_RTX;
+    }
+
+  /* Move MULT operand to the front.  */
+  if (!REG_P (op1) && !CONST_INT_P (op1))
+    std::swap (op0, op1);
+
+  /* Move CONST operand to the end.  */
+  if (CONST_INT_P (op0))
+    std::swap (op0, op1);
+
+  if (op2 != NULL && CONST_INT_P (op1))
+    {
+      /* Try to simplify CONST1 + CONST2 into one operand.  */
+      if (CONST_INT_P (op2))
+	{
+	  rtx x = simplify_binary_operation (PLUS, mode, op1, op2);
+
+	  if (x != NULL_RTX && CONST_INT_P (x))
+	    {
+	      op1 = x;
+	      op2 = NULL_RTX;
+	    }
+	}
+      else
+	std::swap (op1, op2);
+    }
+  /* Chain PLUS operators to the left.  */
+  op0 = simplify_gen_binary (PLUS, mode, op0, op1);
+  if (op2 == NULL_RTX)
+    return op0;
+  else
+    return simplify_gen_binary (PLUS, mode, op0, op2);
+}
+
 /* Given invariant DEF and its address USE, check if the corresponding
    invariant expr can be propagated into the use or not.  */
 
@@ -761,7 +829,7 @@ static bool
 inv_can_prop_to_addr_use (struct def *def, df_ref use)
 {
   struct invariant *inv;
-  rtx *pos = DF_REF_REAL_LOC (use), def_set;
+  rtx *pos = DF_REF_REAL_LOC (use), def_set, use_set;
   rtx_insn *use_insn = DF_REF_INSN (use);
   rtx_insn *def_insn;
   bool ok;
@@ -778,6 +846,29 @@ inv_can_prop_to_addr_use (struct def *def, df_ref use)
 
   validate_unshare_change (use_insn, pos, SET_SRC (def_set), true);
   ok = verify_changes (0);
+  /* Try harder with canonicalization in address expression.  */
+  if (!ok && (use_set = single_set (use_insn)) != NULL_RTX)
+    {
+      rtx src, dest, mem = NULL_RTX;
+
+      src = SET_SRC (use_set);
+      dest = SET_DEST (use_set);
+      if (MEM_P (src))
+	mem = src;
+      else if (MEM_P (dest))
+	mem = dest;
+
+      if (mem != NULL_RTX
+	  && !memory_address_addr_space_p (GET_MODE (mem),
+					   XEXP (mem, 0),
+					   MEM_ADDR_SPACE (mem)))
+	{
+	  rtx addr = canonicalize_address (copy_rtx (XEXP (mem, 0)));
+	  if (memory_address_addr_space_p (GET_MODE (mem),
+					   addr, MEM_ADDR_SPACE (mem)))
+	    ok = true;
+	}
+    }
   cancel_changes (0);
   return ok;
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr69052.c b/gcc/testsuite/gcc.target/i386/pr69052.c
new file mode 100644
index 0000000..6f491e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr69052.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target pie } */
+/* { dg-options "-O2 -fPIE -pie" } */
+
+int look_nbits[256], loop_sym[256];
+const int ind[] = {
+  0,  1,  8, 16,  9,  2,  3, 10, 17, 24, 32, 25, 18, 11,  4,  5,
+ 12, 19, 26, 33, 40, 48, 41, 34, 27, 20, 13,  6,  7, 14, 21, 28,
+ 35, 42, 49, 56, 57, 50, 43, 36, 29, 22, 15, 23, 30, 37, 44, 51,
+ 58, 59, 52, 45, 38, 31, 39, 46, 53, 60, 61, 54, 47, 55, 62, 63
+};
+int out[256];
+extern void bar (int *, int *);
+void foo (int *l1, int *l2, int *v, int *v1, int *m1, int i)
+{
+  int L = i + 1, b = 20;
+  int result, k;
+
+  for (k = 1; k < 64; k++)
+    {
+      int look = (((L >> (b - 8))) & ((1 << 8) - 1));
+      int nb = l1[look];
+      int code;
+      int r;
+
+      if (nb)
+	{
+	  b -= nb;
+	  result = l2[look];
+	}
+      else
+	{
+	  nb = 9;
+	  code = (((L >> (b -= nb))) & ((1 << nb) - 1));
+	  result = v[(code + v1[nb])];
+	}
+      r = result >> 4;
+      result &= 15;
+      if (result)
+	{
+	  k += r;
+	  r = (((L >> (b -= result))) & ((1 << result) - 1));
+	  if (r < (1 << (result - 1)))
+	    result = r + (((-1) << result) + 1);
+	  else
+	    result = r;
+
+	  out[ind[k]] = result;
+	}
+      bar (&L, &b);
+    }
+}
+
+/* { dg-final { scan-assembler-not "leal\[ \t\]ind@GOTOFF\\(%\[^,\]*\\), %" { target ia32 } } } */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
  2016-02-09 11:08 [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization Bin Cheng
@ 2016-02-11  7:14 ` Jeff Law
  2016-02-11 17:59   ` Bin.Cheng
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Law @ 2016-02-11  7:14 UTC (permalink / raw)
  To: Bin Cheng, gcc-patches; +Cc: nd

On 02/09/2016 04:08 AM, Bin Cheng wrote:
> Hi,
> When counting cost for loop inv, GCC checks if a loop inv can be propagated into its use site (a memory reference).  If it cannot be propagated, we increase its cost so that it's expensive enough to be hoisted out of loop.  Currently we simply replace loop inv register in the use site with its definition expression, then call validate_changes to check if the result insn is valid.  This is weak because validate_changes doesn't take canonicalization into consideration.  Given below example:
>
>    Loop inv def:
>     69: r149:SI=r87:SI+const(unspec[`xxxx'] 1)
>        REG_DEAD r87:SI
>    Loop inv use:
>     70: r150:SI=[r90:SI*0x4+r149:SI]
>        REG_DEAD r149:SI
>
> The address expression after propagation is "r90 * 0x4 + (r87 + const(unspec[`xxxx']))".  Function validate_changes simply returns false to it.  As a matter of fact, the propagation is feasible if we canonicalize address expression into the form like "(r90 * 0x4 + r87) + const(unspec[`xxxx'])".
>
> This patch fixes the problem by canonicalizing address expression and verifying if the new addr is valid.  The canonicalization follows GCC insn canonicalization rules.  The test case from bugzilla PR is also included.
> As for the canonicalize_address interface, there is another canonicalize_address in fwprop.c which only changes shift into mult.  I think it would be good to factor out a common RTL interface in GCC, but that's stage1 work.

Also note there's bits in combine that will canonicalize appropriate 
shifts into mults.  Clearly there's a need for some generalized routines 
to take a fairly generic address and perform canonicalizations and 
simplifications on it.

> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>
> Thanks,
> bin
>
> 2016-02-09  Bin Cheng<bin.cheng@arm.com>
>
> 	PR tree-optimization/69052
> 	* loop-invariant.c (canonicalize_address): New function.
> 	(inv_can_prop_to_addr_use): Check validity of address expression
> 	which is canonicalized by above function.
>
> gcc/testsuite/ChangeLog
> 2016-02-09  Bin Cheng<bin.cheng@arm.com>
>
> 	PR tree-optimization/69052
> 	* gcc.target/i386/pr69052.c: New test.
>
>
> pr69052-20160204.txt
>
>
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> index 707f044..157e273 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -754,6 +754,74 @@ create_new_invariant (struct def *def, rtx_insn *insn, bitmap depends_on,
>     return inv;
>   }
>
> +/* Returns a canonical version address for X.  It identifies
> +   addr expr in the form of A + B + C.  Following instruction
> +   canonicalization rules, MULT operand is moved to the front,
> +   CONST operand is moved to the end; also PLUS operators are
> +   chained to the left.  */
> +
> +static rtx
> +canonicalize_address (rtx x)
> +{
> +  rtx op0, op1, op2;
> +  machine_mode mode = GET_MODE (x);
> +  enum rtx_code code = GET_CODE (x);
> +
> +  if (code != PLUS)
> +    return x;
> +
> +  /* Extract operands from A + B (+ C).  */
> +  if (GET_CODE (XEXP (x, 0)) == PLUS)
> +    {
> +      op0 = XEXP (XEXP (x, 0), 0);
> +      op1 = XEXP (XEXP (x, 0), 1);
> +      op2 = XEXP (x, 1);
> +    }
> +  else if (GET_CODE (XEXP (x, 1)) == PLUS)
> +    {
> +      op0 = XEXP (x, 0);
> +      op1 = XEXP (XEXP (x, 1), 0);
> +      op2 = XEXP (XEXP (x, 1), 1);
> +    }
> +  else
> +    {
> +      op0 = XEXP (x, 0);
> +      op1 = XEXP (x, 1);
> +      op2 = NULL_RTX;
> +    }
> +
> +  /* Move MULT operand to the front.  */
> +  if (!REG_P (op1) && !CONST_INT_P (op1))
> +    std::swap (op0, op1);
This feels a bit hack-ish in the sense that you already know the form of 
the RTL you're expecting and just assume that you'll be given something 
of that form, but no more complex.

ISTM you're better off walking the whole rtx, recording the tidbits as 
you go into a vec.  If you see something unexpected during that walk, 
you punt canonicalization of the whole expression.

You then sort the vec.  You want to move things like MULT to the start 
and all the constants to the end I think.

You then do simplifications, particularly on the constants, but there 
may be something useful to do with MULT terms as well.  You could also 
arrange to rewrite ASHIFTs into MULTs at this stage.

Then you generate a new equivalent expression from the simplified 
operands in the vec.

You might look at tree-ssa-reassoc for ideas on implementation details.

Initially just use it in the LICM code, but I think given that kind of 
structure it'd be generally useful to replace bits of combine and fwprop

If your contention is that only a few forms really matter, then I'd like 
to see those forms spelled out better in the comment and some kind of 
checking that we have reasonable incoming RTL.


> +
> +  /* Move CONST operand to the end.  */
> +  if (CONST_INT_P (op0))
> +    std::swap (op0, op1);
You might want to check CONSTANT_P here.  Maybe it doesn't matter in 
practice, but things like (plus (plus (symbol-ref) (const_int) const_int))

That also gives you a fighting chance at extending this to handle 
HIGH/LO_SUM which are going to appear on the RISCy targets.

So I think the concept of making sure we're passing canonical RTL to the 
verification step is good, I'm a bit concerned about the implementation 
of the canonicalization step.

Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
  2016-02-11  7:14 ` Jeff Law
@ 2016-02-11 17:59   ` Bin.Cheng
  2016-02-11 23:26     ` Jeff Law
  0 siblings, 1 reply; 12+ messages in thread
From: Bin.Cheng @ 2016-02-11 17:59 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bin Cheng, gcc-patches, nd

On Thu, Feb 11, 2016 at 7:14 AM, Jeff Law <law@redhat.com> wrote:
> On 02/09/2016 04:08 AM, Bin Cheng wrote:
>>
>> Hi,
>> When counting cost for loop inv, GCC checks if a loop inv can be
>> propagated into its use site (a memory reference).  If it cannot be
>> propagated, we increase its cost so that it's expensive enough to be hoisted
>> out of loop.  Currently we simply replace loop inv register in the use site
>> with its definition expression, then call validate_changes to check if the
>> result insn is valid.  This is weak because validate_changes doesn't take
>> canonicalization into consideration.  Given below example:
>>
>>    Loop inv def:
>>     69: r149:SI=r87:SI+const(unspec[`xxxx'] 1)
>>        REG_DEAD r87:SI
>>    Loop inv use:
>>     70: r150:SI=[r90:SI*0x4+r149:SI]
>>        REG_DEAD r149:SI
>>
>> The address expression after propagation is "r90 * 0x4 + (r87 +
>> const(unspec[`xxxx']))".  Function validate_changes simply returns false to
>> it.  As a matter of fact, the propagation is feasible if we canonicalize
>> address expression into the form like "(r90 * 0x4 + r87) +
>> const(unspec[`xxxx'])".
>>
>> This patch fixes the problem by canonicalizing address expression and
>> verifying if the new addr is valid.  The canonicalization follows GCC insn
>> canonicalization rules.  The test case from bugzilla PR is also included.
>> As for the canonicalize_address interface, there is another
>> canonicalize_address in fwprop.c which only changes shift into mult.  I
>> think it would be good to factor out a common RTL interface in GCC, but
>> that's stage1 work.
>
>
> Also note there's bits in combine that will canonicalize appropriate shifts
> into mults.  Clearly there's a need for some generalized routines to take a
> fairly generic address and perform canonicalizations and simplifications on
> it.
>
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>
>> Thanks,
>> bin
>>
>> 2016-02-09  Bin Cheng<bin.cheng@arm.com>
>>
>>         PR tree-optimization/69052
>>         * loop-invariant.c (canonicalize_address): New function.
>>         (inv_can_prop_to_addr_use): Check validity of address expression
>>         which is canonicalized by above function.
>>
>> gcc/testsuite/ChangeLog
>> 2016-02-09  Bin Cheng<bin.cheng@arm.com>
>>
>>         PR tree-optimization/69052
>>         * gcc.target/i386/pr69052.c: New test.
>>
>>
>> pr69052-20160204.txt
>>
>>
>> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
>> index 707f044..157e273 100644
>> --- a/gcc/loop-invariant.c
>> +++ b/gcc/loop-invariant.c
>> @@ -754,6 +754,74 @@ create_new_invariant (struct def *def, rtx_insn
>> *insn, bitmap depends_on,
>>     return inv;
>>   }
>>
>> +/* Returns a canonical version address for X.  It identifies
>> +   addr expr in the form of A + B + C.  Following instruction
>> +   canonicalization rules, MULT operand is moved to the front,
>> +   CONST operand is moved to the end; also PLUS operators are
>> +   chained to the left.  */
>> +
>> +static rtx
>> +canonicalize_address (rtx x)
>> +{
>> +  rtx op0, op1, op2;
>> +  machine_mode mode = GET_MODE (x);
>> +  enum rtx_code code = GET_CODE (x);
>> +
>> +  if (code != PLUS)
>> +    return x;
>> +
>> +  /* Extract operands from A + B (+ C).  */
>> +  if (GET_CODE (XEXP (x, 0)) == PLUS)
>> +    {
>> +      op0 = XEXP (XEXP (x, 0), 0);
>> +      op1 = XEXP (XEXP (x, 0), 1);
>> +      op2 = XEXP (x, 1);
>> +    }
>> +  else if (GET_CODE (XEXP (x, 1)) == PLUS)
>> +    {
>> +      op0 = XEXP (x, 0);
>> +      op1 = XEXP (XEXP (x, 1), 0);
>> +      op2 = XEXP (XEXP (x, 1), 1);
>> +    }
>> +  else
>> +    {
>> +      op0 = XEXP (x, 0);
>> +      op1 = XEXP (x, 1);
>> +      op2 = NULL_RTX;
>> +    }
>> +
>> +  /* Move MULT operand to the front.  */
>> +  if (!REG_P (op1) && !CONST_INT_P (op1))
>> +    std::swap (op0, op1);
>
> This feels a bit hack-ish in the sense that you already know the form of the
> RTL you're expecting and just assume that you'll be given something of that
> form, but no more complex.
>
> ISTM you're better off walking the whole rtx, recording the tidbits as you
> go into a vec.  If you see something unexpected during that walk, you punt
> canonicalization of the whole expression.
>
> You then sort the vec.  You want to move things like MULT to the start and
> all the constants to the end I think.
>
> You then do simplifications, particularly on the constants, but there may be
> something useful to do with MULT terms as well.  You could also arrange to
> rewrite ASHIFTs into MULTs at this stage.
>
> Then you generate a new equivalent expression from the simplified operands
> in the vec.
>
> You might look at tree-ssa-reassoc for ideas on implementation details.
>
> Initially just use it in the LICM code, but I think given that kind of
> structure it'd be generally useful to replace bits of combine and fwprop
>
> If your contention is that only a few forms really matter, then I'd like to
> see those forms spelled out better in the comment and some kind of checking
> that we have reasonable incoming RTL.
>
>
>> +
>> +  /* Move CONST operand to the end.  */
>> +  if (CONST_INT_P (op0))
>> +    std::swap (op0, op1);
>
> You might want to check CONSTANT_P here.  Maybe it doesn't matter in
> practice, but things like (plus (plus (symbol-ref) (const_int) const_int))
>
> That also gives you a fighting chance at extending this to handle
> HIGH/LO_SUM which are going to appear on the RISCy targets.
>
> So I think the concept of making sure we're passing canonical RTL to the
> verification step is good, I'm a bit concerned about the implementation of
> the canonicalization step.
Hi Jeff,
Thanks for detailed review.  I also think a generic canonical
interface for RTL is much better.  I will give it a try.  But with
high chance it's a next stage1 stuff.

Thanks,
bin
>
> Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
  2016-02-11 17:59   ` Bin.Cheng
@ 2016-02-11 23:26     ` Jeff Law
  2016-02-16 18:43       ` Bin Cheng
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Law @ 2016-02-11 23:26 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Bin Cheng, gcc-patches, nd

On 02/11/2016 10:59 AM, Bin.Cheng wrote:

> Hi Jeff,
> Thanks for detailed review.  I also think a generic canonical
> interface for RTL is much better.  I will give it a try.  But with
> high chance it's a next stage1 stuff.
That is, of course, fine.  However, if you do get something ready, I'd 
support using it within LICM for gcc-6, then using it in other places 
for gcc-7.

Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
  2016-02-11 23:26     ` Jeff Law
@ 2016-02-16 18:43       ` Bin Cheng
  2016-02-19 22:24         ` Jeff Law
  0 siblings, 1 reply; 12+ messages in thread
From: Bin Cheng @ 2016-02-16 18:43 UTC (permalink / raw)
  To: Jeff Law, Bin.Cheng; +Cc: gcc-patches, nd

[-- Attachment #1: Type: text/plain, Size: 1996 bytes --]

________________________________________
From: Jeff Law <law@redhat.com>
Sent: 11 February 2016 23:26
To: Bin.Cheng
Cc: Bin Cheng; gcc-patches@gcc.gnu.org; nd
Subject: Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization

>> On 02/11/2016 10:59 AM, Bin.Cheng wrote:

>> Hi Jeff,
>> Thanks for detailed review.  I also think a generic canonical
>> interface for RTL is much better.  I will give it a try.  But with
>> high chance it's a next stage1 stuff.
> That is, of course, fine.  However, if you do get something ready, I'd
> support using it within LICM for gcc-6, then using it in other places
> for gcc-7.
Hi,
This is the updated version patch.  It fixes the problem by introducing a generic address canonicalization interface.  This new interface canonicalizes address expression in following steps:
     1) Rewrite ASHIFT into MULT recursively.
     2) Divide address into sub expressions with PLUS as the separator.
     3) Sort sub expressions according to precedence defined for communative operations.
     4) Simplify CONST_INT_P sub expressions.
     5) Create new canonicalized address and return.

According to review comments, this interface is now restricted in LCIM, and will probably be expanded to other passes like fwprop and combine after entering GCC7.
Bootstrap and test on x86_64 and AArch64.  Is it OK?

Thanks,
bin

2016-02-15  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/69052
	* loop-invariant.c (canonicalize_address_mult): New function.
	(MAX_CANON_ADDR_PARTS): New macro.
	(collect_address_parts): New function.
	(compare_address_parts, canonicalize_address): New functions.
	(inv_can_prop_to_addr_use): Check validity of address expression
	which is canonicalized by above canonicalize_address.

gcc/testsuite/ChangeLog
2016-02-15  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/69052
	* gcc.target/i386/pr69052.c: New test.

> 
> Jeff


[-- Attachment #2: pr69052-20160216.txt --]
[-- Type: text/plain, Size: 7028 bytes --]

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 707f044..af528d0 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -754,6 +754,147 @@ create_new_invariant (struct def *def, rtx_insn *insn, bitmap depends_on,
   return inv;
 }
 
+/* Return a canonical version of X for the address, from the point of view,
+   that all multiplications are represented as MULT instead of the multiply
+   by a power of 2 being represented as ASHIFT.
+
+   This function is a duplication of canonicalize_address from fwprop.c.
+   Callers should prepare a copy of X because this function may modify it
+   in place.  */
+
+static void
+canonicalize_address_mult (rtx x)
+{
+  for (;;)
+    switch (GET_CODE (x))
+      {
+      case ASHIFT:
+	if (CONST_INT_P (XEXP (x, 1))
+	    && INTVAL (XEXP (x, 1)) < GET_MODE_BITSIZE (GET_MODE (x))
+	    && INTVAL (XEXP (x, 1)) >= 0)
+	  {
+	    HOST_WIDE_INT shift = INTVAL (XEXP (x, 1));
+	    PUT_CODE (x, MULT);
+	    XEXP (x, 1) = gen_int_mode ((HOST_WIDE_INT) 1 << shift,
+					GET_MODE (x));
+	  }
+
+	x = XEXP (x, 0);
+	break;
+
+      case PLUS:
+	if (GET_CODE (XEXP (x, 0)) == PLUS
+	    || GET_CODE (XEXP (x, 0)) == ASHIFT
+	    || GET_CODE (XEXP (x, 0)) == CONST)
+	  canonicalize_address_mult (XEXP (x, 0));
+
+	x = XEXP (x, 1);
+	break;
+
+      case CONST:
+	x = XEXP (x, 0);
+	break;
+
+      default:
+	return;
+      }
+}
+
+/* Maximum number of sub expressions in address.  We set it to
+   a small integer since it's unlikely to have a complicated
+   address expression.  */
+
+#define MAX_CANON_ADDR_PARTS (5)
+
+/* Collect sub expressions in address X with PLUS as the seperator.
+   Sub expressions are stored in vector ADDR_PARTS.  */
+
+static void
+collect_address_parts (rtx x, vec<rtx> *addr_parts)
+{
+  for (;;)
+    if (GET_CODE (x) == PLUS)
+      {
+	collect_address_parts (XEXP (x, 0), addr_parts);
+	x = XEXP (x, 1);
+      }
+    else
+      {
+	addr_parts->safe_push (x);
+	break;
+      }
+}
+
+/* Compare function for sorting sub expressions X and Y based on
+   precedence defined for communitive operations.  */
+
+static int
+compare_address_parts (const void *x, const void *y)
+{
+  const rtx *rx = (const rtx *)x;
+  const rtx *ry = (const rtx *)y;
+  int px = commutative_operand_precedence (*rx);
+  int py = commutative_operand_precedence (*ry);
+
+  return (py - px);
+}
+
+/* Return a canonical version address for X by following steps:
+     1) Rewrite ASHIFT into MULT recursively.
+     2) Divide address into sub expressions with PLUS as the
+	separator.
+     3) Sort sub expressions according to precedence defined
+	for communative operations.
+     4) Simplify CONST_INT_P sub expressions.
+     5) Create new canonicalized address and return.
+   Callers should prepare a copy of X because this function may
+   modify it in place.  */
+
+static rtx
+canonicalize_address (rtx x)
+{
+  rtx res;
+  unsigned int i, j;
+  machine_mode mode = GET_MODE (x);
+  auto_vec<rtx, MAX_CANON_ADDR_PARTS> addr_parts;
+
+  /* Rewrite ASHIFT into MULT.  */
+  canonicalize_address_mult (x);
+  /* Divide address into sub expressions.  */
+  collect_address_parts (x, &addr_parts);
+  /* Unlikely to have very complicated address.  */
+  if (addr_parts.length () < 2
+      || addr_parts.length () > MAX_CANON_ADDR_PARTS)
+    return x;
+
+  /* Sort sub expressions according to canonicalization precedence.  */
+  addr_parts.qsort (compare_address_parts);
+
+  /* Simplify all constant int summary if possible.  */
+  for (i = 0; i < addr_parts.length (); i++)
+    if (CONST_INT_P (addr_parts[i]))
+      break;
+
+  for (j = i + 1; j < addr_parts.length (); j++)
+    {
+      gcc_assert (CONST_INT_P (addr_parts[j]));
+      addr_parts[i] = simplify_gen_binary (PLUS, mode,
+					   addr_parts[i],
+					   addr_parts[j]);
+    }
+
+  /* Chain PLUS operators to the left for !CONST_INT_P sub expressions.  */
+  res = addr_parts[0];
+  for (j = 1; j < i; j++)
+    res = simplify_gen_binary (PLUS, mode, res, addr_parts[j]);
+
+  /* Pickup the last CONST_INT_P sub expression.  */
+  if (i < addr_parts.length ())
+    res = simplify_gen_binary (PLUS, mode, res, addr_parts[i]);
+
+  return res;
+}
+
 /* Given invariant DEF and its address USE, check if the corresponding
    invariant expr can be propagated into the use or not.  */
 
@@ -761,7 +902,7 @@ static bool
 inv_can_prop_to_addr_use (struct def *def, df_ref use)
 {
   struct invariant *inv;
-  rtx *pos = DF_REF_REAL_LOC (use), def_set;
+  rtx *pos = DF_REF_REAL_LOC (use), def_set, use_set;
   rtx_insn *use_insn = DF_REF_INSN (use);
   rtx_insn *def_insn;
   bool ok;
@@ -778,6 +919,29 @@ inv_can_prop_to_addr_use (struct def *def, df_ref use)
 
   validate_unshare_change (use_insn, pos, SET_SRC (def_set), true);
   ok = verify_changes (0);
+  /* Try harder with canonicalization in address expression.  */
+  if (!ok && (use_set = single_set (use_insn)) != NULL_RTX)
+    {
+      rtx src, dest, mem = NULL_RTX;
+
+      src = SET_SRC (use_set);
+      dest = SET_DEST (use_set);
+      if (MEM_P (src))
+	mem = src;
+      else if (MEM_P (dest))
+	mem = dest;
+
+      if (mem != NULL_RTX
+	  && !memory_address_addr_space_p (GET_MODE (mem),
+					   XEXP (mem, 0),
+					   MEM_ADDR_SPACE (mem)))
+	{
+	  rtx addr = canonicalize_address (copy_rtx (XEXP (mem, 0)));
+	  if (memory_address_addr_space_p (GET_MODE (mem),
+					   addr, MEM_ADDR_SPACE (mem)))
+	    ok = true;
+	}
+    }
   cancel_changes (0);
   return ok;
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr69052.c b/gcc/testsuite/gcc.target/i386/pr69052.c
new file mode 100644
index 0000000..6f491e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr69052.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target pie } */
+/* { dg-options "-O2 -fPIE -pie" } */
+
+int look_nbits[256], loop_sym[256];
+const int ind[] = {
+  0,  1,  8, 16,  9,  2,  3, 10, 17, 24, 32, 25, 18, 11,  4,  5,
+ 12, 19, 26, 33, 40, 48, 41, 34, 27, 20, 13,  6,  7, 14, 21, 28,
+ 35, 42, 49, 56, 57, 50, 43, 36, 29, 22, 15, 23, 30, 37, 44, 51,
+ 58, 59, 52, 45, 38, 31, 39, 46, 53, 60, 61, 54, 47, 55, 62, 63
+};
+int out[256];
+extern void bar (int *, int *);
+void foo (int *l1, int *l2, int *v, int *v1, int *m1, int i)
+{
+  int L = i + 1, b = 20;
+  int result, k;
+
+  for (k = 1; k < 64; k++)
+    {
+      int look = (((L >> (b - 8))) & ((1 << 8) - 1));
+      int nb = l1[look];
+      int code;
+      int r;
+
+      if (nb)
+	{
+	  b -= nb;
+	  result = l2[look];
+	}
+      else
+	{
+	  nb = 9;
+	  code = (((L >> (b -= nb))) & ((1 << nb) - 1));
+	  result = v[(code + v1[nb])];
+	}
+      r = result >> 4;
+      result &= 15;
+      if (result)
+	{
+	  k += r;
+	  r = (((L >> (b -= result))) & ((1 << result) - 1));
+	  if (r < (1 << (result - 1)))
+	    result = r + (((-1) << result) + 1);
+	  else
+	    result = r;
+
+	  out[ind[k]] = result;
+	}
+      bar (&L, &b);
+    }
+}
+
+/* { dg-final { scan-assembler-not "leal\[ \t\]ind@GOTOFF\\(%\[^,\]*\\), %" { target ia32 } } } */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
  2016-02-16 18:43       ` Bin Cheng
@ 2016-02-19 22:24         ` Jeff Law
  2016-02-22  9:22           ` Bin.Cheng
  2016-02-23 15:10           ` Bin.Cheng
  0 siblings, 2 replies; 12+ messages in thread
From: Jeff Law @ 2016-02-19 22:24 UTC (permalink / raw)
  To: Bin Cheng, Bin.Cheng; +Cc: gcc-patches, nd

On 02/16/2016 11:43 AM, Bin Cheng wrote:
> ________________________________________
> From: Jeff Law <law@redhat.com>
> Sent: 11 February 2016 23:26
> To: Bin.Cheng
> Cc: Bin Cheng; gcc-patches@gcc.gnu.org; nd
> Subject: Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
>
>>> On 02/11/2016 10:59 AM, Bin.Cheng wrote:
>
>>> Hi Jeff,
>>> Thanks for detailed review.  I also think a generic canonical
>>> interface for RTL is much better.  I will give it a try.  But with
>>> high chance it's a next stage1 stuff.
>> That is, of course, fine.  However, if you do get something ready, I'd
>> support using it within LICM for gcc-6, then using it in other places
>> for gcc-7.
> Hi,
> This is the updated version patch.  It fixes the problem by introducing a generic address canonicalization interface.  This new interface canonicalizes address expression in following steps:
>       1) Rewrite ASHIFT into MULT recursively.
>       2) Divide address into sub expressions with PLUS as the separator.
>       3) Sort sub expressions according to precedence defined for communative operations.
>       4) Simplify CONST_INT_P sub expressions.
>       5) Create new canonicalized address and return.
>
> According to review comments, this interface is now restricted in LCIM, and will probably be expanded to other passes like fwprop and combine after entering GCC7.
> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>
> Thanks,
> bin
>
> 2016-02-15  Bin Cheng  <bin.cheng@arm.com>
>
> 	PR tree-optimization/69052
> 	* loop-invariant.c (canonicalize_address_mult): New function.
> 	(MAX_CANON_ADDR_PARTS): New macro.
> 	(collect_address_parts): New function.
> 	(compare_address_parts, canonicalize_address): New functions.
> 	(inv_can_prop_to_addr_use): Check validity of address expression
> 	which is canonicalized by above canonicalize_address.
>
> gcc/testsuite/ChangeLog
> 2016-02-15  Bin Cheng  <bin.cheng@arm.com>
>
> 	PR tree-optimization/69052
> 	* gcc.target/i386/pr69052.c: New test.
This is exactly what I was looking for from a design standpoint.

My only question is why didn't you use FOR_EACH_SUBRTX_VRA from 
rtl-iter.h to walk the RTX expressions in collect_address_parts and 
canonicalize_address_mult?

Jeff


>
>>
>> Jeff
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
  2016-02-19 22:24         ` Jeff Law
@ 2016-02-22  9:22           ` Bin.Cheng
  2016-02-25  6:39             ` Jeff Law
  2016-02-23 15:10           ` Bin.Cheng
  1 sibling, 1 reply; 12+ messages in thread
From: Bin.Cheng @ 2016-02-22  9:22 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bin Cheng, gcc-patches, nd

On Fri, Feb 19, 2016 at 10:24 PM, Jeff Law <law@redhat.com> wrote:
> On 02/16/2016 11:43 AM, Bin Cheng wrote:
>>
>> ________________________________________
>> From: Jeff Law <law@redhat.com>
>> Sent: 11 February 2016 23:26
>> To: Bin.Cheng
>> Cc: Bin Cheng; gcc-patches@gcc.gnu.org; nd
>> Subject: Re: [PATCH PR69052]Check if loop inv can be propagated into mem
>> ref with additional addr expr canonicalization
>>
>>>> On 02/11/2016 10:59 AM, Bin.Cheng wrote:
>>
>>
>>>> Hi Jeff,
>>>> Thanks for detailed review.  I also think a generic canonical
>>>> interface for RTL is much better.  I will give it a try.  But with
>>>> high chance it's a next stage1 stuff.
>>>
>>> That is, of course, fine.  However, if you do get something ready, I'd
>>> support using it within LICM for gcc-6, then using it in other places
>>> for gcc-7.
>>
>> Hi,
>> This is the updated version patch.  It fixes the problem by introducing a
>> generic address canonicalization interface.  This new interface
>> canonicalizes address expression in following steps:
>>       1) Rewrite ASHIFT into MULT recursively.
>>       2) Divide address into sub expressions with PLUS as the separator.
>>       3) Sort sub expressions according to precedence defined for
>> communative operations.
>>       4) Simplify CONST_INT_P sub expressions.
>>       5) Create new canonicalized address and return.
>>
>> According to review comments, this interface is now restricted in LCIM,
>> and will probably be expanded to other passes like fwprop and combine after
>> entering GCC7.
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>
>> Thanks,
>> bin
>>
>> 2016-02-15  Bin Cheng  <bin.cheng@arm.com>
>>
>>         PR tree-optimization/69052
>>         * loop-invariant.c (canonicalize_address_mult): New function.
>>         (MAX_CANON_ADDR_PARTS): New macro.
>>         (collect_address_parts): New function.
>>         (compare_address_parts, canonicalize_address): New functions.
>>         (inv_can_prop_to_addr_use): Check validity of address expression
>>         which is canonicalized by above canonicalize_address.
>>
>> gcc/testsuite/ChangeLog
>> 2016-02-15  Bin Cheng  <bin.cheng@arm.com>
>>
>>         PR tree-optimization/69052
>>         * gcc.target/i386/pr69052.c: New test.
>
> This is exactly what I was looking for from a design standpoint.
>
> My only question is why didn't you use FOR_EACH_SUBRTX_VRA from rtl-iter.h
> to walk the RTX expressions in collect_address_parts and
> canonicalize_address_mult?
Hi Jeff,
Nothing special, just I haven't used this before, also
canonicalize_address_mult is alphabetically copied from fwprop.c.  One
question is when rewriting SHIFT to MULT, we need to modify rtl
expression in place, does FOR_EACH_SUBRTX iterator support this?  If
yes, what is the behavior for modified sub-expression?

Thanks,
bin
>
> Jeff
>
>
>>
>>>
>>> Jeff
>>
>>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
  2016-02-19 22:24         ` Jeff Law
  2016-02-22  9:22           ` Bin.Cheng
@ 2016-02-23 15:10           ` Bin.Cheng
  1 sibling, 0 replies; 12+ messages in thread
From: Bin.Cheng @ 2016-02-23 15:10 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bin Cheng, gcc-patches, nd

[-- Attachment #1: Type: text/plain, Size: 2729 bytes --]

On Fri, Feb 19, 2016 at 10:24 PM, Jeff Law <law@redhat.com> wrote:
> On 02/16/2016 11:43 AM, Bin Cheng wrote:
>>
>> ________________________________________
>> From: Jeff Law <law@redhat.com>
>> Sent: 11 February 2016 23:26
>> To: Bin.Cheng
>> Cc: Bin Cheng; gcc-patches@gcc.gnu.org; nd
>> Subject: Re: [PATCH PR69052]Check if loop inv can be propagated into mem
>> ref with additional addr expr canonicalization
>>
>>>> On 02/11/2016 10:59 AM, Bin.Cheng wrote:
>>
>>
>>>> Hi Jeff,
>>>> Thanks for detailed review.  I also think a generic canonical
>>>> interface for RTL is much better.  I will give it a try.  But with
>>>> high chance it's a next stage1 stuff.
>>>
>>> That is, of course, fine.  However, if you do get something ready, I'd
>>> support using it within LICM for gcc-6, then using it in other places
>>> for gcc-7.
>>
>> Hi,
>> This is the updated version patch.  It fixes the problem by introducing a
>> generic address canonicalization interface.  This new interface
>> canonicalizes address expression in following steps:
>>       1) Rewrite ASHIFT into MULT recursively.
>>       2) Divide address into sub expressions with PLUS as the separator.
>>       3) Sort sub expressions according to precedence defined for
>> communative operations.
>>       4) Simplify CONST_INT_P sub expressions.
>>       5) Create new canonicalized address and return.
>>
>> According to review comments, this interface is now restricted in LCIM,
>> and will probably be expanded to other passes like fwprop and combine after
>> entering GCC7.
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>
>> Thanks,
>> bin
>>
>> 2016-02-15  Bin Cheng  <bin.cheng@arm.com>
>>
>>         PR tree-optimization/69052
>>         * loop-invariant.c (canonicalize_address_mult): New function.
>>         (MAX_CANON_ADDR_PARTS): New macro.
>>         (collect_address_parts): New function.
>>         (compare_address_parts, canonicalize_address): New functions.
>>         (inv_can_prop_to_addr_use): Check validity of address expression
>>         which is canonicalized by above canonicalize_address.
>>
>> gcc/testsuite/ChangeLog
>> 2016-02-15  Bin Cheng  <bin.cheng@arm.com>
>>
>>         PR tree-optimization/69052
>>         * gcc.target/i386/pr69052.c: New test.
>
> This is exactly what I was looking for from a design standpoint.
>
> My only question is why didn't you use FOR_EACH_SUBRTX_VRA from rtl-iter.h
> to walk the RTX expressions in collect_address_parts and
> canonicalize_address_mult?
Hi Jeff,
Here comes the updated patch using FOR_EACH_SUBRTX_VAR in both
functions you mentioned.
Bootstrap and test on x86_64 and AArch64, is it OK?  The ChangeLog
isn't affected.

Thanks,
bin
>
> Jeff
>
>
>>
>>>
>>> Jeff
>>
>>
>

[-- Attachment #2: pr69052-20160222.txt --]
[-- Type: text/plain, Size: 7007 bytes --]

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 707f044..dcbe932 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "expr.h"
 #include "params.h"
+#include "rtl-iter.h"
 #include "dumpfile.h"
 
 /* The data stored for the loop.  */
@@ -754,6 +755,130 @@ create_new_invariant (struct def *def, rtx_insn *insn, bitmap depends_on,
   return inv;
 }
 
+/* Return a canonical version of X for the address, from the point of view,
+   that all multiplications are represented as MULT instead of the multiply
+   by a power of 2 being represented as ASHIFT.
+
+   Callers should prepare a copy of X because this function may modify it
+   in place.  */
+
+static void
+canonicalize_address_mult (rtx x)
+{
+  subrtx_var_iterator::array_type array;
+  FOR_EACH_SUBRTX_VAR (iter, array, x, NONCONST)
+    {
+      rtx sub = *iter;
+
+      if (GET_CODE (sub) == ASHIFT
+	  && CONST_INT_P (XEXP (sub, 1))
+	  && INTVAL (XEXP (sub, 1)) < GET_MODE_BITSIZE (GET_MODE (sub))
+	  && INTVAL (XEXP (sub, 1)) >= 0)
+	{
+	  HOST_WIDE_INT shift = INTVAL (XEXP (sub, 1));
+	  PUT_CODE (sub, MULT);
+	  XEXP (sub, 1) = gen_int_mode ((HOST_WIDE_INT) 1 << shift,
+					GET_MODE (sub));
+	  iter.skip_subrtxes ();
+	}
+    }
+}
+
+/* Maximum number of sub expressions in address.  We set it to
+   a small integer since it's unlikely to have a complicated
+   address expression.  */
+
+#define MAX_CANON_ADDR_PARTS (5)
+
+/* Collect sub expressions in address X with PLUS as the seperator.
+   Sub expressions are stored in vector ADDR_PARTS.  */
+
+static void
+collect_address_parts (rtx x, vec<rtx> *addr_parts)
+{
+  subrtx_var_iterator::array_type array;
+  FOR_EACH_SUBRTX_VAR (iter, array, x, NONCONST)
+    {
+      rtx sub = *iter;
+
+      if (GET_CODE (sub) != PLUS)
+	{
+	  addr_parts->safe_push (sub);
+	  iter.skip_subrtxes ();
+	}
+    }
+}
+
+/* Compare function for sorting sub expressions X and Y based on
+   precedence defined for communitive operations.  */
+
+static int
+compare_address_parts (const void *x, const void *y)
+{
+  const rtx *rx = (const rtx *)x;
+  const rtx *ry = (const rtx *)y;
+  int px = commutative_operand_precedence (*rx);
+  int py = commutative_operand_precedence (*ry);
+
+  return (py - px);
+}
+
+/* Return a canonical version address for X by following steps:
+     1) Rewrite ASHIFT into MULT recursively.
+     2) Divide address into sub expressions with PLUS as the
+	separator.
+     3) Sort sub expressions according to precedence defined
+	for communative operations.
+     4) Simplify CONST_INT_P sub expressions.
+     5) Create new canonicalized address and return.
+   Callers should prepare a copy of X because this function may
+   modify it in place.  */
+
+static rtx
+canonicalize_address (rtx x)
+{
+  rtx res;
+  unsigned int i, j;
+  machine_mode mode = GET_MODE (x);
+  auto_vec<rtx, MAX_CANON_ADDR_PARTS> addr_parts;
+
+  /* Rewrite ASHIFT into MULT.  */
+  canonicalize_address_mult (x);
+  /* Divide address into sub expressions.  */
+  collect_address_parts (x, &addr_parts);
+  /* Unlikely to have very complicated address.  */
+  if (addr_parts.length () < 2
+      || addr_parts.length () > MAX_CANON_ADDR_PARTS)
+    return x;
+
+  /* Sort sub expressions according to canonicalization precedence.  */
+  addr_parts.qsort (compare_address_parts);
+
+  /* Simplify all constant int summary if possible.  */
+  for (i = 0; i < addr_parts.length (); i++)
+    if (CONST_INT_P (addr_parts[i]))
+      break;
+
+  for (j = i + 1; j < addr_parts.length (); j++)
+    {
+      gcc_assert (CONST_INT_P (addr_parts[j]));
+      addr_parts[i] = simplify_gen_binary (PLUS, mode,
+					   addr_parts[i],
+					   addr_parts[j]);
+    }
+
+  /* Chain PLUS operators to the left for !CONST_INT_P sub expressions.  */
+  res = addr_parts[0];
+  for (j = 1; j < i; j++)
+    res = simplify_gen_binary (PLUS, mode, res, addr_parts[j]);
+
+  /* Pickup the last CONST_INT_P sub expression.  */
+  if (i < addr_parts.length ())
+    res = simplify_gen_binary (PLUS, mode, res, addr_parts[i]);
+
+  return res;
+}
+
 /* Given invariant DEF and its address USE, check if the corresponding
    invariant expr can be propagated into the use or not.  */
 
@@ -761,7 +886,7 @@ static bool
 inv_can_prop_to_addr_use (struct def *def, df_ref use)
 {
   struct invariant *inv;
-  rtx *pos = DF_REF_REAL_LOC (use), def_set;
+  rtx *pos = DF_REF_REAL_LOC (use), def_set, use_set;
   rtx_insn *use_insn = DF_REF_INSN (use);
   rtx_insn *def_insn;
   bool ok;
@@ -778,6 +903,29 @@ inv_can_prop_to_addr_use (struct def *def, df_ref use)
 
   validate_unshare_change (use_insn, pos, SET_SRC (def_set), true);
   ok = verify_changes (0);
+  /* Try harder with canonicalization in address expression.  */
+  if (!ok && (use_set = single_set (use_insn)) != NULL_RTX)
+    {
+      rtx src, dest, mem = NULL_RTX;
+
+      src = SET_SRC (use_set);
+      dest = SET_DEST (use_set);
+      if (MEM_P (src))
+	mem = src;
+      else if (MEM_P (dest))
+	mem = dest;
+
+      if (mem != NULL_RTX
+	  && !memory_address_addr_space_p (GET_MODE (mem),
+					   XEXP (mem, 0),
+					   MEM_ADDR_SPACE (mem)))
+	{
+	  rtx addr = canonicalize_address (copy_rtx (XEXP (mem, 0)));
+	  if (memory_address_addr_space_p (GET_MODE (mem),
+					   addr, MEM_ADDR_SPACE (mem)))
+	    ok = true;
+	}
+    }
   cancel_changes (0);
   return ok;
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr69052.c b/gcc/testsuite/gcc.target/i386/pr69052.c
new file mode 100644
index 0000000..6f491e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr69052.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target pie } */
+/* { dg-options "-O2 -fPIE -pie" } */
+
+int look_nbits[256], loop_sym[256];
+const int ind[] = {
+  0,  1,  8, 16,  9,  2,  3, 10, 17, 24, 32, 25, 18, 11,  4,  5,
+ 12, 19, 26, 33, 40, 48, 41, 34, 27, 20, 13,  6,  7, 14, 21, 28,
+ 35, 42, 49, 56, 57, 50, 43, 36, 29, 22, 15, 23, 30, 37, 44, 51,
+ 58, 59, 52, 45, 38, 31, 39, 46, 53, 60, 61, 54, 47, 55, 62, 63
+};
+int out[256];
+extern void bar (int *, int *);
+void foo (int *l1, int *l2, int *v, int *v1, int *m1, int i)
+{
+  int L = i + 1, b = 20;
+  int result, k;
+
+  for (k = 1; k < 64; k++)
+    {
+      int look = (((L >> (b - 8))) & ((1 << 8) - 1));
+      int nb = l1[look];
+      int code;
+      int r;
+
+      if (nb)
+	{
+	  b -= nb;
+	  result = l2[look];
+	}
+      else
+	{
+	  nb = 9;
+	  code = (((L >> (b -= nb))) & ((1 << nb) - 1));
+	  result = v[(code + v1[nb])];
+	}
+      r = result >> 4;
+      result &= 15;
+      if (result)
+	{
+	  k += r;
+	  r = (((L >> (b -= result))) & ((1 << result) - 1));
+	  if (r < (1 << (result - 1)))
+	    result = r + (((-1) << result) + 1);
+	  else
+	    result = r;
+
+	  out[ind[k]] = result;
+	}
+      bar (&L, &b);
+    }
+}
+
+/* { dg-final { scan-assembler-not "leal\[ \t\]ind@GOTOFF\\(%\[^,\]*\\), %" { target ia32 } } } */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
  2016-02-22  9:22           ` Bin.Cheng
@ 2016-02-25  6:39             ` Jeff Law
  2016-02-25  8:47               ` Bin.Cheng
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Law @ 2016-02-25  6:39 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Bin Cheng, gcc-patches, nd

On 02/22/2016 02:22 AM, Bin.Cheng wrote:

>> My only question is why didn't you use FOR_EACH_SUBRTX_VRA from rtl-iter.h
>> to walk the RTX expressions in collect_address_parts and
>> canonicalize_address_mult?
> Hi Jeff,
> Nothing special, just I haven't used this before, also
> canonicalize_address_mult is alphabetically copied from fwprop.c.  One
> question is when rewriting SHIFT to MULT, we need to modify rtl
> expression in place, does FOR_EACH_SUBRTX iterator support this?  If
> yes, what is the behavior for modified sub-expression?
Hmm.  The question of semantics when we change the underlying 
sub-expressions is an interesting one.

While I think we're OK in practice using the iterators, I think that's 
more of an accident than by design -- if MULT/ASHIFT had a different 
underlying RTL structure then I'm much less sure using the iterators 
would be safe.

Let's go with your original patch that didn't use the iterators.  Sorry 
for making you do the additional work/testing to build the iterator 
version.  But after pondering the issue you raised, I think your 
original patch is safer.


jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
  2016-02-25  6:39             ` Jeff Law
@ 2016-02-25  8:47               ` Bin.Cheng
  2016-03-01 17:08                 ` Bin.Cheng
  0 siblings, 1 reply; 12+ messages in thread
From: Bin.Cheng @ 2016-02-25  8:47 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bin Cheng, gcc-patches, nd

On Thu, Feb 25, 2016 at 6:39 AM, Jeff Law <law@redhat.com> wrote:
> On 02/22/2016 02:22 AM, Bin.Cheng wrote:
>
>>> My only question is why didn't you use FOR_EACH_SUBRTX_VRA from
>>> rtl-iter.h
>>> to walk the RTX expressions in collect_address_parts and
>>> canonicalize_address_mult?
>>
>> Hi Jeff,
>> Nothing special, just I haven't used this before, also
>> canonicalize_address_mult is alphabetically copied from fwprop.c.  One
>> question is when rewriting SHIFT to MULT, we need to modify rtl
>> expression in place, does FOR_EACH_SUBRTX iterator support this?  If
>> yes, what is the behavior for modified sub-expression?
>
> Hmm.  The question of semantics when we change the underlying
> sub-expressions is an interesting one.
>
> While I think we're OK in practice using the iterators, I think that's more
> of an accident than by design -- if MULT/ASHIFT had a different underlying
> RTL structure then I'm much less sure using the iterators would be safe.
Hi Jeff,
Yes, I thought about this too and finally decided to skip sub rtxes in
modified MULT/ASHIFT expressions.  I think this will make the patch
with FOR_EACH_SUBRTX iterator safe.  Although it doesn't dive into
MULT/ASHIFT while the original one does, I don't think there is such
case in practice, after all we can't handle such complicated address
expression either.
>
> Let's go with your original patch that didn't use the iterators.  Sorry for
> making you do the additional work/testing to build the iterator version.
Not even a problem.
> But after pondering the issue you raised, I think your original patch is
> safer.
So in conclusion, I think both versions should be safe, the iterator
one is definitely cleaner.  It is your call which version I should
apply.

Thanks,
bin
>
>
> jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
  2016-02-25  8:47               ` Bin.Cheng
@ 2016-03-01 17:08                 ` Bin.Cheng
  2016-03-01 23:44                   ` Jeff Law
  0 siblings, 1 reply; 12+ messages in thread
From: Bin.Cheng @ 2016-03-01 17:08 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bin Cheng, gcc-patches, nd

On Thu, Feb 25, 2016 at 8:47 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Thu, Feb 25, 2016 at 6:39 AM, Jeff Law <law@redhat.com> wrote:
>> On 02/22/2016 02:22 AM, Bin.Cheng wrote:
>>
>>>> My only question is why didn't you use FOR_EACH_SUBRTX_VRA from
>>>> rtl-iter.h
>>>> to walk the RTX expressions in collect_address_parts and
>>>> canonicalize_address_mult?
>>>
>>> Hi Jeff,
>>> Nothing special, just I haven't used this before, also
>>> canonicalize_address_mult is alphabetically copied from fwprop.c.  One
>>> question is when rewriting SHIFT to MULT, we need to modify rtl
>>> expression in place, does FOR_EACH_SUBRTX iterator support this?  If
>>> yes, what is the behavior for modified sub-expression?
>>
>> Hmm.  The question of semantics when we change the underlying
>> sub-expressions is an interesting one.
>>
>> While I think we're OK in practice using the iterators, I think that's more
>> of an accident than by design -- if MULT/ASHIFT had a different underlying
>> RTL structure then I'm much less sure using the iterators would be safe.
> Hi Jeff,
> Yes, I thought about this too and finally decided to skip sub rtxes in
> modified MULT/ASHIFT expressions.  I think this will make the patch
> with FOR_EACH_SUBRTX iterator safe.  Although it doesn't dive into
> MULT/ASHIFT while the original one does, I don't think there is such
> case in practice, after all we can't handle such complicated address
> expression either.
>>
>> Let's go with your original patch that didn't use the iterators.  Sorry for
>> making you do the additional work/testing to build the iterator version.
> Not even a problem.
>> But after pondering the issue you raised, I think your original patch is
>> safer.
> So in conclusion, I think both versions should be safe, the iterator
> one is definitely cleaner.  It is your call which version I should
> apply.
Hi Jeff,
I tend to apply the new patch with FOR_EACH_SUBRTX because it's
clearer.  Does it work for you?  Of course if you do think it's not
that safe I will fall back to the old one.

Thanks,
bin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
  2016-03-01 17:08                 ` Bin.Cheng
@ 2016-03-01 23:44                   ` Jeff Law
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff Law @ 2016-03-01 23:44 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Bin Cheng, gcc-patches, nd

On 03/01/2016 10:08 AM, Bin.Cheng wrote:
> On Thu, Feb 25, 2016 at 8:47 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Thu, Feb 25, 2016 at 6:39 AM, Jeff Law <law@redhat.com> wrote:
>>> On 02/22/2016 02:22 AM, Bin.Cheng wrote:
>>>
>>>>> My only question is why didn't you use FOR_EACH_SUBRTX_VRA from
>>>>> rtl-iter.h
>>>>> to walk the RTX expressions in collect_address_parts and
>>>>> canonicalize_address_mult?
>>>>
>>>> Hi Jeff,
>>>> Nothing special, just I haven't used this before, also
>>>> canonicalize_address_mult is alphabetically copied from fwprop.c.  One
>>>> question is when rewriting SHIFT to MULT, we need to modify rtl
>>>> expression in place, does FOR_EACH_SUBRTX iterator support this?  If
>>>> yes, what is the behavior for modified sub-expression?
>>>
>>> Hmm.  The question of semantics when we change the underlying
>>> sub-expressions is an interesting one.
>>>
>>> While I think we're OK in practice using the iterators, I think that's more
>>> of an accident than by design -- if MULT/ASHIFT had a different underlying
>>> RTL structure then I'm much less sure using the iterators would be safe.
>> Hi Jeff,
>> Yes, I thought about this too and finally decided to skip sub rtxes in
>> modified MULT/ASHIFT expressions.  I think this will make the patch
>> with FOR_EACH_SUBRTX iterator safe.  Although it doesn't dive into
>> MULT/ASHIFT while the original one does, I don't think there is such
>> case in practice, after all we can't handle such complicated address
>> expression either.
>>>
>>> Let's go with your original patch that didn't use the iterators.  Sorry for
>>> making you do the additional work/testing to build the iterator version.
>> Not even a problem.
>>> But after pondering the issue you raised, I think your original patch is
>>> safer.
>> So in conclusion, I think both versions should be safe, the iterator
>> one is definitely cleaner.  It is your call which version I should
>> apply.
> Hi Jeff,
> I tend to apply the new patch with FOR_EACH_SUBRTX because it's
> clearer.  Does it work for you?  Of course if you do think it's not
> that safe I will fall back to the old one.
Sorry, things are just busy here.

I'm reasonably comfortable with both as long as we're not diving inside 
the subexpressions in the ASHIFT/MULT case.  If you'd prefer the 
FOR_EACH_SUBRTX variant, that's OK with me.

jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-03-01 23:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-09 11:08 [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization Bin Cheng
2016-02-11  7:14 ` Jeff Law
2016-02-11 17:59   ` Bin.Cheng
2016-02-11 23:26     ` Jeff Law
2016-02-16 18:43       ` Bin Cheng
2016-02-19 22:24         ` Jeff Law
2016-02-22  9:22           ` Bin.Cheng
2016-02-25  6:39             ` Jeff Law
2016-02-25  8:47               ` Bin.Cheng
2016-03-01 17:08                 ` Bin.Cheng
2016-03-01 23:44                   ` Jeff Law
2016-02-23 15:10           ` Bin.Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).