[PATCH][GCC] Algorithmic optimization in match and simplify

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH][GCC] Algorithmic optimization in match and simplify
@ 2015-08-28 17:29 Andre Vieira
  2015-08-28 18:13 ` Marc Glisse
  2015-09-01 16:50 ` Location of "dg-final" directives? (was Re: [PATCH][GCC] Algorithmic optimization in match and simplify) David Malcolm
  0 siblings, 2 replies; 17+ messages in thread
From: Andre Vieira @ 2015-08-28 17:29 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1343 bytes --]

Two new algorithmic optimisations:
   1.((X op0 C0) op1 C1) op2 C2)
     with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2
     zero_mask has 1's for all bits that are sure to be 0 in (X op0 C0)
     and 0's otherwise.
     if (op1 == '^') C1 &= ~C2 (Only changed if actually emitted)
     if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2)
     if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2)
   2. (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)


This patch does two algorithmic optimisations that target patterns like:
(((x << 24) | 0x00FFFFFF) ^ 0xFF000000) and ((x ^ 0x40000002) >> 1) | 
0x80000000.

The transformation uses the knowledge of which bits are zero after 
operations like (X {&,<<,(unsigned)>>}) to combine constants, reducing 
run-time operations.
The two examples above would be transformed into (X << 24) ^ 0xFFFFFFFF 
and (X >> 1) ^ 0xa0000001 respectively.


gcc/ChangeLog:

2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>

   * match.pd: Added new patterns:
     ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
     (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)

gcc/testsuite/ChangeLog:

2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>

   * gcc.dg/tree-ssa/forwprop-33.c: New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-algorithmic-optimization.patch --]
[-- Type: text/x-patch; name=0001-algorithmic-optimization.patch, Size: 4541 bytes --]

From 15f86df5b3561edf26ae79cedbe160fd46596fd9 Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira <andsim01@arm.com>
Date: Wed, 26 Aug 2015 16:27:31 +0100
Subject: [PATCH] algorithmic optimization

---
 gcc/match.pd                                | 70 +++++++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c | 42 +++++++++++++++++
 2 files changed, 112 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c

diff --git a/gcc/match.pd b/gcc/match.pd
index eb0ba9d10a9b8ca66c23c56da0678477379daf80..3d9a8f52713e8dfb2189aad76bce709c924fa286 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -708,6 +708,76 @@ along with GCC; see the file COPYING3.  If not see
       && tree_nop_conversion_p (type, TREE_TYPE (@1)))
   (convert (bit_and (bit_not @1) @0))))
 
+/* (X bit_op C0) rshift C1 -> (X rshift C0) bit_op (C0 rshift C1) */
+(for bit_op (bit_ior bit_xor bit_and)
+(simplify
+ (rshift (bit_op:c @0 INTEGER_CST@1) INTEGER_CST@2)
+ (bit_op
+  (rshift @0 @2)
+  { wide_int_to_tree (type, wi::rshift (@1, @2, TYPE_SIGN (type))); })))
+
+/* (X bit_op C0) lshift C1 -> (X lshift C0) bit_op (C0 lshift C1) */
+(for bit_op (bit_ior bit_xor bit_and)
+(simplify
+ (lshift (bit_op:c @0 INTEGER_CST@1) INTEGER_CST@2)
+ (bit_op
+  (lshift @0 @2)
+  { wide_int_to_tree (type, wi::lshift (@1, @2)); })))
+
+
+/* ((X op0 C0) op1 C1) op2 C2)
+    with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2
+    zero_mask has 1's for all bits that are sure to be 0 in (X op0 C0)
+    and 0's otherwise.
+    if (op1 == '^') C1 &= ~C2;
+    if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2)
+    if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2)
+*/
+(for op0 (rshift rshift lshift lshift bit_and bit_and)
+ op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor)
+ op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior)
+(simplify
+ (op2:c
+  (op1:c
+   (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
+ (with
+  {
+    unsigned int prec = TYPE_PRECISION (type);
+    wide_int zero_mask = wi::zero (prec);
+    wide_int C0 = wide_int_storage (@1);
+    wide_int C1 = wide_int_storage (@2);
+    wide_int C2 = wide_int_storage (@3);
+    wide_int cst_emit;
+
+    if (op0 == BIT_AND_EXPR)
+      {
+	zero_mask = wide_int_storage (wi::neg (@1));
+      }
+    else if (op0 == LSHIFT_EXPR && wi::fits_uhwi_p (@1))
+      {
+	zero_mask = wide_int_storage (wi::mask (C0.to_uhwi (), false, prec));
+      }
+    else if (op0 == RSHIFT_EXPR && TYPE_UNSIGNED (type) && wi::fits_uhwi_p (@1))
+      {
+	unsigned HOST_WIDE_INT m = prec - C0.to_uhwi ();
+	zero_mask = wide_int_storage (wi::mask (m, true, prec));
+      }
+
+    if (op1 == BIT_XOR_EXPR)
+      {
+	C1 = wi::bit_and_not (C1,C2);
+	cst_emit = wi::bit_or (C1, C2);
+      }
+    else
+      {
+	cst_emit = wi::bit_xor (C1, C2);
+      }
+  }
+  (if ((C1 & ~zero_mask) == 0)
+   (op2 (op0 @0 @1) { wide_int_to_tree (type, cst_emit); })
+   (if ((C2 & ~zero_mask) == 0)
+    (op1 (op0 @0 @1) { wide_int_to_tree (type, cst_emit); }))))))
+
 /* Associate (p +p off1) +p off2 as (p +p (off1 + off2)).  */
 (simplify
   (pointer_plus (pointer_plus:s @0 @1) @3)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c
new file mode 100644
index 0000000000000000000000000000000000000000..984d8b37a01defe0e6852070a7dfa7ace5027c51
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-forwprop1" } */
+
+unsigned short
+foo (unsigned short a)
+{
+  a ^= 0x4002;
+  a >>= 1;
+  a |= 0x8000; /* Simplify to ((a >> 1) ^ 0xa001).  */
+  return a;
+}
+
+unsigned short
+bar (unsigned short a)
+{
+  a |= 0x4002;
+  a <<= 1;
+  a ^= 0x0001; /* Simplify to ((a << 1) | 0x8005).  */
+  return a;
+}
+
+unsigned short
+baz (unsigned short a)
+{
+  a &= 0xd123;
+  a ^= 0x6040;
+  a |= 0xc031; /* Simplify to ((a & 0xd123) | 0xe071).  */
+  return a;
+}
+
+short
+qux (short a)
+{
+  a ^= 0x8002;
+  a >>= 1;
+  a |= 0x8000; /* Only move shift inward: (((a >> 1) ^ 0x4001 |) 0x8000).  */
+  return a;
+}
+/* { dg-final { scan-tree-dump "\\^ 40961" "forwprop1" } } */
+/* { dg-final { scan-tree-dump "\\| 32773" "forwprop1" } } */
+/* { dg-final { scan-tree-dump "\\| 57457" "forwprop1" } } */
+/* { dg-final { scan-tree-dump "\\^ -16383" "forwprop1" } } */
-- 
1.9.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][GCC] Algorithmic optimization in match and simplify
  2015-08-28 17:29 [PATCH][GCC] Algorithmic optimization in match and simplify Andre Vieira
@ 2015-08-28 18:13 ` Marc Glisse
  2015-09-01 13:40   ` Andre Vieira
  2015-09-01 16:50 ` Location of "dg-final" directives? (was Re: [PATCH][GCC] Algorithmic optimization in match and simplify) David Malcolm
  1 sibling, 1 reply; 17+ messages in thread
From: Marc Glisse @ 2015-08-28 18:13 UTC (permalink / raw)
  To: Andre Vieira; +Cc: GCC Patches

(not a review, I haven't even read the whole patch)

On Fri, 28 Aug 2015, Andre Vieira wrote:

> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>
>  * match.pd: Added new patterns:
>    ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
>    (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)

+(for op0 (rshift rshift lshift lshift bit_and bit_and)
+ op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor)
+ op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior)

You can nest for-loops, it seems clearer as:
(for op0 (rshift lshift bit_and)
  (for op1 (bit_ior bit_xor)
       op2 (bit_xor bit_ior)

+(simplify
+ (op2:c
+  (op1:c
+   (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)

I suspect you will want more :s (single_use) and less :c (canonicalization 
should put constants in second position).

+	C1 = wi::bit_and_not (C1,C2);

Space after ','.

Having wide_int_storage in many places is surprising, I can't find similar 
code anywhere else in gcc.

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][GCC] Algorithmic optimization in match and simplify
  2015-08-28 18:13 ` Marc Glisse
@ 2015-09-01 13:40   ` Andre Vieira
  2015-09-01 14:01     ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Andre Vieira @ 2015-09-01 13:40 UTC (permalink / raw)
  To: gcc-patches

Hi Marc,

On 28/08/15 19:07, Marc Glisse wrote:
> (not a review, I haven't even read the whole patch)
>
> On Fri, 28 Aug 2015, Andre Vieira wrote:
>
>> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>
>>   * match.pd: Added new patterns:
>>     ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
>>     (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>
> +(for op0 (rshift rshift lshift lshift bit_and bit_and)
> + op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor)
> + op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior)
>
> You can nest for-loops, it seems clearer as:
> (for op0 (rshift lshift bit_and)
>    (for op1 (bit_ior bit_xor)
>         op2 (bit_xor bit_ior)

Will do, thank you for pointing it out.
>
> +(simplify
> + (op2:c
> +  (op1:c
> +   (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
>
> I suspect you will want more :s (single_use) and less :c (canonicalization
> should put constants in second position).
>
I can't find the definition of :s (single_use). GCC internals do point 
out that canonicalization does put constants in the second position, 
didnt see that first. Thank you for pointing it out.

> +	C1 = wi::bit_and_not (C1,C2);
>
> Space after ','.
>
Will do.

> Having wide_int_storage in many places is surprising, I can't find similar
> code anywhere else in gcc.
>
>

I tried looking for examples of something similar, I think I ended up 
using wide_int because I was able to convert easily to and from it and 
it has the "mask" and "wide_int_to_tree" functions. I welcome any 
suggestions on what I should be using here for integer constant 
transformations and comparisons.

BR,
Andre

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][GCC] Algorithmic optimization in match and simplify
  2015-09-01 13:40   ` Andre Vieira
@ 2015-09-01 14:01     ` Richard Biener
  2015-09-03 11:13       ` [PATCH v2][GCC] " Andre Vieira
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2015-09-01 14:01 UTC (permalink / raw)
  To: Andre Vieira; +Cc: GCC Patches

On Tue, Sep 1, 2015 at 3:40 PM, Andre Vieira
<Andre.SimoesDiasVieira@arm.com> wrote:
> Hi Marc,
>
> On 28/08/15 19:07, Marc Glisse wrote:
>>
>> (not a review, I haven't even read the whole patch)
>>
>> On Fri, 28 Aug 2015, Andre Vieira wrote:
>>
>>> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>
>>>   * match.pd: Added new patterns:
>>>     ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
>>>     (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>>
>>
>> +(for op0 (rshift rshift lshift lshift bit_and bit_and)
>> + op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor)
>> + op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior)
>>
>> You can nest for-loops, it seems clearer as:
>> (for op0 (rshift lshift bit_and)
>>    (for op1 (bit_ior bit_xor)
>>         op2 (bit_xor bit_ior)
>
>
> Will do, thank you for pointing it out.
>>
>>
>> +(simplify
>> + (op2:c
>> +  (op1:c
>> +   (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
>>
>> I suspect you will want more :s (single_use) and less :c (canonicalization
>> should put constants in second position).
>>
> I can't find the definition of :s (single_use).

Sorry for that - didn't get along updating it yet :/  It restricts matching to
sub-expressions that have a single-use.  So

+  a &= 0xd123;
+  unsigned short tem = a ^ 0x6040;
+  a = tem | 0xc031; /* Simplify _not_ to ((a & 0xd123) | 0xe071).  */
... use of tem ...

we shouldn't do the simplifcation here because the expression
(a & 0x123) ^ 0x6040 is kept live by 'tem'.

> GCC internals do point out
> that canonicalization does put constants in the second position, didnt see
> that first. Thank you for pointing it out.
>
>> +       C1 = wi::bit_and_not (C1,C2);
>>
>> Space after ','.
>>
> Will do.
>
>> Having wide_int_storage in many places is surprising, I can't find similar
>> code anywhere else in gcc.
>>
>>
>
> I tried looking for examples of something similar, I think I ended up using
> wide_int because I was able to convert easily to and from it and it has the
> "mask" and "wide_int_to_tree" functions. I welcome any suggestions on what I
> should be using here for integer constant transformations and comparisons.

Using wide-ints is fine, but you shouldn't need 'wide_int_storage'
constructors - those
are indeed odd.  Is it just for the initializers of wide-ints?

+    wide_int zero_mask = wi::zero (prec);
+    wide_int C0 = wide_int_storage (@1);
+    wide_int C1 = wide_int_storage (@2);
+    wide_int C2 = wide_int_storage (@3);
...
+       zero_mask = wide_int_storage (wi::mask (C0.to_uhwi (), false, prec));

tree_to_uhwi (@1) should do the trick as well

+       C1 = wi::bit_and_not (C1,C2);
+       cst_emit = wi::bit_or (C1, C2);

the ops should be replacable with @2 and @3, the store to C1 obviously not
(but you can use a tree temporary and use wide_int_to_tree here to avoid
the back-and-forth for the case where C1 is not assigned to).

Note that transforms only doing association are prone to endless recursion
in case some other pattern does the reverse op...

Richard.


> BR,
> Andre
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Location of "dg-final" directives? (was Re: [PATCH][GCC] Algorithmic optimization in match and simplify)
  2015-08-28 17:29 [PATCH][GCC] Algorithmic optimization in match and simplify Andre Vieira
  2015-08-28 18:13 ` Marc Glisse
@ 2015-09-01 16:50 ` David Malcolm
  2015-09-01 16:55   ` Marek Polacek
  1 sibling, 1 reply; 17+ messages in thread
From: David Malcolm @ 2015-09-01 16:50 UTC (permalink / raw)
  To: Andre Vieira; +Cc: GCC Patches

On Fri, 2015-08-28 at 17:56 +0100, Andre Vieira wrote:
[..snip..]
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..984d8b37a01defe0e6852070a7dfa7ace5027c51
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c
> @@ -0,0 +1,42 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-forwprop1" } */
> +
> +unsigned short
> +foo (unsigned short a)
> +{
> +  a ^= 0x4002;
> +  a >>= 1;
> +  a |= 0x8000; /* Simplify to ((a >> 1) ^ 0xa001).  */
> +  return a;
> +}
> +
> +unsigned short
> +bar (unsigned short a)
> +{
> +  a |= 0x4002;
> +  a <<= 1;
> +  a ^= 0x0001; /* Simplify to ((a << 1) | 0x8005).  */
> +  return a;
> +}
> +
> +unsigned short
> +baz (unsigned short a)
> +{
> +  a &= 0xd123;
> +  a ^= 0x6040;
> +  a |= 0xc031; /* Simplify to ((a & 0xd123) | 0xe071).  */
> +  return a;
> +}
> +
> +short
> +qux (short a)
> +{
> +  a ^= 0x8002;
> +  a >>= 1;
> +  a |= 0x8000; /* Only move shift inward: (((a >> 1) ^ 0x4001 |) 0x8000).  */
> +  return a;
> +}
> +/* { dg-final { scan-tree-dump "\\^ 40961" "forwprop1" } } */
> +/* { dg-final { scan-tree-dump "\\| 32773" "forwprop1" } } */
> +/* { dg-final { scan-tree-dump "\\| 57457" "forwprop1" } } */
> +/* { dg-final { scan-tree-dump "\\^ -16383" "forwprop1" } } */
> -- 

I can't comment on the patch itself, but I noticed that in the testsuite
addition, you've gathered all the "dg-final" clauses at the end.

I think that this is consistent with existing practice in gcc, but
AFAIK, the dg-final clauses can appear anywhere in the file.

Would it make test cases more readable if we put the dg-final clauses
next to the code that they relate to?  Or, at least after the function
that they relate to?

For example:

  unsigned short
  foo (unsigned short a)
  {
    a ^= 0x4002;
    a >>= 1;
    a |= 0x8000; /* Simplify to ((a >> 1) ^ 0xa001).  */
    return a;
  }
  /* { dg-final { scan-tree-dump "\\^ 40961" "forwprop1" } } */

  unsigned short
  bar (unsigned short a)
  {
      /* snip */ /* Simplify to ((a << 1) | 0x8005).  */
  }
  /* { dg-final { scan-tree-dump "\\| 32773" "forwprop1" } } */

  ... etc ...

I think this may be more readable than the "group them all at the end of
the file approach", especially with testcases with many files and
dg-final directives.

Hope this is constructive and not too "bikesheddy"
Dave

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Location of "dg-final" directives? (was Re: [PATCH][GCC] Algorithmic optimization in match and simplify)
  2015-09-01 16:50 ` Location of "dg-final" directives? (was Re: [PATCH][GCC] Algorithmic optimization in match and simplify) David Malcolm
@ 2015-09-01 16:55   ` Marek Polacek
  2015-09-02 13:24     ` Andre Vieira
  0 siblings, 1 reply; 17+ messages in thread
From: Marek Polacek @ 2015-09-01 16:55 UTC (permalink / raw)
  To: David Malcolm; +Cc: Andre Vieira, GCC Patches

On Tue, Sep 01, 2015 at 12:50:27PM -0400, David Malcolm wrote:
> I can't comment on the patch itself, but I noticed that in the testsuite
> addition, you've gathered all the "dg-final" clauses at the end.
> 
> I think that this is consistent with existing practice in gcc, but
> AFAIK, the dg-final clauses can appear anywhere in the file.
> 
> Would it make test cases more readable if we put the dg-final clauses
> next to the code that they relate to?  Or, at least after the function
> that they relate to?
> 
> For example:
> 
>   unsigned short
>   foo (unsigned short a)
>   {
>     a ^= 0x4002;
>     a >>= 1;
>     a |= 0x8000; /* Simplify to ((a >> 1) ^ 0xa001).  */
>     return a;
>   }
>   /* { dg-final { scan-tree-dump "\\^ 40961" "forwprop1" } } */
> 
>   unsigned short
>   bar (unsigned short a)
>   {
>       /* snip */ /* Simplify to ((a << 1) | 0x8005).  */
>   }
>   /* { dg-final { scan-tree-dump "\\| 32773" "forwprop1" } } */
> 
>   ... etc ...
> 
> I think this may be more readable than the "group them all at the end of
> the file approach", especially with testcases with many files and
> dg-final directives.

Yeah, it's probably somewhat more readable.  Same for dg-output.  E.g. some
ubsan tests already use dg-outputs after functions they relate to.

	Marek

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Location of "dg-final" directives? (was Re: [PATCH][GCC] Algorithmic optimization in match and simplify)
  2015-09-01 16:55   ` Marek Polacek
@ 2015-09-02 13:24     ` Andre Vieira
  0 siblings, 0 replies; 17+ messages in thread
From: Andre Vieira @ 2015-09-02 13:24 UTC (permalink / raw)
  To: Marek Polacek, David Malcolm; +Cc: GCC Patches



On 01/09/15 17:54, Marek Polacek wrote:
> On Tue, Sep 01, 2015 at 12:50:27PM -0400, David Malcolm wrote:
>> I can't comment on the patch itself, but I noticed that in the testsuite
>> addition, you've gathered all the "dg-final" clauses at the end.
>>
>> I think that this is consistent with existing practice in gcc, but
>> AFAIK, the dg-final clauses can appear anywhere in the file.
>>
>> Would it make test cases more readable if we put the dg-final clauses
>> next to the code that they relate to?  Or, at least after the function
>> that they relate to?
>>
>> For example:
>>
>>    unsigned short
>>    foo (unsigned short a)
>>    {
>>      a ^= 0x4002;
>>      a >>= 1;
>>      a |= 0x8000; /* Simplify to ((a >> 1) ^ 0xa001).  */
>>      return a;
>>    }
>>    /* { dg-final { scan-tree-dump "\\^ 40961" "forwprop1" } } */
>>
>>    unsigned short
>>    bar (unsigned short a)
>>    {
>>        /* snip */ /* Simplify to ((a << 1) | 0x8005).  */
>>    }
>>    /* { dg-final { scan-tree-dump "\\| 32773" "forwprop1" } } */
>>
>>    ... etc ...
>>
>> I think this may be more readable than the "group them all at the end of
>> the file approach", especially with testcases with many files and
>> dg-final directives.
>
> Yeah, it's probably somewhat more readable.  Same for dg-output.  E.g. some
> ubsan tests already use dg-outputs after functions they relate to.
>
> 	Marek
>

If no one objects Ill make those changes too. Sounds reasonable to me.

Andre

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2][GCC] Algorithmic optimization in match and simplify
  2015-09-01 14:01     ` Richard Biener
@ 2015-09-03 11:13       ` Andre Vieira
  2015-09-16 14:23         ` Andre Vieira
  2015-09-17  9:52         ` Richard Biener
  0 siblings, 2 replies; 17+ messages in thread
From: Andre Vieira @ 2015-09-03 11:13 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 5184 bytes --]

On 01/09/15 15:01, Richard Biener wrote:
> On Tue, Sep 1, 2015 at 3:40 PM, Andre Vieira
> <Andre.SimoesDiasVieira@arm.com> wrote:
>> Hi Marc,
>>
>> On 28/08/15 19:07, Marc Glisse wrote:
>>>
>>> (not a review, I haven't even read the whole patch)
>>>
>>> On Fri, 28 Aug 2015, Andre Vieira wrote:
>>>
>>>> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>>
>>>>    * match.pd: Added new patterns:
>>>>      ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
>>>>      (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>>>
>>>
>>> +(for op0 (rshift rshift lshift lshift bit_and bit_and)
>>> + op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor)
>>> + op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior)
>>>
>>> You can nest for-loops, it seems clearer as:
>>> (for op0 (rshift lshift bit_and)
>>>     (for op1 (bit_ior bit_xor)
>>>          op2 (bit_xor bit_ior)
>>
>>
>> Will do, thank you for pointing it out.
>>>
>>>
>>> +(simplify
>>> + (op2:c
>>> +  (op1:c
>>> +   (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
>>>
>>> I suspect you will want more :s (single_use) and less :c (canonicalization
>>> should put constants in second position).
>>>
>> I can't find the definition of :s (single_use).
>
> Sorry for that - didn't get along updating it yet :/  It restricts matching to
> sub-expressions that have a single-use.  So
>
> +  a &= 0xd123;
> +  unsigned short tem = a ^ 0x6040;
> +  a = tem | 0xc031; /* Simplify _not_ to ((a & 0xd123) | 0xe071).  */
> ... use of tem ...
>
> we shouldn't do the simplifcation here because the expression
> (a & 0x123) ^ 0x6040 is kept live by 'tem'.
>
>> GCC internals do point out
>> that canonicalization does put constants in the second position, didnt see
>> that first. Thank you for pointing it out.
>>
>>> +       C1 = wi::bit_and_not (C1,C2);
>>>
>>> Space after ','.
>>>
>> Will do.
>>
>>> Having wide_int_storage in many places is surprising, I can't find similar
>>> code anywhere else in gcc.
>>>
>>>
>>
>> I tried looking for examples of something similar, I think I ended up using
>> wide_int because I was able to convert easily to and from it and it has the
>> "mask" and "wide_int_to_tree" functions. I welcome any suggestions on what I
>> should be using here for integer constant transformations and comparisons.
>
> Using wide-ints is fine, but you shouldn't need 'wide_int_storage'
> constructors - those
> are indeed odd.  Is it just for the initializers of wide-ints?
>
> +    wide_int zero_mask = wi::zero (prec);
> +    wide_int C0 = wide_int_storage (@1);
> +    wide_int C1 = wide_int_storage (@2);
> +    wide_int C2 = wide_int_storage (@3);
> ...
> +       zero_mask = wide_int_storage (wi::mask (C0.to_uhwi (), false, prec));
>
> tree_to_uhwi (@1) should do the trick as well
>
> +       C1 = wi::bit_and_not (C1,C2);
> +       cst_emit = wi::bit_or (C1, C2);
>
> the ops should be replacable with @2 and @3, the store to C1 obviously not
> (but you can use a tree temporary and use wide_int_to_tree here to avoid
> the back-and-forth for the case where C1 is not assigned to).
>
> Note that transforms only doing association are prone to endless recursion
> in case some other pattern does the reverse op...
>
> Richard.
>
>
>> BR,
>> Andre
>>
>
Thank you for all the comments, see reworked version:

Two new algorithmic optimisations:
   1.((X op0 C0) op1 C1) op2 C2)
     with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2
     zero_mask has 1's for all bits that are sure to be 0 in (X op0 C0)
     and 0's otherwise.
     if (op1 == '^') C1 &= ~C2 (Only changed if actually emitted)
     if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2)
     if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2)
   2. (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)


This patch does two algorithmic optimisations that target patterns like:
(((x << 24) | 0x00FFFFFF) ^ 0xFF000000) and ((x ^ 0x40000002) >> 1) | 
0x80000000.

The transformation uses the knowledge of which bits are zero after 
operations like (X {&,<<,(unsigned)>>}) to combine constants, reducing 
run-time operations.
The two examples above would be transformed into (X << 24) ^ 0xFFFFFFFF 
and (X >> 1) ^ 0xa0000001 respectively.

The second transformation enables more applications of the first. Also 
some targets may benefit from delaying shift operations. I am aware that 
such an optimization, in combination with one or more optimizations that 
cause the reverse transformation, may lead to an infinite loop. Though 
such behavior has not been detected during regression testing and 
bootstrapping on aarch64.

gcc/ChangeLog:

2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>

   * match.pd: Added new patterns:
     ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
     (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)

gcc/testsuite/ChangeLog:

2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
             Hale Wang  <hale.wang@arm.com>

   * gcc.dg/tree-ssa/forwprop-33.c: New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-algorithmic-optimization-v2.patch --]
[-- Type: text/x-patch; name=0001-algorithmic-optimization-v2.patch, Size: 4460 bytes --]

From 3d1d4d838fed9af45aea9fa99f8954585fee7c23 Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira <andsim01@arm.com>
Date: Wed, 2 Sep 2015 16:47:38 +0100
Subject: [PATCH] algorithmic optimization v2

---
 gcc/match.pd                                | 70 +++++++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c | 42 +++++++++++++++++
 2 files changed, 112 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c

diff --git a/gcc/match.pd b/gcc/match.pd
index fb4b342d31d26a03bc756c538f6635f2acf6ddb2..6138591c0cef1814dcbd6313dedaa95a91700dc2 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -710,6 +710,76 @@ along with GCC; see the file COPYING3.  If not see
       && tree_nop_conversion_p (type, TREE_TYPE (@1)))
   (convert (bit_and (bit_not @1) @0))))
 
+/* (X bit_op C0) rshift C1 -> (X rshift C0) bit_op (C0 rshift C1) */
+(for bit_op (bit_ior bit_xor bit_and)
+(simplify
+ (rshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2)
+ (bit_op
+  (rshift @0 @2)
+  { wide_int_to_tree (type, wi::rshift (@1, @2, TYPE_SIGN (type))); })))
+
+/* (X bit_op C0) lshift C1 -> (X lshift C0) bit_op (C0 lshift C1) */
+(for bit_op (bit_ior bit_xor bit_and)
+(simplify
+ (lshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2)
+ (bit_op
+  (lshift @0 @2)
+  { wide_int_to_tree (type, wi::lshift (@1, @2)); })))
+
+
+/* ((X op0 C0) op1 C1) op2 C2)
+    with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2
+    zero_mask has 1's for all bits that are sure to be 0 in (X op0 C0)
+    and 0's otherwise.
+    if (op1 == '^') C1 &= ~C2;
+    if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2)
+    if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2)
+*/
+(for op0 (rshift lshift bit_and)
+ (for op1 (bit_ior bit_xor)
+      op2 (bit_xor bit_ior)
+(simplify
+ (op2
+  (op1:s
+   (op0:s @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
+ (if (!(op0 == RSHIFT_EXPR && !TYPE_UNSIGNED (type)) && wi::fits_uhwi_p (@1))
+  (with
+   {
+     unsigned int prec = TYPE_PRECISION (type);
+     wide_int zero_mask_not;
+     wide_int C1;
+     wide_int cst_emit;
+     if (op0 == BIT_AND_EXPR)
+       {
+	 zero_mask_not = @1;
+       }
+     else if (op0 == LSHIFT_EXPR)
+       {
+	 zero_mask_not = wi::bit_not (wi::mask (tree_to_uhwi (@1), false,
+						prec));
+       }
+     else if (op0 == RSHIFT_EXPR)
+       {
+	 unsigned HOST_WIDE_INT m = prec - tree_to_uhwi (@1);
+	 zero_mask_not = wi::bit_not (wi::mask (m, true, prec));
+       }
+
+     if (op1 == BIT_XOR_EXPR)
+       {
+	 C1 = wi::bit_and_not (@2, @3);
+	 cst_emit = wi::bit_or (C1, @3);
+       }
+     else
+       {
+	 C1 = @2;
+	 cst_emit = wi::bit_xor (@2, @3);
+       }
+   }
+   (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec)))
+    (op2 (op0 @0 @1) { wide_int_to_tree (type, cst_emit); })
+    (if (wi::eq_p (wi::bit_and (@3, zero_mask_not), wi::zero (prec)))
+     (op1 (op0 @0 @1) { wide_int_to_tree (type, cst_emit); }))))))))
+
 /* Associate (p +p off1) +p off2 as (p +p (off1 + off2)).  */
 (simplify
   (pointer_plus (pointer_plus:s @0 @1) @3)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c
new file mode 100644
index 0000000000000000000000000000000000000000..c8940d62a7a9370e9d2b911badfc6d085f988304
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-forwprop1" } */
+
+unsigned short
+foo (unsigned short a)
+{
+  a ^= 0x4002;
+  a >>= 1;
+  a |= 0x8000; /* Simplify to ((a >> 1) ^ 0xa001).  */
+  return a;
+}
+/* { dg-final { scan-tree-dump "\\^ 40961" "forwprop1" } } */
+
+unsigned short
+bar (unsigned short a)
+{
+  a |= 0x4002;
+  a <<= 1;
+  a ^= 0x0001; /* Simplify to ((a << 1) | 0x8005).  */
+  return a;
+}
+/* { dg-final { scan-tree-dump "\\| 32773" "forwprop1" } } */
+
+unsigned short
+baz (unsigned short a)
+{
+  a &= 0xd123;
+  a ^= 0x6040;
+  a |= 0xc031; /* Simplify to ((a & 0xd123) | 0xe071).  */
+  return a;
+}
+/* { dg-final { scan-tree-dump "\\| 57457" "forwprop1" } } */
+
+short
+qux (short a)
+{
+  a ^= 0x8002;
+  a >>= 1;
+  a |= 0x8000; /* Only move shift inward: (((a >> 1) ^ 0x4001 |) 0x8000).  */
+  return a;
+}
+/* { dg-final { scan-tree-dump "\\^ -16383" "forwprop1" } } */
-- 
1.9.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2][GCC] Algorithmic optimization in match and simplify
  2015-09-03 11:13       ` [PATCH v2][GCC] " Andre Vieira
@ 2015-09-16 14:23         ` Andre Vieira
  2015-09-17  9:52         ` Richard Biener
  1 sibling, 0 replies; 17+ messages in thread
From: Andre Vieira @ 2015-09-16 14:23 UTC (permalink / raw)
  To: gcc-patches

On 03/09/15 12:11, Andre Vieira wrote:
> On 01/09/15 15:01, Richard Biener wrote:
>> On Tue, Sep 1, 2015 at 3:40 PM, Andre Vieira
>> <Andre.SimoesDiasVieira@arm.com> wrote:
>>> Hi Marc,
>>>
>>> On 28/08/15 19:07, Marc Glisse wrote:
>>>>
>>>> (not a review, I haven't even read the whole patch)
>>>>
>>>> On Fri, 28 Aug 2015, Andre Vieira wrote:
>>>>
>>>>> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>>>
>>>>>     * match.pd: Added new patterns:
>>>>>       ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
>>>>>       (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>>>>
>>>>
>>>> +(for op0 (rshift rshift lshift lshift bit_and bit_and)
>>>> + op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor)
>>>> + op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior)
>>>>
>>>> You can nest for-loops, it seems clearer as:
>>>> (for op0 (rshift lshift bit_and)
>>>>      (for op1 (bit_ior bit_xor)
>>>>           op2 (bit_xor bit_ior)
>>>
>>>
>>> Will do, thank you for pointing it out.
>>>>
>>>>
>>>> +(simplify
>>>> + (op2:c
>>>> +  (op1:c
>>>> +   (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
>>>>
>>>> I suspect you will want more :s (single_use) and less :c (canonicalization
>>>> should put constants in second position).
>>>>
>>> I can't find the definition of :s (single_use).
>>
>> Sorry for that - didn't get along updating it yet :/  It restricts matching to
>> sub-expressions that have a single-use.  So
>>
>> +  a &= 0xd123;
>> +  unsigned short tem = a ^ 0x6040;
>> +  a = tem | 0xc031; /* Simplify _not_ to ((a & 0xd123) | 0xe071).  */
>> ... use of tem ...
>>
>> we shouldn't do the simplifcation here because the expression
>> (a & 0x123) ^ 0x6040 is kept live by 'tem'.
>>
>>> GCC internals do point out
>>> that canonicalization does put constants in the second position, didnt see
>>> that first. Thank you for pointing it out.
>>>
>>>> +       C1 = wi::bit_and_not (C1,C2);
>>>>
>>>> Space after ','.
>>>>
>>> Will do.
>>>
>>>> Having wide_int_storage in many places is surprising, I can't find similar
>>>> code anywhere else in gcc.
>>>>
>>>>
>>>
>>> I tried looking for examples of something similar, I think I ended up using
>>> wide_int because I was able to convert easily to and from it and it has the
>>> "mask" and "wide_int_to_tree" functions. I welcome any suggestions on what I
>>> should be using here for integer constant transformations and comparisons.
>>
>> Using wide-ints is fine, but you shouldn't need 'wide_int_storage'
>> constructors - those
>> are indeed odd.  Is it just for the initializers of wide-ints?
>>
>> +    wide_int zero_mask = wi::zero (prec);
>> +    wide_int C0 = wide_int_storage (@1);
>> +    wide_int C1 = wide_int_storage (@2);
>> +    wide_int C2 = wide_int_storage (@3);
>> ...
>> +       zero_mask = wide_int_storage (wi::mask (C0.to_uhwi (), false, prec));
>>
>> tree_to_uhwi (@1) should do the trick as well
>>
>> +       C1 = wi::bit_and_not (C1,C2);
>> +       cst_emit = wi::bit_or (C1, C2);
>>
>> the ops should be replacable with @2 and @3, the store to C1 obviously not
>> (but you can use a tree temporary and use wide_int_to_tree here to avoid
>> the back-and-forth for the case where C1 is not assigned to).
>>
>> Note that transforms only doing association are prone to endless recursion
>> in case some other pattern does the reverse op...
>>
>> Richard.
>>
>>
>>> BR,
>>> Andre
>>>
>>
> Thank you for all the comments, see reworked version:
>
> Two new algorithmic optimisations:
>     1.((X op0 C0) op1 C1) op2 C2)
>       with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2
>       zero_mask has 1's for all bits that are sure to be 0 in (X op0 C0)
>       and 0's otherwise.
>       if (op1 == '^') C1 &= ~C2 (Only changed if actually emitted)
>       if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2)
>       if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2)
>     2. (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>
>
> This patch does two algorithmic optimisations that target patterns like:
> (((x << 24) | 0x00FFFFFF) ^ 0xFF000000) and ((x ^ 0x40000002) >> 1) |
> 0x80000000.
>
> The transformation uses the knowledge of which bits are zero after
> operations like (X {&,<<,(unsigned)>>}) to combine constants, reducing
> run-time operations.
> The two examples above would be transformed into (X << 24) ^ 0xFFFFFFFF
> and (X >> 1) ^ 0xa0000001 respectively.
>
> The second transformation enables more applications of the first. Also
> some targets may benefit from delaying shift operations. I am aware that
> such an optimization, in combination with one or more optimizations that
> cause the reverse transformation, may lead to an infinite loop. Though
> such behavior has not been detected during regression testing and
> bootstrapping on aarch64.
>
> gcc/ChangeLog:
>
> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>
>     * match.pd: Added new patterns:
>       ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
>       (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>
> gcc/testsuite/ChangeLog:
>
> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>               Hale Wang  <hale.wang@arm.com>
>
>     * gcc.dg/tree-ssa/forwprop-33.c: New test.
>

Ping.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2][GCC] Algorithmic optimization in match and simplify
  2015-09-03 11:13       ` [PATCH v2][GCC] " Andre Vieira
  2015-09-16 14:23         ` Andre Vieira
@ 2015-09-17  9:52         ` Richard Biener
  2015-09-25 11:44           ` Andre Vieira
  1 sibling, 1 reply; 17+ messages in thread
From: Richard Biener @ 2015-09-17  9:52 UTC (permalink / raw)
  To: Andre Vieira; +Cc: GCC Patches

On Thu, Sep 3, 2015 at 1:11 PM, Andre Vieira
<Andre.SimoesDiasVieira@arm.com> wrote:
> On 01/09/15 15:01, Richard Biener wrote:
>>
>> On Tue, Sep 1, 2015 at 3:40 PM, Andre Vieira
>> <Andre.SimoesDiasVieira@arm.com> wrote:
>>>
>>> Hi Marc,
>>>
>>> On 28/08/15 19:07, Marc Glisse wrote:
>>>>
>>>>
>>>> (not a review, I haven't even read the whole patch)
>>>>
>>>> On Fri, 28 Aug 2015, Andre Vieira wrote:
>>>>
>>>>> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>>>
>>>>>    * match.pd: Added new patterns:
>>>>>      ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
>>>>>      (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>}
>>>>> C1)
>>>>
>>>>
>>>>
>>>> +(for op0 (rshift rshift lshift lshift bit_and bit_and)
>>>> + op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor)
>>>> + op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior)
>>>>
>>>> You can nest for-loops, it seems clearer as:
>>>> (for op0 (rshift lshift bit_and)
>>>>     (for op1 (bit_ior bit_xor)
>>>>          op2 (bit_xor bit_ior)
>>>
>>>
>>>
>>> Will do, thank you for pointing it out.
>>>>
>>>>
>>>>
>>>> +(simplify
>>>> + (op2:c
>>>> +  (op1:c
>>>> +   (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
>>>>
>>>> I suspect you will want more :s (single_use) and less :c
>>>> (canonicalization
>>>> should put constants in second position).
>>>>
>>> I can't find the definition of :s (single_use).
>>
>>
>> Sorry for that - didn't get along updating it yet :/  It restricts
>> matching to
>> sub-expressions that have a single-use.  So
>>
>> +  a &= 0xd123;
>> +  unsigned short tem = a ^ 0x6040;
>> +  a = tem | 0xc031; /* Simplify _not_ to ((a & 0xd123) | 0xe071).  */
>> ... use of tem ...
>>
>> we shouldn't do the simplifcation here because the expression
>> (a & 0x123) ^ 0x6040 is kept live by 'tem'.
>>
>>> GCC internals do point out
>>> that canonicalization does put constants in the second position, didnt
>>> see
>>> that first. Thank you for pointing it out.
>>>
>>>> +       C1 = wi::bit_and_not (C1,C2);
>>>>
>>>> Space after ','.
>>>>
>>> Will do.
>>>
>>>> Having wide_int_storage in many places is surprising, I can't find
>>>> similar
>>>> code anywhere else in gcc.
>>>>
>>>>
>>>
>>> I tried looking for examples of something similar, I think I ended up
>>> using
>>> wide_int because I was able to convert easily to and from it and it has
>>> the
>>> "mask" and "wide_int_to_tree" functions. I welcome any suggestions on
>>> what I
>>> should be using here for integer constant transformations and
>>> comparisons.
>>
>>
>> Using wide-ints is fine, but you shouldn't need 'wide_int_storage'
>> constructors - those
>> are indeed odd.  Is it just for the initializers of wide-ints?
>>
>> +    wide_int zero_mask = wi::zero (prec);
>> +    wide_int C0 = wide_int_storage (@1);
>> +    wide_int C1 = wide_int_storage (@2);
>> +    wide_int C2 = wide_int_storage (@3);
>> ...
>> +       zero_mask = wide_int_storage (wi::mask (C0.to_uhwi (), false,
>> prec));
>>
>> tree_to_uhwi (@1) should do the trick as well
>>
>> +       C1 = wi::bit_and_not (C1,C2);
>> +       cst_emit = wi::bit_or (C1, C2);
>>
>> the ops should be replacable with @2 and @3, the store to C1 obviously not
>> (but you can use a tree temporary and use wide_int_to_tree here to avoid
>> the back-and-forth for the case where C1 is not assigned to).
>>
>> Note that transforms only doing association are prone to endless recursion
>> in case some other pattern does the reverse op...
>>
>> Richard.
>>
>>
>>> BR,
>>> Andre
>>>
>>
> Thank you for all the comments, see reworked version:
>
> Two new algorithmic optimisations:
>   1.((X op0 C0) op1 C1) op2 C2)
>     with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2
>     zero_mask has 1's for all bits that are sure to be 0 in (X op0 C0)
>     and 0's otherwise.
>     if (op1 == '^') C1 &= ~C2 (Only changed if actually emitted)
>     if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2)
>     if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2)
>   2. (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>
>
> This patch does two algorithmic optimisations that target patterns like:
> (((x << 24) | 0x00FFFFFF) ^ 0xFF000000) and ((x ^ 0x40000002) >> 1) |
> 0x80000000.
>
> The transformation uses the knowledge of which bits are zero after
> operations like (X {&,<<,(unsigned)>>}) to combine constants, reducing
> run-time operations.
> The two examples above would be transformed into (X << 24) ^ 0xFFFFFFFF and
> (X >> 1) ^ 0xa0000001 respectively.
>
> The second transformation enables more applications of the first. Also some
> targets may benefit from delaying shift operations. I am aware that such an
> optimization, in combination with one or more optimizations that cause the
> reverse transformation, may lead to an infinite loop. Though such behavior
> has not been detected during regression testing and bootstrapping on
> aarch64.

+/* (X bit_op C0) rshift C1 -> (X rshift C0) bit_op (C0 rshift C1) */
+(for bit_op (bit_ior bit_xor bit_and)
+(simplify
+ (rshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2)
+ (bit_op
+  (rshift @0 @2)
+  { wide_int_to_tree (type, wi::rshift (@1, @2, TYPE_SIGN (type))); })))
+
+/* (X bit_op C0) lshift C1 -> (X lshift C0) bit_op (C0 lshift C1) */
+(for bit_op (bit_ior bit_xor bit_and)
+(simplify
+ (lshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2)
+ (bit_op
+  (lshift @0 @2)
+  { wide_int_to_tree (type, wi::lshift (@1, @2)); })))

this may be one case where not using wide-ints to be able to combine the
patterns makes sense.  Thus,

(for shift (lshift rshift)
 (simplify
  (shift ...)
  (bit_op
   (shift @0 @2)
   (shift @1 @2))))

note that I'm worried we'd take on "invalid" ubsan here when the above
applies to

int foo (int i)
{
  return (i & 0x7fffffff) >> 3;
}
int main () { return foo (0x80000007); }

and i is negative.  That's because right-shift of negative values
invokes undefined
behavior.  IIRC in the middle-end we will not be taking advantage of
that but the
simplifications apply to GENERIC as well and thus may hit before ubsan
instrumentation triggers(?)  It would be nice if you could double-check that.

+ (if (!(op0 == RSHIFT_EXPR && !TYPE_UNSIGNED (type)) && wi::fits_uhwi_p (@1))

you only need fits_uhwi_p (@1) in the op0 != BIT_AND_EXPR case it
seems, so better
move it down to those cases.

+   (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec)))

I think you can write

   (if (wi::bit_and (...) == 0)

or at least wi:eq_p (wi::bit_and (...), 0).

I wonder if we shouldn't improve the pattern by handling (X op0 C0)
transparently
via using get_nonzero_bits (yeah, that's not exactly zero_mask but its
inverse AFAIK).
We'd wrap get_nonzero_bits in a helper that can handle GENERIC and your
&, >>, << cases (hmm, such function must already exist somewhere...).  So you'd
reduce the pattern to

+ (for op1 (bit_ior bit_xor)
+      op2 (bit_xor bit_ior)
+(simplify
+ (op2
+  (op1:s @0 INTEGER_CST@2) INTEGER_CST@3))
   (with
    {
      wide_int zero_mask_not = get_nonzero_bits (@0);
...
    }

This would make use of value-range information determined by VRP for example.

note that with your pattern you'd want to capture (op0:s @0 INTEGER_CST@1)
like via (op0@4 @0 INTEGER_CST@1) so you can re-use it in the replacement
like so:

+   (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec)))
+    (op2 @4 { wide_int_to_tree (type, cst_emit); })
+    (if (wi::eq_p (wi::bit_and (@3, zero_mask_not), wi::zero (prec)))
+     (op1 @4 { wide_int_to_tree (type, cst_emit); }))))))))

the expression doesn't need a :s then obviously.

Thanks and sorry for the delay in reviewing this.
Richard.


> gcc/ChangeLog:
>
> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>
>   * match.pd: Added new patterns:
>     ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
>     (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>
> gcc/testsuite/ChangeLog:
>
> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>             Hale Wang  <hale.wang@arm.com>
>
>   * gcc.dg/tree-ssa/forwprop-33.c: New test.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2][GCC] Algorithmic optimization in match and simplify
  2015-09-17  9:52         ` Richard Biener
@ 2015-09-25 11:44           ` Andre Vieira
  2015-09-25 12:22             ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Andre Vieira @ 2015-09-25 11:44 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On 17/09/15 10:46, Richard Biener wrote:
> On Thu, Sep 3, 2015 at 1:11 PM, Andre Vieira
> <Andre.SimoesDiasVieira@arm.com> wrote:
>> On 01/09/15 15:01, Richard Biener wrote:
>>>
>>> On Tue, Sep 1, 2015 at 3:40 PM, Andre Vieira
>>> <Andre.SimoesDiasVieira@arm.com> wrote:
>>>>
>>>> Hi Marc,
>>>>
>>>> On 28/08/15 19:07, Marc Glisse wrote:
>>>>>
>>>>>
>>>>> (not a review, I haven't even read the whole patch)
>>>>>
>>>>> On Fri, 28 Aug 2015, Andre Vieira wrote:
>>>>>
>>>>>> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>>>>
>>>>>>     * match.pd: Added new patterns:
>>>>>>       ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
>>>>>>       (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>}
>>>>>> C1)
>>>>>
>>>>>
>>>>>
>>>>> +(for op0 (rshift rshift lshift lshift bit_and bit_and)
>>>>> + op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor)
>>>>> + op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior)
>>>>>
>>>>> You can nest for-loops, it seems clearer as:
>>>>> (for op0 (rshift lshift bit_and)
>>>>>      (for op1 (bit_ior bit_xor)
>>>>>           op2 (bit_xor bit_ior)
>>>>
>>>>
>>>>
>>>> Will do, thank you for pointing it out.
>>>>>
>>>>>
>>>>>
>>>>> +(simplify
>>>>> + (op2:c
>>>>> +  (op1:c
>>>>> +   (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
>>>>>
>>>>> I suspect you will want more :s (single_use) and less :c
>>>>> (canonicalization
>>>>> should put constants in second position).
>>>>>
>>>> I can't find the definition of :s (single_use).
>>>
>>>
>>> Sorry for that - didn't get along updating it yet :/  It restricts
>>> matching to
>>> sub-expressions that have a single-use.  So
>>>
>>> +  a &= 0xd123;
>>> +  unsigned short tem = a ^ 0x6040;
>>> +  a = tem | 0xc031; /* Simplify _not_ to ((a & 0xd123) | 0xe071).  */
>>> ... use of tem ...
>>>
>>> we shouldn't do the simplifcation here because the expression
>>> (a & 0x123) ^ 0x6040 is kept live by 'tem'.
>>>
>>>> GCC internals do point out
>>>> that canonicalization does put constants in the second position, didnt
>>>> see
>>>> that first. Thank you for pointing it out.
>>>>
>>>>> +       C1 = wi::bit_and_not (C1,C2);
>>>>>
>>>>> Space after ','.
>>>>>
>>>> Will do.
>>>>
>>>>> Having wide_int_storage in many places is surprising, I can't find
>>>>> similar
>>>>> code anywhere else in gcc.
>>>>>
>>>>>
>>>>
>>>> I tried looking for examples of something similar, I think I ended up
>>>> using
>>>> wide_int because I was able to convert easily to and from it and it has
>>>> the
>>>> "mask" and "wide_int_to_tree" functions. I welcome any suggestions on
>>>> what I
>>>> should be using here for integer constant transformations and
>>>> comparisons.
>>>
>>>
>>> Using wide-ints is fine, but you shouldn't need 'wide_int_storage'
>>> constructors - those
>>> are indeed odd.  Is it just for the initializers of wide-ints?
>>>
>>> +    wide_int zero_mask = wi::zero (prec);
>>> +    wide_int C0 = wide_int_storage (@1);
>>> +    wide_int C1 = wide_int_storage (@2);
>>> +    wide_int C2 = wide_int_storage (@3);
>>> ...
>>> +       zero_mask = wide_int_storage (wi::mask (C0.to_uhwi (), false,
>>> prec));
>>>
>>> tree_to_uhwi (@1) should do the trick as well
>>>
>>> +       C1 = wi::bit_and_not (C1,C2);
>>> +       cst_emit = wi::bit_or (C1, C2);
>>>
>>> the ops should be replacable with @2 and @3, the store to C1 obviously not
>>> (but you can use a tree temporary and use wide_int_to_tree here to avoid
>>> the back-and-forth for the case where C1 is not assigned to).
>>>
>>> Note that transforms only doing association are prone to endless recursion
>>> in case some other pattern does the reverse op...
>>>
>>> Richard.
>>>
>>>
>>>> BR,
>>>> Andre
>>>>
>>>
>> Thank you for all the comments, see reworked version:
>>
>> Two new algorithmic optimisations:
>>    1.((X op0 C0) op1 C1) op2 C2)
>>      with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2
>>      zero_mask has 1's for all bits that are sure to be 0 in (X op0 C0)
>>      and 0's otherwise.
>>      if (op1 == '^') C1 &= ~C2 (Only changed if actually emitted)
>>      if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2)
>>      if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2)
>>    2. (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>>
>>
>> This patch does two algorithmic optimisations that target patterns like:
>> (((x << 24) | 0x00FFFFFF) ^ 0xFF000000) and ((x ^ 0x40000002) >> 1) |
>> 0x80000000.
>>
>> The transformation uses the knowledge of which bits are zero after
>> operations like (X {&,<<,(unsigned)>>}) to combine constants, reducing
>> run-time operations.
>> The two examples above would be transformed into (X << 24) ^ 0xFFFFFFFF and
>> (X >> 1) ^ 0xa0000001 respectively.
>>
>> The second transformation enables more applications of the first. Also some
>> targets may benefit from delaying shift operations. I am aware that such an
>> optimization, in combination with one or more optimizations that cause the
>> reverse transformation, may lead to an infinite loop. Though such behavior
>> has not been detected during regression testing and bootstrapping on
>> aarch64.
>
> +/* (X bit_op C0) rshift C1 -> (X rshift C0) bit_op (C0 rshift C1) */
> +(for bit_op (bit_ior bit_xor bit_and)
> +(simplify
> + (rshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2)
> + (bit_op
> +  (rshift @0 @2)
> +  { wide_int_to_tree (type, wi::rshift (@1, @2, TYPE_SIGN (type))); })))
> +
> +/* (X bit_op C0) lshift C1 -> (X lshift C0) bit_op (C0 lshift C1) */
> +(for bit_op (bit_ior bit_xor bit_and)
> +(simplify
> + (lshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2)
> + (bit_op
> +  (lshift @0 @2)
> +  { wide_int_to_tree (type, wi::lshift (@1, @2)); })))

Will do, good to see that my second transformation still picks up the 
shift @1 @2 as a constant. I'm assuming there is some constant folding 
going on between simplify transformations?

>
> this may be one case where not using wide-ints to be able to combine the
> patterns makes sense.  Thus,
>
> (for shift (lshift rshift)
>   (simplify
>    (shift ...)
>    (bit_op
>     (shift @0 @2)
>     (shift @1 @2))))
>
> note that I'm worried we'd take on "invalid" ubsan here when the above
> applies to
>
> int foo (int i)
> {
>    return (i & 0x7fffffff) >> 3;
> }
> int main () { return foo (0x80000007); }
>
> and i is negative.  That's because right-shift of negative values
> invokes undefined
> behavior.  IIRC in the middle-end we will not be taking advantage of
> that but the
> simplifications apply to GENERIC as well and thus may hit before ubsan
> instrumentation triggers(?)  It would be nice if you could double-check that.

I was looking into this and I understand your worries, though, I found 
out that for the particular case of shifts and bit_and there already is 
a simplify transformation that does the same, regardless of the sign.

/* Fold (X & C2) << C1 into (X << C1) & (C2 << C1)
    (X & C2) >> C1 into (X >> C1) & (C2 >> C1).  */
(for shift (lshift rshift)
  (simplify
   (shift (convert?:s (bit_and:s @0 INTEGER_CST@2)) INTEGER_CST@1)
   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
    (with { tree mask = int_const_binop (shift, fold_convert (type, @2), 
@1); }
     (bit_and (shift (convert @0) @1) { mask; })))))

Now, I don't quite understand what you mean by ubsan instrumentation, 
will GCC introduce guards into code where it detects potential undefined 
behavior? Also, it might be worth to note that right shift of negative 
values is denoted as "implementation defined" by the C standard. GCC 
however doesn't include it in its list of implementation defined 
behavior so I guess that is why you refer to it as undefined?

Should we perhaps disable transformations where we can not guarantee 
that the right shift produced is not one of negative values? I.e. 
prohibit this transformation for:
1) signed types and op1 == bit_xor
2) signed types and op1 == bit_and and C1 has sign bit set.

Also would it be useful in cases where you have signed shift and 
bit_and, and C1 is not negative, to do the transformation but replace 
the signed shift by an unsigned shift. This to make sure we don't 
introduce undefined/implementation defined behavior were there was none.

This does however change the current behavior!

>
> + (if (!(op0 == RSHIFT_EXPR && !TYPE_UNSIGNED (type)) && wi::fits_uhwi_p (@1))
>
> you only need fits_uhwi_p (@1) in the op0 != BIT_AND_EXPR case it
> seems, so better
> move it down to those cases.

So I used to, but I had the problem that I didn't know how to make it 
"fail" the matching if this was not the case. For instance if op0 is a 
lshift for which the constant doesn't fit uhwi, then it would fall 
through and never set the zero mask, potentially leading to a wrong 
transformation. Now I'm not sure this can happen, since that would mean 
that either constant @2 or @3 need to be all 1's and that might already 
be caught by some other transformation, but its wrong to rely on such 
behavior IMO. So for now I have changed it to:

(simplify
  (op2
   (op1:s
    (op0@4 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
  (if (!(op0 == RSHIFT_EXPR && !TYPE_UNSIGNED (type)) &&
       (op0 == BIT_AND_EXPR || wi::fits_uhwi_p (@1)))


It would be cool to have a FAIL expression, usable in the with clauses, 
to make the pattern match fail a bit like the one in the machine 
description language.

>
> +   (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec)))
>
> I think you can write
>
>     (if (wi::bit_and (...) == 0)
>
> or at least wi:eq_p (wi::bit_and (...), 0).
>

wi::bit_and (...) == 0 seems to be doing the trick.

> I wonder if we shouldn't improve the pattern by handling (X op0 C0)
> transparently
> via using get_nonzero_bits (yeah, that's not exactly zero_mask but its
> inverse AFAIK).
> We'd wrap get_nonzero_bits in a helper that can handle GENERIC and your
> &, >>, << cases (hmm, such function must already exist somewhere...).  So you'd
> reduce the pattern to
>
> + (for op1 (bit_ior bit_xor)
> +      op2 (bit_xor bit_ior)
> +(simplify
> + (op2
> +  (op1:s @0 INTEGER_CST@2) INTEGER_CST@3))
>     (with
>      {
>        wide_int zero_mask_not = get_nonzero_bits (@0);
> ...
>      }
>
> This would make use of value-range information determined by VRP for example.

I'll go look for such a function.

>
> note that with your pattern you'd want to capture (op0:s @0 INTEGER_CST@1)
> like via (op0@4 @0 INTEGER_CST@1) so you can re-use it in the replacement
> like so:
>
> +   (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec)))
> +    (op2 @4 { wide_int_to_tree (type, cst_emit); })
> +    (if (wi::eq_p (wi::bit_and (@3, zero_mask_not), wi::zero (prec)))
> +     (op1 @4 { wide_int_to_tree (type, cst_emit); }))))))))
>
> the expression doesn't need a :s then obviously.

Yeah makes sense.
>
> Thanks and sorry for the delay in reviewing this.
> Richard.
>

Thank you for all the comments!
>
>> gcc/ChangeLog:
>>
>> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>
>>    * match.pd: Added new patterns:
>>      ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
>>      (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>              Hale Wang  <hale.wang@arm.com>
>>
>>    * gcc.dg/tree-ssa/forwprop-33.c: New test.
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2][GCC] Algorithmic optimization in match and simplify
  2015-09-25 11:44           ` Andre Vieira
@ 2015-09-25 12:22             ` Richard Biener
  2015-10-07  8:21               ` [PATCH V3][GCC] " Andre Vieira
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2015-09-25 12:22 UTC (permalink / raw)
  To: Andre Vieira; +Cc: GCC Patches

On Fri, Sep 25, 2015 at 1:30 PM, Andre Vieira
<Andre.SimoesDiasVieira@arm.com> wrote:
> On 17/09/15 10:46, Richard Biener wrote:
>>
>> On Thu, Sep 3, 2015 at 1:11 PM, Andre Vieira
>> <Andre.SimoesDiasVieira@arm.com> wrote:
>>>
>>> On 01/09/15 15:01, Richard Biener wrote:
>>>>
>>>>
>>>> On Tue, Sep 1, 2015 at 3:40 PM, Andre Vieira
>>>> <Andre.SimoesDiasVieira@arm.com> wrote:
>>>>>
>>>>>
>>>>> Hi Marc,
>>>>>
>>>>> On 28/08/15 19:07, Marc Glisse wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> (not a review, I haven't even read the whole patch)
>>>>>>
>>>>>> On Fri, 28 Aug 2015, Andre Vieira wrote:
>>>>>>
>>>>>>> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>>>>>
>>>>>>>     * match.pd: Added new patterns:
>>>>>>>       ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
>>>>>>>       (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>}
>>>>>>> C1)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> +(for op0 (rshift rshift lshift lshift bit_and bit_and)
>>>>>> + op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor)
>>>>>> + op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior)
>>>>>>
>>>>>> You can nest for-loops, it seems clearer as:
>>>>>> (for op0 (rshift lshift bit_and)
>>>>>>      (for op1 (bit_ior bit_xor)
>>>>>>           op2 (bit_xor bit_ior)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Will do, thank you for pointing it out.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> +(simplify
>>>>>> + (op2:c
>>>>>> +  (op1:c
>>>>>> +   (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
>>>>>>
>>>>>> I suspect you will want more :s (single_use) and less :c
>>>>>> (canonicalization
>>>>>> should put constants in second position).
>>>>>>
>>>>> I can't find the definition of :s (single_use).
>>>>
>>>>
>>>>
>>>> Sorry for that - didn't get along updating it yet :/  It restricts
>>>> matching to
>>>> sub-expressions that have a single-use.  So
>>>>
>>>> +  a &= 0xd123;
>>>> +  unsigned short tem = a ^ 0x6040;
>>>> +  a = tem | 0xc031; /* Simplify _not_ to ((a & 0xd123) | 0xe071).  */
>>>> ... use of tem ...
>>>>
>>>> we shouldn't do the simplifcation here because the expression
>>>> (a & 0x123) ^ 0x6040 is kept live by 'tem'.
>>>>
>>>>> GCC internals do point out
>>>>> that canonicalization does put constants in the second position, didnt
>>>>> see
>>>>> that first. Thank you for pointing it out.
>>>>>
>>>>>> +       C1 = wi::bit_and_not (C1,C2);
>>>>>>
>>>>>> Space after ','.
>>>>>>
>>>>> Will do.
>>>>>
>>>>>> Having wide_int_storage in many places is surprising, I can't find
>>>>>> similar
>>>>>> code anywhere else in gcc.
>>>>>>
>>>>>>
>>>>>
>>>>> I tried looking for examples of something similar, I think I ended up
>>>>> using
>>>>> wide_int because I was able to convert easily to and from it and it has
>>>>> the
>>>>> "mask" and "wide_int_to_tree" functions. I welcome any suggestions on
>>>>> what I
>>>>> should be using here for integer constant transformations and
>>>>> comparisons.
>>>>
>>>>
>>>>
>>>> Using wide-ints is fine, but you shouldn't need 'wide_int_storage'
>>>> constructors - those
>>>> are indeed odd.  Is it just for the initializers of wide-ints?
>>>>
>>>> +    wide_int zero_mask = wi::zero (prec);
>>>> +    wide_int C0 = wide_int_storage (@1);
>>>> +    wide_int C1 = wide_int_storage (@2);
>>>> +    wide_int C2 = wide_int_storage (@3);
>>>> ...
>>>> +       zero_mask = wide_int_storage (wi::mask (C0.to_uhwi (), false,
>>>> prec));
>>>>
>>>> tree_to_uhwi (@1) should do the trick as well
>>>>
>>>> +       C1 = wi::bit_and_not (C1,C2);
>>>> +       cst_emit = wi::bit_or (C1, C2);
>>>>
>>>> the ops should be replacable with @2 and @3, the store to C1 obviously
>>>> not
>>>> (but you can use a tree temporary and use wide_int_to_tree here to avoid
>>>> the back-and-forth for the case where C1 is not assigned to).
>>>>
>>>> Note that transforms only doing association are prone to endless
>>>> recursion
>>>> in case some other pattern does the reverse op...
>>>>
>>>> Richard.
>>>>
>>>>
>>>>> BR,
>>>>> Andre
>>>>>
>>>>
>>> Thank you for all the comments, see reworked version:
>>>
>>> Two new algorithmic optimisations:
>>>    1.((X op0 C0) op1 C1) op2 C2)
>>>      with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2
>>>      zero_mask has 1's for all bits that are sure to be 0 in (X op0 C0)
>>>      and 0's otherwise.
>>>      if (op1 == '^') C1 &= ~C2 (Only changed if actually emitted)
>>>      if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2)
>>>      if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2)
>>>    2. (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>>>
>>>
>>> This patch does two algorithmic optimisations that target patterns like:
>>> (((x << 24) | 0x00FFFFFF) ^ 0xFF000000) and ((x ^ 0x40000002) >> 1) |
>>> 0x80000000.
>>>
>>> The transformation uses the knowledge of which bits are zero after
>>> operations like (X {&,<<,(unsigned)>>}) to combine constants, reducing
>>> run-time operations.
>>> The two examples above would be transformed into (X << 24) ^ 0xFFFFFFFF
>>> and
>>> (X >> 1) ^ 0xa0000001 respectively.
>>>
>>> The second transformation enables more applications of the first. Also
>>> some
>>> targets may benefit from delaying shift operations. I am aware that such
>>> an
>>> optimization, in combination with one or more optimizations that cause
>>> the
>>> reverse transformation, may lead to an infinite loop. Though such
>>> behavior
>>> has not been detected during regression testing and bootstrapping on
>>> aarch64.
>>
>>
>> +/* (X bit_op C0) rshift C1 -> (X rshift C0) bit_op (C0 rshift C1) */
>> +(for bit_op (bit_ior bit_xor bit_and)
>> +(simplify
>> + (rshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2)
>> + (bit_op
>> +  (rshift @0 @2)
>> +  { wide_int_to_tree (type, wi::rshift (@1, @2, TYPE_SIGN (type))); })))
>> +
>> +/* (X bit_op C0) lshift C1 -> (X lshift C0) bit_op (C0 lshift C1) */
>> +(for bit_op (bit_ior bit_xor bit_and)
>> +(simplify
>> + (lshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2)
>> + (bit_op
>> +  (lshift @0 @2)
>> +  { wide_int_to_tree (type, wi::lshift (@1, @2)); })))
>
>
> Will do, good to see that my second transformation still picks up the shift
> @1 @2 as a constant. I'm assuming there is some constant folding going on
> between simplify transformations?

Yes.

>>
>> this may be one case where not using wide-ints to be able to combine the
>> patterns makes sense.  Thus,
>>
>> (for shift (lshift rshift)
>>   (simplify
>>    (shift ...)
>>    (bit_op
>>     (shift @0 @2)
>>     (shift @1 @2))))
>>
>> note that I'm worried we'd take on "invalid" ubsan here when the above
>> applies to
>>
>> int foo (int i)
>> {
>>    return (i & 0x7fffffff) >> 3;
>> }
>> int main () { return foo (0x80000007); }
>>
>> and i is negative.  That's because right-shift of negative values
>> invokes undefined
>> behavior.  IIRC in the middle-end we will not be taking advantage of
>> that but the
>> simplifications apply to GENERIC as well and thus may hit before ubsan
>> instrumentation triggers(?)  It would be nice if you could double-check
>> that.
>
>
> I was looking into this and I understand your worries, though, I found out
> that for the particular case of shifts and bit_and there already is a
> simplify transformation that does the same, regardless of the sign.
>
> /* Fold (X & C2) << C1 into (X << C1) & (C2 << C1)
>    (X & C2) >> C1 into (X >> C1) & (C2 >> C1).  */
> (for shift (lshift rshift)
>  (simplify
>   (shift (convert?:s (bit_and:s @0 INTEGER_CST@2)) INTEGER_CST@1)
>   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>    (with { tree mask = int_const_binop (shift, fold_convert (type, @2), @1);
> }
>     (bit_and (shift (convert @0) @1) { mask; })))))

I see ... also an opportunity to merge this pattern with yours.

> Now, I don't quite understand what you mean by ubsan instrumentation, will
> GCC introduce guards into code where it detects potential undefined
> behavior?

Yes.

> Also, it might be worth to note that right shift of negative
> values is denoted as "implementation defined" by the C standard. GCC however
> doesn't include it in its list of implementation defined behavior so I guess
> that is why you refer to it as undefined?

Not sure, I thought it was undefined.  If its implementation defined
then GCC needs
to document its behavior.

> Should we perhaps disable transformations where we can not guarantee that
> the right shift produced is not one of negative values? I.e. prohibit this
> transformation for:
> 1) signed types and op1 == bit_xor
> 2) signed types and op1 == bit_and and C1 has sign bit set.
>
> Also would it be useful in cases where you have signed shift and bit_and,
> and C1 is not negative, to do the transformation but replace the signed
> shift by an unsigned shift. This to make sure we don't introduce
> undefined/implementation defined behavior were there was none.
>
> This does however change the current behavior!

Yeah, so unless somebody else chimes in let's consider this as followup only.

>>
>> + (if (!(op0 == RSHIFT_EXPR && !TYPE_UNSIGNED (type)) && wi::fits_uhwi_p
>> (@1))
>>
>> you only need fits_uhwi_p (@1) in the op0 != BIT_AND_EXPR case it
>> seems, so better
>> move it down to those cases.
>
>
> So I used to, but I had the problem that I didn't know how to make it "fail"
> the matching if this was not the case. For instance if op0 is a lshift for
> which the constant doesn't fit uhwi, then it would fall through and never
> set the zero mask, potentially leading to a wrong transformation. Now I'm
> not sure this can happen, since that would mean that either constant @2 or
> @3 need to be all 1's and that might already be caught by some other
> transformation, but its wrong to rely on such behavior IMO. So for now I
> have changed it to:
>
> (simplify
>  (op2
>   (op1:s
>    (op0@4 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
>  (if (!(op0 == RSHIFT_EXPR && !TYPE_UNSIGNED (type)) &&
>       (op0 == BIT_AND_EXPR || wi::fits_uhwi_p (@1)))
>
>
> It would be cool to have a FAIL expression, usable in the with clauses, to
> make the pattern match fail a bit like the one in the machine description
> language.

I'll think about it.  Currently you'd need to add a  'bool fail' in the with
and initialize it, adding a (if (!fail) ...) after it.

>>
>> +   (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec)))
>>
>> I think you can write
>>
>>     (if (wi::bit_and (...) == 0)
>>
>> or at least wi:eq_p (wi::bit_and (...), 0).
>>
>
> wi::bit_and (...) == 0 seems to be doing the trick.
>
>> I wonder if we shouldn't improve the pattern by handling (X op0 C0)
>> transparently
>> via using get_nonzero_bits (yeah, that's not exactly zero_mask but its
>> inverse AFAIK).
>> We'd wrap get_nonzero_bits in a helper that can handle GENERIC and your
>> &, >>, << cases (hmm, such function must already exist somewhere...).  So
>> you'd
>> reduce the pattern to
>>
>> + (for op1 (bit_ior bit_xor)
>> +      op2 (bit_xor bit_ior)
>> +(simplify
>> + (op2
>> +  (op1:s @0 INTEGER_CST@2) INTEGER_CST@3))
>>     (with
>>      {
>>        wide_int zero_mask_not = get_nonzero_bits (@0);
>> ...
>>      }
>>
>> This would make use of value-range information determined by VRP for
>> example.
>
>
> I'll go look for such a function.
>
>>
>> note that with your pattern you'd want to capture (op0:s @0 INTEGER_CST@1)
>> like via (op0@4 @0 INTEGER_CST@1) so you can re-use it in the replacement
>> like so:
>>
>> +   (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec)))
>> +    (op2 @4 { wide_int_to_tree (type, cst_emit); })
>> +    (if (wi::eq_p (wi::bit_and (@3, zero_mask_not), wi::zero (prec)))
>> +     (op1 @4 { wide_int_to_tree (type, cst_emit); }))))))))
>>
>> the expression doesn't need a :s then obviously.
>
>
> Yeah makes sense.
>>
>>
>> Thanks and sorry for the delay in reviewing this.
>> Richard.
>>
>
> Thank you for all the comments!

No problem!

>>
>>> gcc/ChangeLog:
>>>
>>> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>
>>>    * match.pd: Added new patterns:
>>>      ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
>>>      (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>              Hale Wang  <hale.wang@arm.com>
>>>
>>>    * gcc.dg/tree-ssa/forwprop-33.c: New test.
>>
>>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH V3][GCC] Algorithmic optimization in match and simplify
  2015-09-25 12:22             ` Richard Biener
@ 2015-10-07  8:21               ` Andre Vieira
  2015-10-08 12:29                 ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Andre Vieira @ 2015-10-07  8:21 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 14775 bytes --]

On 25/09/15 12:42, Richard Biener wrote:
> On Fri, Sep 25, 2015 at 1:30 PM, Andre Vieira
> <Andre.SimoesDiasVieira@arm.com> wrote:
>> On 17/09/15 10:46, Richard Biener wrote:
>>>
>>> On Thu, Sep 3, 2015 at 1:11 PM, Andre Vieira
>>> <Andre.SimoesDiasVieira@arm.com> wrote:
>>>>
>>>> On 01/09/15 15:01, Richard Biener wrote:
>>>>>
>>>>>
>>>>> On Tue, Sep 1, 2015 at 3:40 PM, Andre Vieira
>>>>> <Andre.SimoesDiasVieira@arm.com> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Marc,
>>>>>>
>>>>>> On 28/08/15 19:07, Marc Glisse wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> (not a review, I haven't even read the whole patch)
>>>>>>>
>>>>>>> On Fri, 28 Aug 2015, Andre Vieira wrote:
>>>>>>>
>>>>>>>> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>>>>>>
>>>>>>>>      * match.pd: Added new patterns:
>>>>>>>>        ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
>>>>>>>>        (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>}
>>>>>>>> C1)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> +(for op0 (rshift rshift lshift lshift bit_and bit_and)
>>>>>>> + op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor)
>>>>>>> + op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior)
>>>>>>>
>>>>>>> You can nest for-loops, it seems clearer as:
>>>>>>> (for op0 (rshift lshift bit_and)
>>>>>>>       (for op1 (bit_ior bit_xor)
>>>>>>>            op2 (bit_xor bit_ior)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Will do, thank you for pointing it out.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> +(simplify
>>>>>>> + (op2:c
>>>>>>> +  (op1:c
>>>>>>> +   (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
>>>>>>>
>>>>>>> I suspect you will want more :s (single_use) and less :c
>>>>>>> (canonicalization
>>>>>>> should put constants in second position).
>>>>>>>
>>>>>> I can't find the definition of :s (single_use).
>>>>>
>>>>>
>>>>>
>>>>> Sorry for that - didn't get along updating it yet :/  It restricts
>>>>> matching to
>>>>> sub-expressions that have a single-use.  So
>>>>>
>>>>> +  a &= 0xd123;
>>>>> +  unsigned short tem = a ^ 0x6040;
>>>>> +  a = tem | 0xc031; /* Simplify _not_ to ((a & 0xd123) | 0xe071).  */
>>>>> ... use of tem ...
>>>>>
>>>>> we shouldn't do the simplifcation here because the expression
>>>>> (a & 0x123) ^ 0x6040 is kept live by 'tem'.
>>>>>
>>>>>> GCC internals do point out
>>>>>> that canonicalization does put constants in the second position, didnt
>>>>>> see
>>>>>> that first. Thank you for pointing it out.
>>>>>>
>>>>>>> +       C1 = wi::bit_and_not (C1,C2);
>>>>>>>
>>>>>>> Space after ','.
>>>>>>>
>>>>>> Will do.
>>>>>>
>>>>>>> Having wide_int_storage in many places is surprising, I can't find
>>>>>>> similar
>>>>>>> code anywhere else in gcc.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> I tried looking for examples of something similar, I think I ended up
>>>>>> using
>>>>>> wide_int because I was able to convert easily to and from it and it has
>>>>>> the
>>>>>> "mask" and "wide_int_to_tree" functions. I welcome any suggestions on
>>>>>> what I
>>>>>> should be using here for integer constant transformations and
>>>>>> comparisons.
>>>>>
>>>>>
>>>>>
>>>>> Using wide-ints is fine, but you shouldn't need 'wide_int_storage'
>>>>> constructors - those
>>>>> are indeed odd.  Is it just for the initializers of wide-ints?
>>>>>
>>>>> +    wide_int zero_mask = wi::zero (prec);
>>>>> +    wide_int C0 = wide_int_storage (@1);
>>>>> +    wide_int C1 = wide_int_storage (@2);
>>>>> +    wide_int C2 = wide_int_storage (@3);
>>>>> ...
>>>>> +       zero_mask = wide_int_storage (wi::mask (C0.to_uhwi (), false,
>>>>> prec));
>>>>>
>>>>> tree_to_uhwi (@1) should do the trick as well
>>>>>
>>>>> +       C1 = wi::bit_and_not (C1,C2);
>>>>> +       cst_emit = wi::bit_or (C1, C2);
>>>>>
>>>>> the ops should be replacable with @2 and @3, the store to C1 obviously
>>>>> not
>>>>> (but you can use a tree temporary and use wide_int_to_tree here to avoid
>>>>> the back-and-forth for the case where C1 is not assigned to).
>>>>>
>>>>> Note that transforms only doing association are prone to endless
>>>>> recursion
>>>>> in case some other pattern does the reverse op...
>>>>>
>>>>> Richard.
>>>>>
>>>>>
>>>>>> BR,
>>>>>> Andre
>>>>>>
>>>>>
>>>> Thank you for all the comments, see reworked version:
>>>>
>>>> Two new algorithmic optimisations:
>>>>     1.((X op0 C0) op1 C1) op2 C2)
>>>>       with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2
>>>>       zero_mask has 1's for all bits that are sure to be 0 in (X op0 C0)
>>>>       and 0's otherwise.
>>>>       if (op1 == '^') C1 &= ~C2 (Only changed if actually emitted)
>>>>       if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2)
>>>>       if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2)
>>>>     2. (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>>>>
>>>>
>>>> This patch does two algorithmic optimisations that target patterns like:
>>>> (((x << 24) | 0x00FFFFFF) ^ 0xFF000000) and ((x ^ 0x40000002) >> 1) |
>>>> 0x80000000.
>>>>
>>>> The transformation uses the knowledge of which bits are zero after
>>>> operations like (X {&,<<,(unsigned)>>}) to combine constants, reducing
>>>> run-time operations.
>>>> The two examples above would be transformed into (X << 24) ^ 0xFFFFFFFF
>>>> and
>>>> (X >> 1) ^ 0xa0000001 respectively.
>>>>
>>>> The second transformation enables more applications of the first. Also
>>>> some
>>>> targets may benefit from delaying shift operations. I am aware that such
>>>> an
>>>> optimization, in combination with one or more optimizations that cause
>>>> the
>>>> reverse transformation, may lead to an infinite loop. Though such
>>>> behavior
>>>> has not been detected during regression testing and bootstrapping on
>>>> aarch64.
>>>
>>>
>>> +/* (X bit_op C0) rshift C1 -> (X rshift C0) bit_op (C0 rshift C1) */
>>> +(for bit_op (bit_ior bit_xor bit_and)
>>> +(simplify
>>> + (rshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2)
>>> + (bit_op
>>> +  (rshift @0 @2)
>>> +  { wide_int_to_tree (type, wi::rshift (@1, @2, TYPE_SIGN (type))); })))
>>> +
>>> +/* (X bit_op C0) lshift C1 -> (X lshift C0) bit_op (C0 lshift C1) */
>>> +(for bit_op (bit_ior bit_xor bit_and)
>>> +(simplify
>>> + (lshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2)
>>> + (bit_op
>>> +  (lshift @0 @2)
>>> +  { wide_int_to_tree (type, wi::lshift (@1, @2)); })))
>>
>>
>> Will do, good to see that my second transformation still picks up the shift
>> @1 @2 as a constant. I'm assuming there is some constant folding going on
>> between simplify transformations?
>
> Yes.
>
>>>
>>> this may be one case where not using wide-ints to be able to combine the
>>> patterns makes sense.  Thus,
>>>
>>> (for shift (lshift rshift)
>>>    (simplify
>>>     (shift ...)
>>>     (bit_op
>>>      (shift @0 @2)
>>>      (shift @1 @2))))
>>>
>>> note that I'm worried we'd take on "invalid" ubsan here when the above
>>> applies to
>>>
>>> int foo (int i)
>>> {
>>>     return (i & 0x7fffffff) >> 3;
>>> }
>>> int main () { return foo (0x80000007); }
>>>
>>> and i is negative.  That's because right-shift of negative values
>>> invokes undefined
>>> behavior.  IIRC in the middle-end we will not be taking advantage of
>>> that but the
>>> simplifications apply to GENERIC as well and thus may hit before ubsan
>>> instrumentation triggers(?)  It would be nice if you could double-check
>>> that.
>>
>>
>> I was looking into this and I understand your worries, though, I found out
>> that for the particular case of shifts and bit_and there already is a
>> simplify transformation that does the same, regardless of the sign.
>>
>> /* Fold (X & C2) << C1 into (X << C1) & (C2 << C1)
>>     (X & C2) >> C1 into (X >> C1) & (C2 >> C1).  */
>> (for shift (lshift rshift)
>>   (simplify
>>    (shift (convert?:s (bit_and:s @0 INTEGER_CST@2)) INTEGER_CST@1)
>>    (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>>     (with { tree mask = int_const_binop (shift, fold_convert (type, @2), @1);
>> }
>>      (bit_and (shift (convert @0) @1) { mask; })))))
>
> I see ... also an opportunity to merge this pattern with yours.
>
>> Now, I don't quite understand what you mean by ubsan instrumentation, will
>> GCC introduce guards into code where it detects potential undefined
>> behavior?
>
> Yes.
>
>> Also, it might be worth to note that right shift of negative
>> values is denoted as "implementation defined" by the C standard. GCC however
>> doesn't include it in its list of implementation defined behavior so I guess
>> that is why you refer to it as undefined?
>
> Not sure, I thought it was undefined.  If its implementation defined
> then GCC needs
> to document its behavior.
>
>> Should we perhaps disable transformations where we can not guarantee that
>> the right shift produced is not one of negative values? I.e. prohibit this
>> transformation for:
>> 1) signed types and op1 == bit_xor
>> 2) signed types and op1 == bit_and and C1 has sign bit set.
>>
>> Also would it be useful in cases where you have signed shift and bit_and,
>> and C1 is not negative, to do the transformation but replace the signed
>> shift by an unsigned shift. This to make sure we don't introduce
>> undefined/implementation defined behavior were there was none.
>>
>> This does however change the current behavior!
>
> Yeah, so unless somebody else chimes in let's consider this as followup only.
>
>>>
>>> + (if (!(op0 == RSHIFT_EXPR && !TYPE_UNSIGNED (type)) && wi::fits_uhwi_p
>>> (@1))
>>>
>>> you only need fits_uhwi_p (@1) in the op0 != BIT_AND_EXPR case it
>>> seems, so better
>>> move it down to those cases.
>>
>>
>> So I used to, but I had the problem that I didn't know how to make it "fail"
>> the matching if this was not the case. For instance if op0 is a lshift for
>> which the constant doesn't fit uhwi, then it would fall through and never
>> set the zero mask, potentially leading to a wrong transformation. Now I'm
>> not sure this can happen, since that would mean that either constant @2 or
>> @3 need to be all 1's and that might already be caught by some other
>> transformation, but its wrong to rely on such behavior IMO. So for now I
>> have changed it to:
>>
>> (simplify
>>   (op2
>>    (op1:s
>>     (op0@4 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
>>   (if (!(op0 == RSHIFT_EXPR && !TYPE_UNSIGNED (type)) &&
>>        (op0 == BIT_AND_EXPR || wi::fits_uhwi_p (@1)))
>>
>>
>> It would be cool to have a FAIL expression, usable in the with clauses, to
>> make the pattern match fail a bit like the one in the machine description
>> language.
>
> I'll think about it.  Currently you'd need to add a  'bool fail' in the with
> and initialize it, adding a (if (!fail) ...) after it.
>
>>>
>>> +   (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec)))
>>>
>>> I think you can write
>>>
>>>      (if (wi::bit_and (...) == 0)
>>>
>>> or at least wi:eq_p (wi::bit_and (...), 0).
>>>
>>
>> wi::bit_and (...) == 0 seems to be doing the trick.
>>
>>> I wonder if we shouldn't improve the pattern by handling (X op0 C0)
>>> transparently
>>> via using get_nonzero_bits (yeah, that's not exactly zero_mask but its
>>> inverse AFAIK).
>>> We'd wrap get_nonzero_bits in a helper that can handle GENERIC and your
>>> &, >>, << cases (hmm, such function must already exist somewhere...).  So
>>> you'd
>>> reduce the pattern to
>>>
>>> + (for op1 (bit_ior bit_xor)
>>> +      op2 (bit_xor bit_ior)
>>> +(simplify
>>> + (op2
>>> +  (op1:s @0 INTEGER_CST@2) INTEGER_CST@3))
>>>      (with
>>>       {
>>>         wide_int zero_mask_not = get_nonzero_bits (@0);
>>> ...
>>>       }
>>>
>>> This would make use of value-range information determined by VRP for
>>> example.
>>
>>
>> I'll go look for such a function.
>>
>>>
>>> note that with your pattern you'd want to capture (op0:s @0 INTEGER_CST@1)
>>> like via (op0@4 @0 INTEGER_CST@1) so you can re-use it in the replacement
>>> like so:
>>>
>>> +   (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec)))
>>> +    (op2 @4 { wide_int_to_tree (type, cst_emit); })
>>> +    (if (wi::eq_p (wi::bit_and (@3, zero_mask_not), wi::zero (prec)))
>>> +     (op1 @4 { wide_int_to_tree (type, cst_emit); }))))))))
>>>
>>> the expression doesn't need a :s then obviously.
>>
>>
>> Yeah makes sense.
>>>
>>>
>>> Thanks and sorry for the delay in reviewing this.
>>> Richard.
>>>
>>
>> Thank you for all the comments!
>
> No problem!
>
>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>>
>>>>     * match.pd: Added new patterns:
>>>>       ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
>>>>       (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>>               Hale Wang  <hale.wang@arm.com>
>>>>
>>>>     * gcc.dg/tree-ssa/forwprop-33.c: New test.
>>>
>>>
>>
>
Thanks again for the comments Richard!

A new algorithmic optimisation:

((X inner_op C0) outer_op C1)
With X being a tree where value_range has reasoned certain bits to 
always be zero throughout its computed value range, we will call this 
the zero_mask,
and with inner_op = {|,^}, outer_op = {|,^} and inner_op != outer_op.
if (inner_op == '^') C0 &= ~C1;
if ((C0 & ~zero_mask) == 0) then emit (X outer_op (C0 outer_op C1)
if ((C1 & ~zero_mask) == 0) then emit (X inner_op (C0 outer_op C1)

And extended '(X & C2) << C1 into (X << C1) & (C2 << C1)' and
'(X & C2) >> C1 into (X >> C1) & (C2 >> C1)' to also accept the bitwise 
or and xor operators:
'(X {&,^,|} C2) << C1 into (X << C1) {&,^,|} (C2 << C1)' and
'(X {&,^,|} C2) >> C1 into (X >> C1) & (C2 >> C1)'.

The second transformation enables more applications of the first. Also 
some targets may benefit from delaying shift operations. I am aware that 
such an optimization, in combination with one or more optimizations that 
cause the reverse transformation, may lead to an infinite loop. Though 
such behavior has not been detected during regression testing and 
bootstrapping on aarch64.

gcc/ChangeLog:

2015-10-05 Andre Vieira <andre.simoesdiasvieira@arm.com>

* match.pd: Added a new pattern
((X inner_op C0) outer_op C1)
and expanded existing one
(X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)

gcc/testsuite/ChangeLog:

2015-10-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
Hale Wang <hale.wang@arm.com>

* gcc.dg/tree-ssa/forwprop-33.c: New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-algorithmic-optimization-v3.patch --]
[-- Type: text/x-patch; name=0001-algorithmic-optimization-v3.patch, Size: 5237 bytes --]

From 551f8af905ad0879747f75b5eb8f86f1d1636961 Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira <andsim01@arm.com>
Date: Wed, 30 Sep 2015 13:21:29 +0100
Subject: [PATCH] algorithmic optimization v3

---
 gcc/match.pd                                | 60 +++++++++++++++++++++---
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c | 71 +++++++++++++++++++++++++++++
 2 files changed, 124 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c

diff --git a/gcc/match.pd b/gcc/match.pd
index bd5c267f1f86093be10eab8c40cfbf94b71c6545..820c42901fad09b77e2f81f06b4e23d7abaa4079 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -710,6 +710,51 @@ along with GCC; see the file COPYING3.  If not see
       && tree_nop_conversion_p (type, TREE_TYPE (@1)))
   (convert (bit_and (bit_not @1) @0))))
 
+
+
+/* ((X inner_op C0) outer_op C1)
+   With X being a tree where value_range has reasoned certain bits to always be
+   zero throughout its computed value range,
+   inner_op = {|,^}, outer_op = {|,^} and inner_op != outer_op
+   where zero_mask has 1's for all bits that are sure to be 0 in
+   and 0's otherwise.
+   if (inner_op == '^') C0 &= ~C1;
+   if ((C0 & ~zero_mask) == 0) then emit (X outer_op (C0 outer_op C1)
+   if ((C1 & ~zero_mask) == 0) then emit (X inner_op (C0 outer_op C1)
+*/
+(for inner_op (bit_ior bit_xor)
+     outer_op (bit_xor bit_ior)
+(simplify
+ (outer_op
+  (inner_op:s @2 INTEGER_CST@0) INTEGER_CST@1)
+ (with
+  {
+    bool fail = false;
+    wide_int zero_mask_not;
+    wide_int C0;
+    wide_int cst_emit;
+
+    if (TREE_CODE (@2) == SSA_NAME)
+      zero_mask_not = get_nonzero_bits (@2);
+    else
+      fail = true;
+
+    if (inner_op == BIT_XOR_EXPR)
+      {
+	C0 = wi::bit_and_not (@0, @1);
+	cst_emit = wi::bit_or (C0, @1);
+      }
+    else
+      {
+	C0 = @0;
+	cst_emit = wi::bit_xor (@0, @1);
+      }
+  }
+  (if (!fail && wi::bit_and (C0, zero_mask_not) == 0)
+   (outer_op @2 { wide_int_to_tree (type, cst_emit); })
+   (if (!fail && wi::bit_and (@1, zero_mask_not) == 0)
+    (inner_op @2 { wide_int_to_tree (type, cst_emit); }))))))
+
 /* Associate (p +p off1) +p off2 as (p +p (off1 + off2)).  */
 (simplify
   (pointer_plus (pointer_plus:s @0 @1) @3)
@@ -1103,14 +1148,15 @@ along with GCC; see the file COPYING3.  If not see
 	     (bit_and (convert (shift:shift_type (convert @3) @1)) { newmaskt; })
 	     (bit_and @4 { newmaskt; })))))))))))))
 
-/* Fold (X & C2) << C1 into (X << C1) & (C2 << C1)
-   (X & C2) >> C1 into (X >> C1) & (C2 >> C1).  */
+/* Fold (X {&,^,|} C2) << C1 into (X << C1) {&,^,|} (C2 << C1)
+   (X {&,^,|} C2) >> C1 into (X >> C1) & (C2 >> C1).  */
 (for shift (lshift rshift)
- (simplify
-  (shift (convert?:s (bit_and:s @0 INTEGER_CST@2)) INTEGER_CST@1)
-  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
-   (with { tree mask = int_const_binop (shift, fold_convert (type, @2), @1); }
-    (bit_and (shift (convert @0) @1) { mask; })))))
+ (for bit_op (bit_and bit_xor bit_ior)
+  (simplify
+   (shift (convert?:s (bit_op:s @0 INTEGER_CST@2)) INTEGER_CST@1)
+   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
+    (with { tree mask = int_const_binop (shift, fold_convert (type, @2), @1); }
+     (bit_op (shift (convert @0) @1) { mask; }))))))
 
 
 /* Simplifications of conversions.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c
new file mode 100644
index 0000000000000000000000000000000000000000..bb4f8afd7aafbacf4805e6cd95e5beb3edaecf16
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c
@@ -0,0 +1,71 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-forwprop3" } */
+
+unsigned short
+test1 (unsigned short a)
+{
+  a ^= 0x4002;
+  a >>= 1;
+  a |= 0x8000; /* Simplify to ((a >> 1) ^ 0xa001).  */
+  return a;
+}
+/* { dg-final { scan-tree-dump "\\^ 40961" "forwprop3" } } */
+
+unsigned short
+test2 (unsigned short a)
+{
+  a |= 0x4002;
+  a <<= 1;
+  a ^= 0x0001; /* Simplify to ((a << 1) | 0x8005).  */
+  return a;
+}
+/* { dg-final { scan-tree-dump "\\| 32773" "forwprop3" } } */
+
+unsigned short
+test3 (unsigned short a)
+{
+  a &= 0xd123;
+  a ^= 0x6040;
+  a |= 0xc031; /* Simplify to ((a & 0xd123) | 0xe071).  */
+  return a;
+}
+/* { dg-final { scan-tree-dump "\\| 57457" "forwprop3" } } */
+
+unsigned short
+test4 (unsigned short a)
+{
+  a ^= 0x8002;
+  a >>= 1;
+  a |= 0x8000;
+  return a;
+}
+/* { dg-final { scan-tree-dump "\\^ 49153" "forwprop3" } } */
+
+unsigned short
+test5 (unsigned short a)
+{
+  a ^= 0x8002;
+  a >>= 1;
+  a |= 0x8001; /* Only move shift inward: (((a >> 1) ^ 0x4001) | 0x8001).  */
+  return a;
+}
+/* { dg-final { scan-tree-dump "\\^ 16385" "forwprop3" } } */
+/* { dg-final { scan-tree-dump "\\| 32769" "forwprop3" } } */
+
+short
+test6 (short a)
+{
+  a &= 0x7fff;
+  a >>= 2;
+  return a;
+}
+/* { dg-final { scan-tree-dump "\\& 8191" "forwprop3" } } */
+
+short
+test7 (short a)
+{
+  a &= 0x8fff;
+  a >>= 2;
+  return a;
+}
+/* { dg-final { scan-tree-dump "\\& -7169" "forwprop3" } } */
-- 
1.9.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V3][GCC] Algorithmic optimization in match and simplify
  2015-10-07  8:21               ` [PATCH V3][GCC] " Andre Vieira
@ 2015-10-08 12:29                 ` Richard Biener
  2015-10-09 16:11                   ` James Greenhalgh
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2015-10-08 12:29 UTC (permalink / raw)
  To: Andre Vieira; +Cc: GCC Patches

On Wed, Oct 7, 2015 at 10:21 AM, Andre Vieira
<Andre.SimoesDiasVieira@arm.com> wrote:
> On 25/09/15 12:42, Richard Biener wrote:
>>
>> On Fri, Sep 25, 2015 at 1:30 PM, Andre Vieira
>> <Andre.SimoesDiasVieira@arm.com> wrote:
>>>
>>> On 17/09/15 10:46, Richard Biener wrote:
>>>>
>>>>
>>>> On Thu, Sep 3, 2015 at 1:11 PM, Andre Vieira
>>>> <Andre.SimoesDiasVieira@arm.com> wrote:
>>>>>
>>>>>
>>>>> On 01/09/15 15:01, Richard Biener wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 1, 2015 at 3:40 PM, Andre Vieira
>>>>>> <Andre.SimoesDiasVieira@arm.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi Marc,
>>>>>>>
>>>>>>> On 28/08/15 19:07, Marc Glisse wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> (not a review, I haven't even read the whole patch)
>>>>>>>>
>>>>>>>> On Fri, 28 Aug 2015, Andre Vieira wrote:
>>>>>>>>
>>>>>>>>> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>>>>>>>
>>>>>>>>>      * match.pd: Added new patterns:
>>>>>>>>>        ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
>>>>>>>>>        (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0
>>>>>>>>> {<<,>>}
>>>>>>>>> C1)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> +(for op0 (rshift rshift lshift lshift bit_and bit_and)
>>>>>>>> + op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor)
>>>>>>>> + op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior)
>>>>>>>>
>>>>>>>> You can nest for-loops, it seems clearer as:
>>>>>>>> (for op0 (rshift lshift bit_and)
>>>>>>>>       (for op1 (bit_ior bit_xor)
>>>>>>>>            op2 (bit_xor bit_ior)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Will do, thank you for pointing it out.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> +(simplify
>>>>>>>> + (op2:c
>>>>>>>> +  (op1:c
>>>>>>>> +   (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
>>>>>>>>
>>>>>>>> I suspect you will want more :s (single_use) and less :c
>>>>>>>> (canonicalization
>>>>>>>> should put constants in second position).
>>>>>>>>
>>>>>>> I can't find the definition of :s (single_use).
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sorry for that - didn't get along updating it yet :/  It restricts
>>>>>> matching to
>>>>>> sub-expressions that have a single-use.  So
>>>>>>
>>>>>> +  a &= 0xd123;
>>>>>> +  unsigned short tem = a ^ 0x6040;
>>>>>> +  a = tem | 0xc031; /* Simplify _not_ to ((a & 0xd123) | 0xe071).  */
>>>>>> ... use of tem ...
>>>>>>
>>>>>> we shouldn't do the simplifcation here because the expression
>>>>>> (a & 0x123) ^ 0x6040 is kept live by 'tem'.
>>>>>>
>>>>>>> GCC internals do point out
>>>>>>> that canonicalization does put constants in the second position,
>>>>>>> didnt
>>>>>>> see
>>>>>>> that first. Thank you for pointing it out.
>>>>>>>
>>>>>>>> +       C1 = wi::bit_and_not (C1,C2);
>>>>>>>>
>>>>>>>> Space after ','.
>>>>>>>>
>>>>>>> Will do.
>>>>>>>
>>>>>>>> Having wide_int_storage in many places is surprising, I can't find
>>>>>>>> similar
>>>>>>>> code anywhere else in gcc.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> I tried looking for examples of something similar, I think I ended up
>>>>>>> using
>>>>>>> wide_int because I was able to convert easily to and from it and it
>>>>>>> has
>>>>>>> the
>>>>>>> "mask" and "wide_int_to_tree" functions. I welcome any suggestions on
>>>>>>> what I
>>>>>>> should be using here for integer constant transformations and
>>>>>>> comparisons.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Using wide-ints is fine, but you shouldn't need 'wide_int_storage'
>>>>>> constructors - those
>>>>>> are indeed odd.  Is it just for the initializers of wide-ints?
>>>>>>
>>>>>> +    wide_int zero_mask = wi::zero (prec);
>>>>>> +    wide_int C0 = wide_int_storage (@1);
>>>>>> +    wide_int C1 = wide_int_storage (@2);
>>>>>> +    wide_int C2 = wide_int_storage (@3);
>>>>>> ...
>>>>>> +       zero_mask = wide_int_storage (wi::mask (C0.to_uhwi (), false,
>>>>>> prec));
>>>>>>
>>>>>> tree_to_uhwi (@1) should do the trick as well
>>>>>>
>>>>>> +       C1 = wi::bit_and_not (C1,C2);
>>>>>> +       cst_emit = wi::bit_or (C1, C2);
>>>>>>
>>>>>> the ops should be replacable with @2 and @3, the store to C1 obviously
>>>>>> not
>>>>>> (but you can use a tree temporary and use wide_int_to_tree here to
>>>>>> avoid
>>>>>> the back-and-forth for the case where C1 is not assigned to).
>>>>>>
>>>>>> Note that transforms only doing association are prone to endless
>>>>>> recursion
>>>>>> in case some other pattern does the reverse op...
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>
>>>>>>> BR,
>>>>>>> Andre
>>>>>>>
>>>>>>
>>>>> Thank you for all the comments, see reworked version:
>>>>>
>>>>> Two new algorithmic optimisations:
>>>>>     1.((X op0 C0) op1 C1) op2 C2)
>>>>>       with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2
>>>>>       zero_mask has 1's for all bits that are sure to be 0 in (X op0
>>>>> C0)
>>>>>       and 0's otherwise.
>>>>>       if (op1 == '^') C1 &= ~C2 (Only changed if actually emitted)
>>>>>       if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2)
>>>>>       if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2)
>>>>>     2. (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>}
>>>>> C1)
>>>>>
>>>>>
>>>>> This patch does two algorithmic optimisations that target patterns
>>>>> like:
>>>>> (((x << 24) | 0x00FFFFFF) ^ 0xFF000000) and ((x ^ 0x40000002) >> 1) |
>>>>> 0x80000000.
>>>>>
>>>>> The transformation uses the knowledge of which bits are zero after
>>>>> operations like (X {&,<<,(unsigned)>>}) to combine constants, reducing
>>>>> run-time operations.
>>>>> The two examples above would be transformed into (X << 24) ^ 0xFFFFFFFF
>>>>> and
>>>>> (X >> 1) ^ 0xa0000001 respectively.
>>>>>
>>>>> The second transformation enables more applications of the first. Also
>>>>> some
>>>>> targets may benefit from delaying shift operations. I am aware that
>>>>> such
>>>>> an
>>>>> optimization, in combination with one or more optimizations that cause
>>>>> the
>>>>> reverse transformation, may lead to an infinite loop. Though such
>>>>> behavior
>>>>> has not been detected during regression testing and bootstrapping on
>>>>> aarch64.
>>>>
>>>>
>>>>
>>>> +/* (X bit_op C0) rshift C1 -> (X rshift C0) bit_op (C0 rshift C1) */
>>>> +(for bit_op (bit_ior bit_xor bit_and)
>>>> +(simplify
>>>> + (rshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2)
>>>> + (bit_op
>>>> +  (rshift @0 @2)
>>>> +  { wide_int_to_tree (type, wi::rshift (@1, @2, TYPE_SIGN (type)));
>>>> })))
>>>> +
>>>> +/* (X bit_op C0) lshift C1 -> (X lshift C0) bit_op (C0 lshift C1) */
>>>> +(for bit_op (bit_ior bit_xor bit_and)
>>>> +(simplify
>>>> + (lshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2)
>>>> + (bit_op
>>>> +  (lshift @0 @2)
>>>> +  { wide_int_to_tree (type, wi::lshift (@1, @2)); })))
>>>
>>>
>>>
>>> Will do, good to see that my second transformation still picks up the
>>> shift
>>> @1 @2 as a constant. I'm assuming there is some constant folding going on
>>> between simplify transformations?
>>
>>
>> Yes.
>>
>>>>
>>>> this may be one case where not using wide-ints to be able to combine the
>>>> patterns makes sense.  Thus,
>>>>
>>>> (for shift (lshift rshift)
>>>>    (simplify
>>>>     (shift ...)
>>>>     (bit_op
>>>>      (shift @0 @2)
>>>>      (shift @1 @2))))
>>>>
>>>> note that I'm worried we'd take on "invalid" ubsan here when the above
>>>> applies to
>>>>
>>>> int foo (int i)
>>>> {
>>>>     return (i & 0x7fffffff) >> 3;
>>>> }
>>>> int main () { return foo (0x80000007); }
>>>>
>>>> and i is negative.  That's because right-shift of negative values
>>>> invokes undefined
>>>> behavior.  IIRC in the middle-end we will not be taking advantage of
>>>> that but the
>>>> simplifications apply to GENERIC as well and thus may hit before ubsan
>>>> instrumentation triggers(?)  It would be nice if you could double-check
>>>> that.
>>>
>>>
>>>
>>> I was looking into this and I understand your worries, though, I found
>>> out
>>> that for the particular case of shifts and bit_and there already is a
>>> simplify transformation that does the same, regardless of the sign.
>>>
>>> /* Fold (X & C2) << C1 into (X << C1) & (C2 << C1)
>>>     (X & C2) >> C1 into (X >> C1) & (C2 >> C1).  */
>>> (for shift (lshift rshift)
>>>   (simplify
>>>    (shift (convert?:s (bit_and:s @0 INTEGER_CST@2)) INTEGER_CST@1)
>>>    (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>>>     (with { tree mask = int_const_binop (shift, fold_convert (type, @2),
>>> @1);
>>> }
>>>      (bit_and (shift (convert @0) @1) { mask; })))))
>>
>>
>> I see ... also an opportunity to merge this pattern with yours.
>>
>>> Now, I don't quite understand what you mean by ubsan instrumentation,
>>> will
>>> GCC introduce guards into code where it detects potential undefined
>>> behavior?
>>
>>
>> Yes.
>>
>>> Also, it might be worth to note that right shift of negative
>>> values is denoted as "implementation defined" by the C standard. GCC
>>> however
>>> doesn't include it in its list of implementation defined behavior so I
>>> guess
>>> that is why you refer to it as undefined?
>>
>>
>> Not sure, I thought it was undefined.  If its implementation defined
>> then GCC needs
>> to document its behavior.
>>
>>> Should we perhaps disable transformations where we can not guarantee that
>>> the right shift produced is not one of negative values? I.e. prohibit
>>> this
>>> transformation for:
>>> 1) signed types and op1 == bit_xor
>>> 2) signed types and op1 == bit_and and C1 has sign bit set.
>>>
>>> Also would it be useful in cases where you have signed shift and bit_and,
>>> and C1 is not negative, to do the transformation but replace the signed
>>> shift by an unsigned shift. This to make sure we don't introduce
>>> undefined/implementation defined behavior were there was none.
>>>
>>> This does however change the current behavior!
>>
>>
>> Yeah, so unless somebody else chimes in let's consider this as followup
>> only.
>>
>>>>
>>>> + (if (!(op0 == RSHIFT_EXPR && !TYPE_UNSIGNED (type)) && wi::fits_uhwi_p
>>>> (@1))
>>>>
>>>> you only need fits_uhwi_p (@1) in the op0 != BIT_AND_EXPR case it
>>>> seems, so better
>>>> move it down to those cases.
>>>
>>>
>>>
>>> So I used to, but I had the problem that I didn't know how to make it
>>> "fail"
>>> the matching if this was not the case. For instance if op0 is a lshift
>>> for
>>> which the constant doesn't fit uhwi, then it would fall through and never
>>> set the zero mask, potentially leading to a wrong transformation. Now I'm
>>> not sure this can happen, since that would mean that either constant @2
>>> or
>>> @3 need to be all 1's and that might already be caught by some other
>>> transformation, but its wrong to rely on such behavior IMO. So for now I
>>> have changed it to:
>>>
>>> (simplify
>>>   (op2
>>>    (op1:s
>>>     (op0@4 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
>>>   (if (!(op0 == RSHIFT_EXPR && !TYPE_UNSIGNED (type)) &&
>>>        (op0 == BIT_AND_EXPR || wi::fits_uhwi_p (@1)))
>>>
>>>
>>> It would be cool to have a FAIL expression, usable in the with clauses,
>>> to
>>> make the pattern match fail a bit like the one in the machine description
>>> language.
>>
>>
>> I'll think about it.  Currently you'd need to add a  'bool fail' in the
>> with
>> and initialize it, adding a (if (!fail) ...) after it.
>>
>>>>
>>>> +   (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec)))
>>>>
>>>> I think you can write
>>>>
>>>>      (if (wi::bit_and (...) == 0)
>>>>
>>>> or at least wi:eq_p (wi::bit_and (...), 0).
>>>>
>>>
>>> wi::bit_and (...) == 0 seems to be doing the trick.
>>>
>>>> I wonder if we shouldn't improve the pattern by handling (X op0 C0)
>>>> transparently
>>>> via using get_nonzero_bits (yeah, that's not exactly zero_mask but its
>>>> inverse AFAIK).
>>>> We'd wrap get_nonzero_bits in a helper that can handle GENERIC and your
>>>> &, >>, << cases (hmm, such function must already exist somewhere...).
>>>> So
>>>> you'd
>>>> reduce the pattern to
>>>>
>>>> + (for op1 (bit_ior bit_xor)
>>>> +      op2 (bit_xor bit_ior)
>>>> +(simplify
>>>> + (op2
>>>> +  (op1:s @0 INTEGER_CST@2) INTEGER_CST@3))
>>>>      (with
>>>>       {
>>>>         wide_int zero_mask_not = get_nonzero_bits (@0);
>>>> ...
>>>>       }
>>>>
>>>> This would make use of value-range information determined by VRP for
>>>> example.
>>>
>>>
>>>
>>> I'll go look for such a function.
>>>
>>>>
>>>> note that with your pattern you'd want to capture (op0:s @0
>>>> INTEGER_CST@1)
>>>> like via (op0@4 @0 INTEGER_CST@1) so you can re-use it in the
>>>> replacement
>>>> like so:
>>>>
>>>> +   (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec)))
>>>> +    (op2 @4 { wide_int_to_tree (type, cst_emit); })
>>>> +    (if (wi::eq_p (wi::bit_and (@3, zero_mask_not), wi::zero (prec)))
>>>> +     (op1 @4 { wide_int_to_tree (type, cst_emit); }))))))))
>>>>
>>>> the expression doesn't need a :s then obviously.
>>>
>>>
>>>
>>> Yeah makes sense.
>>>>
>>>>
>>>>
>>>> Thanks and sorry for the delay in reviewing this.
>>>> Richard.
>>>>
>>>
>>> Thank you for all the comments!
>>
>>
>> No problem!
>>
>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>>>
>>>>>     * match.pd: Added new patterns:
>>>>>       ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
>>>>>       (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>}
>>>>> C1)
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>>>               Hale Wang  <hale.wang@arm.com>
>>>>>
>>>>>     * gcc.dg/tree-ssa/forwprop-33.c: New test.
>>>>
>>>>
>>>>
>>>
>>
> Thanks again for the comments Richard!
>
> A new algorithmic optimisation:
>
> ((X inner_op C0) outer_op C1)
> With X being a tree where value_range has reasoned certain bits to always be
> zero throughout its computed value range, we will call this the zero_mask,
> and with inner_op = {|,^}, outer_op = {|,^} and inner_op != outer_op.
> if (inner_op == '^') C0 &= ~C1;
> if ((C0 & ~zero_mask) == 0) then emit (X outer_op (C0 outer_op C1)
> if ((C1 & ~zero_mask) == 0) then emit (X inner_op (C0 outer_op C1)
>
> And extended '(X & C2) << C1 into (X << C1) & (C2 << C1)' and
> '(X & C2) >> C1 into (X >> C1) & (C2 >> C1)' to also accept the bitwise or
> and xor operators:
> '(X {&,^,|} C2) << C1 into (X << C1) {&,^,|} (C2 << C1)' and
> '(X {&,^,|} C2) >> C1 into (X >> C1) & (C2 >> C1)'.
>
> The second transformation enables more applications of the first. Also some
> targets may benefit from delaying shift operations. I am aware that such an
> optimization, in combination with one or more optimizations that cause the
> reverse transformation, may lead to an infinite loop. Though such behavior
> has not been detected during regression testing and bootstrapping on
> aarch64.
>
> gcc/ChangeLog:
>
> 2015-10-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
>
> * match.pd: Added a new pattern
> ((X inner_op C0) outer_op C1)
> and expanded existing one
> (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>
> gcc/testsuite/ChangeLog:
>
> 2015-10-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
>
> Hale Wang <hale.wang@arm.com>
>
> * gcc.dg/tree-ssa/forwprop-33.c: New test.

Ok.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V3][GCC] Algorithmic optimization in match and simplify
  2015-10-08 12:29                 ` Richard Biener
@ 2015-10-09 16:11                   ` James Greenhalgh
  2015-10-15 13:50                     ` Christophe Lyon
  0 siblings, 1 reply; 17+ messages in thread
From: James Greenhalgh @ 2015-10-09 16:11 UTC (permalink / raw)
  To: Richard Biener; +Cc: Andre Simoes Dias Vieira, GCC Patches

On Thu, Oct 08, 2015 at 01:29:34PM +0100, Richard Biener wrote:
> > Thanks again for the comments Richard!
> >
> > A new algorithmic optimisation:
> >
> > ((X inner_op C0) outer_op C1)
> > With X being a tree where value_range has reasoned certain bits to always be
> > zero throughout its computed value range, we will call this the zero_mask,
> > and with inner_op = {|,^}, outer_op = {|,^} and inner_op != outer_op.
> > if (inner_op == '^') C0 &= ~C1;
> > if ((C0 & ~zero_mask) == 0) then emit (X outer_op (C0 outer_op C1)
> > if ((C1 & ~zero_mask) == 0) then emit (X inner_op (C0 outer_op C1)
> >
> > And extended '(X & C2) << C1 into (X << C1) & (C2 << C1)' and
> > '(X & C2) >> C1 into (X >> C1) & (C2 >> C1)' to also accept the bitwise or
> > and xor operators:
> > '(X {&,^,|} C2) << C1 into (X << C1) {&,^,|} (C2 << C1)' and
> > '(X {&,^,|} C2) >> C1 into (X >> C1) & (C2 >> C1)'.
> >
> > The second transformation enables more applications of the first. Also some
> > targets may benefit from delaying shift operations. I am aware that such an
> > optimization, in combination with one or more optimizations that cause the
> > reverse transformation, may lead to an infinite loop. Though such behavior
> > has not been detected during regression testing and bootstrapping on
> > aarch64.
> >
> > gcc/ChangeLog:
> >
> > 2015-10-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
> >
> > * match.pd: Added a new pattern
> > ((X inner_op C0) outer_op C1)
> > and expanded existing one
> > (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2015-10-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
> >
> > Hale Wang <hale.wang@arm.com>
> >
> > * gcc.dg/tree-ssa/forwprop-33.c: New test.
> 
> Ok.
> 
> Thanks,
> Richard.
> 

As Andre does not have commit rights, I've committed this on his behalf as
revision 228661. Please watch for any fallout over the weekend.

Andre, please check your ChangeLog format in future. In the end I
committed this:

gcc/ChangeLog

2015-10-09  Andre Vieira  <andre.simoesdiasvieira@arm.com>

	* match.pd: ((X inner_op C0) outer_op C1) New pattern.
	((X & C2) << C1): Expand to...
	(X {&,^,|} C2 << C1): ...This.
	((X & C2) >> C1): Expand to...
	(X {&,^,|} C2 >> C1): ...This.

gcc/testsuite/ChangeLog

2015-10-09  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Hale Wang  <hale.wang@arm.com>

	* gcc.dg/tree-ssa/forwprop-33.c: New.

Thanks,
James

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V3][GCC] Algorithmic optimization in match and simplify
  2015-10-09 16:11                   ` James Greenhalgh
@ 2015-10-15 13:50                     ` Christophe Lyon
  2015-10-19 11:49                       ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Christophe Lyon @ 2015-10-15 13:50 UTC (permalink / raw)
  To: James Greenhalgh, Andre Simoes Dias Vieira; +Cc: GCC Patches

On 9 October 2015 at 18:11, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> On Thu, Oct 08, 2015 at 01:29:34PM +0100, Richard Biener wrote:
>> > Thanks again for the comments Richard!
>> >
>> > A new algorithmic optimisation:
>> >
>> > ((X inner_op C0) outer_op C1)
>> > With X being a tree where value_range has reasoned certain bits to always be
>> > zero throughout its computed value range, we will call this the zero_mask,
>> > and with inner_op = {|,^}, outer_op = {|,^} and inner_op != outer_op.
>> > if (inner_op == '^') C0 &= ~C1;
>> > if ((C0 & ~zero_mask) == 0) then emit (X outer_op (C0 outer_op C1)
>> > if ((C1 & ~zero_mask) == 0) then emit (X inner_op (C0 outer_op C1)
>> >
>> > And extended '(X & C2) << C1 into (X << C1) & (C2 << C1)' and
>> > '(X & C2) >> C1 into (X >> C1) & (C2 >> C1)' to also accept the bitwise or
>> > and xor operators:
>> > '(X {&,^,|} C2) << C1 into (X << C1) {&,^,|} (C2 << C1)' and
>> > '(X {&,^,|} C2) >> C1 into (X >> C1) & (C2 >> C1)'.
>> >
>> > The second transformation enables more applications of the first. Also some
>> > targets may benefit from delaying shift operations. I am aware that such an
>> > optimization, in combination with one or more optimizations that cause the
>> > reverse transformation, may lead to an infinite loop. Though such behavior
>> > has not been detected during regression testing and bootstrapping on
>> > aarch64.
>> >
>> > gcc/ChangeLog:
>> >
>> > 2015-10-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
>> >
>> > * match.pd: Added a new pattern
>> > ((X inner_op C0) outer_op C1)
>> > and expanded existing one
>> > (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> > 2015-10-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
>> >
>> > Hale Wang <hale.wang@arm.com>
>> >
>> > * gcc.dg/tree-ssa/forwprop-33.c: New test.
>>
>> Ok.
>>
>> Thanks,
>> Richard.
>>
>
> As Andre does not have commit rights, I've committed this on his behalf as
> revision 228661. Please watch for any fallout over the weekend.
>

Since this commit I'm seeing:
FAIL: gcc.target/arm/xor-and.c scan-assembler orr
on most arm targets.

See: http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/228661/report-build-info.html

Since that's already a few days old, I suspect you are already aware of that?

Christophe.


> Andre, please check your ChangeLog format in future. In the end I
> committed this:
>
> gcc/ChangeLog
>
> 2015-10-09  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>
>         * match.pd: ((X inner_op C0) outer_op C1) New pattern.
>         ((X & C2) << C1): Expand to...
>         (X {&,^,|} C2 << C1): ...This.
>         ((X & C2) >> C1): Expand to...
>         (X {&,^,|} C2 >> C1): ...This.
>
> gcc/testsuite/ChangeLog
>
> 2015-10-09  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>             Hale Wang  <hale.wang@arm.com>
>
>         * gcc.dg/tree-ssa/forwprop-33.c: New.
>
> Thanks,
> James
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V3][GCC] Algorithmic optimization in match and simplify
  2015-10-15 13:50                     ` Christophe Lyon
@ 2015-10-19 11:49                       ` Richard Biener
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Biener @ 2015-10-19 11:49 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: James Greenhalgh, Andre Simoes Dias Vieira, GCC Patches

On Thu, Oct 15, 2015 at 3:50 PM, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
> On 9 October 2015 at 18:11, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>> On Thu, Oct 08, 2015 at 01:29:34PM +0100, Richard Biener wrote:
>>> > Thanks again for the comments Richard!
>>> >
>>> > A new algorithmic optimisation:
>>> >
>>> > ((X inner_op C0) outer_op C1)
>>> > With X being a tree where value_range has reasoned certain bits to always be
>>> > zero throughout its computed value range, we will call this the zero_mask,
>>> > and with inner_op = {|,^}, outer_op = {|,^} and inner_op != outer_op.
>>> > if (inner_op == '^') C0 &= ~C1;
>>> > if ((C0 & ~zero_mask) == 0) then emit (X outer_op (C0 outer_op C1)
>>> > if ((C1 & ~zero_mask) == 0) then emit (X inner_op (C0 outer_op C1)
>>> >
>>> > And extended '(X & C2) << C1 into (X << C1) & (C2 << C1)' and
>>> > '(X & C2) >> C1 into (X >> C1) & (C2 >> C1)' to also accept the bitwise or
>>> > and xor operators:
>>> > '(X {&,^,|} C2) << C1 into (X << C1) {&,^,|} (C2 << C1)' and
>>> > '(X {&,^,|} C2) >> C1 into (X >> C1) & (C2 >> C1)'.
>>> >
>>> > The second transformation enables more applications of the first. Also some
>>> > targets may benefit from delaying shift operations. I am aware that such an
>>> > optimization, in combination with one or more optimizations that cause the
>>> > reverse transformation, may lead to an infinite loop. Though such behavior
>>> > has not been detected during regression testing and bootstrapping on
>>> > aarch64.
>>> >
>>> > gcc/ChangeLog:
>>> >
>>> > 2015-10-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
>>> >
>>> > * match.pd: Added a new pattern
>>> > ((X inner_op C0) outer_op C1)
>>> > and expanded existing one
>>> > (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>>> >
>>> > gcc/testsuite/ChangeLog:
>>> >
>>> > 2015-10-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
>>> >
>>> > Hale Wang <hale.wang@arm.com>
>>> >
>>> > * gcc.dg/tree-ssa/forwprop-33.c: New test.
>>>
>>> Ok.
>>>
>>> Thanks,
>>> Richard.
>>>
>>
>> As Andre does not have commit rights, I've committed this on his behalf as
>> revision 228661. Please watch for any fallout over the weekend.
>>
>
> Since this commit I'm seeing:
> FAIL: gcc.target/arm/xor-and.c scan-assembler orr
> on most arm targets.
>
> See: http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/228661/report-build-info.html
>
> Since that's already a few days old, I suspect you are already aware of that?

Please file a bugreport.

Thanks,
Richard.

> Christophe.
>
>
>> Andre, please check your ChangeLog format in future. In the end I
>> committed this:
>>
>> gcc/ChangeLog
>>
>> 2015-10-09  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>
>>         * match.pd: ((X inner_op C0) outer_op C1) New pattern.
>>         ((X & C2) << C1): Expand to...
>>         (X {&,^,|} C2 << C1): ...This.
>>         ((X & C2) >> C1): Expand to...
>>         (X {&,^,|} C2 >> C1): ...This.
>>
>> gcc/testsuite/ChangeLog
>>
>> 2015-10-09  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>             Hale Wang  <hale.wang@arm.com>
>>
>>         * gcc.dg/tree-ssa/forwprop-33.c: New.
>>
>> Thanks,
>> James
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-10-19 11:44 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-28 17:29 [PATCH][GCC] Algorithmic optimization in match and simplify Andre Vieira
2015-08-28 18:13 ` Marc Glisse
2015-09-01 13:40   ` Andre Vieira
2015-09-01 14:01     ` Richard Biener
2015-09-03 11:13       ` [PATCH v2][GCC] " Andre Vieira
2015-09-16 14:23         ` Andre Vieira
2015-09-17  9:52         ` Richard Biener
2015-09-25 11:44           ` Andre Vieira
2015-09-25 12:22             ` Richard Biener
2015-10-07  8:21               ` [PATCH V3][GCC] " Andre Vieira
2015-10-08 12:29                 ` Richard Biener
2015-10-09 16:11                   ` James Greenhalgh
2015-10-15 13:50                     ` Christophe Lyon
2015-10-19 11:49                       ` Richard Biener
2015-09-01 16:50 ` Location of "dg-final" directives? (was Re: [PATCH][GCC] Algorithmic optimization in match and simplify) David Malcolm
2015-09-01 16:55   ` Marek Polacek
2015-09-02 13:24     ` Andre Vieira

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).