[PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
@ 2015-05-28 12:33 Marek Polacek
  2015-05-28 13:10 ` Jakub Jelinek
  2015-05-28 13:16 ` Richard Biener
  0 siblings, 2 replies; 14+ messages in thread
From: Marek Polacek @ 2015-05-28 12:33 UTC (permalink / raw)
  To: GCC Patches

This PR points out that we weren't able to optimize 1 << x == 2 to just
x == 1.  This is my attempt to fix that: if we see (CST1 << A) == CST2
and CST2 is a multiple of CST1, use log2 to get rid of the shift, but
only if the result of the shift is a natural number (including zero).

If CST2 is not a multiple of CST1, then the whole expression can be
discarded, but I'd like to do that as a follow-up.
(It would help if our current match.pd grammar allowed us to use "else",
any plans on doing that?)

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-05-28  Marek Polacek  <polacek@redhat.com>

	PR tree-optimization/66299
	* match.pd ((CST1 << A) == CST2 -> A == log2 (CST2 / CST1),
	(CST1 << A) != CST2 -> A != log2 (CST2 / CST1)): New
	patterns.

	* gcc.dg/pr66299-1.c: New test.
	* gcc.dg/pr66299-2.c: New test.

diff --git gcc/match.pd gcc/match.pd
index abd7851..5d07a70 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -676,6 +676,19 @@ along with GCC; see the file COPYING3.  If not see
   (cmp (bit_and (lshift integer_onep @0) integer_onep) integer_zerop)
   (icmp @0 { build_zero_cst (TREE_TYPE (@0)); })))
 
+/* (CST1 << A) == CST2 -> A == log2 (CST2 / CST1)
+   (CST1 << A) != CST2 -> A != log2 (CST2 / CST1)
+   if CST2 is a multiple of CST1.  */
+(for cmp (ne eq)
+ (simplify
+  (cmp (lshift@3 INTEGER_CST@0 @1) INTEGER_CST@2)
+  (if ((TREE_CODE (@3) != SSA_NAME || has_single_use (@3))
+       && wi::multiple_of_p (@2, @0, TYPE_SIGN (type)))
+   (with {
+    int shift = wi::exact_log2 (wi::div_trunc (@2, @0, TYPE_SIGN (type))); }
+   (if (shift != -1)
+    (cmp @1 { build_int_cst (TREE_TYPE (@1), shift); }))))))
+
 /* Simplifications of conversions.  */
 
 /* Basic strip-useless-type-conversions / strip_nops.  */
diff --git gcc/testsuite/gcc.dg/pr66299-1.c gcc/testsuite/gcc.dg/pr66299-1.c
index e69de29..9d41275 100644
--- gcc/testsuite/gcc.dg/pr66299-1.c
+++ gcc/testsuite/gcc.dg/pr66299-1.c
@@ -0,0 +1,83 @@
+/* PR tree-optimization/66299 */
+/* { dg-do run } */
+/* { dg-options "-fdump-tree-original" } */
+
+void
+test1 (int x)
+{
+  if ((0 << x) != 0
+      || (1 << x) != 2
+      || (2 << x) != 4
+      || (3 << x) != 6
+      || (4 << x) != 8
+      || (5 << x) != 10
+      || (6 << x) != 12
+      || (7 << x) != 14
+      || (8 << x) != 16
+      || (9 << x) != 18
+      || (10 << x) != 20)
+    __builtin_abort ();
+}
+
+void
+test2 (int x)
+{
+  if (!((0 << x) == 0
+        && (1 << x) == 4
+        && (2 << x) == 8
+        && (3 << x) == 12
+        && (4 << x) == 16
+        && (5 << x) == 20
+        && (6 << x) == 24
+        && (7 << x) == 28
+        && (8 << x) == 32
+        && (9 << x) == 36
+	&& (10 << x) == 40))
+    __builtin_abort ();
+}
+
+void
+test3 (unsigned int x)
+{
+  if ((0U << x) != 0U
+      || (1U << x) != 16U
+      || (2U << x) != 32U
+      || (3U << x) != 48U
+      || (4U << x) != 64U
+      || (5U << x) != 80U
+      || (6U << x) != 96U
+      || (7U << x) != 112U
+      || (8U << x) != 128U
+      || (9U << x) != 144U
+      || (10U << x) != 160U)
+    __builtin_abort ();
+}
+
+void
+test4 (unsigned int x)
+{
+  if (!((0U << x) == 0U
+	|| (1U << x) == 8U
+	|| (2U << x) == 16U
+	|| (3U << x) == 24U
+	|| (4U << x) == 32U
+	|| (5U << x) == 40U
+	|| (6U << x) == 48U
+	|| (7U << x) == 56U
+	|| (8U << x) == 64U
+	|| (9U << x) == 72U
+	|| (10U << x) == 80U))
+    __builtin_abort ();
+}
+
+int
+main (void)
+{
+  test1 (1);
+  test2 (2);
+  test3 (4U);
+  test4 (3U);
+}
+
+/* { dg-final { scan-tree-dump-not "<<" "original" } } */
+/* { dg-final { cleanup-tree-dump "original" } } */
diff --git gcc/testsuite/gcc.dg/pr66299-2.c gcc/testsuite/gcc.dg/pr66299-2.c
index e69de29..dde0549 100644
--- gcc/testsuite/gcc.dg/pr66299-2.c
+++ gcc/testsuite/gcc.dg/pr66299-2.c
@@ -0,0 +1,34 @@
+/* PR tree-optimization/66299 */
+/* { dg-do run } */
+/* { dg-options "-fdump-tree-optimized -O" } */
+
+void
+test1 (int x, unsigned u)
+{
+  if ((1U << x) != 64
+      || (2 << x) != u
+      || (x << x) != 384
+      || (3 << x) == 9
+      || (x << 14) != 98304U
+      || (1 << x) == 14
+      || (3 << 2) != 12)
+    __builtin_abort ();
+}
+
+void
+test2 (int x)
+{
+  unsigned int t = ((unsigned int) 1U << x);
+  if (t != 2U)
+    __builtin_abort ();
+}
+
+int
+main (void)
+{
+  test1 (6, 128U);
+  test2 (1);
+}
+
+/* { dg-final { scan-tree-dump-not "<<" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */

	Marek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
  2015-05-28 12:33 [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299) Marek Polacek
@ 2015-05-28 13:10 ` Jakub Jelinek
  2015-05-28 20:33   ` Marc Glisse
  2015-05-28 13:16 ` Richard Biener
  1 sibling, 1 reply; 14+ messages in thread
From: Jakub Jelinek @ 2015-05-28 13:10 UTC (permalink / raw)
  To: Marek Polacek; +Cc: GCC Patches

On Thu, May 28, 2015 at 02:15:45PM +0200, Marek Polacek wrote:
> This PR points out that we weren't able to optimize 1 << x == 2 to just
> x == 1.  This is my attempt to fix that: if we see (CST1 << A) == CST2
> and CST2 is a multiple of CST1, use log2 to get rid of the shift, but
> only if the result of the shift is a natural number (including zero).

Is CST2 a multiple of CST1 the best test though?
I mean say in
(0x8001U << x) == 0x20000U
0x20000U isn't a multiple of 0x8001U, yet there is only one
valid value of x for which it holds (17), so we could very well
optimize that to x == 17.
If popcount of the CST1 is 1, then multiple_of_p is supposedly sufficient
(have you checked if CST1 is negative that it still works?), for others
supposedly we could have a helper function that would just try
in a loop all shift counts from 0 to precision - 1, and note when
(CST1 << b) == CST2 - if for no b, then it should fold regardless of
has_single_use to false or true, if for exactly one shift count, then
use a comparison against that shift count, otherwise give up?
Supposedly (CST1 >> A) == CST2 can be handled similarly.

> If CST2 is not a multiple of CST1, then the whole expression can be
> discarded, but I'd like to do that as a follow-up.
> (It would help if our current match.pd grammar allowed us to use "else",
> any plans on doing that?)

	Jakub

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
  2015-05-28 12:33 [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299) Marek Polacek
  2015-05-28 13:10 ` Jakub Jelinek
@ 2015-05-28 13:16 ` Richard Biener
  1 sibling, 0 replies; 14+ messages in thread
From: Richard Biener @ 2015-05-28 13:16 UTC (permalink / raw)
  To: Marek Polacek; +Cc: GCC Patches

On Thu, May 28, 2015 at 2:15 PM, Marek Polacek <polacek@redhat.com> wrote:
> This PR points out that we weren't able to optimize 1 << x == 2 to just
> x == 1.  This is my attempt to fix that: if we see (CST1 << A) == CST2
> and CST2 is a multiple of CST1, use log2 to get rid of the shift, but
> only if the result of the shift is a natural number (including zero).
>
> If CST2 is not a multiple of CST1, then the whole expression can be
> discarded, but I'd like to do that as a follow-up.
> (It would help if our current match.pd grammar allowed us to use "else",
> any plans on doing that?)
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2015-05-28  Marek Polacek  <polacek@redhat.com>
>
>         PR tree-optimization/66299
>         * match.pd ((CST1 << A) == CST2 -> A == log2 (CST2 / CST1),
>         (CST1 << A) != CST2 -> A != log2 (CST2 / CST1)): New
>         patterns.
>
>         * gcc.dg/pr66299-1.c: New test.
>         * gcc.dg/pr66299-2.c: New test.
>
> diff --git gcc/match.pd gcc/match.pd
> index abd7851..5d07a70 100644
> --- gcc/match.pd
> +++ gcc/match.pd
> @@ -676,6 +676,19 @@ along with GCC; see the file COPYING3.  If not see
>    (cmp (bit_and (lshift integer_onep @0) integer_onep) integer_zerop)
>    (icmp @0 { build_zero_cst (TREE_TYPE (@0)); })))
>
> +/* (CST1 << A) == CST2 -> A == log2 (CST2 / CST1)
> +   (CST1 << A) != CST2 -> A != log2 (CST2 / CST1)
> +   if CST2 is a multiple of CST1.  */
> +(for cmp (ne eq)
> + (simplify
> +  (cmp (lshift@3 INTEGER_CST@0 @1) INTEGER_CST@2)
> +  (if ((TREE_CODE (@3) != SSA_NAME || has_single_use (@3))

I think we have the single_use (@3) helper now.  Not sure why you
restrict this here though - we are only creating new constants.

Ok with dropping the single-use check.

> +       && wi::multiple_of_p (@2, @0, TYPE_SIGN (type)))
> +   (with {
> +    int shift = wi::exact_log2 (wi::div_trunc (@2, @0, TYPE_SIGN (type))); }
> +   (if (shift != -1)
> +    (cmp @1 { build_int_cst (TREE_TYPE (@1), shift); }))))))

so with else you mean

        (if (shift != -1)
          ...
          (else-expr ...))

?  Sure that's possible.  Today you can write

       (if (shift != -1)
          ...)
       (if (shift == -1)
         ...)

or

       (if (shift != -1)
         ....)
       (else-expr)

which is equivalent if the if is the only one at the nesting level and
the then-expr
doesn't contain any further ones.  That is, it is equivalent to

     if () ...;
     else-expr;

thus the fall-thru

So (if ...) would get an optional else-expr, yes, that sounds useful.
I think we
already have some (if A ..) (if !A ..) in match.pd.

Thanks,
Richard.

> +
>  /* Simplifications of conversions.  */
>
>  /* Basic strip-useless-type-conversions / strip_nops.  */
> diff --git gcc/testsuite/gcc.dg/pr66299-1.c gcc/testsuite/gcc.dg/pr66299-1.c
> index e69de29..9d41275 100644
> --- gcc/testsuite/gcc.dg/pr66299-1.c
> +++ gcc/testsuite/gcc.dg/pr66299-1.c
> @@ -0,0 +1,83 @@
> +/* PR tree-optimization/66299 */
> +/* { dg-do run } */
> +/* { dg-options "-fdump-tree-original" } */
> +
> +void
> +test1 (int x)
> +{
> +  if ((0 << x) != 0
> +      || (1 << x) != 2
> +      || (2 << x) != 4
> +      || (3 << x) != 6
> +      || (4 << x) != 8
> +      || (5 << x) != 10
> +      || (6 << x) != 12
> +      || (7 << x) != 14
> +      || (8 << x) != 16
> +      || (9 << x) != 18
> +      || (10 << x) != 20)
> +    __builtin_abort ();
> +}
> +
> +void
> +test2 (int x)
> +{
> +  if (!((0 << x) == 0
> +        && (1 << x) == 4
> +        && (2 << x) == 8
> +        && (3 << x) == 12
> +        && (4 << x) == 16
> +        && (5 << x) == 20
> +        && (6 << x) == 24
> +        && (7 << x) == 28
> +        && (8 << x) == 32
> +        && (9 << x) == 36
> +       && (10 << x) == 40))
> +    __builtin_abort ();
> +}
> +
> +void
> +test3 (unsigned int x)
> +{
> +  if ((0U << x) != 0U
> +      || (1U << x) != 16U
> +      || (2U << x) != 32U
> +      || (3U << x) != 48U
> +      || (4U << x) != 64U
> +      || (5U << x) != 80U
> +      || (6U << x) != 96U
> +      || (7U << x) != 112U
> +      || (8U << x) != 128U
> +      || (9U << x) != 144U
> +      || (10U << x) != 160U)
> +    __builtin_abort ();
> +}
> +
> +void
> +test4 (unsigned int x)
> +{
> +  if (!((0U << x) == 0U
> +       || (1U << x) == 8U
> +       || (2U << x) == 16U
> +       || (3U << x) == 24U
> +       || (4U << x) == 32U
> +       || (5U << x) == 40U
> +       || (6U << x) == 48U
> +       || (7U << x) == 56U
> +       || (8U << x) == 64U
> +       || (9U << x) == 72U
> +       || (10U << x) == 80U))
> +    __builtin_abort ();
> +}
> +
> +int
> +main (void)
> +{
> +  test1 (1);
> +  test2 (2);
> +  test3 (4U);
> +  test4 (3U);
> +}
> +
> +/* { dg-final { scan-tree-dump-not "<<" "original" } } */
> +/* { dg-final { cleanup-tree-dump "original" } } */
> diff --git gcc/testsuite/gcc.dg/pr66299-2.c gcc/testsuite/gcc.dg/pr66299-2.c
> index e69de29..dde0549 100644
> --- gcc/testsuite/gcc.dg/pr66299-2.c
> +++ gcc/testsuite/gcc.dg/pr66299-2.c
> @@ -0,0 +1,34 @@
> +/* PR tree-optimization/66299 */
> +/* { dg-do run } */
> +/* { dg-options "-fdump-tree-optimized -O" } */
> +
> +void
> +test1 (int x, unsigned u)
> +{
> +  if ((1U << x) != 64
> +      || (2 << x) != u
> +      || (x << x) != 384
> +      || (3 << x) == 9
> +      || (x << 14) != 98304U
> +      || (1 << x) == 14
> +      || (3 << 2) != 12)
> +    __builtin_abort ();
> +}
> +
> +void
> +test2 (int x)
> +{
> +  unsigned int t = ((unsigned int) 1U << x);
> +  if (t != 2U)
> +    __builtin_abort ();
> +}
> +
> +int
> +main (void)
> +{
> +  test1 (6, 128U);
> +  test2 (1);
> +}
> +
> +/* { dg-final { scan-tree-dump-not "<<" "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
>
>         Marek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
  2015-05-28 13:10 ` Jakub Jelinek
@ 2015-05-28 20:33   ` Marc Glisse
  2015-06-08 15:12     ` Marek Polacek
  0 siblings, 1 reply; 14+ messages in thread
From: Marc Glisse @ 2015-05-28 20:33 UTC (permalink / raw)
  To: Marek Polacek, Jakub Jelinek; +Cc: GCC Patches

On Thu, 28 May 2015, Marek Polacek wrote:

> This PR points out that we weren't able to optimize 1 << x == 2 to just
> x == 1.

Side note: if we are looking for extra patterns to simplify, llvm has an 
almost unlimited supply. Here are a few we don't seem to have (there are 
more where those came from), of course several need constraining / 
generalizing, it is just a list of hints I wrote for myself.

(A|B) & ~(A&B) -> A^B
(A | B) & ((~A) ^ B) -> (A & B)
(A & (~B)) | (A ^ B) -> (A ^ B)
((B | C) & A) | B -> B | (A & C)
A | ( A ^ B) -> A |  B
A | (~A ^ B) -> A | ~B
(A ^ B) & ((B ^ C) ^ A) -> (A ^ B) & ~C
(A ^ B) | ((B ^ C) ^ A) -> (A ^ B) | C
(A & B) | (A ^ B) -> (A | B)
A | ~(A ^ B) -> A | ~B
(A & B) | ((~A) ^ B) -> (~A ^ B)
~(~X & Y) -> (X | ~Y)
~(~X >>s Y) -> (X >>s Y)
(A & B)^(A | B) -> A ^ B
(A | ~B) ^ (~A | B) -> A ^ B
(A & ~B) ^ (~A & B) -> A ^ B
(A ^ C)^(A | B) -> ((~A) & B) ^ C
(A & B) ^ (A ^ B) -> (A | B)
(A & ~B) ^ (~A) -> ~(A & B)
(A&B)+(A^B) -> A|B
(A&B)+(A|B) -> A+B
(A|B)-(A^B) -> A&B
((X | Y) - X) -> (~X & Y)
fmax(x,NaN) -> x
fmax(a,fmax(a,b)) -> fmax(a,b)
(X+2) >u X -> x <u 256-2
(1 << X) <  30 -> X <= 4
((X & ~7) == 0) -> X < 8
2 * X < 5 -> X <= 2
((1 << x)&8) == 0 -> x != 3
((1 << x)&7) == 0 -> x > 2
Y - Z < X - Z -> Y < X
3 * X == 3 * Y -> X == Y
A >> 3 == B >> 3 -> (A ^ B) < 8
(float)int <= 4.4 -> int <= 4
x unle x -> x ord x


> +/* (CST1 << A) == CST2 -> A == log2 (CST2 / CST1)
> +   (CST1 << A) != CST2 -> A != log2 (CST2 / CST1)
> +   if CST2 is a multiple of CST1.  */
> +(for cmp (ne eq)
> + (simplify
> +  (cmp (lshift@3 INTEGER_CST@0 @1) INTEGER_CST@2)
> +  (if ((TREE_CODE (@3) != SSA_NAME || has_single_use (@3))
> +       && wi::multiple_of_p (@2, @0, TYPE_SIGN (type)))

Doesn't "type" refer to the result of the EQ_EXPR here?


On Thu, 28 May 2015, Jakub Jelinek wrote:

> Is CST2 a multiple of CST1 the best test though?
> I mean say in
> (0x8001U << x) == 0x20000U
> 0x20000U isn't a multiple of 0x8001U, yet there is only one
> valid value of x for which it holds (17), so we could very well
> optimize that to x == 17.
> If popcount of the CST1 is 1, then multiple_of_p is supposedly sufficient
> (have you checked if CST1 is negative that it still works?), for others
> supposedly we could have a helper function that would just try
> in a loop all shift counts from 0 to precision - 1, and note when
> (CST1 << b) == CST2 - if for no b, then it should fold regardless of
> has_single_use to false or true, if for exactly one shift count, then
> use a comparison against that shift count, otherwise give up?

ctz(CST2)-ctz(CST1) should provide a single candidate without looping. 
ctz(CST1) is also relevant when CST2==0.

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
  2015-05-28 20:33   ` Marc Glisse
@ 2015-06-08 15:12     ` Marek Polacek
  2015-06-08 17:14       ` Marc Glisse
  0 siblings, 1 reply; 14+ messages in thread
From: Marek Polacek @ 2015-06-08 15:12 UTC (permalink / raw)
  To: Marc Glisse; +Cc: Jakub Jelinek, GCC Patches, Richard Biener

On Thu, May 28, 2015 at 09:48:10PM +0200, Marc Glisse wrote:
> Side note: if we are looking for extra patterns to simplify, llvm has an
> almost unlimited supply. Here are a few we don't seem to have (there are
> more where those came from), of course several need constraining /
> generalizing, it is just a list of hints I wrote for myself.
> 
> (A|B) & ~(A&B) -> A^B
> (A | B) & ((~A) ^ B) -> (A & B)
> (A & (~B)) | (A ^ B) -> (A ^ B)
> ((B | C) & A) | B -> B | (A & C)
> A | ( A ^ B) -> A |  B
> A | (~A ^ B) -> A | ~B
> (A ^ B) & ((B ^ C) ^ A) -> (A ^ B) & ~C
> (A ^ B) | ((B ^ C) ^ A) -> (A ^ B) | C
> (A & B) | (A ^ B) -> (A | B)
> A | ~(A ^ B) -> A | ~B
> (A & B) | ((~A) ^ B) -> (~A ^ B)
> ~(~X & Y) -> (X | ~Y)
> ~(~X >>s Y) -> (X >>s Y)
> (A & B)^(A | B) -> A ^ B
> (A | ~B) ^ (~A | B) -> A ^ B
> (A & ~B) ^ (~A & B) -> A ^ B
> (A ^ C)^(A | B) -> ((~A) & B) ^ C
> (A & B) ^ (A ^ B) -> (A | B)
> (A & ~B) ^ (~A) -> ~(A & B)
> (A&B)+(A^B) -> A|B
> (A&B)+(A|B) -> A+B
> (A|B)-(A^B) -> A&B
> ((X | Y) - X) -> (~X & Y)
> fmax(x,NaN) -> x
> fmax(a,fmax(a,b)) -> fmax(a,b)
> (X+2) >u X -> x <u 256-2
> (1 << X) <  30 -> X <= 4
> ((X & ~7) == 0) -> X < 8
> 2 * X < 5 -> X <= 2
> ((1 << x)&8) == 0 -> x != 3
> ((1 << x)&7) == 0 -> x > 2
> Y - Z < X - Z -> Y < X
> 3 * X == 3 * Y -> X == Y
> A >> 3 == B >> 3 -> (A ^ B) < 8
> (float)int <= 4.4 -> int <= 4
> x unle x -> x ord x

Thanks for this list.  I'll look at implementing (some of) them.
 
> On Thu, 28 May 2015, Jakub Jelinek wrote:
> 
> >Is CST2 a multiple of CST1 the best test though?

Apparently not ;).

> >I mean say in
> >(0x8001U << x) == 0x20000U
> >0x20000U isn't a multiple of 0x8001U, yet there is only one
> >valid value of x for which it holds (17), so we could very well
> >optimize that to x == 17.

Yeah.

> >If popcount of the CST1 is 1, then multiple_of_p is supposedly sufficient
> >(have you checked if CST1 is negative that it still works?), for others
> >supposedly we could have a helper function that would just try
> >in a loop all shift counts from 0 to precision - 1, and note when
> >(CST1 << b) == CST2 - if for no b, then it should fold regardless of
> >has_single_use to false or true, if for exactly one shift count, then
> >use a comparison against that shift count, otherwise give up?
> 
> ctz(CST2)-ctz(CST1) should provide a single candidate without looping.
> ctz(CST1) is also relevant when CST2==0.

That seems to work well so the following patch is an attempt to do it
so.
If CST2 is non-zero, we compute a candidate and verify whether this
candidate works.  If so, we know there's exactly one so we should be
able to fold the shift into comparison.
I've tried even negative numbers and it seems to DTRT, but I'd certainly
appreciate if y'all could take a look at this.  Thanks.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-06-08  Marek Polacek  <polacek@redhat.com>

	PR tree-optimization/66299
	* match.pd ((CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1)
	((CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1)): New
	patterns.

	* gcc.dg/pr66299-1.c: New test.
	* gcc.dg/pr66299-2.c: New test.

diff --git gcc/match.pd gcc/match.pd
index abd7851..32e913c 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -676,6 +676,18 @@ along with GCC; see the file COPYING3.  If not see
   (cmp (bit_and (lshift integer_onep @0) integer_onep) integer_zerop)
   (icmp @0 { build_zero_cst (TREE_TYPE (@0)); })))
 
+/* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1)
+   (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1)
+   if CST2 != 0.  */
+(for cmp (ne eq)
+ (simplify
+  (cmp (lshift INTEGER_CST@0 @1) INTEGER_CST@2)
+  (with {
+   unsigned int cand = wi::ctz (@2) - wi::ctz (@0); }
+  (if (!integer_zerop (@2)
+       && wi::eq_p (wi::lshift (@0, cand), @2))
+   (cmp @1 { build_int_cst (TREE_TYPE (@1), cand); })))))
+
 /* Simplifications of conversions.  */
 
 /* Basic strip-useless-type-conversions / strip_nops.  */
diff --git gcc/testsuite/gcc.dg/pr66299-1.c gcc/testsuite/gcc.dg/pr66299-1.c
index e69de29..e7b978d 100644
--- gcc/testsuite/gcc.dg/pr66299-1.c
+++ gcc/testsuite/gcc.dg/pr66299-1.c
@@ -0,0 +1,92 @@
+/* PR tree-optimization/66299 */
+/* { dg-do run } */
+/* { dg-options "-fdump-tree-original" } */
+
+void
+test1 (int x)
+{
+  if ((0 << x) != 0
+      || (1 << x) != 2
+      || (2 << x) != 4
+      || (3 << x) != 6
+      || (4 << x) != 8
+      || (5 << x) != 10
+      || (6 << x) != 12
+      || (7 << x) != 14
+      || (8 << x) != 16
+      || (9 << x) != 18
+      || (10 << x) != 20)
+    __builtin_abort ();
+}
+
+void
+test2 (int x)
+{
+  if (!((0 << x) == 0
+        && (1 << x) == 4
+        && (2 << x) == 8
+        && (3 << x) == 12
+        && (4 << x) == 16
+        && (5 << x) == 20
+        && (6 << x) == 24
+        && (7 << x) == 28
+        && (8 << x) == 32
+        && (9 << x) == 36
+	&& (10 << x) == 40))
+    __builtin_abort ();
+}
+
+void
+test3 (unsigned int x)
+{
+  if ((0U << x) != 0U
+      || (1U << x) != 16U
+      || (2U << x) != 32U
+      || (3U << x) != 48U
+      || (4U << x) != 64U
+      || (5U << x) != 80U
+      || (6U << x) != 96U
+      || (7U << x) != 112U
+      || (8U << x) != 128U
+      || (9U << x) != 144U
+      || (10U << x) != 160U)
+    __builtin_abort ();
+}
+
+void
+test4 (unsigned int x)
+{
+  if (!((0U << x) == 0U
+	|| (1U << x) == 8U
+	|| (2U << x) == 16U
+	|| (3U << x) == 24U
+	|| (4U << x) == 32U
+	|| (5U << x) == 40U
+	|| (6U << x) == 48U
+	|| (7U << x) == 56U
+	|| (8U << x) == 64U
+	|| (9U << x) == 72U
+	|| (10U << x) == 80U))
+    __builtin_abort ();
+}
+
+void
+test5 (int x)
+{
+  if ((0 << x) == 1
+      || (0 << x) != 0
+      || (0x8001U << x) != 0x20000U)
+    __builtin_abort ();
+}
+
+int
+main (void)
+{
+  test1 (1);
+  test2 (2);
+  test3 (4U);
+  test4 (3U);
+  test5 (17);
+}
+
+/* { dg-final { scan-tree-dump-not "<<" "original" } } */
diff --git gcc/testsuite/gcc.dg/pr66299-2.c gcc/testsuite/gcc.dg/pr66299-2.c
index e69de29..45e9218 100644
--- gcc/testsuite/gcc.dg/pr66299-2.c
+++ gcc/testsuite/gcc.dg/pr66299-2.c
@@ -0,0 +1,33 @@
+/* PR tree-optimization/66299 */
+/* { dg-do run } */
+/* { dg-options "-fdump-tree-optimized -O" } */
+
+void
+test1 (int x, unsigned u)
+{
+  if ((1U << x) != 64
+      || (2 << x) != u
+      || (x << x) != 384
+      || (3 << x) == 9
+      || (x << 14) != 98304U
+      || (1 << x) == 14
+      || (3 << 2) != 12)
+    __builtin_abort ();
+}
+
+void
+test2 (int x)
+{
+  unsigned int t = ((unsigned int) 1U << x);
+  if (t != 2U)
+    __builtin_abort ();
+}
+
+int
+main (void)
+{
+  test1 (6, 128U);
+  test2 (1);
+}
+
+/* { dg-final { scan-tree-dump-not "<<" "optimized" } } */

	Marek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
  2015-06-08 15:12     ` Marek Polacek
@ 2015-06-08 17:14       ` Marc Glisse
  2015-06-09  7:56         ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: Marc Glisse @ 2015-06-08 17:14 UTC (permalink / raw)
  To: Marek Polacek; +Cc: Jakub Jelinek, GCC Patches, Richard Biener

On Mon, 8 Jun 2015, Marek Polacek wrote:

>       PR tree-optimization/66299
>       * match.pd ((CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1)
>       ((CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1)): New

You are braver than I am, I would have abbreviated ctz (CST2) - ctz (CST1)
to CST3 in the ChangeLog ;-)

> +/* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1)
> +   (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1)
> +   if CST2 != 0.  */
> +(for cmp (ne eq)
> + (simplify
> +  (cmp (lshift INTEGER_CST@0 @1) INTEGER_CST@2)
> +  (with {
> +   unsigned int cand = wi::ctz (@2) - wi::ctz (@0); }
> +  (if (!integer_zerop (@2)

You can probably use directly wi::ne_p (@2, 0) here. Shouldn't this be
indented one space more?

> +       && wi::eq_p (wi::lshift (@0, cand), @2))
> +   (cmp @1 { build_int_cst (TREE_TYPE (@1), cand); })))))

Making 'cand' signed, you could return 0 when cand<0, like (2<<x)==1. You
could also return 0 when the candidate turns out not to work: (3<<x)==4.

Tweaking it so that (6<<X)==0 becomes X>=31 for TYPE_OVERFLOW_WRAPS and
false for TYPE_OVERFLOW_UNDEFINED is probably more controversial.

FWIW, the patch looks good to me, thanks.

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
  2015-06-08 17:14       ` Marc Glisse
@ 2015-06-09  7:56         ` Richard Biener
  2015-06-09 11:46           ` Marc Glisse
  2015-06-09 13:46           ` Marek Polacek
  0 siblings, 2 replies; 14+ messages in thread
From: Richard Biener @ 2015-06-09  7:56 UTC (permalink / raw)
  To: Marc Glisse; +Cc: Marek Polacek, Jakub Jelinek, GCC Patches, Richard Biener

On Mon, Jun 8, 2015 at 7:03 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Mon, 8 Jun 2015, Marek Polacek wrote:
>
>>       PR tree-optimization/66299
>>       * match.pd ((CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1)
>>       ((CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1)): New
>
>
> You are braver than I am, I would have abbreviated ctz (CST2) - ctz (CST1)
> to CST3 in the ChangeLog ;-)
>
>> +/* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1)
>> +   (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1)
>> +   if CST2 != 0.  */
>> +(for cmp (ne eq)
>> + (simplify
>> +  (cmp (lshift INTEGER_CST@0 @1) INTEGER_CST@2)
>> +  (with {
>> +   unsigned int cand = wi::ctz (@2) - wi::ctz (@0); }
>> +  (if (!integer_zerop (@2)
>
>
> You can probably use directly wi::ne_p (@2, 0) here. Shouldn't this be
> indented one space more?

Yes, one space more.  I suppose using integer_zerop might in theory
allow for handling vector shifts at some point ...?

>> +       && wi::eq_p (wi::lshift (@0, cand), @2))
>> +   (cmp @1 { build_int_cst (TREE_TYPE (@1), cand); })))))
>
>
> Making 'cand' signed, you could return 0 when cand<0, like (2<<x)==1. You
> could also return 0 when the candidate turns out not to work: (3<<x)==4.

Sounds like a good improvement.

> Tweaking it so that (6<<X)==0 becomes X>=31 for TYPE_OVERFLOW_WRAPS and
> false for TYPE_OVERFLOW_UNDEFINED is probably more controversial.

Hm, yes.  I think signed overflow != shift amount overflow, so testing
the overflow
macros for this isn't valid.

Otherwise the patch looks ok to me as well - mind doing the improvement above?

Thanks,
Richard.

> FWIW, the patch looks good to me, thanks.
>
> --
> Marc Glisse

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
  2015-06-09  7:56         ` Richard Biener
@ 2015-06-09 11:46           ` Marc Glisse
  2015-06-09 11:49             ` Richard Biener
  2015-06-09 13:46           ` Marek Polacek
  1 sibling, 1 reply; 14+ messages in thread
From: Marc Glisse @ 2015-06-09 11:46 UTC (permalink / raw)
  To: Richard Biener; +Cc: Marek Polacek, Jakub Jelinek, GCC Patches, Richard Biener

On Tue, 9 Jun 2015, Richard Biener wrote:

>> Tweaking it so that (6<<X)==0 becomes X>=31 for TYPE_OVERFLOW_WRAPS and
>> false for TYPE_OVERFLOW_UNDEFINED is probably more controversial.
>
> Hm, yes.  I think signed overflow != shift amount overflow, so testing 
> the overflow macros for this isn't valid.

Would it be ok to always turn it to X>=31 then? (the value 31 is 
conveniently already computed in 'cand')

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
  2015-06-09 11:46           ` Marc Glisse
@ 2015-06-09 11:49             ` Richard Biener
  2015-06-09 11:57               ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Biener @ 2015-06-09 11:49 UTC (permalink / raw)
  To: Marc Glisse; +Cc: Richard Biener, Marek Polacek, Jakub Jelinek, GCC Patches

On Tue, 9 Jun 2015, Marc Glisse wrote:

> On Tue, 9 Jun 2015, Richard Biener wrote:
> 
> > > Tweaking it so that (6<<X)==0 becomes X>=31 for TYPE_OVERFLOW_WRAPS and
> > > false for TYPE_OVERFLOW_UNDEFINED is probably more controversial.
> > 
> > Hm, yes.  I think signed overflow != shift amount overflow, so testing the
> > overflow macros for this isn't valid.
> 
> Would it be ok to always turn it to X>=31 then? (the value 31 is conveniently
> already computed in 'cand')

I think so.

Richard.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
  2015-06-09 11:49             ` Richard Biener
@ 2015-06-09 11:57               ` Richard Biener
  2015-06-09 12:13                 ` Marc Glisse
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Biener @ 2015-06-09 11:57 UTC (permalink / raw)
  To: Marc Glisse; +Cc: Richard Biener, Marek Polacek, Jakub Jelinek, GCC Patches

On Tue, 9 Jun 2015, Richard Biener wrote:

> On Tue, 9 Jun 2015, Marc Glisse wrote:
> 
> > On Tue, 9 Jun 2015, Richard Biener wrote:
> > 
> > > > Tweaking it so that (6<<X)==0 becomes X>=31 for TYPE_OVERFLOW_WRAPS and
> > > > false for TYPE_OVERFLOW_UNDEFINED is probably more controversial.
> > > 
> > > Hm, yes.  I think signed overflow != shift amount overflow, so testing the
> > > overflow macros for this isn't valid.
> > 
> > Would it be ok to always turn it to X>=31 then? (the value 31 is conveniently
> > already computed in 'cand')
> 
> I think so.

Or even ((unsigned)X - 31) < 1 (I probably got that wrong) to properly
say X>=29 && X<32, that is, preserve the implicit upper bound on X
we have because it is used in a shift.

Richard.

> Richard.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
  2015-06-09 11:57               ` Richard Biener
@ 2015-06-09 12:13                 ` Marc Glisse
  2015-06-09 12:22                   ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: Marc Glisse @ 2015-06-09 12:13 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Biener, Marek Polacek, Jakub Jelinek, GCC Patches

On Tue, 9 Jun 2015, Richard Biener wrote:

> On Tue, 9 Jun 2015, Richard Biener wrote:
>
>> On Tue, 9 Jun 2015, Marc Glisse wrote:
>>
>>> On Tue, 9 Jun 2015, Richard Biener wrote:
>>>
>>>>> Tweaking it so that (6<<X)==0 becomes X>=31 for TYPE_OVERFLOW_WRAPS and
>>>>> false for TYPE_OVERFLOW_UNDEFINED is probably more controversial.
>>>>
>>>> Hm, yes.  I think signed overflow != shift amount overflow, so testing the
>>>> overflow macros for this isn't valid.
>>>
>>> Would it be ok to always turn it to X>=31 then? (the value 31 is conveniently
>>> already computed in 'cand')
>>
>> I think so.
>
> Or even ((unsigned)X - 31) < 1 (I probably got that wrong) to properly
> say X>=29 && X<32, that is, preserve the implicit upper bound on X
> we have because it is used in a shift.

I don't understand in what sense this preserves the upper bound. I would 
understand storing a range for X (when it is an SSA_NAME, and it would 
require a lot of care not to propagate backwards too far), or more simply 
introducing if(X>=32) __builtin_unreachable();. But you seem to be talking 
about generating more complicated code so that if someone checks 
(6<<123)==0 it returns false?

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
  2015-06-09 12:13                 ` Marc Glisse
@ 2015-06-09 12:22                   ` Richard Biener
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Biener @ 2015-06-09 12:22 UTC (permalink / raw)
  To: Marc Glisse; +Cc: Richard Biener, Marek Polacek, Jakub Jelinek, GCC Patches

On Tue, 9 Jun 2015, Marc Glisse wrote:

> On Tue, 9 Jun 2015, Richard Biener wrote:
> 
> > On Tue, 9 Jun 2015, Richard Biener wrote:
> > 
> > > On Tue, 9 Jun 2015, Marc Glisse wrote:
> > > 
> > > > On Tue, 9 Jun 2015, Richard Biener wrote:
> > > > 
> > > > > > Tweaking it so that (6<<X)==0 becomes X>=31 for TYPE_OVERFLOW_WRAPS
> > > > > > and
> > > > > > false for TYPE_OVERFLOW_UNDEFINED is probably more controversial.
> > > > > 
> > > > > Hm, yes.  I think signed overflow != shift amount overflow, so testing
> > > > > the
> > > > > overflow macros for this isn't valid.
> > > > 
> > > > Would it be ok to always turn it to X>=31 then? (the value 31 is
> > > > conveniently
> > > > already computed in 'cand')
> > > 
> > > I think so.
> > 
> > Or even ((unsigned)X - 31) < 1 (I probably got that wrong) to properly
> > say X>=29 && X<32, that is, preserve the implicit upper bound on X
> > we have because it is used in a shift.
> 
> I don't understand in what sense this preserves the upper bound. I would
> understand storing a range for X (when it is an SSA_NAME, and it would require
> a lot of care not to propagate backwards too far), or more simply introducing
> if(X>=32) __builtin_unreachable();. But you seem to be talking about
> generating more complicated code so that if someone checks (6<<123)==0 it
> returns false?

Well, I'm mixing simplifying the computation and preserving extra
info we got from the complex computation.  So yes, you are right.

Richard.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
  2015-06-09  7:56         ` Richard Biener
  2015-06-09 11:46           ` Marc Glisse
@ 2015-06-09 13:46           ` Marek Polacek
  2015-06-09 14:11             ` Richard Biener
  1 sibling, 1 reply; 14+ messages in thread
From: Marek Polacek @ 2015-06-09 13:46 UTC (permalink / raw)
  To: Richard Biener; +Cc: Marc Glisse, Jakub Jelinek, GCC Patches, Richard Biener

On Tue, Jun 09, 2015 at 09:53:21AM +0200, Richard Biener wrote:
> >> +/* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1)
> >> +   (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1)
> >> +   if CST2 != 0.  */
> >> +(for cmp (ne eq)
> >> + (simplify
> >> +  (cmp (lshift INTEGER_CST@0 @1) INTEGER_CST@2)
> >> +  (with {
> >> +   unsigned int cand = wi::ctz (@2) - wi::ctz (@0); }
> >> +  (if (!integer_zerop (@2)
> >
> >
> > You can probably use directly wi::ne_p (@2, 0) here. Shouldn't this be
> > indented one space more?
> 
> Yes, one space more.  I suppose using integer_zerop might in theory
> allow for handling vector shifts at some point ...?
 
Fixed the formatting.  But I kept the integer_zerop.

> >> +       && wi::eq_p (wi::lshift (@0, cand), @2))
> >> +   (cmp @1 { build_int_cst (TREE_TYPE (@1), cand); })))))
> >
> >
> > Making 'cand' signed, you could return 0 when cand<0, like (2<<x)==1. You
> > could also return 0 when the candidate turns out not to work: (3<<x)==4.
> 
> Sounds like a good improvement.
 
Yeah, it makes sense to do that while I'm on this.  Done, and a new
test added.

> Otherwise the patch looks ok to me as well - mind doing the improvement above?

Thank you both.  How does this look now?

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-06-09  Marek Polacek  <polacek@redhat.com>

	PR tree-optimization/66299
	* match.pd ((CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1)
	((CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1)): New
	patterns.

	* gcc.dg/pr66299-1.c: New test.
	* gcc.dg/pr66299-2.c: New test.
	* gcc.dg/pr66299-3.c: New test.

diff --git gcc/match.pd gcc/match.pd
index abd7851..560b218 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -676,6 +676,21 @@ along with GCC; see the file COPYING3.  If not see
   (cmp (bit_and (lshift integer_onep @0) integer_onep) integer_zerop)
   (icmp @0 { build_zero_cst (TREE_TYPE (@0)); })))
 
+/* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1)
+   (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1)
+   if CST2 != 0.  */
+(for cmp (ne eq)
+ (simplify
+  (cmp (lshift INTEGER_CST@0 @1) INTEGER_CST@2)
+  (with { int cand = wi::ctz (@2) - wi::ctz (@0); }
+   (if (cand < 0
+	|| (!integer_zerop (@2)
+	    && wi::ne_p (wi::lshift (@0, cand), @2)))
+    { constant_boolean_node (cmp == NE_EXPR, type); })
+   (if (!integer_zerop (@2)
+	&& wi::eq_p (wi::lshift (@0, cand), @2))
+    (cmp @1 { build_int_cst (TREE_TYPE (@1), cand); })))))
+
 /* Simplifications of conversions.  */
 
 /* Basic strip-useless-type-conversions / strip_nops.  */
diff --git gcc/testsuite/gcc.dg/pr66299-1.c gcc/testsuite/gcc.dg/pr66299-1.c
index e69de29..e75146b 100644
--- gcc/testsuite/gcc.dg/pr66299-1.c
+++ gcc/testsuite/gcc.dg/pr66299-1.c
@@ -0,0 +1,92 @@
+/* PR tree-optimization/66299 */
+/* { dg-do run } */
+/* { dg-options "-fdump-tree-original" } */
+
+void
+test1 (int x)
+{
+  if ((0 << x) != 0
+      || (1 << x) != 2
+      || (2 << x) != 4
+      || (3 << x) != 6
+      || (4 << x) != 8
+      || (5 << x) != 10
+      || (6 << x) != 12
+      || (7 << x) != 14
+      || (8 << x) != 16
+      || (9 << x) != 18
+      || (10 << x) != 20)
+    __builtin_abort ();
+}
+
+void
+test2 (int x)
+{
+  if (!((0 << x) == 0
+	&& (1 << x) == 4
+	&& (2 << x) == 8
+	&& (3 << x) == 12
+	&& (4 << x) == 16
+	&& (5 << x) == 20
+	&& (6 << x) == 24
+	&& (7 << x) == 28
+	&& (8 << x) == 32
+	&& (9 << x) == 36
+	&& (10 << x) == 40))
+    __builtin_abort ();
+}
+
+void
+test3 (unsigned int x)
+{
+  if ((0U << x) != 0U
+      || (1U << x) != 16U
+      || (2U << x) != 32U
+      || (3U << x) != 48U
+      || (4U << x) != 64U
+      || (5U << x) != 80U
+      || (6U << x) != 96U
+      || (7U << x) != 112U
+      || (8U << x) != 128U
+      || (9U << x) != 144U
+      || (10U << x) != 160U)
+    __builtin_abort ();
+}
+
+void
+test4 (unsigned int x)
+{
+  if (!((0U << x) == 0U
+	|| (1U << x) == 8U
+	|| (2U << x) == 16U
+	|| (3U << x) == 24U
+	|| (4U << x) == 32U
+	|| (5U << x) == 40U
+	|| (6U << x) == 48U
+	|| (7U << x) == 56U
+	|| (8U << x) == 64U
+	|| (9U << x) == 72U
+	|| (10U << x) == 80U))
+    __builtin_abort ();
+}
+
+void
+test5 (int x)
+{
+  if ((0 << x) == 1
+      || (0 << x) != 0
+      || (0x8001U << x) != 0x20000U)
+    __builtin_abort ();
+}
+
+int
+main (void)
+{
+  test1 (1);
+  test2 (2);
+  test3 (4U);
+  test4 (3U);
+  test5 (17);
+}
+
+/* { dg-final { scan-tree-dump-not "<<" "original" } } */
diff --git gcc/testsuite/gcc.dg/pr66299-2.c gcc/testsuite/gcc.dg/pr66299-2.c
index e69de29..45e9218 100644
--- gcc/testsuite/gcc.dg/pr66299-2.c
+++ gcc/testsuite/gcc.dg/pr66299-2.c
@@ -0,0 +1,33 @@
+/* PR tree-optimization/66299 */
+/* { dg-do run } */
+/* { dg-options "-fdump-tree-optimized -O" } */
+
+void
+test1 (int x, unsigned u)
+{
+  if ((1U << x) != 64
+      || (2 << x) != u
+      || (x << x) != 384
+      || (3 << x) == 9
+      || (x << 14) != 98304U
+      || (1 << x) == 14
+      || (3 << 2) != 12)
+    __builtin_abort ();
+}
+
+void
+test2 (int x)
+{
+  unsigned int t = ((unsigned int) 1U << x);
+  if (t != 2U)
+    __builtin_abort ();
+}
+
+int
+main (void)
+{
+  test1 (6, 128U);
+  test2 (1);
+}
+
+/* { dg-final { scan-tree-dump-not "<<" "optimized" } } */
diff --git gcc/testsuite/gcc.dg/pr66299-3.c gcc/testsuite/gcc.dg/pr66299-3.c
index e69de29..ffee049 100644
--- gcc/testsuite/gcc.dg/pr66299-3.c
+++ gcc/testsuite/gcc.dg/pr66299-3.c
@@ -0,0 +1,68 @@
+/* PR tree-optimization/66299 */
+/* { dg-do run } */
+/* { dg-options "-fdump-tree-original" } */
+
+void __attribute__ ((noinline, noclone))
+test1 (int x)
+{
+  if ((2 << x) == 1
+      || (8 << x) == 1
+      || (8 << x) == 2
+      || (3072 << x) == 3
+      || (294912 << x) == 9
+      || (45056 << x) == 11
+      || (2176 << x) == 17)
+    __builtin_abort ();
+}
+
+void __attribute__ ((noinline, noclone))
+test2 (int x)
+{
+  if ((2 << x) != 1
+      && (8 << x) != 1
+      && (8 << x) != 2
+      && (3072 << x) != 3
+      && (294912 << x) != 9
+      && (45056 << x) != 11
+      && (2176 << x) != 17)
+    ;
+  else
+    __builtin_abort ();
+}
+
+void __attribute__ ((noinline, noclone))
+test3 (int x)
+{
+  if ((3 << x) == 4
+      || (1 << x) == 12
+      || (40 << x) == 1024
+      || (2 << x) == 84
+      || (3 << x) == 16384
+      || (10 << x) == 6144)
+    __builtin_abort ();
+}
+
+void __attribute__ ((noinline, noclone))
+test4 (int x)
+{
+  if ((3 << x) != 4
+      && (1 << x) != 12
+      && (40 << x) != 1024
+      && (2 << x) != 84
+      && (3 << x) != 16384
+      && (10 << x) != 6144)
+    ;
+  else
+    __builtin_abort ();
+}
+
+int
+main (void)
+{
+  test1 (0);
+  test2 (1);
+  test3 (1);
+  test4 (2);
+}
+
+/* { dg-final { scan-tree-dump-not "(<<|==|!=)" "original" } } */

	Marek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
  2015-06-09 13:46           ` Marek Polacek
@ 2015-06-09 14:11             ` Richard Biener
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Biener @ 2015-06-09 14:11 UTC (permalink / raw)
  To: Marek Polacek; +Cc: Richard Biener, Marc Glisse, Jakub Jelinek, GCC Patches

On Tue, 9 Jun 2015, Marek Polacek wrote:

> On Tue, Jun 09, 2015 at 09:53:21AM +0200, Richard Biener wrote:
> > >> +/* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1)
> > >> +   (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1)
> > >> +   if CST2 != 0.  */
> > >> +(for cmp (ne eq)
> > >> + (simplify
> > >> +  (cmp (lshift INTEGER_CST@0 @1) INTEGER_CST@2)
> > >> +  (with {
> > >> +   unsigned int cand = wi::ctz (@2) - wi::ctz (@0); }
> > >> +  (if (!integer_zerop (@2)
> > >
> > >
> > > You can probably use directly wi::ne_p (@2, 0) here. Shouldn't this be
> > > indented one space more?
> > 
> > Yes, one space more.  I suppose using integer_zerop might in theory
> > allow for handling vector shifts at some point ...?
>  
> Fixed the formatting.  But I kept the integer_zerop.
> 
> > >> +       && wi::eq_p (wi::lshift (@0, cand), @2))
> > >> +   (cmp @1 { build_int_cst (TREE_TYPE (@1), cand); })))))
> > >
> > >
> > > Making 'cand' signed, you could return 0 when cand<0, like (2<<x)==1. You
> > > could also return 0 when the candidate turns out not to work: (3<<x)==4.
> > 
> > Sounds like a good improvement.
>  
> Yeah, it makes sense to do that while I'm on this.  Done, and a new
> test added.
> 
> > Otherwise the patch looks ok to me as well - mind doing the improvement above?
> 
> Thank you both.  How does this look now?
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2015-06-09  Marek Polacek  <polacek@redhat.com>
> 
> 	PR tree-optimization/66299
> 	* match.pd ((CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1)
> 	((CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1)): New
> 	patterns.
> 
> 	* gcc.dg/pr66299-1.c: New test.
> 	* gcc.dg/pr66299-2.c: New test.
> 	* gcc.dg/pr66299-3.c: New test.
> 
> diff --git gcc/match.pd gcc/match.pd
> index abd7851..560b218 100644
> --- gcc/match.pd
> +++ gcc/match.pd
> @@ -676,6 +676,21 @@ along with GCC; see the file COPYING3.  If not see
>    (cmp (bit_and (lshift integer_onep @0) integer_onep) integer_zerop)
>    (icmp @0 { build_zero_cst (TREE_TYPE (@0)); })))
>  
> +/* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1)
> +   (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1)
> +   if CST2 != 0.  */
> +(for cmp (ne eq)
> + (simplify
> +  (cmp (lshift INTEGER_CST@0 @1) INTEGER_CST@2)
> +  (with { int cand = wi::ctz (@2) - wi::ctz (@0); }
> +   (if (cand < 0
> +	|| (!integer_zerop (@2)
> +	    && wi::ne_p (wi::lshift (@0, cand), @2)))
> +    { constant_boolean_node (cmp == NE_EXPR, type); })
> +   (if (!integer_zerop (@2)
> +	&& wi::eq_p (wi::lshift (@0, cand), @2))
> +    (cmp @1 { build_int_cst (TREE_TYPE (@1), cand); })))))
> +
>  /* Simplifications of conversions.  */
>  
>  /* Basic strip-useless-type-conversions / strip_nops.  */
> diff --git gcc/testsuite/gcc.dg/pr66299-1.c gcc/testsuite/gcc.dg/pr66299-1.c
> index e69de29..e75146b 100644
> --- gcc/testsuite/gcc.dg/pr66299-1.c
> +++ gcc/testsuite/gcc.dg/pr66299-1.c
> @@ -0,0 +1,92 @@
> +/* PR tree-optimization/66299 */
> +/* { dg-do run } */
> +/* { dg-options "-fdump-tree-original" } */
> +
> +void
> +test1 (int x)
> +{
> +  if ((0 << x) != 0
> +      || (1 << x) != 2
> +      || (2 << x) != 4
> +      || (3 << x) != 6
> +      || (4 << x) != 8
> +      || (5 << x) != 10
> +      || (6 << x) != 12
> +      || (7 << x) != 14
> +      || (8 << x) != 16
> +      || (9 << x) != 18
> +      || (10 << x) != 20)
> +    __builtin_abort ();
> +}
> +
> +void
> +test2 (int x)
> +{
> +  if (!((0 << x) == 0
> +	&& (1 << x) == 4
> +	&& (2 << x) == 8
> +	&& (3 << x) == 12
> +	&& (4 << x) == 16
> +	&& (5 << x) == 20
> +	&& (6 << x) == 24
> +	&& (7 << x) == 28
> +	&& (8 << x) == 32
> +	&& (9 << x) == 36
> +	&& (10 << x) == 40))
> +    __builtin_abort ();
> +}
> +
> +void
> +test3 (unsigned int x)
> +{
> +  if ((0U << x) != 0U
> +      || (1U << x) != 16U
> +      || (2U << x) != 32U
> +      || (3U << x) != 48U
> +      || (4U << x) != 64U
> +      || (5U << x) != 80U
> +      || (6U << x) != 96U
> +      || (7U << x) != 112U
> +      || (8U << x) != 128U
> +      || (9U << x) != 144U
> +      || (10U << x) != 160U)
> +    __builtin_abort ();
> +}
> +
> +void
> +test4 (unsigned int x)
> +{
> +  if (!((0U << x) == 0U
> +	|| (1U << x) == 8U
> +	|| (2U << x) == 16U
> +	|| (3U << x) == 24U
> +	|| (4U << x) == 32U
> +	|| (5U << x) == 40U
> +	|| (6U << x) == 48U
> +	|| (7U << x) == 56U
> +	|| (8U << x) == 64U
> +	|| (9U << x) == 72U
> +	|| (10U << x) == 80U))
> +    __builtin_abort ();
> +}
> +
> +void
> +test5 (int x)
> +{
> +  if ((0 << x) == 1
> +      || (0 << x) != 0
> +      || (0x8001U << x) != 0x20000U)
> +    __builtin_abort ();
> +}
> +
> +int
> +main (void)
> +{
> +  test1 (1);
> +  test2 (2);
> +  test3 (4U);
> +  test4 (3U);
> +  test5 (17);
> +}
> +
> +/* { dg-final { scan-tree-dump-not "<<" "original" } } */
> diff --git gcc/testsuite/gcc.dg/pr66299-2.c gcc/testsuite/gcc.dg/pr66299-2.c
> index e69de29..45e9218 100644
> --- gcc/testsuite/gcc.dg/pr66299-2.c
> +++ gcc/testsuite/gcc.dg/pr66299-2.c
> @@ -0,0 +1,33 @@
> +/* PR tree-optimization/66299 */
> +/* { dg-do run } */
> +/* { dg-options "-fdump-tree-optimized -O" } */
> +
> +void
> +test1 (int x, unsigned u)
> +{
> +  if ((1U << x) != 64
> +      || (2 << x) != u
> +      || (x << x) != 384
> +      || (3 << x) == 9
> +      || (x << 14) != 98304U
> +      || (1 << x) == 14
> +      || (3 << 2) != 12)
> +    __builtin_abort ();
> +}
> +
> +void
> +test2 (int x)
> +{
> +  unsigned int t = ((unsigned int) 1U << x);
> +  if (t != 2U)
> +    __builtin_abort ();
> +}
> +
> +int
> +main (void)
> +{
> +  test1 (6, 128U);
> +  test2 (1);
> +}
> +
> +/* { dg-final { scan-tree-dump-not "<<" "optimized" } } */
> diff --git gcc/testsuite/gcc.dg/pr66299-3.c gcc/testsuite/gcc.dg/pr66299-3.c
> index e69de29..ffee049 100644
> --- gcc/testsuite/gcc.dg/pr66299-3.c
> +++ gcc/testsuite/gcc.dg/pr66299-3.c
> @@ -0,0 +1,68 @@
> +/* PR tree-optimization/66299 */
> +/* { dg-do run } */
> +/* { dg-options "-fdump-tree-original" } */
> +
> +void __attribute__ ((noinline, noclone))
> +test1 (int x)
> +{
> +  if ((2 << x) == 1
> +      || (8 << x) == 1
> +      || (8 << x) == 2
> +      || (3072 << x) == 3
> +      || (294912 << x) == 9
> +      || (45056 << x) == 11
> +      || (2176 << x) == 17)
> +    __builtin_abort ();
> +}
> +
> +void __attribute__ ((noinline, noclone))
> +test2 (int x)
> +{
> +  if ((2 << x) != 1
> +      && (8 << x) != 1
> +      && (8 << x) != 2
> +      && (3072 << x) != 3
> +      && (294912 << x) != 9
> +      && (45056 << x) != 11
> +      && (2176 << x) != 17)
> +    ;
> +  else
> +    __builtin_abort ();
> +}
> +
> +void __attribute__ ((noinline, noclone))
> +test3 (int x)
> +{
> +  if ((3 << x) == 4
> +      || (1 << x) == 12
> +      || (40 << x) == 1024
> +      || (2 << x) == 84
> +      || (3 << x) == 16384
> +      || (10 << x) == 6144)
> +    __builtin_abort ();
> +}
> +
> +void __attribute__ ((noinline, noclone))
> +test4 (int x)
> +{
> +  if ((3 << x) != 4
> +      && (1 << x) != 12
> +      && (40 << x) != 1024
> +      && (2 << x) != 84
> +      && (3 << x) != 16384
> +      && (10 << x) != 6144)
> +    ;
> +  else
> +    __builtin_abort ();
> +}
> +
> +int
> +main (void)
> +{
> +  test1 (0);
> +  test2 (1);
> +  test3 (1);
> +  test4 (2);
> +}
> +
> +/* { dg-final { scan-tree-dump-not "(<<|==|!=)" "original" } } */
> 
> 	Marek
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-06-09 14:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-28 12:33 [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299) Marek Polacek
2015-05-28 13:10 ` Jakub Jelinek
2015-05-28 20:33   ` Marc Glisse
2015-06-08 15:12     ` Marek Polacek
2015-06-08 17:14       ` Marc Glisse
2015-06-09  7:56         ` Richard Biener
2015-06-09 11:46           ` Marc Glisse
2015-06-09 11:49             ` Richard Biener
2015-06-09 11:57               ` Richard Biener
2015-06-09 12:13                 ` Marc Glisse
2015-06-09 12:22                   ` Richard Biener
2015-06-09 13:46           ` Marek Polacek
2015-06-09 14:11             ` Richard Biener
2015-05-28 13:16 ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).