RFC: Two minor optimization patterns

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* RFC: Two minor optimization patterns
@ 2015-01-13 22:47 Rasmus Villemoes
  2015-01-13 22:56 ` Andrew Pinski
  2015-01-14 13:14 ` RFC: Two minor optimization patterns Marc Glisse
  0 siblings, 2 replies; 24+ messages in thread
From: Rasmus Villemoes @ 2015-01-13 22:47 UTC (permalink / raw)
  To: gcc-patches

[My first attempt at submitting a patch for gcc, so please forgive me
if I'm not following the right protocol.]

Sometimes rounding a variable to the next even integer is written x += x
& 1. This usually means using an extra register (and hence at least an
extra mov instruction) compared to the equivalent x = (x + 1) & ~1. The
first pattern below tries to do this transformation.

While playing with various ways of rounding down, I noticed that gcc
already optimizes all of x-(x&3), x^(x&3) and x&~(x&3) to simply
x&~3. In fact, x&~(x&y) is rewritten as x&~y. However, the dual of this
is not handled, so I included the second pattern below.

I've tested the below in the sense that gcc compiles and that trivial
test cases get compiled as expected.

Rasmus

diff --git gcc/match.pd gcc/match.pd
index 81c4ee6..04a0bc4 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -262,6 +262,16 @@ along with GCC; see the file COPYING3.  If not see
  (abs tree_expr_nonnegative_p@0)
  @0)

+/* x + (x & 1) -> (x + 1) & ~1 */
+(simplify
+ (plus @0 (bit_and @0 integer_onep@1))
+ (bit_and (plus @0 @1) (bit_not @1)))
+
+/* x | ~(x | y) -> x | ~y */
+(simplify
+ (bit_ior @0 (bit_not (bit_ior @0 @1)))
+ (bit_ior @0 (bit_not @1)))
+

 /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
    when profitable.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC: Two minor optimization patterns
  2015-01-13 22:47 RFC: Two minor optimization patterns Rasmus Villemoes
@ 2015-01-13 22:56 ` Andrew Pinski
  2015-01-14  9:52   ` Richard Biener
  2015-01-14 13:14 ` RFC: Two minor optimization patterns Marc Glisse
  1 sibling, 1 reply; 24+ messages in thread
From: Andrew Pinski @ 2015-01-13 22:56 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: GCC Patches

On Tue, Jan 13, 2015 at 2:41 PM, Rasmus Villemoes <rv@rasmusvillemoes.dk> wrote:
> [My first attempt at submitting a patch for gcc, so please forgive me
> if I'm not following the right protocol.]

There are a few things missing.  For one, a testcase or two for the
added optimizations.

>
> Sometimes rounding a variable to the next even integer is written x += x
> & 1. This usually means using an extra register (and hence at least an
> extra mov instruction) compared to the equivalent x = (x + 1) & ~1. The
> first pattern below tries to do this transformation.
>
> While playing with various ways of rounding down, I noticed that gcc
> already optimizes all of x-(x&3), x^(x&3) and x&~(x&3) to simply
> x&~3. In fact, x&~(x&y) is rewritten as x&~y. However, the dual of this
> is not handled, so I included the second pattern below.
>
> I've tested the below in the sense that gcc compiles and that trivial
> test cases get compiled as expected.

The other thing you missed is a changelog entry for the change you did.
Also you mentioned you tested the patch below but did not mention
which target you tested it on and you should run the full GCC
testsuite.
https://gcc.gnu.org/contribute.html is a good page to start with how
to handle most of the items above.
https://gcc.gnu.org/wiki/HowToPrepareATestcase is a good page on how
to write the testcase for testing the added optimization.

Thanks,
Andrew Pinski

>
> Rasmus
>
>
>
> diff --git gcc/match.pd gcc/match.pd
> index 81c4ee6..04a0bc4 100644
> --- gcc/match.pd
> +++ gcc/match.pd
> @@ -262,6 +262,16 @@ along with GCC; see the file COPYING3.  If not see
>   (abs tree_expr_nonnegative_p@0)
>   @0)
>
> +/* x + (x & 1) -> (x + 1) & ~1 */
> +(simplify
> + (plus @0 (bit_and @0 integer_onep@1))
> + (bit_and (plus @0 @1) (bit_not @1)))
> +
> +/* x | ~(x | y) -> x | ~y */
> +(simplify
> + (bit_ior @0 (bit_not (bit_ior @0 @1)))
> + (bit_ior @0 (bit_not @1)))
> +
>
>  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
>     when profitable.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC: Two minor optimization patterns
  2015-01-13 22:56 ` Andrew Pinski
@ 2015-01-14  9:52   ` Richard Biener
  2015-01-14 12:45     ` Rasmus Villemoes
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2015-01-14  9:52 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Rasmus Villemoes, GCC Patches

On Tue, Jan 13, 2015 at 11:47 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Tue, Jan 13, 2015 at 2:41 PM, Rasmus Villemoes <rv@rasmusvillemoes.dk> wrote:
>> [My first attempt at submitting a patch for gcc, so please forgive me
>> if I'm not following the right protocol.]
>
> There are a few things missing.  For one, a testcase or two for the
> added optimizations.
>
>>
>> Sometimes rounding a variable to the next even integer is written x += x
>> & 1. This usually means using an extra register (and hence at least an
>> extra mov instruction) compared to the equivalent x = (x + 1) & ~1. The
>> first pattern below tries to do this transformation.
>>
>> While playing with various ways of rounding down, I noticed that gcc
>> already optimizes all of x-(x&3), x^(x&3) and x&~(x&3) to simply
>> x&~3.

Does it also handle x+(x&3)?  Where does it handle x - (x&3)?

That is, doesn't the pattern also work for constants other than 1?

Please put it before the abs simplifications after the last one handing
bit_and/bit_ior.

Thanks,
Richard.

> In fact, x&~(x&y) is rewritten as x&~y. However, the dual of this
>> is not handled, so I included the second pattern below.
>>
>> I've tested the below in the sense that gcc compiles and that trivial
>> test cases get compiled as expected.
>
> The other thing you missed is a changelog entry for the change you did.
> Also you mentioned you tested the patch below but did not mention
> which target you tested it on and you should run the full GCC
> testsuite.
> https://gcc.gnu.org/contribute.html is a good page to start with how
> to handle most of the items above.
> https://gcc.gnu.org/wiki/HowToPrepareATestcase is a good page on how
> to write the testcase for testing the added optimization.
>
> Thanks,
> Andrew Pinski
>
>>
>> Rasmus
>>
>>
>>
>> diff --git gcc/match.pd gcc/match.pd
>> index 81c4ee6..04a0bc4 100644
>> --- gcc/match.pd
>> +++ gcc/match.pd
>> @@ -262,6 +262,16 @@ along with GCC; see the file COPYING3.  If not see
>>   (abs tree_expr_nonnegative_p@0)
>>   @0)
>>
>> +/* x + (x & 1) -> (x + 1) & ~1 */
>> +(simplify
>> + (plus @0 (bit_and @0 integer_onep@1))
>> + (bit_and (plus @0 @1) (bit_not @1)))
>> +
>> +/* x | ~(x | y) -> x | ~y */
>> +(simplify
>> + (bit_ior @0 (bit_not (bit_ior @0 @1)))
>> + (bit_ior @0 (bit_not @1)))
>> +
>>
>>  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
>>     when profitable.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC: Two minor optimization patterns
  2015-01-14  9:52   ` Richard Biener
@ 2015-01-14 12:45     ` Rasmus Villemoes
  2015-01-14 14:01       ` Richard Biener
  0 siblings, 1 reply; 24+ messages in thread
From: Rasmus Villemoes @ 2015-01-14 12:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: Andrew Pinski, GCC Patches

On Wed, Jan 14 2015, Richard Biener <richard.guenther@gmail.com> wrote:

> On Tue, Jan 13, 2015 at 11:47 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Tue, Jan 13, 2015 at 2:41 PM, Rasmus Villemoes <rv@rasmusvillemoes.dk> wrote:
>>> [My first attempt at submitting a patch for gcc, so please forgive me
>>> if I'm not following the right protocol.]
>>
>> There are a few things missing.  For one, a testcase or two for the
>> added optimizations.

I'll see what I can come up with. Thanks for the pointers.

>>> Sometimes rounding a variable to the next even integer is written x += x
>>> & 1. This usually means using an extra register (and hence at least an
>>> extra mov instruction) compared to the equivalent x = (x + 1) & ~1. The
>>> first pattern below tries to do this transformation.
>>>
>>> While playing with various ways of rounding down, I noticed that gcc
>>> already optimizes all of x-(x&3), x^(x&3) and x&~(x&3) to simply
>>> x&~3.
>
> Does it also handle x+(x&3)?

I'm not sure what 'it' refers to, and I'm also not sure how you think
x+(x&3) could be rewritten.

> Where does it handle x - (x&3)?

If by 'it' you mean gcc, I tried looking for a pattern matching this,
but couldn't find it, so I don't know where it is handled. I can just
see by running gcc-5.0 -fdump-tree-original -O2 -c opt.c that "x - (x &
3)" is rewritten as x & -4 (which is of course the same as x & ~3). Btw,
I now see that neither x&~(x&3) or x&~(x&y) are rewritten that early,
but objdump -d shows that the end result is the same.

> That is, doesn't the pattern also work for constants other than 1?

Here I assume that 'the pattern' refers to the first pattern, and the
answer is 'not immediately'. To round up a number to the next multiple
of 2^k we need to add the negative of that number modulo 2^k. It just so
happens that for k=1 we have x==-x for both possible values of x. So
with a little tweak, this does in fact lead to an optimization
opportunity, namely x + ((-x) & m) -> (x + m) & ~m whenever m is one
less than a power of 2. I don't know how to check for m satisfying this
in the match.pd language.

> Please put it before the abs simplifications after the last one handing
> bit_and/bit_ior.

OK, will do.

Rasmus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC: Two minor optimization patterns
  2015-01-13 22:47 RFC: Two minor optimization patterns Rasmus Villemoes
  2015-01-13 22:56 ` Andrew Pinski
@ 2015-01-14 13:14 ` Marc Glisse
  2015-01-14 13:58   ` Richard Biener
  1 sibling, 1 reply; 24+ messages in thread
From: Marc Glisse @ 2015-01-14 13:14 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: gcc-patches

On Tue, 13 Jan 2015, Rasmus Villemoes wrote:

> diff --git gcc/match.pd gcc/match.pd
> index 81c4ee6..04a0bc4 100644
> --- gcc/match.pd
> +++ gcc/match.pd
> @@ -262,6 +262,16 @@ along with GCC; see the file COPYING3.  If not see
>  (abs tree_expr_nonnegative_p@0)
>  @0)
>
> +/* x + (x & 1) -> (x + 1) & ~1 */
> +(simplify
> + (plus @0 (bit_and @0 integer_onep@1))
> + (bit_and (plus @0 @1) (bit_not @1)))
> +
> +/* x | ~(x | y) -> x | ~y */
> +(simplify
> + (bit_ior @0 (bit_not (bit_ior @0 @1)))
> + (bit_ior @0 (bit_not @1)))

You may want to consider using has_single_use (see other patterns).

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC: Two minor optimization patterns
  2015-01-14 13:14 ` RFC: Two minor optimization patterns Marc Glisse
@ 2015-01-14 13:58   ` Richard Biener
  2015-01-14 14:31     ` Marc Glisse
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2015-01-14 13:58 UTC (permalink / raw)
  To: GCC Patches; +Cc: Rasmus Villemoes

On Wed, Jan 14, 2015 at 1:45 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Tue, 13 Jan 2015, Rasmus Villemoes wrote:
>
>> diff --git gcc/match.pd gcc/match.pd
>> index 81c4ee6..04a0bc4 100644
>> --- gcc/match.pd
>> +++ gcc/match.pd
>> @@ -262,6 +262,16 @@ along with GCC; see the file COPYING3.  If not see
>>  (abs tree_expr_nonnegative_p@0)
>>  @0)
>>
>> +/* x + (x & 1) -> (x + 1) & ~1 */
>> +(simplify
>> + (plus @0 (bit_and @0 integer_onep@1))
>> + (bit_and (plus @0 @1) (bit_not @1)))
>> +
>> +/* x | ~(x | y) -> x | ~y */
>> +(simplify
>> + (bit_ior @0 (bit_not (bit_ior @0 @1)))
>> + (bit_ior @0 (bit_not @1)))
>
>
> You may want to consider using has_single_use (see other patterns).

Just to clarify, you mean on the (x | y) sub-expression?

Richard.

> --
> Marc Glisse

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC: Two minor optimization patterns
  2015-01-14 12:45     ` Rasmus Villemoes
@ 2015-01-14 14:01       ` Richard Biener
  2015-01-21 10:50         ` [PATCH 0/4] A few " Rasmus Villemoes
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2015-01-14 14:01 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: Andrew Pinski, GCC Patches

On Wed, Jan 14, 2015 at 1:23 PM, Rasmus Villemoes <rv@rasmusvillemoes.dk> wrote:
> On Wed, Jan 14 2015, Richard Biener <richard.guenther@gmail.com> wrote:
>
>> On Tue, Jan 13, 2015 at 11:47 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>>> On Tue, Jan 13, 2015 at 2:41 PM, Rasmus Villemoes <rv@rasmusvillemoes.dk> wrote:
>>>> [My first attempt at submitting a patch for gcc, so please forgive me
>>>> if I'm not following the right protocol.]
>>>
>>> There are a few things missing.  For one, a testcase or two for the
>>> added optimizations.
>
> I'll see what I can come up with. Thanks for the pointers.
>
>>>> Sometimes rounding a variable to the next even integer is written x += x
>>>> & 1. This usually means using an extra register (and hence at least an
>>>> extra mov instruction) compared to the equivalent x = (x + 1) & ~1. The
>>>> first pattern below tries to do this transformation.
>>>>
>>>> While playing with various ways of rounding down, I noticed that gcc
>>>> already optimizes all of x-(x&3), x^(x&3) and x&~(x&3) to simply
>>>> x&~3.
>>
>> Does it also handle x+(x&3)?
>
> I'm not sure what 'it' refers to, and I'm also not sure how you think
> x+(x&3) could be rewritten.

I was just guessing.

>> Where does it handle x - (x&3)?
>
> If by 'it' you mean gcc, I tried looking for a pattern matching this,
> but couldn't find it, so I don't know where it is handled. I can just
> see by running gcc-5.0 -fdump-tree-original -O2 -c opt.c that "x - (x &
> 3)" is rewritten as x & -4 (which is of course the same as x & ~3).

That's done in fold-const.c:fold_binary_loc here:

          /* Fold A - (A & B) into ~B & A.  */
          if (!TREE_SIDE_EFFECTS (arg0)
              && TREE_CODE (arg1) == BIT_AND_EXPR)
            {
...

(note that patterns are not fully moved to match.pd yet)

> Btw,
> I now see that neither x&~(x&3) or x&~(x&y) are rewritten that early,
> but objdump -d shows that the end result is the same.
>
>> That is, doesn't the pattern also work for constants other than 1?
>
> Here I assume that 'the pattern' refers to the first pattern, and the
> answer is 'not immediately'. To round up a number to the next multiple
> of 2^k we need to add the negative of that number modulo 2^k. It just so
> happens that for k=1 we have x==-x for both possible values of x. So
> with a little tweak, this does in fact lead to an optimization
> opportunity, namely x + ((-x) & m) -> (x + m) & ~m whenever m is one
> less than a power of 2. I don't know how to check for m satisfying this
> in the match.pd language.

you'd need to write some C code involving trees in a if/with.  We do
have a integer_pow2p predicate but not a integer_one_less_than_pow2p
one.

>
>> Please put it before the abs simplifications after the last one handing
>> bit_and/bit_ior.
>
> OK, will do.

Thanks,
Richard.

> Rasmus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC: Two minor optimization patterns
  2015-01-14 13:58   ` Richard Biener
@ 2015-01-14 14:31     ` Marc Glisse
  2015-01-14 14:49       ` Richard Biener
  0 siblings, 1 reply; 24+ messages in thread
From: Marc Glisse @ 2015-01-14 14:31 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Rasmus Villemoes

On Wed, 14 Jan 2015, Richard Biener wrote:

> On Wed, Jan 14, 2015 at 1:45 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
>> On Tue, 13 Jan 2015, Rasmus Villemoes wrote:
>>
>>> diff --git gcc/match.pd gcc/match.pd
>>> index 81c4ee6..04a0bc4 100644
>>> --- gcc/match.pd
>>> +++ gcc/match.pd
>>> @@ -262,6 +262,16 @@ along with GCC; see the file COPYING3.  If not see
>>>  (abs tree_expr_nonnegative_p@0)
>>>  @0)
>>>
>>> +/* x + (x & 1) -> (x + 1) & ~1 */
>>> +(simplify
>>> + (plus @0 (bit_and @0 integer_onep@1))
>>> + (bit_and (plus @0 @1) (bit_not @1)))
>>> +
>>> +/* x | ~(x | y) -> x | ~y */
>>> +(simplify
>>> + (bit_ior @0 (bit_not (bit_ior @0 @1)))
>>> + (bit_ior @0 (bit_not @1)))
>>
>>
>> You may want to consider using has_single_use (see other patterns).
>
> Just to clarify, you mean on the (x | y) sub-expression?

I meant on (x & 1) for the first pattern and on ~(x | y) for the second 
one. That is, cases where the transformation could actually increase the 
number of statements (we don't have a convenient interface to check if 
(x+1) or ~y are already available).

(x | y) could sometimes be cheaper than y if it is already computed and it 
shortens the lifetime of y, but we are way too early in the pipeline to 
make an informed decision about that, so it seems better to canonicalize.

Now that I think about it, some platforms probably have an instruction 
ornot, so my reasons above could be wrong...

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC: Two minor optimization patterns
  2015-01-14 14:31     ` Marc Glisse
@ 2015-01-14 14:49       ` Richard Biener
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Biener @ 2015-01-14 14:49 UTC (permalink / raw)
  To: GCC Patches; +Cc: Rasmus Villemoes

On Wed, Jan 14, 2015 at 3:26 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Wed, 14 Jan 2015, Richard Biener wrote:
>
>> On Wed, Jan 14, 2015 at 1:45 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
>>>
>>> On Tue, 13 Jan 2015, Rasmus Villemoes wrote:
>>>
>>>> diff --git gcc/match.pd gcc/match.pd
>>>> index 81c4ee6..04a0bc4 100644
>>>> --- gcc/match.pd
>>>> +++ gcc/match.pd
>>>> @@ -262,6 +262,16 @@ along with GCC; see the file COPYING3.  If not see
>>>>  (abs tree_expr_nonnegative_p@0)
>>>>  @0)
>>>>
>>>> +/* x + (x & 1) -> (x + 1) & ~1 */
>>>> +(simplify
>>>> + (plus @0 (bit_and @0 integer_onep@1))
>>>> + (bit_and (plus @0 @1) (bit_not @1)))
>>>> +
>>>> +/* x | ~(x | y) -> x | ~y */
>>>> +(simplify
>>>> + (bit_ior @0 (bit_not (bit_ior @0 @1)))
>>>> + (bit_ior @0 (bit_not @1)))
>>>
>>>
>>>
>>> You may want to consider using has_single_use (see other patterns).
>>
>>
>> Just to clarify, you mean on the (x | y) sub-expression?
>
>
> I meant on (x & 1) for the first pattern and on ~(x | y) for the second one.
> That is, cases where the transformation could actually increase the number
> of statements (we don't have a convenient interface to check if (x+1) or ~y
> are already available).
>
> (x | y) could sometimes be cheaper than y if it is already computed and it
> shortens the lifetime of y, but we are way too early in the pipeline to make
> an informed decision about that, so it seems better to canonicalize.

Yeah.  Note that I've not yet settled myself on how to attack the
"single-use issue" generally.  For example I'd like to do code
generation for a specific simplification variant for value-numbering which
can check the availability of expressions - here explicit has_single_use
calls wil be prohibitive.  It should be the responsibility of the caller
to apply restrictions like this (but for the tree-ssa-forwprop.c case I've
not yet come up with a reasonable way to tame things down...)

Richard.

> Now that I think about it, some platforms probably have an instruction
> ornot, so my reasons above could be wrong...



> --
> Marc Glisse

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/4] match.pd: Add x + (x & 1) -> (x + 1) & ~1 pattern
  2015-01-21 10:50         ` [PATCH 0/4] A few " Rasmus Villemoes
@ 2015-01-21 10:50           ` Rasmus Villemoes
  2015-04-30  9:34             ` Richard Biener
  2015-05-01 18:26             ` Jeff Law
  2015-01-21 10:55           ` [PATCH 2/4] match.pd: Add x & ~(x & y) -> x & ~y pattern Rasmus Villemoes
                             ` (2 subsequent siblings)
  3 siblings, 2 replies; 24+ messages in thread
From: Rasmus Villemoes @ 2015-01-21 10:50 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Andrew Pinski, Rasmus Villemoes

gcc.dg/20150120-1.c: New test

Rounding an integer to the next even integer is sometimes written x +=
x & 1. The equivalent x = (x+1)&~1 usually uses one less register, and
in practical cases only the new value of x will be used (making it
unlikely that the subexpression x&1 has any uses).

Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
---
 gcc/match.pd                      |  6 +++++
 gcc/testsuite/gcc.dg/20150120-1.c | 51 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/20150120-1.c

diff --git gcc/match.pd gcc/match.pd
index 81c4ee6..ecefcfb 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -255,6 +255,12 @@ along with GCC; see the file COPYING3.  If not see
   (bitop @0 @0)
   (non_lvalue @0)))
 
+/* x + (x & 1) -> (x + 1) & ~1 */
+(simplify
+ (plus:c @0 (bit_and@2 @0 integer_onep@1))
+ (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
+  (bit_and (plus @0 @1) (bit_not @1))))
+
 (simplify
  (abs (negate @0))
  (abs @0))
diff --git gcc/testsuite/gcc.dg/20150120-1.c gcc/testsuite/gcc.dg/20150120-1.c
new file mode 100644
index 0000000..18906c4
--- /dev/null
+++ gcc/testsuite/gcc.dg/20150120-1.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-original" } */
+
+/* x + (x & 1) -> (x + 1) & ~1 */
+int
+fn1 (int x)
+{
+	return x + (x & 1);
+}
+int
+fn2 (int x)
+{
+	return (x & 1) + x;
+}
+int
+fn3 (int x)
+{
+	return x + (1 & x);
+}
+int
+fn4 (int x)
+{
+	return (1 & x) + x;
+}
+unsigned int
+fn5 (unsigned int x)
+{
+	return x + (x & 1);
+}
+unsigned int
+fn6 (unsigned int x)
+{
+	return (x & 1) + x;
+}
+unsigned int
+fn7 (unsigned int x)
+{
+	return x + (x % 2);
+}
+unsigned int
+fn8 (unsigned int x)
+{
+	return (x % 2) + x;
+}
+unsigned int
+fn9 (unsigned int x)
+{
+	return (1LL & x) + x;
+}
+
+/* { dg-final { scan-tree-dump-times "x \\+ 1" 9 "original" } } */
-- 
2.1.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 0/4] A few optimization patterns
  2015-01-14 14:01       ` Richard Biener
@ 2015-01-21 10:50         ` Rasmus Villemoes
  2015-01-21 10:50           ` [PATCH 1/4] match.pd: Add x + (x & 1) -> (x + 1) & ~1 pattern Rasmus Villemoes
                             ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Rasmus Villemoes @ 2015-01-21 10:50 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Andrew Pinski, Rasmus Villemoes

This adds four optimization patterns to match.pd, along with trivial
test cases. Compiles and works as expected, and 'make check' on x86_64
gives the same number of "unexpected failures" before and after (8
from gfortran.dg/guality/pr41558.f90, 1 from failing to compile
gcc.dg/plugin/ggcplug.c).

I know almost nothing about the internals of gcc, so 4/4 may very well
be considered ugly - it was what I could stitch together from pieces I
picked up here and there.

Rasmus Villemoes (4):
  match.pd: Add x + (x & 1) -> (x + 1) & ~1 pattern
  match.pd: Add x & ~(x & y) -> x & ~y pattern
  match.pd: Add x | ~(x | y) -> x | ~y pattern
  match.pd: Add x + ((-x) & m) -> (x + m) & ~m pattern

 gcc/match.pd                      | 27 ++++++++++++++++++
 gcc/testsuite/gcc.dg/20150120-1.c | 51 +++++++++++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/20150120-2.c | 32 +++++++++++++++++++++
 gcc/testsuite/gcc.dg/20150120-3.c | 32 +++++++++++++++++++++
 gcc/testsuite/gcc.dg/20150120-4.c | 59 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 201 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/20150120-1.c
 create mode 100644 gcc/testsuite/gcc.dg/20150120-2.c
 create mode 100644 gcc/testsuite/gcc.dg/20150120-3.c
 create mode 100644 gcc/testsuite/gcc.dg/20150120-4.c

-- 
2.1.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/4] match.pd: Add x & ~(x & y) -> x & ~y pattern
  2015-01-21 10:50         ` [PATCH 0/4] A few " Rasmus Villemoes
  2015-01-21 10:50           ` [PATCH 1/4] match.pd: Add x + (x & 1) -> (x + 1) & ~1 pattern Rasmus Villemoes
@ 2015-01-21 10:55           ` Rasmus Villemoes
  2015-05-01 18:29             ` Jeff Law
  2015-01-21 10:58           ` [PATCH 3/4] match.pd: Add x | ~(x | y) -> x | " Rasmus Villemoes
  2015-01-21 11:17           ` [PATCH 4/4] match.pd: Add x + ((-x) & m) -> (x + m) & ~m pattern Rasmus Villemoes
  3 siblings, 1 reply; 24+ messages in thread
From: Rasmus Villemoes @ 2015-01-21 10:55 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Andrew Pinski, Rasmus Villemoes

gcc.dg/20150120-2.c: New test

Clearing a certain subset of bits, for example to round down x to a
multiple of a power of 2, is sometimes written x & ~(x & y), where y
may or may not be a constant. It is shorter to use x & ~y,
particularly when y is a constant. gcc already does this when it is
spelled x - (x & y) or x ^ (x & y).

Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
---
 gcc/match.pd                      |  6 ++++++
 gcc/testsuite/gcc.dg/20150120-2.c | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/20150120-2.c

diff --git gcc/match.pd gcc/match.pd
index ecefcfb..d25fc3a 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -261,6 +261,12 @@ along with GCC; see the file COPYING3.  If not see
  (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
   (bit_and (plus @0 @1) (bit_not @1))))
 
+/* x & ~(x & y) -> x & ~y */
+(simplify
+ (bit_and:c @0 (bit_not (bit_and:c@2 @0 @1)))
+ (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
+  (bit_and @0 (bit_not @1))))
+
 (simplify
  (abs (negate @0))
  (abs @0))
diff --git gcc/testsuite/gcc.dg/20150120-2.c gcc/testsuite/gcc.dg/20150120-2.c
new file mode 100644
index 0000000..976b654
--- /dev/null
+++ gcc/testsuite/gcc.dg/20150120-2.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-original" } */
+
+/* x & ~(x & y) -> x & ~y */
+int fn1 (int x, int y)
+{
+	return x & ~(x & y);
+}
+int fn2 (int x, int y)
+{
+	return ~(x & y) & x;
+}
+int fn3 (int x, int y)
+{
+	return x & ~(y & x);
+}
+int fn4 (int x, int y)
+{
+	return ~(y & x) & x;
+}
+int fn5 (int z)
+{
+	return z & ~(z & 3);
+}
+int fn6 (int z)
+{
+	return ~(z & 3) & z;
+}
+
+
+/* { dg-final { scan-tree-dump-times "~y & x" 4 "original" } } */
+/* { dg-final { scan-tree-dump-times "z & -4" 2 "original" } } */
-- 
2.1.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 3/4] match.pd: Add x | ~(x | y) -> x | ~y pattern
  2015-01-21 10:50         ` [PATCH 0/4] A few " Rasmus Villemoes
  2015-01-21 10:50           ` [PATCH 1/4] match.pd: Add x + (x & 1) -> (x + 1) & ~1 pattern Rasmus Villemoes
  2015-01-21 10:55           ` [PATCH 2/4] match.pd: Add x & ~(x & y) -> x & ~y pattern Rasmus Villemoes
@ 2015-01-21 10:58           ` Rasmus Villemoes
  2015-01-21 11:32             ` Marek Polacek
  2015-01-21 11:17           ` [PATCH 4/4] match.pd: Add x + ((-x) & m) -> (x + m) & ~m pattern Rasmus Villemoes
  3 siblings, 1 reply; 24+ messages in thread
From: Rasmus Villemoes @ 2015-01-21 10:58 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Andrew Pinski, Rasmus Villemoes

gcc.dg/20150120-3.c: New test

This is simply the 'dual' of the previous pattern, added for
completeness.

Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
---
 gcc/match.pd                      |  6 ++++++
 gcc/testsuite/gcc.dg/20150120-3.c | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/20150120-3.c

diff --git gcc/match.pd gcc/match.pd
index d25fc3a..47865f1 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -267,6 +267,12 @@ along with GCC; see the file COPYING3.  If not see
  (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
   (bit_and @0 (bit_not @1))))
 
+/* x | ~(x | y) -> x | ~y */
+(simplify
+ (bit_ior:c @0 (bit_not (bit_ior:c@2 @0 @1)))
+ (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
+  (bit_ior @0 (bit_not @1))))
+
 (simplify
  (abs (negate @0))
  (abs @0))
diff --git gcc/testsuite/gcc.dg/20150120-3.c gcc/testsuite/gcc.dg/20150120-3.c
new file mode 100644
index 0000000..322556f
--- /dev/null
+++ gcc/testsuite/gcc.dg/20150120-3.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-original" } */
+
+/* x | ~(x | y) -> x | ~y */
+int fn1 (int x, int y)
+{
+	return x | ~(x | y);
+}
+int fn2 (int x, int y)
+{
+	return ~(x | y) | x;
+}
+int fn3 (int x, int y)
+{
+	return x | ~(y | x);
+}
+int fn4 (int x, int y)
+{
+	return ~(y | x) | x;
+}
+int fn5 (int z)
+{
+	return z | ~(z | 3);
+}
+int fn6 (int z)
+{
+	return ~(z | 3) | z;
+}
+
+
+/* { dg-final { scan-tree-dump-times "~y \\| x" 4 "original" } } */
+/* { dg-final { scan-tree-dump-times "z \\| -4" 2 "original" } } */
-- 
2.1.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 4/4] match.pd: Add x + ((-x) & m) -> (x + m) & ~m pattern
  2015-01-21 10:50         ` [PATCH 0/4] A few " Rasmus Villemoes
                             ` (2 preceding siblings ...)
  2015-01-21 10:58           ` [PATCH 3/4] match.pd: Add x | ~(x | y) -> x | " Rasmus Villemoes
@ 2015-01-21 11:17           ` Rasmus Villemoes
  2015-04-30  9:42             ` Richard Biener
  3 siblings, 1 reply; 24+ messages in thread
From: Rasmus Villemoes @ 2015-01-21 11:17 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Andrew Pinski, Rasmus Villemoes

Generalizing the x+(x&1) pattern, one can round up x to a multiple of
a 2^k by adding the negative of x modulo 2^k. But it is fewer
instructions, and presumably requires fewer registers, to do the more
common (x+m)&~m where m=2^k-1.

Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
---
 gcc/match.pd                      |  9 ++++++
 gcc/testsuite/gcc.dg/20150120-4.c | 59 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/20150120-4.c

diff --git gcc/match.pd gcc/match.pd
index 47865f1..93c2298 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -273,6 +273,15 @@ along with GCC; see the file COPYING3.  If not see
  (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
   (bit_ior @0 (bit_not @1))))
 
+/* x + ((-x) & m) -> (x + m) & ~m when m == 2^k-1.  */
+(simplify
+ (plus:c @0 (bit_and@2 (negate @0) CONSTANT_CLASS_P@1))
+ (with { tree cst = fold_binary (PLUS_EXPR, TREE_TYPE (@1),
+				 @1, build_one_cst (TREE_TYPE (@1))); }
+  (if ((TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
+	&& cst && integer_pow2p (cst))
+   (bit_and (plus @0 @1) (bit_not @1)))))
+
 (simplify
  (abs (negate @0))
  (abs @0))
diff --git gcc/testsuite/gcc.dg/20150120-4.c gcc/testsuite/gcc.dg/20150120-4.c
new file mode 100644
index 0000000..c3552bf
--- /dev/null
+++ gcc/testsuite/gcc.dg/20150120-4.c
@@ -0,0 +1,59 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-original" } */
+
+/* x + ((-x) & m) -> (x + m) & ~m for m one less than a pow2.  */
+int
+fn1 (int x)
+{
+	return x + ((-x) & 7);
+}
+int
+fn2 (int x)
+{
+	return ((-x) & 7) + x;
+}
+unsigned int
+fn3 (unsigned int x)
+{
+	return x + ((-x) & 7);
+}
+unsigned int
+fn4 (unsigned int x)
+{
+	return ((-x) & 7) + x;
+}
+unsigned int
+fn5 (unsigned int x)
+{
+	return x + ((-x) % 8);
+}
+unsigned int
+fn6 (unsigned int x)
+{
+	return ((-x) % 8) + x;
+}
+int
+fn7 (int x)
+{
+	return x + ((-x) & 9);
+}
+int
+fn8 (int x)
+{
+	return ((-x) & 9) + x;
+}
+unsigned int
+fn9 (unsigned int x)
+{
+	return x + ((-x) & ~0U);
+}
+unsigned int
+fn10 (unsigned int x)
+{
+	return ((-x) & ~0U) + x;
+}
+
+
+/* { dg-final { scan-tree-dump-times "x \\+ 7" 6 "original" } } */
+/* { dg-final { scan-tree-dump-times "-x & 9" 2 "original" } } */
+/* { dg-final { scan-tree-dump-times "return 0" 2 "original" } } */
-- 
2.1.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] match.pd: Add x | ~(x | y) -> x | ~y pattern
  2015-01-21 10:58           ` [PATCH 3/4] match.pd: Add x | ~(x | y) -> x | " Rasmus Villemoes
@ 2015-01-21 11:32             ` Marek Polacek
  0 siblings, 0 replies; 24+ messages in thread
From: Marek Polacek @ 2015-01-21 11:32 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: GCC Patches, Richard Biener, Andrew Pinski

On Wed, Jan 21, 2015 at 11:49:53AM +0100, Rasmus Villemoes wrote:
> gcc.dg/20150120-3.c: New test
> 
> This is simply the 'dual' of the previous pattern, added for
> completeness.

If this is dual, I think you could make use of
(for op (bit_ior bit_and)
 ...
and do the simplification in one hunk.

	Marek

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] match.pd: Add x + (x & 1) -> (x + 1) & ~1 pattern
  2015-01-21 10:50           ` [PATCH 1/4] match.pd: Add x + (x & 1) -> (x + 1) & ~1 pattern Rasmus Villemoes
@ 2015-04-30  9:34             ` Richard Biener
  2015-05-01 18:26             ` Jeff Law
  1 sibling, 0 replies; 24+ messages in thread
From: Richard Biener @ 2015-04-30  9:34 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: GCC Patches, Andrew Pinski

On Wed, Jan 21, 2015 at 11:49 AM, Rasmus Villemoes
<rv@rasmusvillemoes.dk> wrote:
> gcc.dg/20150120-1.c: New test
>
> Rounding an integer to the next even integer is sometimes written x +=
> x & 1. The equivalent x = (x+1)&~1 usually uses one less register, and
> in practical cases only the new value of x will be used (making it
> unlikely that the subexpression x&1 has any uses).

Now that we are in stage1 again

You are missig a ChangeLog entry and fail to state how you tested the patch.

Otherwise the patch looks ok.

Thanks,
Richard.

> Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
> ---
>  gcc/match.pd                      |  6 +++++
>  gcc/testsuite/gcc.dg/20150120-1.c | 51 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 57 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/20150120-1.c
>
> diff --git gcc/match.pd gcc/match.pd
> index 81c4ee6..ecefcfb 100644
> --- gcc/match.pd
> +++ gcc/match.pd
> @@ -255,6 +255,12 @@ along with GCC; see the file COPYING3.  If not see
>    (bitop @0 @0)
>    (non_lvalue @0)))
>
> +/* x + (x & 1) -> (x + 1) & ~1 */
> +(simplify
> + (plus:c @0 (bit_and@2 @0 integer_onep@1))
> + (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
> +  (bit_and (plus @0 @1) (bit_not @1))))
> +
>  (simplify
>   (abs (negate @0))
>   (abs @0))
> diff --git gcc/testsuite/gcc.dg/20150120-1.c gcc/testsuite/gcc.dg/20150120-1.c
> new file mode 100644
> index 0000000..18906c4
> --- /dev/null
> +++ gcc/testsuite/gcc.dg/20150120-1.c
> @@ -0,0 +1,51 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-original" } */
> +
> +/* x + (x & 1) -> (x + 1) & ~1 */
> +int
> +fn1 (int x)
> +{
> +       return x + (x & 1);
> +}
> +int
> +fn2 (int x)
> +{
> +       return (x & 1) + x;
> +}
> +int
> +fn3 (int x)
> +{
> +       return x + (1 & x);
> +}
> +int
> +fn4 (int x)
> +{
> +       return (1 & x) + x;
> +}
> +unsigned int
> +fn5 (unsigned int x)
> +{
> +       return x + (x & 1);
> +}
> +unsigned int
> +fn6 (unsigned int x)
> +{
> +       return (x & 1) + x;
> +}
> +unsigned int
> +fn7 (unsigned int x)
> +{
> +       return x + (x % 2);
> +}
> +unsigned int
> +fn8 (unsigned int x)
> +{
> +       return (x % 2) + x;
> +}
> +unsigned int
> +fn9 (unsigned int x)
> +{
> +       return (1LL & x) + x;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "x \\+ 1" 9 "original" } } */
> --
> 2.1.3
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] match.pd: Add x + ((-x) & m) -> (x + m) & ~m pattern
  2015-01-21 11:17           ` [PATCH 4/4] match.pd: Add x + ((-x) & m) -> (x + m) & ~m pattern Rasmus Villemoes
@ 2015-04-30  9:42             ` Richard Biener
  2015-04-30 11:56               ` Marc Glisse
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2015-04-30  9:42 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: GCC Patches, Andrew Pinski

On Wed, Jan 21, 2015 at 11:49 AM, Rasmus Villemoes
<rv@rasmusvillemoes.dk> wrote:
> Generalizing the x+(x&1) pattern, one can round up x to a multiple of
> a 2^k by adding the negative of x modulo 2^k. But it is fewer
> instructions, and presumably requires fewer registers, to do the more
> common (x+m)&~m where m=2^k-1.
>
> Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
> ---
>  gcc/match.pd                      |  9 ++++++
>  gcc/testsuite/gcc.dg/20150120-4.c | 59 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 68 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/20150120-4.c
>
> diff --git gcc/match.pd gcc/match.pd
> index 47865f1..93c2298 100644
> --- gcc/match.pd
> +++ gcc/match.pd
> @@ -273,6 +273,15 @@ along with GCC; see the file COPYING3.  If not see
>   (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
>    (bit_ior @0 (bit_not @1))))
>
> +/* x + ((-x) & m) -> (x + m) & ~m when m == 2^k-1.  */
> +(simplify
> + (plus:c @0 (bit_and@2 (negate @0) CONSTANT_CLASS_P@1))

I think you want to restrict this to INTEGER_CST@1

> + (with { tree cst = fold_binary (PLUS_EXPR, TREE_TYPE (@1),
> +                                @1, build_one_cst (TREE_TYPE (@1))); }

We shouldn't dispatch to fold_binary in patterns.  int_const_binop would
be the appropriate function to use - but what happens for @1 == INT_MAX
where @1 + 1 overflows?  Similar, is this also valid for negative @1
and thus signed mask types?  IMHO we should check whether @1
is equal to wi::mask (TYPE_PRECISION (TREE_TYPE (@1)) - wi::clz (@1),
false, TYPE_PRECISION (TREE_TYPE (@1)).

As with the other patch a ChangeLog entry is missing as well as stating
how you tested the patch.

Thanks,
Richard.

> +  (if ((TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
> +       && cst && integer_pow2p (cst))
> +   (bit_and (plus @0 @1) (bit_not @1)))))
> +
>  (simplify
>   (abs (negate @0))
>   (abs @0))
> diff --git gcc/testsuite/gcc.dg/20150120-4.c gcc/testsuite/gcc.dg/20150120-4.c
> new file mode 100644
> index 0000000..c3552bf
> --- /dev/null
> +++ gcc/testsuite/gcc.dg/20150120-4.c
> @@ -0,0 +1,59 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-original" } */
> +
> +/* x + ((-x) & m) -> (x + m) & ~m for m one less than a pow2.  */
> +int
> +fn1 (int x)
> +{
> +       return x + ((-x) & 7);
> +}
> +int
> +fn2 (int x)
> +{
> +       return ((-x) & 7) + x;
> +}
> +unsigned int
> +fn3 (unsigned int x)
> +{
> +       return x + ((-x) & 7);
> +}
> +unsigned int
> +fn4 (unsigned int x)
> +{
> +       return ((-x) & 7) + x;
> +}
> +unsigned int
> +fn5 (unsigned int x)
> +{
> +       return x + ((-x) % 8);
> +}
> +unsigned int
> +fn6 (unsigned int x)
> +{
> +       return ((-x) % 8) + x;
> +}
> +int
> +fn7 (int x)
> +{
> +       return x + ((-x) & 9);
> +}
> +int
> +fn8 (int x)
> +{
> +       return ((-x) & 9) + x;
> +}
> +unsigned int
> +fn9 (unsigned int x)
> +{
> +       return x + ((-x) & ~0U);
> +}
> +unsigned int
> +fn10 (unsigned int x)
> +{
> +       return ((-x) & ~0U) + x;
> +}
> +
> +
> +/* { dg-final { scan-tree-dump-times "x \\+ 7" 6 "original" } } */
> +/* { dg-final { scan-tree-dump-times "-x & 9" 2 "original" } } */
> +/* { dg-final { scan-tree-dump-times "return 0" 2 "original" } } */
> --
> 2.1.3
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] match.pd: Add x + ((-x) & m) -> (x + m) & ~m pattern
  2015-04-30  9:42             ` Richard Biener
@ 2015-04-30 11:56               ` Marc Glisse
  2015-04-30 12:25                 ` Richard Biener
  0 siblings, 1 reply; 24+ messages in thread
From: Marc Glisse @ 2015-04-30 11:56 UTC (permalink / raw)
  To: Richard Biener; +Cc: Rasmus Villemoes, GCC Patches, Andrew Pinski

On Thu, 30 Apr 2015, Richard Biener wrote:

> On Wed, Jan 21, 2015 at 11:49 AM, Rasmus Villemoes
> <rv@rasmusvillemoes.dk> wrote:
>> Generalizing the x+(x&1) pattern, one can round up x to a multiple of
>> a 2^k by adding the negative of x modulo 2^k. But it is fewer
>> instructions, and presumably requires fewer registers, to do the more
>> common (x+m)&~m where m=2^k-1.
>>
>> Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
>> ---
>>  gcc/match.pd                      |  9 ++++++
>>  gcc/testsuite/gcc.dg/20150120-4.c | 59 +++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 68 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.dg/20150120-4.c
>>
>> diff --git gcc/match.pd gcc/match.pd
>> index 47865f1..93c2298 100644
>> --- gcc/match.pd
>> +++ gcc/match.pd
>> @@ -273,6 +273,15 @@ along with GCC; see the file COPYING3.  If not see
>>   (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
>>    (bit_ior @0 (bit_not @1))))
>>
>> +/* x + ((-x) & m) -> (x + m) & ~m when m == 2^k-1.  */
>> +(simplify
>> + (plus:c @0 (bit_and@2 (negate @0) CONSTANT_CLASS_P@1))
>
> I think you want to restrict this to INTEGER_CST@1

Is this only to make the following test easier (a good enough reason for 
me) or is there some fundamental reason why this transformation would be 
wrong for vectors?

>> + (with { tree cst = fold_binary (PLUS_EXPR, TREE_TYPE (@1),
>> +                                @1, build_one_cst (TREE_TYPE (@1))); }
>
> We shouldn't dispatch to fold_binary in patterns.  int_const_binop would
> be the appropriate function to use - but what happens for @1 == INT_MAX
> where @1 + 1 overflows?  Similar, is this also valid for negative @1
> and thus signed mask types?  IMHO we should check whether @1
> is equal to wi::mask (TYPE_PRECISION (TREE_TYPE (@1)) - wi::clz (@1),
> false, TYPE_PRECISION (TREE_TYPE (@1)).
>
> As with the other patch a ChangeLog entry is missing as well as stating
> how you tested the patch.
>
> Thanks,
> Richard.
>
>> +  (if ((TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
>> +       && cst && integer_pow2p (cst))
>> +   (bit_and (plus @0 @1) (bit_not @1)))))

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] match.pd: Add x + ((-x) & m) -> (x + m) & ~m pattern
  2015-04-30 11:56               ` Marc Glisse
@ 2015-04-30 12:25                 ` Richard Biener
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Biener @ 2015-04-30 12:25 UTC (permalink / raw)
  To: GCC Patches; +Cc: Rasmus Villemoes, Andrew Pinski

On Thu, Apr 30, 2015 at 1:44 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Thu, 30 Apr 2015, Richard Biener wrote:
>
>> On Wed, Jan 21, 2015 at 11:49 AM, Rasmus Villemoes
>> <rv@rasmusvillemoes.dk> wrote:
>>>
>>> Generalizing the x+(x&1) pattern, one can round up x to a multiple of
>>> a 2^k by adding the negative of x modulo 2^k. But it is fewer
>>> instructions, and presumably requires fewer registers, to do the more
>>> common (x+m)&~m where m=2^k-1.
>>>
>>> Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
>>> ---
>>>  gcc/match.pd                      |  9 ++++++
>>>  gcc/testsuite/gcc.dg/20150120-4.c | 59
>>> +++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 68 insertions(+)
>>>  create mode 100644 gcc/testsuite/gcc.dg/20150120-4.c
>>>
>>> diff --git gcc/match.pd gcc/match.pd
>>> index 47865f1..93c2298 100644
>>> --- gcc/match.pd
>>> +++ gcc/match.pd
>>> @@ -273,6 +273,15 @@ along with GCC; see the file COPYING3.  If not see
>>>   (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
>>>    (bit_ior @0 (bit_not @1))))
>>>
>>> +/* x + ((-x) & m) -> (x + m) & ~m when m == 2^k-1.  */
>>> +(simplify
>>> + (plus:c @0 (bit_and@2 (negate @0) CONSTANT_CLASS_P@1))
>>
>>
>> I think you want to restrict this to INTEGER_CST@1
>
>
> Is this only to make the following test easier (a good enough reason for me)
> or is there some fundamental reason why this transformation would be wrong
> for vectors?

Good question - I suppose it also works for vectors (well, the predicates
don't).  for non-ingegers or complex ints we shouldn't arrive here as
we can't have bit_and for them.  for pointers we can't have plus on them.

So yes, it makes the following tests easier.  A TODO comment for vectors
might be appropriate (we'd simply need a predicate that can test for
all emlements being 2^k-1).

Richard.

>
>>> + (with { tree cst = fold_binary (PLUS_EXPR, TREE_TYPE (@1),
>>> +                                @1, build_one_cst (TREE_TYPE (@1))); }
>>
>>
>> We shouldn't dispatch to fold_binary in patterns.  int_const_binop would
>> be the appropriate function to use - but what happens for @1 == INT_MAX
>> where @1 + 1 overflows?  Similar, is this also valid for negative @1
>> and thus signed mask types?  IMHO we should check whether @1
>> is equal to wi::mask (TYPE_PRECISION (TREE_TYPE (@1)) - wi::clz (@1),
>> false, TYPE_PRECISION (TREE_TYPE (@1)).
>>
>> As with the other patch a ChangeLog entry is missing as well as stating
>> how you tested the patch.
>>
>> Thanks,
>> Richard.
>>
>>> +  (if ((TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
>>> +       && cst && integer_pow2p (cst))
>>> +   (bit_and (plus @0 @1) (bit_not @1)))))
>
>
> --
> Marc Glisse

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] match.pd: Add x + (x & 1) -> (x + 1) & ~1 pattern
  2015-01-21 10:50           ` [PATCH 1/4] match.pd: Add x + (x & 1) -> (x + 1) & ~1 pattern Rasmus Villemoes
  2015-04-30  9:34             ` Richard Biener
@ 2015-05-01 18:26             ` Jeff Law
  1 sibling, 0 replies; 24+ messages in thread
From: Jeff Law @ 2015-05-01 18:26 UTC (permalink / raw)
  To: Rasmus Villemoes, GCC Patches; +Cc: Richard Biener, Andrew Pinski

On 01/21/2015 03:49 AM, Rasmus Villemoes wrote:
> gcc.dg/20150120-1.c: New test
>
> Rounding an integer to the next even integer is sometimes written x +=
> x & 1. The equivalent x = (x+1)&~1 usually uses one less register, and
> in practical cases only the new value of x will be used (making it
> unlikely that the subexpression x&1 has any uses).

I bootstrapped and regression tested this on x86_64-linux-gnu, created 
the appropriate ChangeLogs and installed the patch on the trunk.

Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] match.pd: Add x & ~(x & y) -> x & ~y pattern
  2015-01-21 10:55           ` [PATCH 2/4] match.pd: Add x & ~(x & y) -> x & ~y pattern Rasmus Villemoes
@ 2015-05-01 18:29             ` Jeff Law
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff Law @ 2015-05-01 18:29 UTC (permalink / raw)
  To: Rasmus Villemoes, GCC Patches; +Cc: Richard Biener, Andrew Pinski

[-- Attachment #1: Type: text/plain, Size: 972 bytes --]

On 01/21/2015 03:49 AM, Rasmus Villemoes wrote:
> gcc.dg/20150120-2.c: New test
>
> Clearing a certain subset of bits, for example to round down x to a
> multiple of a power of 2, is sometimes written x & ~(x & y), where y
> may or may not be a constant. It is shorter to use x & ~y,
> particularly when y is a constant. gcc already does this when it is
> spelled x - (x & y) or x ^ (x & y).
>
> Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
> ---
>   gcc/match.pd                      |  6 ++++++
>   gcc/testsuite/gcc.dg/20150120-2.c | 32 ++++++++++++++++++++++++++++++++
>   2 files changed, 38 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.dg/20150120-2.c
I combined this pattern with its bit-ior 'dual' into a single pattern. 
Then bootstrapped, regression tested, wrote the ChangeLog and installed 
the result onto the trunk.

I'm attaching the final patch which includes the changes in #1, #2 and 
#3 of this series for archival purposes.

jeff


[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 4487 bytes --]

commit 0a66cfe379a8217c69705535303626d12aca0c5f
Author: law <law@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Fri May 1 18:25:12 2015 +0000

    	* match.pd: New simplification patterns.
    	(x + (x & 1))  -> ((x + 1) & ~1)
    	(x & ~(x & y)) -> ((x & ~y))
    	(x | ~(x | y)) -> ((x | ~y))
    
    	* gcc.dg/20150120-1.c: New test.
    	* gcc.dg/20150120-2.c: New test.
    	* gcc.dg/20150120-3.c: New test.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@222697 138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 65816c7..e006b26 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2015-05-01  Rasmus Villemoes  <rv@rasmusvillemoes.dk>
+
+	* match.pd: New simplification patterns.
+	(x + (x & 1))  -> ((x + 1) & ~1)
+	(x & ~(x & y)) -> ((x & ~y))
+	(x | ~(x | y)) -> ((x | ~y))
+
 2015-05-01  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
 
 	* target.def (attribute_table): Mention that struct attribute_spec
diff --git a/gcc/match.pd b/gcc/match.pd
index fc374de..87ecaf1 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -255,6 +255,20 @@ along with GCC; see the file COPYING3.  If not see
   (bitop @0 @0)
   (non_lvalue @0)))
 
+/* x + (x & 1) -> (x + 1) & ~1 */
+(simplify
+ (plus:c @0 (bit_and@2 @0 integer_onep@1))
+ (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
+  (bit_and (plus @0 @1) (bit_not @1))))
+
+/* x & ~(x & y) -> x & ~y */
+/* x | ~(x | y) -> x | ~y  */
+(for bitop (bit_and bit_ior)
+  (simplify
+    (bitop:c @0 (bit_not (bitop:c@2 @0 @1)))
+      (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
+	(bitop @0 (bit_not @1)))))
+
 (simplify
  (abs (negate @0))
  (abs @0))
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index bfdde3b..2aedc46 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,9 @@
+2015-05-01  Rasmus Villemoes  <rv@rasmusvillemoes.dk>
+
+	* gcc.dg/20150120-1.c: New test.
+	* gcc.dg/20150120-2.c: New test.
+	* gcc.dg/20150120-3.c: New test.
+
 2015-05-01  David Edelsohn  <dje.gcc@gmail.com>
 
 	* gcc.dg/debug/pr65771.c: Add "dg-add-options tls".
diff --git a/gcc/testsuite/gcc.dg/20150120-1.c b/gcc/testsuite/gcc.dg/20150120-1.c
new file mode 100644
index 0000000..18906c4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/20150120-1.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-original" } */
+
+/* x + (x & 1) -> (x + 1) & ~1 */
+int
+fn1 (int x)
+{
+	return x + (x & 1);
+}
+int
+fn2 (int x)
+{
+	return (x & 1) + x;
+}
+int
+fn3 (int x)
+{
+	return x + (1 & x);
+}
+int
+fn4 (int x)
+{
+	return (1 & x) + x;
+}
+unsigned int
+fn5 (unsigned int x)
+{
+	return x + (x & 1);
+}
+unsigned int
+fn6 (unsigned int x)
+{
+	return (x & 1) + x;
+}
+unsigned int
+fn7 (unsigned int x)
+{
+	return x + (x % 2);
+}
+unsigned int
+fn8 (unsigned int x)
+{
+	return (x % 2) + x;
+}
+unsigned int
+fn9 (unsigned int x)
+{
+	return (1LL & x) + x;
+}
+
+/* { dg-final { scan-tree-dump-times "x \\+ 1" 9 "original" } } */
diff --git a/gcc/testsuite/gcc.dg/20150120-2.c b/gcc/testsuite/gcc.dg/20150120-2.c
new file mode 100644
index 0000000..976b654
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/20150120-2.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-original" } */
+
+/* x & ~(x & y) -> x & ~y */
+int fn1 (int x, int y)
+{
+	return x & ~(x & y);
+}
+int fn2 (int x, int y)
+{
+	return ~(x & y) & x;
+}
+int fn3 (int x, int y)
+{
+	return x & ~(y & x);
+}
+int fn4 (int x, int y)
+{
+	return ~(y & x) & x;
+}
+int fn5 (int z)
+{
+	return z & ~(z & 3);
+}
+int fn6 (int z)
+{
+	return ~(z & 3) & z;
+}
+
+
+/* { dg-final { scan-tree-dump-times "~y & x" 4 "original" } } */
+/* { dg-final { scan-tree-dump-times "z & -4" 2 "original" } } */
diff --git a/gcc/testsuite/gcc.dg/20150120-3.c b/gcc/testsuite/gcc.dg/20150120-3.c
new file mode 100644
index 0000000..322556f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/20150120-3.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-original" } */
+
+/* x | ~(x | y) -> x | ~y */
+int fn1 (int x, int y)
+{
+	return x | ~(x | y);
+}
+int fn2 (int x, int y)
+{
+	return ~(x | y) | x;
+}
+int fn3 (int x, int y)
+{
+	return x | ~(y | x);
+}
+int fn4 (int x, int y)
+{
+	return ~(y | x) | x;
+}
+int fn5 (int z)
+{
+	return z | ~(z | 3);
+}
+int fn6 (int z)
+{
+	return ~(z | 3) | z;
+}
+
+
+/* { dg-final { scan-tree-dump-times "~y \\| x" 4 "original" } } */
+/* { dg-final { scan-tree-dump-times "z \\| -4" 2 "original" } } */

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] match.pd: Add x & ~(x & y) -> x & ~y pattern
  2015-01-22  9:24 ` Richard Biener
@ 2015-01-22 14:24   ` Rasmus Villemoes
  0 siblings, 0 replies; 24+ messages in thread
From: Rasmus Villemoes @ 2015-01-22 14:24 UTC (permalink / raw)
  To: Richard Biener; +Cc: Marc Glisse, GCC Patches

On Thu, Jan 22 2015, Richard Biener <richard.guenther@gmail.com> wrote:

> On Wed, Jan 21, 2015 at 9:00 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
>> Hello,
>>
>> (sorry for the broken thread, for some reason I haven't received any email
>> from gcc since about 10am, I'll investigate later)
>>
>> +/* x & ~(x & y) -> x & ~y */
>> +(simplify
>> + (bit_and:c @0 (bit_not (bit_and:c@2 @0 @1)))
>> + (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
>> +  (bit_and @0 (bit_not @1))))
>>
>> Wouldn't it make more sense to put @2 on bit_not? If bit_and is used
>> multiple times, the transformation is neutral so it should be done as a
>> canonicalization. On the other hand, if bit_not is used multiple times, the
>> transformation adds an extra bit_not (which might be free when there is an
>> andn insn). So I believe the 2 main options are:
>> - move @2 on the bit_not
>> - don't test has_single_use at all
>
> I tend to favor not testing has_single_use at all.

Does this apply only to this pattern, to this pattern and its dual
(3/4), or to all four?

Rasmus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] match.pd: Add x & ~(x & y) -> x & ~y pattern
  2015-01-21 21:24 [PATCH 2/4] match.pd: Add x & ~(x & y) -> x & ~y pattern Marc Glisse
@ 2015-01-22  9:24 ` Richard Biener
  2015-01-22 14:24   ` Rasmus Villemoes
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2015-01-22  9:24 UTC (permalink / raw)
  To: Marc Glisse; +Cc: GCC Patches, rv

On Wed, Jan 21, 2015 at 9:00 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> Hello,
>
> (sorry for the broken thread, for some reason I haven't received any email
> from gcc since about 10am, I'll investigate later)
>
> +/* x & ~(x & y) -> x & ~y */
> +(simplify
> + (bit_and:c @0 (bit_not (bit_and:c@2 @0 @1)))
> + (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
> +  (bit_and @0 (bit_not @1))))
>
> Wouldn't it make more sense to put @2 on bit_not? If bit_and is used
> multiple times, the transformation is neutral so it should be done as a
> canonicalization. On the other hand, if bit_not is used multiple times, the
> transformation adds an extra bit_not (which might be free when there is an
> andn insn). So I believe the 2 main options are:
> - move @2 on the bit_not
> - don't test has_single_use at all

I tend to favor not testing has_single_use at all.

Richard.

> --
> Marc Glisse

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] match.pd: Add x & ~(x & y) -> x & ~y pattern
@ 2015-01-21 21:24 Marc Glisse
  2015-01-22  9:24 ` Richard Biener
  0 siblings, 1 reply; 24+ messages in thread
From: Marc Glisse @ 2015-01-21 21:24 UTC (permalink / raw)
  To: gcc-patches; +Cc: rv

Hello,

(sorry for the broken thread, for some reason I haven't received any email 
from gcc since about 10am, I'll investigate later)

+/* x & ~(x & y) -> x & ~y */
+(simplify
+ (bit_and:c @0 (bit_not (bit_and:c@2 @0 @1)))
+ (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2))
+  (bit_and @0 (bit_not @1))))

Wouldn't it make more sense to put @2 on bit_not? If bit_and is used 
multiple times, the transformation is neutral so it should be done as a 
canonicalization. On the other hand, if bit_not is used multiple times, 
the transformation adds an extra bit_not (which might be free when there 
is an andn insn). So I believe the 2 main options are:
- move @2 on the bit_not
- don't test has_single_use at all

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2015-05-01 18:29 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-13 22:47 RFC: Two minor optimization patterns Rasmus Villemoes
2015-01-13 22:56 ` Andrew Pinski
2015-01-14  9:52   ` Richard Biener
2015-01-14 12:45     ` Rasmus Villemoes
2015-01-14 14:01       ` Richard Biener
2015-01-21 10:50         ` [PATCH 0/4] A few " Rasmus Villemoes
2015-01-21 10:50           ` [PATCH 1/4] match.pd: Add x + (x & 1) -> (x + 1) & ~1 pattern Rasmus Villemoes
2015-04-30  9:34             ` Richard Biener
2015-05-01 18:26             ` Jeff Law
2015-01-21 10:55           ` [PATCH 2/4] match.pd: Add x & ~(x & y) -> x & ~y pattern Rasmus Villemoes
2015-05-01 18:29             ` Jeff Law
2015-01-21 10:58           ` [PATCH 3/4] match.pd: Add x | ~(x | y) -> x | " Rasmus Villemoes
2015-01-21 11:32             ` Marek Polacek
2015-01-21 11:17           ` [PATCH 4/4] match.pd: Add x + ((-x) & m) -> (x + m) & ~m pattern Rasmus Villemoes
2015-04-30  9:42             ` Richard Biener
2015-04-30 11:56               ` Marc Glisse
2015-04-30 12:25                 ` Richard Biener
2015-01-14 13:14 ` RFC: Two minor optimization patterns Marc Glisse
2015-01-14 13:58   ` Richard Biener
2015-01-14 14:31     ` Marc Glisse
2015-01-14 14:49       ` Richard Biener
2015-01-21 21:24 [PATCH 2/4] match.pd: Add x & ~(x & y) -> x & ~y pattern Marc Glisse
2015-01-22  9:24 ` Richard Biener
2015-01-22 14:24   ` Rasmus Villemoes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).