[PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2
@ 2015-07-24  8:21 Kyrill Tkachov
  2015-07-24  8:36 ` Jakub Jelinek
  2015-07-25  2:37 ` Segher Boessenkool
  0 siblings, 2 replies; 17+ messages in thread
From: Kyrill Tkachov @ 2015-07-24  8:21 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener

[-- Attachment #1: Type: text/plain, Size: 1225 bytes --]

Hi all,

This transformation folds (X % C) == N into
X & ((1 << (size - 1)) | (C - 1))) == N
for constants C and N where N is positive and C is a power of 2.

The idea is similar to the existing X % C -> X & (C - 1) transformation
for unsigned values but this time when we have the comparison we use the
C - 1 mask (all 1s for power of 2 N) orred with the sign bit.

At runtime, if X is positive then X & ((1 << (size - 1)) | (C - 1)))
calculates X % C, which is compared against the positive N.

If X is negative then X & ((1 << (size - 1)) | (C - 1))) doesn't calculate
X % C but since we also catch the set sign bit, it will never compare equal
to N (which is positive), so we still get the right comparison result.

I don't have much experience with writing match.pd patterns, so I appreciate
any feedback on how to write this properly.

Bootstrapped and tested on arm, aarch64, x86_64.

Thanks,
Kyrill

2015-07-24  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * match.pd ((X % C) == N -> (X & ((1 << (size - 1)) | (C - 1))) == N):
     Transform when N is positive and C is a power of 2.

2015-07-24  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.dg/fold-mod-cmp-1.c: New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: matchpd-mod-2-cmp.patch --]
[-- Type: text/x-patch; name=matchpd-mod-2-cmp.patch, Size: 2197 bytes --]

commit af785bad5cb32ef2bc35503ebbe65414b67ef8b1
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Tue Jul 14 18:20:28 2015 +0100

    [match.pd] optimize (X & C) == N when C is power of 2

diff --git a/gcc/match.pd b/gcc/match.pd
index 9cf0278..02ad708 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -266,7 +266,9 @@ along with GCC; see the file COPYING3.  If not see
 /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,
    i.e. "X % C" into "X & (C - 1)", if X and C are positive.
    Also optimize A % (C << N)  where C is a power of 2,
-   to A & ((C << N) - 1).  */
+   to A & ((C << N) - 1).
+   Optimize (X % C) == N into (X & ((1 << (size - 1)) | (C - 1))) == N
+   when C is signed, a power of 2 and N is positive.  */
 (match (power_of_two_cand @1)
  INTEGER_CST@1)
 (match (power_of_two_cand @1)
@@ -278,7 +280,17 @@ along with GCC; see the file COPYING3.  If not see
 	|| tree_expr_nonnegative_p (@0))
 	&& tree_nop_conversion_p (type, TREE_TYPE (@3))
 	&& integer_pow2p (@2) && tree_int_cst_sgn (@2) > 0)
-   (bit_and @0 (convert (minus @1 { build_int_cst (TREE_TYPE (@1), 1); }))))))
+   (bit_and @0 (convert (minus @1 { build_int_cst (TREE_TYPE (@1), 1); })))))
+
+ (simplify
+  (eq (mod @0 (INTEGER_CST@1)) INTEGER_CST@2)
+  (if (integer_pow2p (@1) && tree_expr_nonnegative_p (@2)
+       && tree_to_uhwi (@2) < tree_to_uhwi (@1))
+   (eq (bit_and @0
+         (bit_ior (minus @1 { build_int_cst (TREE_TYPE (@1), 1); })
+         { build_int_cst (type, 1 << (tree_to_uhwi (TYPE_SIZE (type)) - 1)); })
+        )
+    @2))))
 
 /* X % Y is smaller than Y.  */
 (for cmp (lt ge)
diff --git a/gcc/testsuite/gcc.dg/fold-mod-cmp-1.c b/gcc/testsuite/gcc.dg/fold-mod-cmp-1.c
new file mode 100644
index 0000000..9dcf313
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-mod-cmp-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-original" } */
+
+int
+modcmp (int a)
+{
+  return (a % 32) == 6;
+}
+
+/* The above should be simplified to (a & -2147483617) == 6.  */
+/* { dg-final { scan-tree-dump "& -2147483617" "original" } } */
+/* { dg-final { scan-tree-dump "== 6" "original" } } */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2
  2015-07-24  8:21 [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2 Kyrill Tkachov
@ 2015-07-24  8:36 ` Jakub Jelinek
  2015-07-24  8:54   ` Kyrill Tkachov
  2015-07-25  2:37 ` Segher Boessenkool
  1 sibling, 1 reply; 17+ messages in thread
From: Jakub Jelinek @ 2015-07-24  8:36 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches, Richard Biener

On Fri, Jul 24, 2015 at 09:09:39AM +0100, Kyrill Tkachov wrote:
> This transformation folds (X % C) == N into
> X & ((1 << (size - 1)) | (C - 1))) == N
> for constants C and N where N is positive and C is a power of 2.
> 
> The idea is similar to the existing X % C -> X & (C - 1) transformation
> for unsigned values but this time when we have the comparison we use the
> C - 1 mask (all 1s for power of 2 N) orred with the sign bit.
> 
> At runtime, if X is positive then X & ((1 << (size - 1)) | (C - 1)))
> calculates X % C, which is compared against the positive N.
> 
> If X is negative then X & ((1 << (size - 1)) | (C - 1))) doesn't calculate
> X % C but since we also catch the set sign bit, it will never compare equal
> to N (which is positive), so we still get the right comparison result.
> 
> I don't have much experience with writing match.pd patterns, so I appreciate
> any feedback on how to write this properly.
> 
> Bootstrapped and tested on arm, aarch64, x86_64.

I think this is another case that, if at all, should be done during or right
before RTL expansion and should test rtx costs.
Because, ((1 << (size - 1)) | (C - 1))) constant might be very expensive,
while C cheap, and % might not be that expensive compared to & to offset
that.

E.g. on x86_64, for 32-bit and smaller X the constant is cheap as any other
(well, if we don't take instruction size into account), but 64-bit constant
is at least 3 times more expensive (movabsq is needed with its latency).
In the x86_64 case supposedly the divmod is still more expensive, but there
are many other targets.  On sparc64 for 64-bit constants, you might need
many instructions to create the constants, etc.

	Jakub

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2
  2015-07-24  8:36 ` Jakub Jelinek
@ 2015-07-24  8:54   ` Kyrill Tkachov
  2015-07-24  9:09     ` Richard Biener
  2015-07-24  9:16     ` Jakub Jelinek
  0 siblings, 2 replies; 17+ messages in thread
From: Kyrill Tkachov @ 2015-07-24  8:54 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches, Richard Biener


On 24/07/15 09:23, Jakub Jelinek wrote:
> On Fri, Jul 24, 2015 at 09:09:39AM +0100, Kyrill Tkachov wrote:
>> This transformation folds (X % C) == N into
>> X & ((1 << (size - 1)) | (C - 1))) == N
>> for constants C and N where N is positive and C is a power of 2.
>>
>> The idea is similar to the existing X % C -> X & (C - 1) transformation
>> for unsigned values but this time when we have the comparison we use the
>> C - 1 mask (all 1s for power of 2 N) orred with the sign bit.
>>
>> At runtime, if X is positive then X & ((1 << (size - 1)) | (C - 1)))
>> calculates X % C, which is compared against the positive N.
>>
>> If X is negative then X & ((1 << (size - 1)) | (C - 1))) doesn't calculate
>> X % C but since we also catch the set sign bit, it will never compare equal
>> to N (which is positive), so we still get the right comparison result.
>>
>> I don't have much experience with writing match.pd patterns, so I appreciate
>> any feedback on how to write this properly.
>>
>> Bootstrapped and tested on arm, aarch64, x86_64.
> I think this is another case that, if at all, should be done during or right
> before RTL expansion and should test rtx costs.

Hmm, where would that be?
expmed.c has a lot of code to expand div or mod by a power of 2.
In what form would a compare with a mod by power of 2 arrive to
the expansion phase?

> Because, ((1 << (size - 1)) | (C - 1))) constant might be very expensive,
> while C cheap, and % might not be that expensive compared to & to offset
> that.
>
> E.g. on x86_64, for 32-bit and smaller X the constant is cheap as any other
> (well, if we don't take instruction size into account), but 64-bit constant
> is at least 3 times more expensive (movabsq is needed with its latency).
> In the x86_64 case supposedly the divmod is still more expensive, but there
> are many other targets.  On sparc64 for 64-bit constants, you might need
> many instructions to create the constants, etc.

Ok, I am not familiar with sparc64. The constant is just a 1
in the sign bit orred with a continuous string of ones.
That's usually cheap on aarch64 but may not be so on other targets.

Thanks,
Kyrill

>
> 	Jakub
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2
  2015-07-24  9:09     ` Richard Biener
@ 2015-07-24  9:09       ` Kyrill Tkachov
  2015-07-24  9:22         ` Ramana Radhakrishnan
  0 siblings, 1 reply; 17+ messages in thread
From: Kyrill Tkachov @ 2015-07-24  9:09 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, GCC Patches


On 24/07/15 10:00, Richard Biener wrote:
> On Fri, 24 Jul 2015, Kyrill Tkachov wrote:
>
>> On 24/07/15 09:23, Jakub Jelinek wrote:
>>> On Fri, Jul 24, 2015 at 09:09:39AM +0100, Kyrill Tkachov wrote:
>>>> This transformation folds (X % C) == N into
>>>> X & ((1 << (size - 1)) | (C - 1))) == N
>>>> for constants C and N where N is positive and C is a power of 2.
>>>>
>>>> The idea is similar to the existing X % C -> X & (C - 1) transformation
>>>> for unsigned values but this time when we have the comparison we use the
>>>> C - 1 mask (all 1s for power of 2 N) orred with the sign bit.
>>>>
>>>> At runtime, if X is positive then X & ((1 << (size - 1)) | (C - 1)))
>>>> calculates X % C, which is compared against the positive N.
>>>>
>>>> If X is negative then X & ((1 << (size - 1)) | (C - 1))) doesn't calculate
>>>> X % C but since we also catch the set sign bit, it will never compare
>>>> equal
>>>> to N (which is positive), so we still get the right comparison result.
>>>>
>>>> I don't have much experience with writing match.pd patterns, so I
>>>> appreciate
>>>> any feedback on how to write this properly.
>>>>
>>>> Bootstrapped and tested on arm, aarch64, x86_64.
>>> I think this is another case that, if at all, should be done during or right
>>> before RTL expansion and should test rtx costs.
>> Hmm, where would that be?
>> expmed.c has a lot of code to expand div or mod by a power of 2.
>> In what form would a compare with a mod by power of 2 arrive to
>> the expansion phase?
> It arrives as SSA_NAME == N and you can use get_gimple_for_ssa_name
> or get_def_for_expr to get at the defining stmt if that is possible
> (it's still unexpanded and thus TERed) and expand a different
> expression.

Thanks, so it's where we expand compares... (what's TERed?)

>
> But why can't simplify-rtx via combine handle this - it should have
> access to target costs.

That would require for the target to expand to an SMOD rtx
which, if the target has no direct instruction for would be somewhat
awkward.

Thanks,
Kyrill

>
>>> Because, ((1 << (size - 1)) | (C - 1))) constant might be very expensive,
>>> while C cheap, and % might not be that expensive compared to & to offset
>>> that.
>>>
>>> E.g. on x86_64, for 32-bit and smaller X the constant is cheap as any other
>>> (well, if we don't take instruction size into account), but 64-bit constant
>>> is at least 3 times more expensive (movabsq is needed with its latency).
>>> In the x86_64 case supposedly the divmod is still more expensive, but there
>>> are many other targets.  On sparc64 for 64-bit constants, you might need
>>> many instructions to create the constants, etc.
>> Ok, I am not familiar with sparc64. The constant is just a 1
>> in the sign bit orred with a continuous string of ones.
>> That's usually cheap on aarch64 but may not be so on other targets.
> On GIMPLE we might still want to canonicalize to one form.  I'd
> canonicalize to the form with "smaller" constants if the number
> of operations is the same.
>
> Richard.
>
>> Thanks,
>> Kyrill
>>
>>> 	Jakub
>>>
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2
  2015-07-24  8:54   ` Kyrill Tkachov
@ 2015-07-24  9:09     ` Richard Biener
  2015-07-24  9:09       ` Kyrill Tkachov
  2015-07-24  9:16     ` Jakub Jelinek
  1 sibling, 1 reply; 17+ messages in thread
From: Richard Biener @ 2015-07-24  9:09 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Jakub Jelinek, GCC Patches

On Fri, 24 Jul 2015, Kyrill Tkachov wrote:

> 
> On 24/07/15 09:23, Jakub Jelinek wrote:
> > On Fri, Jul 24, 2015 at 09:09:39AM +0100, Kyrill Tkachov wrote:
> > > This transformation folds (X % C) == N into
> > > X & ((1 << (size - 1)) | (C - 1))) == N
> > > for constants C and N where N is positive and C is a power of 2.
> > > 
> > > The idea is similar to the existing X % C -> X & (C - 1) transformation
> > > for unsigned values but this time when we have the comparison we use the
> > > C - 1 mask (all 1s for power of 2 N) orred with the sign bit.
> > > 
> > > At runtime, if X is positive then X & ((1 << (size - 1)) | (C - 1)))
> > > calculates X % C, which is compared against the positive N.
> > > 
> > > If X is negative then X & ((1 << (size - 1)) | (C - 1))) doesn't calculate
> > > X % C but since we also catch the set sign bit, it will never compare
> > > equal
> > > to N (which is positive), so we still get the right comparison result.
> > > 
> > > I don't have much experience with writing match.pd patterns, so I
> > > appreciate
> > > any feedback on how to write this properly.
> > > 
> > > Bootstrapped and tested on arm, aarch64, x86_64.
> > I think this is another case that, if at all, should be done during or right
> > before RTL expansion and should test rtx costs.
> 
> Hmm, where would that be?
> expmed.c has a lot of code to expand div or mod by a power of 2.
> In what form would a compare with a mod by power of 2 arrive to
> the expansion phase?

It arrives as SSA_NAME == N and you can use get_gimple_for_ssa_name
or get_def_for_expr to get at the defining stmt if that is possible
(it's still unexpanded and thus TERed) and expand a different
expression.

But why can't simplify-rtx via combine handle this - it should have
access to target costs.

> > Because, ((1 << (size - 1)) | (C - 1))) constant might be very expensive,
> > while C cheap, and % might not be that expensive compared to & to offset
> > that.
> > 
> > E.g. on x86_64, for 32-bit and smaller X the constant is cheap as any other
> > (well, if we don't take instruction size into account), but 64-bit constant
> > is at least 3 times more expensive (movabsq is needed with its latency).
> > In the x86_64 case supposedly the divmod is still more expensive, but there
> > are many other targets.  On sparc64 for 64-bit constants, you might need
> > many instructions to create the constants, etc.
> 
> Ok, I am not familiar with sparc64. The constant is just a 1
> in the sign bit orred with a continuous string of ones.
> That's usually cheap on aarch64 but may not be so on other targets.

On GIMPLE we might still want to canonicalize to one form.  I'd
canonicalize to the form with "smaller" constants if the number
of operations is the same.

Richard.

> Thanks,
> Kyrill
> 
> > 
> > 	Jakub
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2
  2015-07-24  8:54   ` Kyrill Tkachov
  2015-07-24  9:09     ` Richard Biener
@ 2015-07-24  9:16     ` Jakub Jelinek
  2015-07-24  9:31       ` Kyrill Tkachov
  1 sibling, 1 reply; 17+ messages in thread
From: Jakub Jelinek @ 2015-07-24  9:16 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches, Richard Biener

On Fri, Jul 24, 2015 at 09:48:30AM +0100, Kyrill Tkachov wrote:
> >>Bootstrapped and tested on arm, aarch64, x86_64.
> >I think this is another case that, if at all, should be done during or right
> >before RTL expansion and should test rtx costs.
> 
> Hmm, where would that be?

That is up to discussions, all I'm saying is that match.pd when run on
GENERIC and/or anywhere among GIMPLE passes that fold stuff is undesirable.

In expr.c, with TER you can detect such patterns, in this case when
expanding the comparison, but perhaps we want a *.pd file that would have
rules that would be only GIMPLE and only enabled in a special pass right
before (or very close to) expansion, that would perform such instruction
selection.

> Ok, I am not familiar with sparc64. The constant is just a 1
> in the sign bit orred with a continuous string of ones.
> That's usually cheap on aarch64 but may not be so on other targets.

It has been a long time since I've done anything on sparc64, but you can
e.g. have a look at config/sparc/sparc.c (sparc_emit_set_const64)
to clearly see that not all constants are equal cost, some are much more
expensive.  Seems sparc_rtx_cost does not model this accurrately, but that
is a backend bug, so shouldn't affect the generic decisions.  Sparc does not
have a moddi3 pattern, so your transformation might still be a win there,
except for -Os where it would be very bad code size pessimization.

All I want to say is that on GIMPLE we usually perform transformations where
simpler (fewer operations) is considered better, and for the same number of
operations where one sequence might be better on one target and another on
another target, supposedly we want some other infrastructure.

	Jakub

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2
  2015-07-24  9:09       ` Kyrill Tkachov
@ 2015-07-24  9:22         ` Ramana Radhakrishnan
  0 siblings, 0 replies; 17+ messages in thread
From: Ramana Radhakrishnan @ 2015-07-24  9:22 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Richard Biener, Jakub Jelinek, GCC Patches

On Fri, Jul 24, 2015 at 10:04 AM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>> It arrives as SSA_NAME == N and you can use get_gimple_for_ssa_name
>> or get_def_for_expr to get at the defining stmt if that is possible
>> (it's still unexpanded and thus TERed) and expand a different
>> expression.
>
>
> Thanks, so it's where we expand compares... (what's TERed?)

Temporary Expression Replacement - IIRC something done when you just
come out of ssa . tree-ssa-ter.[hc].

Ramana

>
>>
>> But why can't simplify-rtx via combine handle this - it should have
>> access to target costs.
>
>
> That would require for the target to expand to an SMOD rtx
> which, if the target has no direct instruction for would be somewhat
> awkward.
>
> Thanks,
> Kyrill
>
>
>>
>>>> Because, ((1 << (size - 1)) | (C - 1))) constant might be very
>>>> expensive,
>>>> while C cheap, and % might not be that expensive compared to & to offset
>>>> that.
>>>>
>>>> E.g. on x86_64, for 32-bit and smaller X the constant is cheap as any
>>>> other
>>>> (well, if we don't take instruction size into account), but 64-bit
>>>> constant
>>>> is at least 3 times more expensive (movabsq is needed with its latency).
>>>> In the x86_64 case supposedly the divmod is still more expensive, but
>>>> there
>>>> are many other targets.  On sparc64 for 64-bit constants, you might need
>>>> many instructions to create the constants, etc.
>>>
>>> Ok, I am not familiar with sparc64. The constant is just a 1
>>> in the sign bit orred with a continuous string of ones.
>>> That's usually cheap on aarch64 but may not be so on other targets.
>>
>> On GIMPLE we might still want to canonicalize to one form.  I'd
>> canonicalize to the form with "smaller" constants if the number
>> of operations is the same.
>>
>> Richard.
>>
>>> Thanks,
>>> Kyrill
>>>
>>>>         Jakub
>>>>
>>>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2
  2015-07-24  9:16     ` Jakub Jelinek
@ 2015-07-24  9:31       ` Kyrill Tkachov
  2015-07-24  9:36         ` Richard Biener
                           ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Kyrill Tkachov @ 2015-07-24  9:31 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches, Richard Biener


On 24/07/15 10:09, Jakub Jelinek wrote:
> On Fri, Jul 24, 2015 at 09:48:30AM +0100, Kyrill Tkachov wrote:
>>>> Bootstrapped and tested on arm, aarch64, x86_64.
>>> I think this is another case that, if at all, should be done during or right
>>> before RTL expansion and should test rtx costs.
>> Hmm, where would that be?
> That is up to discussions, all I'm saying is that match.pd when run on
> GENERIC and/or anywhere among GIMPLE passes that fold stuff is undesirable.
>
> In expr.c, with TER you can detect such patterns, in this case when
> expanding the comparison, but perhaps we want a *.pd file that would have
> rules that would be only GIMPLE and only enabled in a special pass right
> before (or very close to) expansion, that would perform such instruction
> selection.

Wild idea, but could it be considered to have target-specific
match.pd files that can be included in the main match.pd?
  That way, targets would get the benefit of getting
the target-specific folding they benefit from at the very beginning
of compilation without stepping on other targets toes.

Kyrill

>
>> Ok, I am not familiar with sparc64. The constant is just a 1
>> in the sign bit orred with a continuous string of ones.
>> That's usually cheap on aarch64 but may not be so on other targets.
> It has been a long time since I've done anything on sparc64, but you can
> e.g. have a look at config/sparc/sparc.c (sparc_emit_set_const64)
> to clearly see that not all constants are equal cost, some are much more
> expensive.  Seems sparc_rtx_cost does not model this accurrately, but that
> is a backend bug, so shouldn't affect the generic decisions.  Sparc does not
> have a moddi3 pattern, so your transformation might still be a win there,
> except for -Os where it would be very bad code size pessimization.
>
> All I want to say is that on GIMPLE we usually perform transformations where
> simpler (fewer operations) is considered better, and for the same number of
> operations where one sequence might be better on one target and another on
> another target, supposedly we want some other infrastructure.
>
> 	Jakub
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2
  2015-07-24  9:31       ` Kyrill Tkachov
@ 2015-07-24  9:36         ` Richard Biener
  2015-07-24 10:11         ` Jakub Jelinek
  2015-07-24 10:25         ` Ramana Radhakrishnan
  2 siblings, 0 replies; 17+ messages in thread
From: Richard Biener @ 2015-07-24  9:36 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Jakub Jelinek, GCC Patches

On Fri, 24 Jul 2015, Kyrill Tkachov wrote:

> 
> On 24/07/15 10:09, Jakub Jelinek wrote:
> > On Fri, Jul 24, 2015 at 09:48:30AM +0100, Kyrill Tkachov wrote:
> > > > > Bootstrapped and tested on arm, aarch64, x86_64.
> > > > I think this is another case that, if at all, should be done during or
> > > > right
> > > > before RTL expansion and should test rtx costs.
> > > Hmm, where would that be?
> > That is up to discussions, all I'm saying is that match.pd when run on
> > GENERIC and/or anywhere among GIMPLE passes that fold stuff is undesirable.
> > 
> > In expr.c, with TER you can detect such patterns, in this case when
> > expanding the comparison, but perhaps we want a *.pd file that would have
> > rules that would be only GIMPLE and only enabled in a special pass right
> > before (or very close to) expansion, that would perform such instruction
> > selection.

Yes, we can do that - that .pd could also be target specific.

> Wild idea, but could it be considered to have target-specific
> match.pd files that can be included in the main match.pd?
>  That way, targets would get the benefit of getting
> the target-specific folding they benefit from at the very beginning
> of compilation without stepping on other targets toes.

The patterns would interact with those in match.pd, so it adds extra
complication in testing (would need to test all targets) to not
run into infinite recursions for example.

It will also make adding testcases that work on all targets harder
as the IL presented to passes could then wildly differ...

So not sure if we want that (from the very beginning of the compilation).

Richard.

> Kyrill
> 
> > 
> > > Ok, I am not familiar with sparc64. The constant is just a 1
> > > in the sign bit orred with a continuous string of ones.
> > > That's usually cheap on aarch64 but may not be so on other targets.
> > It has been a long time since I've done anything on sparc64, but you can
> > e.g. have a look at config/sparc/sparc.c (sparc_emit_set_const64)
> > to clearly see that not all constants are equal cost, some are much more
> > expensive.  Seems sparc_rtx_cost does not model this accurrately, but that
> > is a backend bug, so shouldn't affect the generic decisions.  Sparc does not
> > have a moddi3 pattern, so your transformation might still be a win there,
> > except for -Os where it would be very bad code size pessimization.
> > 
> > All I want to say is that on GIMPLE we usually perform transformations where
> > simpler (fewer operations) is considered better, and for the same number of
> > operations where one sequence might be better on one target and another on
> > another target, supposedly we want some other infrastructure.
> > 
> > 	Jakub
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2
  2015-07-24  9:31       ` Kyrill Tkachov
  2015-07-24  9:36         ` Richard Biener
@ 2015-07-24 10:11         ` Jakub Jelinek
  2015-07-24 10:25         ` Ramana Radhakrishnan
  2 siblings, 0 replies; 17+ messages in thread
From: Jakub Jelinek @ 2015-07-24 10:11 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches, Richard Biener

On Fri, Jul 24, 2015 at 10:23:59AM +0100, Kyrill Tkachov wrote:
> On 24/07/15 10:09, Jakub Jelinek wrote:
> >On Fri, Jul 24, 2015 at 09:48:30AM +0100, Kyrill Tkachov wrote:
> >>>>Bootstrapped and tested on arm, aarch64, x86_64.
> >>>I think this is another case that, if at all, should be done during or right
> >>>before RTL expansion and should test rtx costs.
> >>Hmm, where would that be?
> >That is up to discussions, all I'm saying is that match.pd when run on
> >GENERIC and/or anywhere among GIMPLE passes that fold stuff is undesirable.
> >
> >In expr.c, with TER you can detect such patterns, in this case when
> >expanding the comparison, but perhaps we want a *.pd file that would have
> >rules that would be only GIMPLE and only enabled in a special pass right
> >before (or very close to) expansion, that would perform such instruction
> >selection.
> 
> Wild idea, but could it be considered to have target-specific
> match.pd files that can be included in the main match.pd?
>  That way, targets would get the benefit of getting
> the target-specific folding they benefit from at the very beginning
> of compilation without stepping on other targets toes.

I'd strongly prefer that isn't done.  First of all, you really don't want to
make target-specific foldings during generic folding (yeah, there are
already cases where it is done, but we want to avoid it), or during early
GIMPLE, ideally that should be late GIMPLE only where we introduce gradually
more and more target dependencies.  By making GIMPLE more target specific
earlier, you break e.g. the offloading stuff, but also introduce target
specific bugs more often to GIMPLE optimizers (these days, most GIMPLE
optimizer issues (especially before vectorizer/ivopts and other target
specific changes) affect all targets, or are perhaps ILP32/LP64 etc. related
at most).  Also, by allowing target-specific foldings, we'd end up with duplications
between different target, using rtx_costs, maintaining them more
accurrately and performing some generic code selections based on them IMHO
is desirable.

	Jakub

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2
  2015-07-24  9:31       ` Kyrill Tkachov
  2015-07-24  9:36         ` Richard Biener
  2015-07-24 10:11         ` Jakub Jelinek
@ 2015-07-24 10:25         ` Ramana Radhakrishnan
  2015-07-24 18:44           ` Jeff Law
  2 siblings, 1 reply; 17+ messages in thread
From: Ramana Radhakrishnan @ 2015-07-24 10:25 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Jakub Jelinek, GCC Patches, Richard Biener

>>
>> In expr.c, with TER you can detect such patterns, in this case when
>> expanding the comparison, but perhaps we want a *.pd file that would have
>> rules that would be only GIMPLE and only enabled in a special pass right
>> before (or very close to) expansion, that would perform such instruction
>> selection.
>
>
> Wild idea, but could it be considered to have target-specific
> match.pd files that can be included in the main match.pd?
>  That way, targets would get the benefit of getting
> the target-specific folding they benefit from at the very beginning
> of compilation without stepping on other targets toes.


The downside is preventing duplication, potentially reducing "generic"
improvements and a maintenance headache for gimple optimizers.

regards
Ramana


>
> Kyrill
>
>
>>
>>> Ok, I am not familiar with sparc64. The constant is just a 1
>>> in the sign bit orred with a continuous string of ones.
>>> That's usually cheap on aarch64 but may not be so on other targets.
>>
>> It has been a long time since I've done anything on sparc64, but you can
>> e.g. have a look at config/sparc/sparc.c (sparc_emit_set_const64)
>> to clearly see that not all constants are equal cost, some are much more
>> expensive.  Seems sparc_rtx_cost does not model this accurrately, but that
>> is a backend bug, so shouldn't affect the generic decisions.  Sparc does
>> not
>> have a moddi3 pattern, so your transformation might still be a win there,
>> except for -Os where it would be very bad code size pessimization.
>>
>> All I want to say is that on GIMPLE we usually perform transformations
>> where
>> simpler (fewer operations) is considered better, and for the same number
>> of
>> operations where one sequence might be better on one target and another on
>> another target, supposedly we want some other infrastructure.
>>
>>         Jakub
>>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2
  2015-07-24 10:25         ` Ramana Radhakrishnan
@ 2015-07-24 18:44           ` Jeff Law
  2015-07-27  8:17             ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Law @ 2015-07-24 18:44 UTC (permalink / raw)
  To: ramrad01, Kyrill Tkachov; +Cc: Jakub Jelinek, GCC Patches, Richard Biener

On 07/24/2015 03:44 AM, Ramana Radhakrishnan wrote:
>>>
>>> In expr.c, with TER you can detect such patterns, in this case when
>>> expanding the comparison, but perhaps we want a *.pd file that would have
>>> rules that would be only GIMPLE and only enabled in a special pass right
>>> before (or very close to) expansion, that would perform such instruction
>>> selection.
>>
>>
>> Wild idea, but could it be considered to have target-specific
>> match.pd files that can be included in the main match.pd?
>>   That way, targets would get the benefit of getting
>> the target-specific folding they benefit from at the very beginning
>> of compilation without stepping on other targets toes.
>
>
> The downside is preventing duplication, potentially reducing "generic"
> improvements and a maintenance headache for gimple optimizers.
So how about wedding the two ideas that have sprouted out of this 
discussion.  Specifically having a pass apply a target specific 
match.pd, but only do so at the end of the gimple optimization pipeline?

The design goal would (of course) be to change representations in ways 
that allow the gimple->rtl expanders to generate more efficient code for 
the target.

It avoids introducing the target bits early in the gimple pipeline, but 
still gives a clean way for targets to rewrite gimple for the benefit of 
gimple->rtl expansion.

jeff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2
  2015-07-24  8:21 [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2 Kyrill Tkachov
  2015-07-24  8:36 ` Jakub Jelinek
@ 2015-07-25  2:37 ` Segher Boessenkool
  2015-07-27  8:23   ` Kyrill Tkachov
  1 sibling, 1 reply; 17+ messages in thread
From: Segher Boessenkool @ 2015-07-25  2:37 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches, Richard Biener

On Fri, Jul 24, 2015 at 09:09:39AM +0100, Kyrill Tkachov wrote:
> This transformation folds (X % C) == N into
> X & ((1 << (size - 1)) | (C - 1))) == N
> for constants C and N where N is positive and C is a power of 2.

For N = 0 you can transform it to

	((unsigned)X % C) == N

and for 0 < N < C you can transform it to

	X > 0 && ((unsigned)X % C) == N          (or X >= 0)

and for -C < N < 0 it is

	X < 0 && ((unsigned)X % C) == N + C      (or X <= 0)

and for other N it is

	0.

For N not a constant, well, do you really care?  :-)

(That second case might eventually fold to your original expression).


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2
  2015-07-24 18:44           ` Jeff Law
@ 2015-07-27  8:17             ` Richard Biener
  2015-07-27 16:00               ` Jeff Law
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2015-07-27  8:17 UTC (permalink / raw)
  To: Jeff Law; +Cc: ramrad01, Kyrill Tkachov, Jakub Jelinek, GCC Patches

On Fri, 24 Jul 2015, Jeff Law wrote:

> On 07/24/2015 03:44 AM, Ramana Radhakrishnan wrote:
> > > > 
> > > > In expr.c, with TER you can detect such patterns, in this case when
> > > > expanding the comparison, but perhaps we want a *.pd file that would
> > > > have
> > > > rules that would be only GIMPLE and only enabled in a special pass right
> > > > before (or very close to) expansion, that would perform such instruction
> > > > selection.
> > > 
> > > 
> > > Wild idea, but could it be considered to have target-specific
> > > match.pd files that can be included in the main match.pd?
> > >   That way, targets would get the benefit of getting
> > > the target-specific folding they benefit from at the very beginning
> > > of compilation without stepping on other targets toes.
> > 
> > 
> > The downside is preventing duplication, potentially reducing "generic"
> > improvements and a maintenance headache for gimple optimizers.
> So how about wedding the two ideas that have sprouted out of this discussion.
> Specifically having a pass apply a target specific match.pd, but only do so at
> the end of the gimple optimization pipeline?
> 
> The design goal would (of course) be to change representations in ways that
> allow the gimple->rtl expanders to generate more efficient code for the
> target.
> 
> It avoids introducing the target bits early in the gimple pipeline, but still
> gives a clean way for targets to rewrite gimple for the benefit of gimple->rtl
> expansion.

I think it also aligns with the idea of pushing back RTL expansion and
expose some target specifics after another GIMPLE lowering phase.
I'm also thinking of addressing-mode selection and register promotion.

So at least if we think of that target specific match.pd pass as
containing all RTL expansion tricks done with TER only then it
should be quite simple to make it work.

Richard.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2
  2015-07-25  2:37 ` Segher Boessenkool
@ 2015-07-27  8:23   ` Kyrill Tkachov
  2015-07-27 15:54     ` Segher Boessenkool
  0 siblings, 1 reply; 17+ messages in thread
From: Kyrill Tkachov @ 2015-07-27  8:23 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: GCC Patches, Richard Biener


On 25/07/15 03:19, Segher Boessenkool wrote:
> On Fri, Jul 24, 2015 at 09:09:39AM +0100, Kyrill Tkachov wrote:
>> This transformation folds (X % C) == N into
>> X & ((1 << (size - 1)) | (C - 1))) == N
>> for constants C and N where N is positive and C is a power of 2.
> For N = 0 you can transform it to
>
> 	((unsigned)X % C) == N
>
> and for 0 < N < C you can transform it to
>
> 	X > 0 && ((unsigned)X % C) == N          (or X >= 0)
>
> and for -C < N < 0 it is
>
> 	X < 0 && ((unsigned)X % C) == N + C      (or X <= 0)
>
> and for other N it is
>
> 	0.
>
> For N not a constant, well, do you really care?  :-)
>
> (That second case might eventually fold to your original expression).

Yeah, these avoid the potentially expensive mask, but introduce more operations,
which I believe may not be desirable at this stage.
Unless these transformations are ok for match.pd I'll try to implement this transformation
at RTL expansion time.

Thanks,
Kyrill

>
>
> Segher
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2
  2015-07-27  8:23   ` Kyrill Tkachov
@ 2015-07-27 15:54     ` Segher Boessenkool
  0 siblings, 0 replies; 17+ messages in thread
From: Segher Boessenkool @ 2015-07-27 15:54 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches, Richard Biener

On Mon, Jul 27, 2015 at 09:11:12AM +0100, Kyrill Tkachov wrote:
> On 25/07/15 03:19, Segher Boessenkool wrote:
> >On Fri, Jul 24, 2015 at 09:09:39AM +0100, Kyrill Tkachov wrote:
> >>This transformation folds (X % C) == N into
> >>X & ((1 << (size - 1)) | (C - 1))) == N
> >>for constants C and N where N is positive and C is a power of 2.
> >For N = 0 you can transform it to
> >
> >	((unsigned)X % C) == N
> >
> >and for 0 < N < C you can transform it to
> >
> >	X > 0 && ((unsigned)X % C) == N          (or X >= 0)
> >
> >and for -C < N < 0 it is
> >
> >	X < 0 && ((unsigned)X % C) == N + C      (or X <= 0)
> >
> >and for other N it is
> >
> >	0.
> >
> >For N not a constant, well, do you really care?  :-)
> >
> >(That second case might eventually fold to your original expression).
> 
> Yeah, these avoid the potentially expensive mask,

Fun fact: the current code ends up using the exact same mask, for some
targets.

> but introduce more operations,
> which I believe may not be desirable at this stage.

It is getting rid of the (expensive) division/modulo.  In many cases it
could get rid of the sign test, or hoist it to some outer structure, hard
to test here though (at least, I have no idea how to do that).

> Unless these transformations are ok for match.pd I'll try to implement this 
> transformation
> at RTL expansion time.

If you have to do conditional jumps, the RTL optimisers will not be able
to do very much :-(


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2
  2015-07-27  8:17             ` Richard Biener
@ 2015-07-27 16:00               ` Jeff Law
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff Law @ 2015-07-27 16:00 UTC (permalink / raw)
  To: Richard Biener; +Cc: ramrad01, Kyrill Tkachov, Jakub Jelinek, GCC Patches

On 07/27/2015 01:36 AM, Richard Biener wrote:
>
> I think it also aligns with the idea of pushing back RTL expansion and
> expose some target specifics after another GIMPLE lowering phase.
> I'm also thinking of addressing-mode selection and register promotion.
>
> So at least if we think of that target specific match.pd pass as
> containing all RTL expansion tricks done with TER only then it
> should be quite simple to make it work.
Yea -- I was also thinking it might allow us to clean up some of the 
actual expansion code as well.  It seems promising enough for some 
experimentation.

jeff

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-07-27 15:54 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-24  8:21 [PATCH][RFC][match.pd] optimize (X & C) == N when C is power of 2 Kyrill Tkachov
2015-07-24  8:36 ` Jakub Jelinek
2015-07-24  8:54   ` Kyrill Tkachov
2015-07-24  9:09     ` Richard Biener
2015-07-24  9:09       ` Kyrill Tkachov
2015-07-24  9:22         ` Ramana Radhakrishnan
2015-07-24  9:16     ` Jakub Jelinek
2015-07-24  9:31       ` Kyrill Tkachov
2015-07-24  9:36         ` Richard Biener
2015-07-24 10:11         ` Jakub Jelinek
2015-07-24 10:25         ` Ramana Radhakrishnan
2015-07-24 18:44           ` Jeff Law
2015-07-27  8:17             ` Richard Biener
2015-07-27 16:00               ` Jeff Law
2015-07-25  2:37 ` Segher Boessenkool
2015-07-27  8:23   ` Kyrill Tkachov
2015-07-27 15:54     ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).