* [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
@ 2020-12-12 8:10 Jakub Jelinek
2020-12-12 10:58 ` Richard Biener
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Jakub Jelinek @ 2020-12-12 8:10 UTC (permalink / raw)
To: Richard Biener; +Cc: gcc-patches
Hi!
This patch adds the ~(X - Y) -> ~X + Y simplification requested
in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
be safely negated.
The first two simplify blocks is what has been requested in the PR
and that makes the first testcase pass.
Unfortunately, that change also breaks the second testcase, because
while the same expressions appearing in the same stmt and split
across multiple stmts has been folded (not really) before, with
this optimization fold-const.c optimizes ~X + Y further into
(Y - X) - 1 in fold_binary_loc associate: code, but we have nothing
like that in GIMPLE and so end up with different expressions.
The last simplify is an attempt to deal with just this case,
had to rule out there the Y == -1U case, because then we
reached infinite recursion as ~X + -1U was canonicalized by
the pattern into (-1U - X) + -1U but there is a canonicalization
-1 - A -> ~A that turns it back. Furthermore, had to make it
#if GIMPLE only, because it otherwise resulted in infinite recursion
when interacting with the associate: optimization.
The end result is that we pass all 3 testcases and thus canonizalize
the 3 possible forms of writing the same thing.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
2020-12-12 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/96685
* match.pd (~(X - Y) -> ~X + Y): New optimization.
(~X + Y -> (Y - X) - 1): Likewise.
* gcc.dg/tree-ssa/pr96685-1.c: New test.
* gcc.dg/tree-ssa/pr96685-2.c: New test.
* gcc.dg/tree-ssa/pr96685-3.c: New test.
--- gcc/match.pd.jj 2020-12-11 16:41:45.797787831 +0100
+++ gcc/match.pd 2020-12-11 23:12:45.769291586 +0100
@@ -1074,6 +1074,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(bit_not (plus:c (bit_not @0) @1))
(minus @0 @1))
+/* ~(X - Y) -> ~X + Y. */
+(simplify
+ (bit_not (minus:s @0 @1))
+ (plus (bit_not @0) @1))
+(simplify
+ (bit_not (plus:s @0 INTEGER_CST@1))
+ (if ((INTEGRAL_TYPE_P (type)
+ && TYPE_UNSIGNED (type))
+ || (!TYPE_OVERFLOW_SANITIZED (type)
+ && may_negate_without_overflow_p (@1)))
+ (plus (bit_not @0) { const_unop (NEGATE_EXPR, type, @1); })))
+
+#if GIMPLE
+/* ~X + Y -> (Y - X) - 1. */
+(simplify
+ (plus:c (bit_not @0) @1)
+ (if (ANY_INTEGRAL_TYPE_P (type)
+ && TYPE_OVERFLOW_WRAPS (type)
+ /* -1 - X is folded to ~X, so we'd recurse endlessly. */
+ && !integer_all_onesp (@1))
+ (plus (minus @1 @0) { build_minus_one_cst (type); })
+ (if (INTEGRAL_TYPE_P (type)
+ && TREE_CODE (@1) == INTEGER_CST
+ && wi::to_wide (@1) != wi::min_value (TYPE_PRECISION (type),
+ SIGNED))
+ (minus (plus @1 { build_minus_one_cst (type); }) @0))))
+#endif
+
/* x + (x & 1) -> (x + 1) & ~1 */
(simplify
(plus:c @0 (bit_and:s @0 integer_onep@1))
--- gcc/testsuite/gcc.dg/tree-ssa/pr96685-1.c.jj 2020-12-11 16:42:03.975584838 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr96685-1.c 2020-12-11 16:42:03.975584838 +0100
@@ -0,0 +1,52 @@
+/* PR tree-optimization/96685 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "return 1;" 6 "optimized" } } */
+
+unsigned
+f1 (unsigned x, unsigned y)
+{
+ unsigned a = ~(x - y);
+ unsigned b = ~x + y;
+ return a == b;
+}
+
+unsigned
+f2 (unsigned x)
+{
+ unsigned a = ~(x + -124U);
+ unsigned b = ~x + 124U;
+ return a == b;
+}
+
+unsigned
+f3 (unsigned x)
+{
+ unsigned a = ~(x + 124U);
+ unsigned b = ~x + -124U;
+ return a == b;
+}
+
+int
+f4 (int x, int y)
+{
+ int a = ~(x - y);
+ int b = ~x + y;
+ return a == b;
+}
+
+int
+f5 (int x)
+{
+ int a = ~(x + -124);
+ int b = ~x + 124;
+ return a == b;
+}
+
+int
+f6 (int x)
+{
+ int a = ~(x + 124);
+ int b = ~x + -124;
+ return a == b;
+}
--- gcc/testsuite/gcc.dg/tree-ssa/pr96685-2.c.jj 2020-12-11 16:42:03.975584838 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr96685-2.c 2020-12-11 16:42:03.975584838 +0100
@@ -0,0 +1,40 @@
+/* PR tree-optimization/96685 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "return 1;" 4 "optimized" } } */
+
+int
+f1 (unsigned x, unsigned y)
+{
+ unsigned int r1 = (x - y);
+ r1 = ~r1;
+ unsigned int r2 = ~(x - y);
+ return r1 == r2;
+}
+
+int
+f2 (unsigned x, unsigned y)
+{
+ unsigned int r1 = (x - 23);
+ r1 = ~r1;
+ unsigned int r2 = ~(x - 23);
+ return r1 == r2;
+}
+
+int
+f3 (int x, int y)
+{
+ int r1 = (x - y);
+ r1 = ~r1;
+ int r2 = ~(x - y);
+ return r1 == r2;
+}
+
+int
+f4 (int x, int y)
+{
+ int r1 = (x - 23);
+ r1 = ~r1;
+ int r2 = ~(x - 23);
+ return r1 == r2;
+}
--- gcc/testsuite/gcc.dg/tree-ssa/pr96685-3.c.jj 2020-12-11 17:35:09.536123227 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr96685-3.c 2020-12-11 17:34:31.911543423 +0100
@@ -0,0 +1,43 @@
+/* PR tree-optimization/96685 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "return 1;" 4 "optimized" } } */
+
+int
+f1 (unsigned x, unsigned y)
+{
+ unsigned int r1 = (x - y);
+ r1 = ~r1;
+ unsigned int r2 = (y - x);
+ r2 = r2 - 1;
+ return r1 == r2;
+}
+
+int
+f2 (unsigned x, unsigned y)
+{
+ unsigned int r1 = (x - 23);
+ r1 = ~r1;
+ unsigned int r2 = (23 - x);
+ r2 = r2 - 1;
+ return r1 == r2;
+}
+
+int
+f3 (int x, int y)
+{
+ int r1 = (x - 23);
+ r1 = ~r1;
+ int r2 = (23 - x);
+ --r2;
+ return r1 == r2;
+}
+
+int
+f4 (int x, int y)
+{
+ int r1 = (x - 23);
+ r1 = ~r1;
+ int r2 = (22 - x);
+ return r1 == r2;
+}
Jakub
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
2020-12-12 8:10 [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685] Jakub Jelinek
@ 2020-12-12 10:58 ` Richard Biener
2020-12-12 12:25 ` Marc Glisse
2020-12-21 2:50 ` Maciej W. Rozycki
2 siblings, 0 replies; 13+ messages in thread
From: Richard Biener @ 2020-12-12 10:58 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: gcc-patches
On December 12, 2020 9:10:41 AM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote:
>Hi!
>
>This patch adds the ~(X - Y) -> ~X + Y simplification requested
>in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
>be safely negated.
>
>The first two simplify blocks is what has been requested in the PR
>and that makes the first testcase pass.
>Unfortunately, that change also breaks the second testcase, because
>while the same expressions appearing in the same stmt and split
>across multiple stmts has been folded (not really) before, with
>this optimization fold-const.c optimizes ~X + Y further into
>(Y - X) - 1 in fold_binary_loc associate: code, but we have nothing
>like that in GIMPLE and so end up with different expressions.
>
>The last simplify is an attempt to deal with just this case,
>had to rule out there the Y == -1U case, because then we
>reached infinite recursion as ~X + -1U was canonicalized by
>the pattern into (-1U - X) + -1U but there is a canonicalization
>-1 - A -> ~A that turns it back. Furthermore, had to make it
>#if GIMPLE only, because it otherwise resulted in infinite recursion
>when interacting with the associate: optimization.
>The end result is that we pass all 3 testcases and thus canonizalize
>the 3 possible forms of writing the same thing.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Ok.
Thanks,
Richard.
>2020-12-12 Jakub Jelinek <jakub@redhat.com>
>
> PR tree-optimization/96685
> * match.pd (~(X - Y) -> ~X + Y): New optimization.
> (~X + Y -> (Y - X) - 1): Likewise.
>
> * gcc.dg/tree-ssa/pr96685-1.c: New test.
> * gcc.dg/tree-ssa/pr96685-2.c: New test.
> * gcc.dg/tree-ssa/pr96685-3.c: New test.
>
>--- gcc/match.pd.jj 2020-12-11 16:41:45.797787831 +0100
>+++ gcc/match.pd 2020-12-11 23:12:45.769291586 +0100
>@@ -1074,6 +1074,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (bit_not (plus:c (bit_not @0) @1))
> (minus @0 @1))
>
>+/* ~(X - Y) -> ~X + Y. */
>+(simplify
>+ (bit_not (minus:s @0 @1))
>+ (plus (bit_not @0) @1))
>+(simplify
>+ (bit_not (plus:s @0 INTEGER_CST@1))
>+ (if ((INTEGRAL_TYPE_P (type)
>+ && TYPE_UNSIGNED (type))
>+ || (!TYPE_OVERFLOW_SANITIZED (type)
>+ && may_negate_without_overflow_p (@1)))
>+ (plus (bit_not @0) { const_unop (NEGATE_EXPR, type, @1); })))
>+
>+#if GIMPLE
>+/* ~X + Y -> (Y - X) - 1. */
>+(simplify
>+ (plus:c (bit_not @0) @1)
>+ (if (ANY_INTEGRAL_TYPE_P (type)
>+ && TYPE_OVERFLOW_WRAPS (type)
>+ /* -1 - X is folded to ~X, so we'd recurse endlessly. */
>+ && !integer_all_onesp (@1))
>+ (plus (minus @1 @0) { build_minus_one_cst (type); })
>+ (if (INTEGRAL_TYPE_P (type)
>+ && TREE_CODE (@1) == INTEGER_CST
>+ && wi::to_wide (@1) != wi::min_value (TYPE_PRECISION (type),
>+ SIGNED))
>+ (minus (plus @1 { build_minus_one_cst (type); }) @0))))
>+#endif
>+
> /* x + (x & 1) -> (x + 1) & ~1 */
> (simplify
> (plus:c @0 (bit_and:s @0 integer_onep@1))
>--- gcc/testsuite/gcc.dg/tree-ssa/pr96685-1.c.jj 2020-12-11
>16:42:03.975584838 +0100
>+++ gcc/testsuite/gcc.dg/tree-ssa/pr96685-1.c 2020-12-11
>16:42:03.975584838 +0100
>@@ -0,0 +1,52 @@
>+/* PR tree-optimization/96685 */
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -fdump-tree-optimized" } */
>+/* { dg-final { scan-tree-dump-times "return 1;" 6 "optimized" } } */
>+
>+unsigned
>+f1 (unsigned x, unsigned y)
>+{
>+ unsigned a = ~(x - y);
>+ unsigned b = ~x + y;
>+ return a == b;
>+}
>+
>+unsigned
>+f2 (unsigned x)
>+{
>+ unsigned a = ~(x + -124U);
>+ unsigned b = ~x + 124U;
>+ return a == b;
>+}
>+
>+unsigned
>+f3 (unsigned x)
>+{
>+ unsigned a = ~(x + 124U);
>+ unsigned b = ~x + -124U;
>+ return a == b;
>+}
>+
>+int
>+f4 (int x, int y)
>+{
>+ int a = ~(x - y);
>+ int b = ~x + y;
>+ return a == b;
>+}
>+
>+int
>+f5 (int x)
>+{
>+ int a = ~(x + -124);
>+ int b = ~x + 124;
>+ return a == b;
>+}
>+
>+int
>+f6 (int x)
>+{
>+ int a = ~(x + 124);
>+ int b = ~x + -124;
>+ return a == b;
>+}
>--- gcc/testsuite/gcc.dg/tree-ssa/pr96685-2.c.jj 2020-12-11
>16:42:03.975584838 +0100
>+++ gcc/testsuite/gcc.dg/tree-ssa/pr96685-2.c 2020-12-11
>16:42:03.975584838 +0100
>@@ -0,0 +1,40 @@
>+/* PR tree-optimization/96685 */
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -fdump-tree-optimized" } */
>+/* { dg-final { scan-tree-dump-times "return 1;" 4 "optimized" } } */
>+
>+int
>+f1 (unsigned x, unsigned y)
>+{
>+ unsigned int r1 = (x - y);
>+ r1 = ~r1;
>+ unsigned int r2 = ~(x - y);
>+ return r1 == r2;
>+}
>+
>+int
>+f2 (unsigned x, unsigned y)
>+{
>+ unsigned int r1 = (x - 23);
>+ r1 = ~r1;
>+ unsigned int r2 = ~(x - 23);
>+ return r1 == r2;
>+}
>+
>+int
>+f3 (int x, int y)
>+{
>+ int r1 = (x - y);
>+ r1 = ~r1;
>+ int r2 = ~(x - y);
>+ return r1 == r2;
>+}
>+
>+int
>+f4 (int x, int y)
>+{
>+ int r1 = (x - 23);
>+ r1 = ~r1;
>+ int r2 = ~(x - 23);
>+ return r1 == r2;
>+}
>--- gcc/testsuite/gcc.dg/tree-ssa/pr96685-3.c.jj 2020-12-11
>17:35:09.536123227 +0100
>+++ gcc/testsuite/gcc.dg/tree-ssa/pr96685-3.c 2020-12-11
>17:34:31.911543423 +0100
>@@ -0,0 +1,43 @@
>+/* PR tree-optimization/96685 */
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -fdump-tree-optimized" } */
>+/* { dg-final { scan-tree-dump-times "return 1;" 4 "optimized" } } */
>+
>+int
>+f1 (unsigned x, unsigned y)
>+{
>+ unsigned int r1 = (x - y);
>+ r1 = ~r1;
>+ unsigned int r2 = (y - x);
>+ r2 = r2 - 1;
>+ return r1 == r2;
>+}
>+
>+int
>+f2 (unsigned x, unsigned y)
>+{
>+ unsigned int r1 = (x - 23);
>+ r1 = ~r1;
>+ unsigned int r2 = (23 - x);
>+ r2 = r2 - 1;
>+ return r1 == r2;
>+}
>+
>+int
>+f3 (int x, int y)
>+{
>+ int r1 = (x - 23);
>+ r1 = ~r1;
>+ int r2 = (23 - x);
>+ --r2;
>+ return r1 == r2;
>+}
>+
>+int
>+f4 (int x, int y)
>+{
>+ int r1 = (x - 23);
>+ r1 = ~r1;
>+ int r2 = (22 - x);
>+ return r1 == r2;
>+}
>
> Jakub
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
2020-12-12 8:10 [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685] Jakub Jelinek
2020-12-12 10:58 ` Richard Biener
@ 2020-12-12 12:25 ` Marc Glisse
2020-12-12 12:41 ` Jakub Jelinek
2020-12-21 2:50 ` Maciej W. Rozycki
2 siblings, 1 reply; 13+ messages in thread
From: Marc Glisse @ 2020-12-12 12:25 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Richard Biener, gcc-patches
On Sat, 12 Dec 2020, Jakub Jelinek via Gcc-patches wrote:
> This patch adds the ~(X - Y) -> ~X + Y simplification requested
> in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
> be safely negated.
Would it have been wrong to produce ~X - C without caring about negating
(and then extending it to non-constants)?
I wonder if this makes
/* ~(~X - Y) -> X + Y and ~(~X + Y) -> X - Y. */
useless.
--
Marc Glisse
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
2020-12-12 12:25 ` Marc Glisse
@ 2020-12-12 12:41 ` Jakub Jelinek
2020-12-12 13:00 ` Marc Glisse
0 siblings, 1 reply; 13+ messages in thread
From: Jakub Jelinek @ 2020-12-12 12:41 UTC (permalink / raw)
To: gcc-patches; +Cc: Richard Biener
On Sat, Dec 12, 2020 at 01:25:39PM +0100, Marc Glisse wrote:
> On Sat, 12 Dec 2020, Jakub Jelinek via Gcc-patches wrote:
>
> > This patch adds the ~(X - Y) -> ~X + Y simplification requested
> > in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
> > be safely negated.
>
> Would it have been wrong to produce ~X - C without caring about negating
> (and then extending it to non-constants)?
Extending it to non-constants is what I wanted to avoid.
For ~(X + Y), because + is commutative, it wouldn't be a canonicalization
as it would pick more-less randomly whether to do ~X + Y or X + ~Y.
> I wonder if this makes
> /* ~(~X - Y) -> X + Y and ~(~X + Y) -> X - Y. */
> useless.
Maybe the former, but not the latter.
Jakub
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
2020-12-12 12:41 ` Jakub Jelinek
@ 2020-12-12 13:00 ` Marc Glisse
2020-12-12 13:05 ` Jakub Jelinek
0 siblings, 1 reply; 13+ messages in thread
From: Marc Glisse @ 2020-12-12 13:00 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: gcc-patches, Richard Biener
On Sat, 12 Dec 2020, Jakub Jelinek via Gcc-patches wrote:
> On Sat, Dec 12, 2020 at 01:25:39PM +0100, Marc Glisse wrote:
>> On Sat, 12 Dec 2020, Jakub Jelinek via Gcc-patches wrote:
>>
>>> This patch adds the ~(X - Y) -> ~X + Y simplification requested
>>> in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
>>> be safely negated.
>>
>> Would it have been wrong to produce ~X - C without caring about negating
>> (and then extending it to non-constants)?
>
> Extending it to non-constants is what I wanted to avoid.
> For ~(X + Y), because + is commutative, it wouldn't be a canonicalization
> as it would pick more-less randomly whether to do ~X + Y or X + ~Y.
~X - Y or ~Y - X I guess.
Ok, I understand. But then in the constant case, why produce ~X + -C
instead of ~X - C (which I think doesn't need to care about negating), or
even ~C - X (one less operation)? Or do we already have a transformation
from ~X - C to ~C - X?
--
Marc Glisse
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
2020-12-12 13:00 ` Marc Glisse
@ 2020-12-12 13:05 ` Jakub Jelinek
0 siblings, 0 replies; 13+ messages in thread
From: Jakub Jelinek @ 2020-12-12 13:05 UTC (permalink / raw)
To: gcc-patches; +Cc: Richard Biener
On Sat, Dec 12, 2020 at 02:00:49PM +0100, Marc Glisse wrote:
> > Extending it to non-constants is what I wanted to avoid.
> > For ~(X + Y), because + is commutative, it wouldn't be a canonicalization
> > as it would pick more-less randomly whether to do ~X + Y or X + ~Y.
>
> ~X - Y or ~Y - X I guess.
>
> Ok, I understand. But then in the constant case, why produce ~X + -C instead
> of ~X - C (which I think doesn't need to care about negating),
Because we immediately simplify it again into ~X + (-C) then.
Jakub
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
2020-12-12 8:10 [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685] Jakub Jelinek
2020-12-12 10:58 ` Richard Biener
2020-12-12 12:25 ` Marc Glisse
@ 2020-12-21 2:50 ` Maciej W. Rozycki
2020-12-21 7:54 ` Jakub Jelinek
2 siblings, 1 reply; 13+ messages in thread
From: Maciej W. Rozycki @ 2020-12-21 2:50 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Richard Biener, gcc-patches
On Sat, 12 Dec 2020, Jakub Jelinek via Gcc-patches wrote:
> This patch adds the ~(X - Y) -> ~X + Y simplification requested
> in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
> be safely negated.
This regresses VAX code produced by the cmpelim-eq-notsi.c test case (and
its similar counterparts) with the `vax-netbsdelf' target.
Previously this assembly:
.text
.align 1
.globl eq_notsi
.type eq_notsi, @function
eq_notsi:
.word 0 # 35 [c=0] procedure_entry_mask
subl2 $4,%sp # 46 [c=32] *addsi3
mcoml 4(%ap),%r0 # 32 [c=16] *one_cmplsi2_ccz
jeql .L1 # 34 [c=26] *branch_ccz
addl2 $2,%r0 # 31 [c=32] *addsi3
.L1:
ret # 40 [c=0] return
.size eq_notsi, .-eq_notsi
was produced. Now this:
.text
.align 1
.globl eq_notsi
.type eq_notsi, @function
eq_notsi:
.word 0 # 36 [c=0] procedure_entry_mask
subl2 $4,%sp # 48 [c=32] *addsi3
movl 4(%ap),%r0 # 33 [c=16] *movsi_2
cmpl %r0,$-1 # 34 [c=8] *cmpsi_ccz/1
jeql .L3 # 35 [c=26] *branch_ccz
subl3 %r0,$1,%r0 # 32 [c=32] *subsi3/1
ret # 27 [c=0] return
.L3:
clrl %r0 # 31 [c=2] *movsi_2
ret # 41 [c=0] return
.size eq_notsi, .-eq_notsi
is, which is clearly worse, both in terms of performance and size.
The key here is that the cost of constant 0, here used with a comparison
operation eliminated after MCOML in the former assembly sequence, is lower
(as per `vax_rtx_costs') in the VAX ISA than the cost of constant -1, used
with CMPL in the latter sequence. Not only constant 0 is an implied
operand with some machine instructions saving cycles and space otherwise
used for an explicitly encoded operand, but if used with a comparison
operation it can usually be eliminated, so it should be preferred over all
other constants.
With the example you gave with the PR I can see progression with f3, f4,
f7, f8, regression with f1, f2, and no change in operation cost with f5,
f6.
Shouldn't a transformation like this respect target-specific expression
costs somehow then? Depending on the individual case one form or the
other might be cheaper, and somehow we assume here both are equivalent in
terms of performance and/or code size (as applicable for the optimisation
mode chosen).
Maciej
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
2020-12-21 2:50 ` Maciej W. Rozycki
@ 2020-12-21 7:54 ` Jakub Jelinek
2021-01-10 0:43 ` Maciej W. Rozycki
0 siblings, 1 reply; 13+ messages in thread
From: Jakub Jelinek @ 2020-12-21 7:54 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: gcc-patches, Richard Biener
On Mon, Dec 21, 2020 at 02:50:24AM +0000, Maciej W. Rozycki wrote:
> On Sat, 12 Dec 2020, Jakub Jelinek via Gcc-patches wrote:
>
> > This patch adds the ~(X - Y) -> ~X + Y simplification requested
> > in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
> > be safely negated.
>
> This regresses VAX code produced by the cmpelim-eq-notsi.c test case (and
> its similar counterparts) with the `vax-netbsdelf' target.
The point of the match.pd changes is to canonicalize GIMPLE on some form
when there are several from GIMPLE POV equivalent or better forms of writing
the same thing. The advantage of having one canonical way is that ICF,
SCCVN etc. optimizations can then understand the different forms are
equivalent.
If another form is then better for a particular machine, it should be done
either during expansion (being able to produce both RTLs and computing their
costs), or during combine with either combine splitters or
define_insn_and_split in the backend, or, if it can't be done in RTL, during
the isel pass.
Jakub
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
2020-12-21 7:54 ` Jakub Jelinek
@ 2021-01-10 0:43 ` Maciej W. Rozycki
2021-01-11 19:22 ` Jeff Law
0 siblings, 1 reply; 13+ messages in thread
From: Maciej W. Rozycki @ 2021-01-10 0:43 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: gcc-patches, Richard Biener, Jeff Law
On Mon, 21 Dec 2020, Jakub Jelinek wrote:
> > > This patch adds the ~(X - Y) -> ~X + Y simplification requested
> > > in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
> > > be safely negated.
> >
> > This regresses VAX code produced by the cmpelim-eq-notsi.c test case (and
> > its similar counterparts) with the `vax-netbsdelf' target.
>
> The point of the match.pd changes is to canonicalize GIMPLE on some form
> when there are several from GIMPLE POV equivalent or better forms of writing
> the same thing. The advantage of having one canonical way is that ICF,
> SCCVN etc. optimizations can then understand the different forms are
> equivalent.
Fair enough, though in cases like this I think it is unclear which of the
two forms is going to be ultimately better, especially as it may depend on
the exact form of the operands used, e.g. values of any immediates, so I
think a way to make the reverse transformation (whether to undo one made
here or genuinely) needs to be available at a later compilation stage.
One size doesn't fit all.
With this in mind...
> If another form is then better for a particular machine, it should be done
> either during expansion (being able to produce both RTLs and computing their
> costs), or during combine with either combine splitters or
> define_insn_and_split in the backend, or, if it can't be done in RTL, during
> the isel pass.
Hmm, maybe it has been discussed before, so please forgive me if I write
something silly, but it seems to me like this should be done in a generic
way like match.pd so that all the targets do not have to track the changes
made there and then perhaps repeat the same or similar code each. So I
think it would make sense to make a change like this include that reverse
transformation as well, so that ultimately both forms are tried with RTL,
as there is no clear advantage to either here.
However expand AFAICT essentially boils down to `expand_expr_real_2',
which uses a bijective mapping between individual operations in the tree
and the RTL form respectively rather than sequences as with the unary one
complement and an addition/subtraction covered here, for two operations
total. So this cannot be done with expand, by design, unless I'm missing
something.
So from the look at combine code I infer the way to handle this within
the existing infrastructure would be with `try_combine'/`find_split_point'
(as we have one sequence of insns to be replaced with another sequence
rather than combined into a single insn), right?
Overall it seems to me it would be the best if we were able to produce a
reverse RTL transformation automatically straight from match.pd, however I
am not sure whether the syntax used is suitable for reverse interpretation
in the general case. And in any case this would be a major effort, but it
would be hugely beneficial, as we have lots of knowledge already collected
in match.pd.
Have I missed anything here?
NB I've seen it often enough already to irritate me that someone claims
we ought to do something "because LLVM does it". Well, if someone wants
things done the LLVM way, then there's LLVM readily available. If there
is a technical advantage in doing something, then surely we ought to do
that, however whether LLVM does it too or does not is I think irrelevant.
Maciej
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
2021-01-10 0:43 ` Maciej W. Rozycki
@ 2021-01-11 19:22 ` Jeff Law
2021-01-12 8:42 ` Richard Biener
0 siblings, 1 reply; 13+ messages in thread
From: Jeff Law @ 2021-01-11 19:22 UTC (permalink / raw)
To: Maciej W. Rozycki, Jakub Jelinek; +Cc: gcc-patches, Richard Biener
On 1/9/21 5:43 PM, Maciej W. Rozycki wrote:
> On Mon, 21 Dec 2020, Jakub Jelinek wrote:
>
>>>> This patch adds the ~(X - Y) -> ~X + Y simplification requested
>>>> in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
>>>> be safely negated.
>>> This regresses VAX code produced by the cmpelim-eq-notsi.c test case (and
>>> its similar counterparts) with the `vax-netbsdelf' target.
>> The point of the match.pd changes is to canonicalize GIMPLE on some form
>> when there are several from GIMPLE POV equivalent or better forms of writing
>> the same thing. The advantage of having one canonical way is that ICF,
>> SCCVN etc. optimizations can then understand the different forms are
>> equivalent.
> Fair enough, though in cases like this I think it is unclear which of the
> two forms is going to be ultimately better, especially as it may depend on
> the exact form of the operands used, e.g. values of any immediates, so I
> think a way to make the reverse transformation (whether to undo one made
> here or genuinely) needs to be available at a later compilation stage.
> One size doesn't fit all.
>
> With this in mind...
So in this case the number of operations are the same before/after and
parallelism is the same before/after, register lifetimes, etc. I doubt
either form is particularly better suited for CSE or gives better VRP
data, etc. The fact that we can't always do ~(X +C) -> ~X + -C
probably argues against that form ever so slightly.
>
>> If another form is then better for a particular machine, it should be done
>> either during expansion (being able to produce both RTLs and computing their
>> costs), or during combine with either combine splitters or
>> define_insn_and_split in the backend, or, if it can't be done in RTL, during
>> the isel pass.
> Hmm, maybe it has been discussed before, so please forgive me if I write
> something silly, but it seems to me like this should be done in a generic
> way like match.pd so that all the targets do not have to track the changes
> made there and then perhaps repeat the same or similar code each. So I
> think it would make sense to make a change like this include that reverse
> transformation as well, so that ultimately both forms are tried with RTL,
> as there is no clear advantage to either here.
The idea we've kicked around in the past was to use the same syntax as
match.pd, but have it be target dependent to reform expressions in ways
that are beneficial to the target and have it run at the end of the
gimple/ssa pipeline. Nobody's implemented this though.
jeff
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
2021-01-11 19:22 ` Jeff Law
@ 2021-01-12 8:42 ` Richard Biener
2021-01-14 14:57 ` Maciej W. Rozycki
0 siblings, 1 reply; 13+ messages in thread
From: Richard Biener @ 2021-01-12 8:42 UTC (permalink / raw)
To: Jeff Law; +Cc: Maciej W. Rozycki, Jakub Jelinek, gcc-patches
On Mon, 11 Jan 2021, Jeff Law wrote:
>
>
> On 1/9/21 5:43 PM, Maciej W. Rozycki wrote:
>> On Mon, 21 Dec 2020, Jakub Jelinek wrote:
>>
>>>>> This patch adds the ~(X - Y) -> ~X + Y simplification requested
>>>>> in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
>>>>> be safely negated.
>>>> This regresses VAX code produced by the cmpelim-eq-notsi.c test case (and
>>>> its similar counterparts) with the `vax-netbsdelf' target.
>>> The point of the match.pd changes is to canonicalize GIMPLE on some form
>>> when there are several from GIMPLE POV equivalent or better forms of writing
>>> the same thing. The advantage of having one canonical way is that ICF,
>>> SCCVN etc. optimizations can then understand the different forms are
>>> equivalent.
>> Fair enough, though in cases like this I think it is unclear which of the
>> two forms is going to be ultimately better, especially as it may depend on
>> the exact form of the operands used, e.g. values of any immediates, so I
>> think a way to make the reverse transformation (whether to undo one made
>> here or genuinely) needs to be available at a later compilation stage.
>> One size doesn't fit all.
>>
>> With this in mind...
> So in this case the number of operations are the same before/after and
> parallelism is the same before/after, register lifetimes, etc. I doubt
> either form is particularly better suited for CSE or gives better VRP
> data, etc. The fact that we can't always do ~(X +C) -> ~X + -C
> probably argues against that form ever so slightly.
>
>
>>
>>> If another form is then better for a particular machine, it should be done
>>> either during expansion (being able to produce both RTLs and computing their
>>> costs), or during combine with either combine splitters or
>>> define_insn_and_split in the backend, or, if it can't be done in RTL, during
>>> the isel pass.
>> Hmm, maybe it has been discussed before, so please forgive me if I write
>> something silly, but it seems to me like this should be done in a generic
>> way like match.pd so that all the targets do not have to track the changes
>> made there and then perhaps repeat the same or similar code each. So I
>> think it would make sense to make a change like this include that reverse
>> transformation as well, so that ultimately both forms are tried with RTL,
>> as there is no clear advantage to either here.
> The idea we've kicked around in the past was to use the same syntax as
> match.pd, but have it be target dependent to reform expressions in ways
> that are beneficial to the target and have it run at the end of the
> gimple/ssa pipeline. Nobody's implemented this though.
Yes. And while a gimple-to-gimple transform is indeed quite simple
to make eventually a match.pd-like gimple-to-RTL would be more
useful in the end. Note RTL can eventually be emulated close enough
via the use of internal functions mapping to optabs. But still
complex combined instructions will need expander help unless we
expose all named instruction RTL patterns as target specific
internal functions to use from that .pd file.
Richard.
> jeff
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
2021-01-12 8:42 ` Richard Biener
@ 2021-01-14 14:57 ` Maciej W. Rozycki
2021-01-14 15:03 ` Richard Biener
0 siblings, 1 reply; 13+ messages in thread
From: Maciej W. Rozycki @ 2021-01-14 14:57 UTC (permalink / raw)
To: Richard Biener; +Cc: Jeff Law, Jakub Jelinek, gcc-patches
On Tue, 12 Jan 2021, Richard Biener wrote:
> >>> The point of the match.pd changes is to canonicalize GIMPLE on some form
> >>> when there are several from GIMPLE POV equivalent or better forms of
> >>> writing
> >>> the same thing. The advantage of having one canonical way is that ICF,
> >>> SCCVN etc. optimizations can then understand the different forms are
> >>> equivalent.
> >> Fair enough, though in cases like this I think it is unclear which of the
> >> two forms is going to be ultimately better, especially as it may depend on
> >> the exact form of the operands used, e.g. values of any immediates, so I
> >> think a way to make the reverse transformation (whether to undo one made
> >> here or genuinely) needs to be available at a later compilation stage.
> >> One size doesn't fit all.
> >>
> >> With this in mind...
> > So in this case the number of operations are the same before/after and
> > parallelism is the same before/after, register lifetimes, etc. I doubt
> > either form is particularly better suited for CSE or gives better VRP
> > data, etc. The fact that we can't always do ~(X +C) -> ~X + -C
> > probably argues against that form ever so slightly.
FWIW I agree with Jakub here, that having one canonical form for the
middle end to operate on is advantageous. It is just that when we
eventually get to the backend we may want to do the reverse transformation
in some cases, which may be specific immediate operand values or whatever
the backend may see fit.
> >>> If another form is then better for a particular machine, it should be done
> >>> either during expansion (being able to produce both RTLs and computing
> >>> their
> >>> costs), or during combine with either combine splitters or
> >>> define_insn_and_split in the backend, or, if it can't be done in RTL,
> >>> during
> >>> the isel pass.
> >> Hmm, maybe it has been discussed before, so please forgive me if I write
> >> something silly, but it seems to me like this should be done in a generic
> >> way like match.pd so that all the targets do not have to track the changes
> >> made there and then perhaps repeat the same or similar code each. So I
> >> think it would make sense to make a change like this include that reverse
> >> transformation as well, so that ultimately both forms are tried with RTL,
> >> as there is no clear advantage to either here.
> > The idea we've kicked around in the past was to use the same syntax as
> > match.pd, but have it be target dependent to reform expressions in ways
> > that are beneficial to the target and have it run at the end of the
> > gimple/ssa pipeline. Nobody's implemented this though.
Hmm, but why does it have to be target dependent? For match.pd we do
things unconditionally, to have a uniform intermediate representation,
however here we wouldn't have to, as we can check the costs respectively
of the original and the transformed expression and choose the cheaper of
the two. Would that be so we don't waste cycles with targets we know
beforehand a given transformation won't buy anything?
In that case however no code quality regression would happen anyway, so I
think it would be more productive if we still had all transformations
defined in a generic manner and then possibly the hopeless ones excluded
by hand for targets listed. This way if anything is omitted by chance,
i.e. not excluded for a given target, then good code will still be
produced and only some compilation performance lost.
While if we require all port maintainers to qualify individual
transformations by hand as they are added by someone to their pet target,
we'll end up with a lot of duplicate effort and missed bits. Of course
some very exotic transformations that match unique target machine
instructions may indeed best be added to a single target only.
> Yes. And while a gimple-to-gimple transform is indeed quite simple
> to make eventually a match.pd-like gimple-to-RTL would be more
> useful in the end. Note RTL can eventually be emulated close enough
> via the use of internal functions mapping to optabs. But still
> complex combined instructions will need expander help unless we
> expose all named instruction RTL patterns as target specific
> internal functions to use from that .pd file.
Hmm, why aren't the standard named patterns we already have going to be
sufficient?
Maciej
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
2021-01-14 14:57 ` Maciej W. Rozycki
@ 2021-01-14 15:03 ` Richard Biener
0 siblings, 0 replies; 13+ messages in thread
From: Richard Biener @ 2021-01-14 15:03 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Jeff Law, Jakub Jelinek, gcc-patches
On Thu, 14 Jan 2021, Maciej W. Rozycki wrote:
> On Tue, 12 Jan 2021, Richard Biener wrote:
>
> > >>> The point of the match.pd changes is to canonicalize GIMPLE on some form
> > >>> when there are several from GIMPLE POV equivalent or better forms of
> > >>> writing
> > >>> the same thing. The advantage of having one canonical way is that ICF,
> > >>> SCCVN etc. optimizations can then understand the different forms are
> > >>> equivalent.
> > >> Fair enough, though in cases like this I think it is unclear which of the
> > >> two forms is going to be ultimately better, especially as it may depend on
> > >> the exact form of the operands used, e.g. values of any immediates, so I
> > >> think a way to make the reverse transformation (whether to undo one made
> > >> here or genuinely) needs to be available at a later compilation stage.
> > >> One size doesn't fit all.
> > >>
> > >> With this in mind...
> > > So in this case the number of operations are the same before/after and
> > > parallelism is the same before/after, register lifetimes, etc. I doubt
> > > either form is particularly better suited for CSE or gives better VRP
> > > data, etc. The fact that we can't always do ~(X +C) -> ~X + -C
> > > probably argues against that form ever so slightly.
>
> FWIW I agree with Jakub here, that having one canonical form for the
> middle end to operate on is advantageous. It is just that when we
> eventually get to the backend we may want to do the reverse transformation
> in some cases, which may be specific immediate operand values or whatever
> the backend may see fit.
>
> > >>> If another form is then better for a particular machine, it should be done
> > >>> either during expansion (being able to produce both RTLs and computing
> > >>> their
> > >>> costs), or during combine with either combine splitters or
> > >>> define_insn_and_split in the backend, or, if it can't be done in RTL,
> > >>> during
> > >>> the isel pass.
> > >> Hmm, maybe it has been discussed before, so please forgive me if I write
> > >> something silly, but it seems to me like this should be done in a generic
> > >> way like match.pd so that all the targets do not have to track the changes
> > >> made there and then perhaps repeat the same or similar code each. So I
> > >> think it would make sense to make a change like this include that reverse
> > >> transformation as well, so that ultimately both forms are tried with RTL,
> > >> as there is no clear advantage to either here.
> > > The idea we've kicked around in the past was to use the same syntax as
> > > match.pd, but have it be target dependent to reform expressions in ways
> > > that are beneficial to the target and have it run at the end of the
> > > gimple/ssa pipeline. Nobody's implemented this though.
>
> Hmm, but why does it have to be target dependent? For match.pd we do
> things unconditionally, to have a uniform intermediate representation,
> however here we wouldn't have to, as we can check the costs respectively
> of the original and the transformed expression and choose the cheaper of
> the two. Would that be so we don't waste cycles with targets we know
> beforehand a given transformation won't buy anything?
>
> In that case however no code quality regression would happen anyway, so I
> think it would be more productive if we still had all transformations
> defined in a generic manner and then possibly the hopeless ones excluded
> by hand for targets listed. This way if anything is omitted by chance,
> i.e. not excluded for a given target, then good code will still be
> produced and only some compilation performance lost.
>
> While if we require all port maintainers to qualify individual
> transformations by hand as they are added by someone to their pet target,
> we'll end up with a lot of duplicate effort and missed bits. Of course
> some very exotic transformations that match unique target machine
> instructions may indeed best be added to a single target only.
>
> > Yes. And while a gimple-to-gimple transform is indeed quite simple
> > to make eventually a match.pd-like gimple-to-RTL would be more
> > useful in the end. Note RTL can eventually be emulated close enough
> > via the use of internal functions mapping to optabs. But still
> > complex combined instructions will need expander help unless we
> > expose all named instruction RTL patterns as target specific
> > internal functions to use from that .pd file.
>
> Hmm, why aren't the standard named patterns we already have going to be
> sufficient?
Because two standard name pattern expansions can later be combined
to a non-standard define-insn which may have lower cost and we'd
of course like to see such instruction selection opportunities here.
Richard.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2021-01-14 15:03 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-12 8:10 [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685] Jakub Jelinek
2020-12-12 10:58 ` Richard Biener
2020-12-12 12:25 ` Marc Glisse
2020-12-12 12:41 ` Jakub Jelinek
2020-12-12 13:00 ` Marc Glisse
2020-12-12 13:05 ` Jakub Jelinek
2020-12-21 2:50 ` Maciej W. Rozycki
2020-12-21 7:54 ` Jakub Jelinek
2021-01-10 0:43 ` Maciej W. Rozycki
2021-01-11 19:22 ` Jeff Law
2021-01-12 8:42 ` Richard Biener
2021-01-14 14:57 ` Maciej W. Rozycki
2021-01-14 15:03 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).