[PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
@ 2020-12-12  8:10 Jakub Jelinek
  2020-12-12 10:58 ` Richard Biener
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Jakub Jelinek @ 2020-12-12  8:10 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

Hi!

This patch adds the ~(X - Y) -> ~X + Y simplification requested
in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
be safely negated.

The first two simplify blocks is what has been requested in the PR
and that makes the first testcase pass.
Unfortunately, that change also breaks the second testcase, because
while the same expressions appearing in the same stmt and split
across multiple stmts has been folded (not really) before, with
this optimization fold-const.c optimizes ~X + Y further into
(Y - X) - 1 in fold_binary_loc associate: code, but we have nothing
like that in GIMPLE and so end up with different expressions.

The last simplify is an attempt to deal with just this case,
had to rule out there the Y == -1U case, because then we
reached infinite recursion as ~X + -1U was canonicalized by
the pattern into (-1U - X) + -1U but there is a canonicalization
-1 - A -> ~A that turns it back.  Furthermore, had to make it
#if GIMPLE only, because it otherwise resulted in infinite recursion
when interacting with the associate: optimization.
The end result is that we pass all 3 testcases and thus canonizalize
the 3 possible forms of writing the same thing.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-12-12  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/96685
	* match.pd (~(X - Y) -> ~X + Y): New optimization.
	(~X + Y -> (Y - X) - 1): Likewise.

	* gcc.dg/tree-ssa/pr96685-1.c: New test.
	* gcc.dg/tree-ssa/pr96685-2.c: New test.
	* gcc.dg/tree-ssa/pr96685-3.c: New test.

--- gcc/match.pd.jj	2020-12-11 16:41:45.797787831 +0100
+++ gcc/match.pd	2020-12-11 23:12:45.769291586 +0100
@@ -1074,6 +1074,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (bit_not (plus:c (bit_not @0) @1))
  (minus @0 @1))
 
+/* ~(X - Y) -> ~X + Y.  */
+(simplify
+ (bit_not (minus:s @0 @1))
+ (plus (bit_not @0) @1))
+(simplify
+ (bit_not (plus:s @0 INTEGER_CST@1))
+ (if ((INTEGRAL_TYPE_P (type)
+       && TYPE_UNSIGNED (type))
+      || (!TYPE_OVERFLOW_SANITIZED (type)
+	  && may_negate_without_overflow_p (@1)))
+  (plus (bit_not @0) { const_unop (NEGATE_EXPR, type, @1); })))
+
+#if GIMPLE
+/* ~X + Y -> (Y - X) - 1.  */
+(simplify
+ (plus:c (bit_not @0) @1)
+  (if (ANY_INTEGRAL_TYPE_P (type)
+       && TYPE_OVERFLOW_WRAPS (type)
+       /* -1 - X is folded to ~X, so we'd recurse endlessly.  */
+       && !integer_all_onesp (@1))
+   (plus (minus @1 @0) { build_minus_one_cst (type); })
+   (if (INTEGRAL_TYPE_P (type)
+	&& TREE_CODE (@1) == INTEGER_CST
+	&& wi::to_wide (@1) != wi::min_value (TYPE_PRECISION (type),
+					      SIGNED))
+    (minus (plus @1 { build_minus_one_cst (type); }) @0))))
+#endif
+
 /* x + (x & 1) -> (x + 1) & ~1 */
 (simplify
  (plus:c @0 (bit_and:s @0 integer_onep@1))
--- gcc/testsuite/gcc.dg/tree-ssa/pr96685-1.c.jj	2020-12-11 16:42:03.975584838 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr96685-1.c	2020-12-11 16:42:03.975584838 +0100
@@ -0,0 +1,52 @@
+/* PR tree-optimization/96685 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "return 1;" 6 "optimized" } } */
+
+unsigned
+f1 (unsigned x, unsigned y)
+{
+  unsigned a = ~(x - y);
+  unsigned b = ~x + y;
+  return a == b;
+}
+
+unsigned
+f2 (unsigned x)
+{
+  unsigned a = ~(x + -124U);
+  unsigned b = ~x + 124U;
+  return a == b;
+}
+
+unsigned
+f3 (unsigned x)
+{
+  unsigned a = ~(x + 124U);
+  unsigned b = ~x + -124U;
+  return a == b;
+}
+
+int
+f4 (int x, int y)
+{
+  int a = ~(x - y);
+  int b = ~x + y;
+  return a == b;
+}
+
+int
+f5 (int x)
+{
+  int a = ~(x + -124);
+  int b = ~x + 124;
+  return a == b;
+}
+
+int
+f6 (int x)
+{
+  int a = ~(x + 124);
+  int b = ~x + -124;
+  return a == b;
+}
--- gcc/testsuite/gcc.dg/tree-ssa/pr96685-2.c.jj	2020-12-11 16:42:03.975584838 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr96685-2.c	2020-12-11 16:42:03.975584838 +0100
@@ -0,0 +1,40 @@
+/* PR tree-optimization/96685 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "return 1;" 4 "optimized" } } */
+
+int
+f1 (unsigned x, unsigned y)
+{
+  unsigned int r1 = (x - y);
+  r1 = ~r1;
+  unsigned int r2 = ~(x - y);
+  return r1 == r2;
+}
+
+int
+f2 (unsigned x, unsigned y)
+{
+  unsigned int r1 = (x - 23);
+  r1 = ~r1;
+  unsigned int r2 = ~(x - 23);
+  return r1 == r2;
+}
+
+int
+f3 (int x, int y)
+{
+  int r1 = (x - y);
+  r1 = ~r1;
+  int r2 = ~(x - y);
+  return r1 == r2;
+}
+
+int
+f4 (int x, int y)
+{
+  int r1 = (x - 23);
+  r1 = ~r1;
+  int r2 = ~(x - 23);
+  return r1 == r2;
+}
--- gcc/testsuite/gcc.dg/tree-ssa/pr96685-3.c.jj	2020-12-11 17:35:09.536123227 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr96685-3.c	2020-12-11 17:34:31.911543423 +0100
@@ -0,0 +1,43 @@
+/* PR tree-optimization/96685 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "return 1;" 4 "optimized" } } */
+
+int
+f1 (unsigned x, unsigned y)
+{
+  unsigned int r1 = (x - y);
+  r1 = ~r1;
+  unsigned int r2 = (y - x);
+  r2 = r2 - 1;
+  return r1 == r2;
+}
+
+int
+f2 (unsigned x, unsigned y)
+{
+  unsigned int r1 = (x - 23);
+  r1 = ~r1;
+  unsigned int r2 = (23 - x);
+  r2 = r2 - 1;
+  return r1 == r2;
+}
+
+int
+f3 (int x, int y)
+{
+  int r1 = (x - 23);
+  r1 = ~r1;
+  int r2 = (23 - x);
+  --r2;
+  return r1 == r2;
+}
+
+int
+f4 (int x, int y)
+{
+  int r1 = (x - 23);
+  r1 = ~r1;
+  int r2 = (22 - x);
+  return r1 == r2;
+}

	Jakub


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
  2020-12-12  8:10 [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685] Jakub Jelinek
@ 2020-12-12 10:58 ` Richard Biener
  2020-12-12 12:25 ` Marc Glisse
  2020-12-21  2:50 ` Maciej W. Rozycki
  2 siblings, 0 replies; 13+ messages in thread
From: Richard Biener @ 2020-12-12 10:58 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

On December 12, 2020 9:10:41 AM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote:
>Hi!
>
>This patch adds the ~(X - Y) -> ~X + Y simplification requested
>in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
>be safely negated.
>
>The first two simplify blocks is what has been requested in the PR
>and that makes the first testcase pass.
>Unfortunately, that change also breaks the second testcase, because
>while the same expressions appearing in the same stmt and split
>across multiple stmts has been folded (not really) before, with
>this optimization fold-const.c optimizes ~X + Y further into
>(Y - X) - 1 in fold_binary_loc associate: code, but we have nothing
>like that in GIMPLE and so end up with different expressions.
>
>The last simplify is an attempt to deal with just this case,
>had to rule out there the Y == -1U case, because then we
>reached infinite recursion as ~X + -1U was canonicalized by
>the pattern into (-1U - X) + -1U but there is a canonicalization
>-1 - A -> ~A that turns it back.  Furthermore, had to make it
>#if GIMPLE only, because it otherwise resulted in infinite recursion
>when interacting with the associate: optimization.
>The end result is that we pass all 3 testcases and thus canonizalize
>the 3 possible forms of writing the same thing.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok. 

Thanks, 
Richard. 

>2020-12-12  Jakub Jelinek  <jakub@redhat.com>
>
>	PR tree-optimization/96685
>	* match.pd (~(X - Y) -> ~X + Y): New optimization.
>	(~X + Y -> (Y - X) - 1): Likewise.
>
>	* gcc.dg/tree-ssa/pr96685-1.c: New test.
>	* gcc.dg/tree-ssa/pr96685-2.c: New test.
>	* gcc.dg/tree-ssa/pr96685-3.c: New test.
>
>--- gcc/match.pd.jj	2020-12-11 16:41:45.797787831 +0100
>+++ gcc/match.pd	2020-12-11 23:12:45.769291586 +0100
>@@ -1074,6 +1074,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (bit_not (plus:c (bit_not @0) @1))
>  (minus @0 @1))
> 
>+/* ~(X - Y) -> ~X + Y.  */
>+(simplify
>+ (bit_not (minus:s @0 @1))
>+ (plus (bit_not @0) @1))
>+(simplify
>+ (bit_not (plus:s @0 INTEGER_CST@1))
>+ (if ((INTEGRAL_TYPE_P (type)
>+       && TYPE_UNSIGNED (type))
>+      || (!TYPE_OVERFLOW_SANITIZED (type)
>+	  && may_negate_without_overflow_p (@1)))
>+  (plus (bit_not @0) { const_unop (NEGATE_EXPR, type, @1); })))
>+
>+#if GIMPLE
>+/* ~X + Y -> (Y - X) - 1.  */
>+(simplify
>+ (plus:c (bit_not @0) @1)
>+  (if (ANY_INTEGRAL_TYPE_P (type)
>+       && TYPE_OVERFLOW_WRAPS (type)
>+       /* -1 - X is folded to ~X, so we'd recurse endlessly.  */
>+       && !integer_all_onesp (@1))
>+   (plus (minus @1 @0) { build_minus_one_cst (type); })
>+   (if (INTEGRAL_TYPE_P (type)
>+	&& TREE_CODE (@1) == INTEGER_CST
>+	&& wi::to_wide (@1) != wi::min_value (TYPE_PRECISION (type),
>+					      SIGNED))
>+    (minus (plus @1 { build_minus_one_cst (type); }) @0))))
>+#endif
>+
> /* x + (x & 1) -> (x + 1) & ~1 */
> (simplify
>  (plus:c @0 (bit_and:s @0 integer_onep@1))
>--- gcc/testsuite/gcc.dg/tree-ssa/pr96685-1.c.jj	2020-12-11
>16:42:03.975584838 +0100
>+++ gcc/testsuite/gcc.dg/tree-ssa/pr96685-1.c	2020-12-11
>16:42:03.975584838 +0100
>@@ -0,0 +1,52 @@
>+/* PR tree-optimization/96685 */
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -fdump-tree-optimized" } */
>+/* { dg-final { scan-tree-dump-times "return 1;" 6 "optimized" } } */
>+
>+unsigned
>+f1 (unsigned x, unsigned y)
>+{
>+  unsigned a = ~(x - y);
>+  unsigned b = ~x + y;
>+  return a == b;
>+}
>+
>+unsigned
>+f2 (unsigned x)
>+{
>+  unsigned a = ~(x + -124U);
>+  unsigned b = ~x + 124U;
>+  return a == b;
>+}
>+
>+unsigned
>+f3 (unsigned x)
>+{
>+  unsigned a = ~(x + 124U);
>+  unsigned b = ~x + -124U;
>+  return a == b;
>+}
>+
>+int
>+f4 (int x, int y)
>+{
>+  int a = ~(x - y);
>+  int b = ~x + y;
>+  return a == b;
>+}
>+
>+int
>+f5 (int x)
>+{
>+  int a = ~(x + -124);
>+  int b = ~x + 124;
>+  return a == b;
>+}
>+
>+int
>+f6 (int x)
>+{
>+  int a = ~(x + 124);
>+  int b = ~x + -124;
>+  return a == b;
>+}
>--- gcc/testsuite/gcc.dg/tree-ssa/pr96685-2.c.jj	2020-12-11
>16:42:03.975584838 +0100
>+++ gcc/testsuite/gcc.dg/tree-ssa/pr96685-2.c	2020-12-11
>16:42:03.975584838 +0100
>@@ -0,0 +1,40 @@
>+/* PR tree-optimization/96685 */
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -fdump-tree-optimized" } */
>+/* { dg-final { scan-tree-dump-times "return 1;" 4 "optimized" } } */
>+
>+int
>+f1 (unsigned x, unsigned y)
>+{
>+  unsigned int r1 = (x - y);
>+  r1 = ~r1;
>+  unsigned int r2 = ~(x - y);
>+  return r1 == r2;
>+}
>+
>+int
>+f2 (unsigned x, unsigned y)
>+{
>+  unsigned int r1 = (x - 23);
>+  r1 = ~r1;
>+  unsigned int r2 = ~(x - 23);
>+  return r1 == r2;
>+}
>+
>+int
>+f3 (int x, int y)
>+{
>+  int r1 = (x - y);
>+  r1 = ~r1;
>+  int r2 = ~(x - y);
>+  return r1 == r2;
>+}
>+
>+int
>+f4 (int x, int y)
>+{
>+  int r1 = (x - 23);
>+  r1 = ~r1;
>+  int r2 = ~(x - 23);
>+  return r1 == r2;
>+}
>--- gcc/testsuite/gcc.dg/tree-ssa/pr96685-3.c.jj	2020-12-11
>17:35:09.536123227 +0100
>+++ gcc/testsuite/gcc.dg/tree-ssa/pr96685-3.c	2020-12-11
>17:34:31.911543423 +0100
>@@ -0,0 +1,43 @@
>+/* PR tree-optimization/96685 */
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -fdump-tree-optimized" } */
>+/* { dg-final { scan-tree-dump-times "return 1;" 4 "optimized" } } */
>+
>+int
>+f1 (unsigned x, unsigned y)
>+{
>+  unsigned int r1 = (x - y);
>+  r1 = ~r1;
>+  unsigned int r2 = (y - x);
>+  r2 = r2 - 1;
>+  return r1 == r2;
>+}
>+
>+int
>+f2 (unsigned x, unsigned y)
>+{
>+  unsigned int r1 = (x - 23);
>+  r1 = ~r1;
>+  unsigned int r2 = (23 - x);
>+  r2 = r2 - 1;
>+  return r1 == r2;
>+}
>+
>+int
>+f3 (int x, int y)
>+{
>+  int r1 = (x - 23);
>+  r1 = ~r1;
>+  int r2 = (23 - x);
>+  --r2;
>+  return r1 == r2;
>+}
>+
>+int
>+f4 (int x, int y)
>+{
>+  int r1 = (x - 23);
>+  r1 = ~r1;
>+  int r2 = (22 - x);
>+  return r1 == r2;
>+}
>
>	Jakub


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
  2020-12-12  8:10 [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685] Jakub Jelinek
  2020-12-12 10:58 ` Richard Biener
@ 2020-12-12 12:25 ` Marc Glisse
  2020-12-12 12:41   ` Jakub Jelinek
  2020-12-21  2:50 ` Maciej W. Rozycki
  2 siblings, 1 reply; 13+ messages in thread
From: Marc Glisse @ 2020-12-12 12:25 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, gcc-patches

On Sat, 12 Dec 2020, Jakub Jelinek via Gcc-patches wrote:

> This patch adds the ~(X - Y) -> ~X + Y simplification requested
> in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
> be safely negated.

Would it have been wrong to produce ~X - C without caring about negating 
(and then extending it to non-constants)?

I wonder if this makes
/* ~(~X - Y) -> X + Y and ~(~X + Y) -> X - Y.  */
useless.

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
  2020-12-12 12:25 ` Marc Glisse
@ 2020-12-12 12:41   ` Jakub Jelinek
  2020-12-12 13:00     ` Marc Glisse
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Jelinek @ 2020-12-12 12:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Biener

On Sat, Dec 12, 2020 at 01:25:39PM +0100, Marc Glisse wrote:
> On Sat, 12 Dec 2020, Jakub Jelinek via Gcc-patches wrote:
> 
> > This patch adds the ~(X - Y) -> ~X + Y simplification requested
> > in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
> > be safely negated.
> 
> Would it have been wrong to produce ~X - C without caring about negating
> (and then extending it to non-constants)?

Extending it to non-constants is what I wanted to avoid.
For ~(X + Y), because + is commutative, it wouldn't be a canonicalization
as it would pick more-less randomly whether to do ~X + Y or X + ~Y.

> I wonder if this makes
> /* ~(~X - Y) -> X + Y and ~(~X + Y) -> X - Y.  */
> useless.

Maybe the former, but not the latter.

	Jakub


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
  2020-12-12 12:41   ` Jakub Jelinek
@ 2020-12-12 13:00     ` Marc Glisse
  2020-12-12 13:05       ` Jakub Jelinek
  0 siblings, 1 reply; 13+ messages in thread
From: Marc Glisse @ 2020-12-12 13:00 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Richard Biener

On Sat, 12 Dec 2020, Jakub Jelinek via Gcc-patches wrote:

> On Sat, Dec 12, 2020 at 01:25:39PM +0100, Marc Glisse wrote:
>> On Sat, 12 Dec 2020, Jakub Jelinek via Gcc-patches wrote:
>>
>>> This patch adds the ~(X - Y) -> ~X + Y simplification requested
>>> in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
>>> be safely negated.
>>
>> Would it have been wrong to produce ~X - C without caring about negating
>> (and then extending it to non-constants)?
>
> Extending it to non-constants is what I wanted to avoid.
> For ~(X + Y), because + is commutative, it wouldn't be a canonicalization
> as it would pick more-less randomly whether to do ~X + Y or X + ~Y.

~X - Y or ~Y - X I guess.

Ok, I understand. But then in the constant case, why produce ~X + -C 
instead of ~X - C (which I think doesn't need to care about negating), or 
even ~C - X (one less operation)? Or do we already have a transformation 
from ~X - C to ~C - X?

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
  2020-12-12 13:00     ` Marc Glisse
@ 2020-12-12 13:05       ` Jakub Jelinek
  0 siblings, 0 replies; 13+ messages in thread
From: Jakub Jelinek @ 2020-12-12 13:05 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Biener

On Sat, Dec 12, 2020 at 02:00:49PM +0100, Marc Glisse wrote:
> > Extending it to non-constants is what I wanted to avoid.
> > For ~(X + Y), because + is commutative, it wouldn't be a canonicalization
> > as it would pick more-less randomly whether to do ~X + Y or X + ~Y.
> 
> ~X - Y or ~Y - X I guess.
> 
> Ok, I understand. But then in the constant case, why produce ~X + -C instead
> of ~X - C (which I think doesn't need to care about negating),

Because we immediately simplify it again into ~X + (-C) then.

	Jakub


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
  2020-12-12  8:10 [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685] Jakub Jelinek
  2020-12-12 10:58 ` Richard Biener
  2020-12-12 12:25 ` Marc Glisse
@ 2020-12-21  2:50 ` Maciej W. Rozycki
  2020-12-21  7:54   ` Jakub Jelinek
  2 siblings, 1 reply; 13+ messages in thread
From: Maciej W. Rozycki @ 2020-12-21  2:50 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, gcc-patches

On Sat, 12 Dec 2020, Jakub Jelinek via Gcc-patches wrote:

> This patch adds the ~(X - Y) -> ~X + Y simplification requested
> in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
> be safely negated.

 This regresses VAX code produced by the cmpelim-eq-notsi.c test case (and 
its similar counterparts) with the `vax-netbsdelf' target.

 Previously this assembly:

	.text
	.align 1
.globl eq_notsi
	.type	eq_notsi, @function
eq_notsi:
	.word 0	# 35	[c=0]  procedure_entry_mask
	subl2 $4,%sp	# 46	[c=32]  *addsi3
	mcoml 4(%ap),%r0	# 32	[c=16]  *one_cmplsi2_ccz
	jeql .L1		# 34	[c=26]  *branch_ccz
	addl2 $2,%r0	# 31	[c=32]  *addsi3
.L1:
	ret		# 40	[c=0]  return
	.size	eq_notsi, .-eq_notsi

was produced.  Now this:

	.text
	.align 1
.globl eq_notsi
	.type	eq_notsi, @function
eq_notsi:
	.word 0	# 36	[c=0]  procedure_entry_mask
	subl2 $4,%sp	# 48	[c=32]  *addsi3
	movl 4(%ap),%r0	# 33	[c=16]  *movsi_2
	cmpl %r0,$-1	# 34	[c=8]  *cmpsi_ccz/1
	jeql .L3		# 35	[c=26]  *branch_ccz
	subl3 %r0,$1,%r0	# 32	[c=32]  *subsi3/1
	ret		# 27	[c=0]  return
.L3:
	clrl %r0		# 31	[c=2]  *movsi_2
	ret		# 41	[c=0]  return
	.size	eq_notsi, .-eq_notsi

is, which is clearly worse, both in terms of performance and size.

 The key here is that the cost of constant 0, here used with a comparison 
operation eliminated after MCOML in the former assembly sequence, is lower 
(as per `vax_rtx_costs') in the VAX ISA than the cost of constant -1, used 
with CMPL in the latter sequence.  Not only constant 0 is an implied 
operand with some machine instructions saving cycles and space otherwise 
used for an explicitly encoded operand, but if used with a comparison 
operation it can usually be eliminated, so it should be preferred over all 
other constants.

 With the example you gave with the PR I can see progression with f3, f4, 
f7, f8, regression with f1, f2, and no change in operation cost with f5, 
f6.

 Shouldn't a transformation like this respect target-specific expression 
costs somehow then?  Depending on the individual case one form or the 
other might be cheaper, and somehow we assume here both are equivalent in 
terms of performance and/or code size (as applicable for the optimisation 
mode chosen).

  Maciej

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
  2020-12-21  2:50 ` Maciej W. Rozycki
@ 2020-12-21  7:54   ` Jakub Jelinek
  2021-01-10  0:43     ` Maciej W. Rozycki
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Jelinek @ 2020-12-21  7:54 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: gcc-patches, Richard Biener

On Mon, Dec 21, 2020 at 02:50:24AM +0000, Maciej W. Rozycki wrote:
> On Sat, 12 Dec 2020, Jakub Jelinek via Gcc-patches wrote:
> 
> > This patch adds the ~(X - Y) -> ~X + Y simplification requested
> > in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
> > be safely negated.
> 
>  This regresses VAX code produced by the cmpelim-eq-notsi.c test case (and 
> its similar counterparts) with the `vax-netbsdelf' target.

The point of the match.pd changes is to canonicalize GIMPLE on some form
when there are several from GIMPLE POV equivalent or better forms of writing
the same thing.  The advantage of having one canonical way is that ICF,
SCCVN etc. optimizations can then understand the different forms are
equivalent.

If another form is then better for a particular machine, it should be done
either during expansion (being able to produce both RTLs and computing their
costs), or during combine with either combine splitters or
define_insn_and_split in the backend, or, if it can't be done in RTL, during
the isel pass.

	Jakub

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
  2020-12-21  7:54   ` Jakub Jelinek
@ 2021-01-10  0:43     ` Maciej W. Rozycki
  2021-01-11 19:22       ` Jeff Law
  0 siblings, 1 reply; 13+ messages in thread
From: Maciej W. Rozycki @ 2021-01-10  0:43 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Richard Biener, Jeff Law

On Mon, 21 Dec 2020, Jakub Jelinek wrote:

> > > This patch adds the ~(X - Y) -> ~X + Y simplification requested
> > > in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
> > > be safely negated.
> > 
> >  This regresses VAX code produced by the cmpelim-eq-notsi.c test case (and 
> > its similar counterparts) with the `vax-netbsdelf' target.
> 
> The point of the match.pd changes is to canonicalize GIMPLE on some form
> when there are several from GIMPLE POV equivalent or better forms of writing
> the same thing.  The advantage of having one canonical way is that ICF,
> SCCVN etc. optimizations can then understand the different forms are
> equivalent.

 Fair enough, though in cases like this I think it is unclear which of the 
two forms is going to be ultimately better, especially as it may depend on 
the exact form of the operands used, e.g. values of any immediates, so I 
think a way to make the reverse transformation (whether to undo one made 
here or genuinely) needs to be available at a later compilation stage.  
One size doesn't fit all.

 With this in mind...

> If another form is then better for a particular machine, it should be done
> either during expansion (being able to produce both RTLs and computing their
> costs), or during combine with either combine splitters or
> define_insn_and_split in the backend, or, if it can't be done in RTL, during
> the isel pass.

 Hmm, maybe it has been discussed before, so please forgive me if I write 
something silly, but it seems to me like this should be done in a generic 
way like match.pd so that all the targets do not have to track the changes 
made there and then perhaps repeat the same or similar code each.  So I 
think it would make sense to make a change like this include that reverse 
transformation as well, so that ultimately both forms are tried with RTL, 
as there is no clear advantage to either here.

 However expand AFAICT essentially boils down to `expand_expr_real_2', 
which uses a bijective mapping between individual operations in the tree 
and the RTL form respectively rather than sequences as with the unary one 
complement and an addition/subtraction covered here, for two operations 
total.  So this cannot be done with expand, by design, unless I'm missing 
something.

 So from the look at combine code I infer the way to handle this within 
the existing infrastructure would be with `try_combine'/`find_split_point' 
(as we have one sequence of insns to be replaced with another sequence 
rather than combined into a single insn), right?

 Overall it seems to me it would be the best if we were able to produce a 
reverse RTL transformation automatically straight from match.pd, however I 
am not sure whether the syntax used is suitable for reverse interpretation 
in the general case.  And in any case this would be a major effort, but it 
would be hugely beneficial, as we have lots of knowledge already collected 
in match.pd.

 Have I missed anything here?

 NB I've seen it often enough already to irritate me that someone claims 
we ought to do something "because LLVM does it".  Well, if someone wants 
things done the LLVM way, then there's LLVM readily available.  If there 
is a technical advantage in doing something, then surely we ought to do 
that, however whether LLVM does it too or does not is I think irrelevant.

  Maciej

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
  2021-01-10  0:43     ` Maciej W. Rozycki
@ 2021-01-11 19:22       ` Jeff Law
  2021-01-12  8:42         ` Richard Biener
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Law @ 2021-01-11 19:22 UTC (permalink / raw)
  To: Maciej W. Rozycki, Jakub Jelinek; +Cc: gcc-patches, Richard Biener



On 1/9/21 5:43 PM, Maciej W. Rozycki wrote:
> On Mon, 21 Dec 2020, Jakub Jelinek wrote:
>
>>>> This patch adds the ~(X - Y) -> ~X + Y simplification requested
>>>> in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
>>>> be safely negated.
>>>  This regresses VAX code produced by the cmpelim-eq-notsi.c test case (and 
>>> its similar counterparts) with the `vax-netbsdelf' target.
>> The point of the match.pd changes is to canonicalize GIMPLE on some form
>> when there are several from GIMPLE POV equivalent or better forms of writing
>> the same thing.  The advantage of having one canonical way is that ICF,
>> SCCVN etc. optimizations can then understand the different forms are
>> equivalent.
>  Fair enough, though in cases like this I think it is unclear which of the 
> two forms is going to be ultimately better, especially as it may depend on 
> the exact form of the operands used, e.g. values of any immediates, so I 
> think a way to make the reverse transformation (whether to undo one made 
> here or genuinely) needs to be available at a later compilation stage.  
> One size doesn't fit all.
>
>  With this in mind...
So in this case the number of operations are the same before/after and
parallelism is the same before/after, register lifetimes, etc.   I doubt
either form is particularly better suited for CSE or gives better VRP
data, etc.   The fact that we can't always do ~(X +C) -> ~X + -C
probably argues against that form ever so slightly.


>
>> If another form is then better for a particular machine, it should be done
>> either during expansion (being able to produce both RTLs and computing their
>> costs), or during combine with either combine splitters or
>> define_insn_and_split in the backend, or, if it can't be done in RTL, during
>> the isel pass.
>  Hmm, maybe it has been discussed before, so please forgive me if I write 
> something silly, but it seems to me like this should be done in a generic 
> way like match.pd so that all the targets do not have to track the changes 
> made there and then perhaps repeat the same or similar code each.  So I 
> think it would make sense to make a change like this include that reverse 
> transformation as well, so that ultimately both forms are tried with RTL, 
> as there is no clear advantage to either here.
The idea we've kicked around in the past was to use the same syntax as
match.pd, but have it be target dependent to reform expressions in ways
that are beneficial to the target and have it run at the end of the
gimple/ssa pipeline.  Nobody's implemented this though.

jeff


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
  2021-01-11 19:22       ` Jeff Law
@ 2021-01-12  8:42         ` Richard Biener
  2021-01-14 14:57           ` Maciej W. Rozycki
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Biener @ 2021-01-12  8:42 UTC (permalink / raw)
  To: Jeff Law; +Cc: Maciej W. Rozycki, Jakub Jelinek, gcc-patches



On Mon, 11 Jan 2021, Jeff Law wrote:

>
>
> On 1/9/21 5:43 PM, Maciej W. Rozycki wrote:
>> On Mon, 21 Dec 2020, Jakub Jelinek wrote:
>>
>>>>> This patch adds the ~(X - Y) -> ~X + Y simplification requested
>>>>> in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
>>>>> be safely negated.
>>>>  This regresses VAX code produced by the cmpelim-eq-notsi.c test case (and
>>>> its similar counterparts) with the `vax-netbsdelf' target.
>>> The point of the match.pd changes is to canonicalize GIMPLE on some form
>>> when there are several from GIMPLE POV equivalent or better forms of writing
>>> the same thing.  The advantage of having one canonical way is that ICF,
>>> SCCVN etc. optimizations can then understand the different forms are
>>> equivalent.
>>  Fair enough, though in cases like this I think it is unclear which of the
>> two forms is going to be ultimately better, especially as it may depend on
>> the exact form of the operands used, e.g. values of any immediates, so I
>> think a way to make the reverse transformation (whether to undo one made
>> here or genuinely) needs to be available at a later compilation stage.
>> One size doesn't fit all.
>>
>>  With this in mind...
> So in this case the number of operations are the same before/after and
> parallelism is the same before/after, register lifetimes, etc.   I doubt
> either form is particularly better suited for CSE or gives better VRP
> data, etc.   The fact that we can't always do ~(X +C) -> ~X + -C
> probably argues against that form ever so slightly.
>
>
>>
>>> If another form is then better for a particular machine, it should be done
>>> either during expansion (being able to produce both RTLs and computing their
>>> costs), or during combine with either combine splitters or
>>> define_insn_and_split in the backend, or, if it can't be done in RTL, during
>>> the isel pass.
>>  Hmm, maybe it has been discussed before, so please forgive me if I write
>> something silly, but it seems to me like this should be done in a generic
>> way like match.pd so that all the targets do not have to track the changes
>> made there and then perhaps repeat the same or similar code each.  So I
>> think it would make sense to make a change like this include that reverse
>> transformation as well, so that ultimately both forms are tried with RTL,
>> as there is no clear advantage to either here.
> The idea we've kicked around in the past was to use the same syntax as
> match.pd, but have it be target dependent to reform expressions in ways
> that are beneficial to the target and have it run at the end of the
> gimple/ssa pipeline.  Nobody's implemented this though.

Yes.  And while a gimple-to-gimple transform is indeed quite simple
to make eventually a match.pd-like gimple-to-RTL would be more
useful in the end.  Note RTL can eventually be emulated close enough
via the use of internal functions mapping to optabs.  But still
complex combined instructions will need expander help unless we
expose all named instruction RTL patterns as target specific
internal functions to use from that .pd file.

Richard.

> jeff
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
  2021-01-12  8:42         ` Richard Biener
@ 2021-01-14 14:57           ` Maciej W. Rozycki
  2021-01-14 15:03             ` Richard Biener
  0 siblings, 1 reply; 13+ messages in thread
From: Maciej W. Rozycki @ 2021-01-14 14:57 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, Jakub Jelinek, gcc-patches

On Tue, 12 Jan 2021, Richard Biener wrote:

> >>> The point of the match.pd changes is to canonicalize GIMPLE on some form
> >>> when there are several from GIMPLE POV equivalent or better forms of
> >>> writing
> >>> the same thing.  The advantage of having one canonical way is that ICF,
> >>> SCCVN etc. optimizations can then understand the different forms are
> >>> equivalent.
> >> Fair enough, though in cases like this I think it is unclear which of the
> >> two forms is going to be ultimately better, especially as it may depend on
> >> the exact form of the operands used, e.g. values of any immediates, so I
> >> think a way to make the reverse transformation (whether to undo one made
> >> here or genuinely) needs to be available at a later compilation stage.
> >> One size doesn't fit all.
> >>
> >>  With this in mind...
> > So in this case the number of operations are the same before/after and
> > parallelism is the same before/after, register lifetimes, etc.   I doubt
> > either form is particularly better suited for CSE or gives better VRP
> > data, etc.   The fact that we can't always do ~(X +C) -> ~X + -C
> > probably argues against that form ever so slightly.

 FWIW I agree with Jakub here, that having one canonical form for the 
middle end to operate on is advantageous.  It is just that when we 
eventually get to the backend we may want to do the reverse transformation 
in some cases, which may be specific immediate operand values or whatever 
the backend may see fit.

> >>> If another form is then better for a particular machine, it should be done
> >>> either during expansion (being able to produce both RTLs and computing
> >>> their
> >>> costs), or during combine with either combine splitters or
> >>> define_insn_and_split in the backend, or, if it can't be done in RTL,
> >>> during
> >>> the isel pass.
> >> Hmm, maybe it has been discussed before, so please forgive me if I write
> >> something silly, but it seems to me like this should be done in a generic
> >> way like match.pd so that all the targets do not have to track the changes
> >> made there and then perhaps repeat the same or similar code each.  So I
> >> think it would make sense to make a change like this include that reverse
> >> transformation as well, so that ultimately both forms are tried with RTL,
> >> as there is no clear advantage to either here.
> > The idea we've kicked around in the past was to use the same syntax as
> > match.pd, but have it be target dependent to reform expressions in ways
> > that are beneficial to the target and have it run at the end of the
> > gimple/ssa pipeline.  Nobody's implemented this though.

 Hmm, but why does it have to be target dependent?  For match.pd we do 
things unconditionally, to have a uniform intermediate representation, 
however here we wouldn't have to, as we can check the costs respectively 
of the original and the transformed expression and choose the cheaper of 
the two.  Would that be so we don't waste cycles with targets we know 
beforehand a given transformation won't buy anything?

 In that case however no code quality regression would happen anyway, so I 
think it would be more productive if we still had all transformations 
defined in a generic manner and then possibly the hopeless ones excluded 
by hand for targets listed.  This way if anything is omitted by chance, 
i.e. not excluded for a given target, then good code will still be 
produced and only some compilation performance lost.

 While if we require all port maintainers to qualify individual 
transformations by hand as they are added by someone to their pet target, 
we'll end up with a lot of duplicate effort and missed bits.  Of course 
some very exotic transformations that match unique target machine 
instructions may indeed best be added to a single target only.

> Yes.  And while a gimple-to-gimple transform is indeed quite simple
> to make eventually a match.pd-like gimple-to-RTL would be more
> useful in the end.  Note RTL can eventually be emulated close enough
> via the use of internal functions mapping to optabs.  But still
> complex combined instructions will need expander help unless we
> expose all named instruction RTL patterns as target specific
> internal functions to use from that .pd file.

 Hmm, why aren't the standard named patterns we already have going to be 
sufficient?

  Maciej

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
  2021-01-14 14:57           ` Maciej W. Rozycki
@ 2021-01-14 15:03             ` Richard Biener
  0 siblings, 0 replies; 13+ messages in thread
From: Richard Biener @ 2021-01-14 15:03 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Jeff Law, Jakub Jelinek, gcc-patches

On Thu, 14 Jan 2021, Maciej W. Rozycki wrote:

> On Tue, 12 Jan 2021, Richard Biener wrote:
> 
> > >>> The point of the match.pd changes is to canonicalize GIMPLE on some form
> > >>> when there are several from GIMPLE POV equivalent or better forms of
> > >>> writing
> > >>> the same thing.  The advantage of having one canonical way is that ICF,
> > >>> SCCVN etc. optimizations can then understand the different forms are
> > >>> equivalent.
> > >> Fair enough, though in cases like this I think it is unclear which of the
> > >> two forms is going to be ultimately better, especially as it may depend on
> > >> the exact form of the operands used, e.g. values of any immediates, so I
> > >> think a way to make the reverse transformation (whether to undo one made
> > >> here or genuinely) needs to be available at a later compilation stage.
> > >> One size doesn't fit all.
> > >>
> > >>  With this in mind...
> > > So in this case the number of operations are the same before/after and
> > > parallelism is the same before/after, register lifetimes, etc.   I doubt
> > > either form is particularly better suited for CSE or gives better VRP
> > > data, etc.   The fact that we can't always do ~(X +C) -> ~X + -C
> > > probably argues against that form ever so slightly.
> 
>  FWIW I agree with Jakub here, that having one canonical form for the 
> middle end to operate on is advantageous.  It is just that when we 
> eventually get to the backend we may want to do the reverse transformation 
> in some cases, which may be specific immediate operand values or whatever 
> the backend may see fit.
> 
> > >>> If another form is then better for a particular machine, it should be done
> > >>> either during expansion (being able to produce both RTLs and computing
> > >>> their
> > >>> costs), or during combine with either combine splitters or
> > >>> define_insn_and_split in the backend, or, if it can't be done in RTL,
> > >>> during
> > >>> the isel pass.
> > >> Hmm, maybe it has been discussed before, so please forgive me if I write
> > >> something silly, but it seems to me like this should be done in a generic
> > >> way like match.pd so that all the targets do not have to track the changes
> > >> made there and then perhaps repeat the same or similar code each.  So I
> > >> think it would make sense to make a change like this include that reverse
> > >> transformation as well, so that ultimately both forms are tried with RTL,
> > >> as there is no clear advantage to either here.
> > > The idea we've kicked around in the past was to use the same syntax as
> > > match.pd, but have it be target dependent to reform expressions in ways
> > > that are beneficial to the target and have it run at the end of the
> > > gimple/ssa pipeline.  Nobody's implemented this though.
> 
>  Hmm, but why does it have to be target dependent?  For match.pd we do 
> things unconditionally, to have a uniform intermediate representation, 
> however here we wouldn't have to, as we can check the costs respectively 
> of the original and the transformed expression and choose the cheaper of 
> the two.  Would that be so we don't waste cycles with targets we know 
> beforehand a given transformation won't buy anything?
> 
>  In that case however no code quality regression would happen anyway, so I 
> think it would be more productive if we still had all transformations 
> defined in a generic manner and then possibly the hopeless ones excluded 
> by hand for targets listed.  This way if anything is omitted by chance, 
> i.e. not excluded for a given target, then good code will still be 
> produced and only some compilation performance lost.
> 
>  While if we require all port maintainers to qualify individual 
> transformations by hand as they are added by someone to their pet target, 
> we'll end up with a lot of duplicate effort and missed bits.  Of course 
> some very exotic transformations that match unique target machine 
> instructions may indeed best be added to a single target only.
> 
> > Yes.  And while a gimple-to-gimple transform is indeed quite simple
> > to make eventually a match.pd-like gimple-to-RTL would be more
> > useful in the end.  Note RTL can eventually be emulated close enough
> > via the use of internal functions mapping to optabs.  But still
> > complex combined instructions will need expander help unless we
> > expose all named instruction RTL patterns as target specific
> > internal functions to use from that .pd file.
> 
>  Hmm, why aren't the standard named patterns we already have going to be 
> sufficient?

Because two standard name pattern expansions can later be combined
to a non-standard define-insn which may have lower cost and we'd
of course like to see such instruction selection opportunities here.

Richard.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-01-14 15:03 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-12  8:10 [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685] Jakub Jelinek
2020-12-12 10:58 ` Richard Biener
2020-12-12 12:25 ` Marc Glisse
2020-12-12 12:41   ` Jakub Jelinek
2020-12-12 13:00     ` Marc Glisse
2020-12-12 13:05       ` Jakub Jelinek
2020-12-21  2:50 ` Maciej W. Rozycki
2020-12-21  7:54   ` Jakub Jelinek
2021-01-10  0:43     ` Maciej W. Rozycki
2021-01-11 19:22       ` Jeff Law
2021-01-12  8:42         ` Richard Biener
2021-01-14 14:57           ` Maciej W. Rozycki
2021-01-14 15:03             ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).