public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
@ 2021-03-31 23:02 Victor Tong
  2021-04-27  8:29 ` Richard Biener
  0 siblings, 1 reply; 15+ messages in thread
From: Victor Tong @ 2021-03-31 23:02 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1568 bytes --]

Hello,

This patch fixes PR tree-optimization/95176. A new pattern in match.pd was added to transform "a * (b / a)" --> "b - (b % a)". A new test case was also added to cover this scenario.

The new pattern interfered with the existing pattern of "X - (X / Y) * Y". In some cases (such as in fn4() in gcc/testsuite/gcc.dg/fold-minus-6.c), the new pattern is applied causing the existing pattern to no longer apply. This results in worse code generation because the expression is left as "X - (X - Y)". An additional subtraction pattern of "X - (X - Y) --> Y" was added to this patch to avoid this regression.

I also didn't remove the existing pattern because it triggered in more cases than the new pattern because of a tree_invariant_p check that's inserted by genmatch for the new pattern.

I verified that all "make -k check" tests pass when targeting x86_64-pc-linux-gnu.

2021-03-31  Victor Tong  <vitong@microsoft.com> 

gcc/ChangeLog:

	* match.pd: Two new patterns: One to optimize division followed by multiply and the other to avoid a regression as explained above

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/20030807-10.c: Update existing test to look for a subtraction because a shift is no longer emitted
	* gcc.dg/pr95176.c: New test to cover optimizing division followed by multiply

I don't have write access to the GCC repo but I've completed the FSF paperwork as I plan to make more contributions in the future. I'm looking for a sponsorship from an existing GCC maintainer before applying for write access.

Thanks,
Victor

[-- Attachment #2: pr95176.patch --]
[-- Type: application/octet-stream, Size: 3025 bytes --]

diff --git a/gcc/match.pd b/gcc/match.pd
index 036f92fa959..ef9c6719a0a 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -599,6 +599,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
   (convert (trunc_mod @0 @1))))
 
+/* X * (Y / X) is the same as Y - (Y % X).  */
+(simplify
+ (mult:c (convert1? @0) (convert2? (trunc_div @1 @@0)))
+ (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
+  (minus (convert @1) (convert (trunc_mod @1 @0)))))
+
+/* X - (X - Y) --> Y */
+(simplify
+ (minus (convert1? @0) (convert2? (minus @@0 @1)))
+ (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) && TYPE_OVERFLOW_UNDEFINED(type))
+  (convert @1)))
+
 /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,
    i.e. "X % C" into "X & (C - 1)", if X and C are positive.
    Also optimize A % (C << N)  where C is a power of 2,
diff --git a/gcc/testsuite/gcc.dg/pr95176.c b/gcc/testsuite/gcc.dg/pr95176.c
new file mode 100644
index 00000000000..ef087317187
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr95176.c
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+/* { dg-options "-O -fdump-tree-optimized-raw" } */
+
+extern int printf (__const char *__restrict __format, ...);
+
+int __attribute__ ((noinline))
+f(int a, int b)
+{
+    return a * (b / a);
+}
+
+int __attribute__((optimize("O0"))) __attribute__ ((noinline))
+fNoOpt(int a, int b)
+{
+    return a * (b / a);
+}
+
+int __attribute__ ((noinline))
+f2(int a, int b)
+{
+    return (b / a) * a;
+}
+
+int __attribute__((optimize("O0"))) __attribute__ ((noinline))
+f2NoOpt(int a, int b)
+{
+    return (b / a) * a;
+}
+
+int main()
+{
+    if (f(2, 15) != fNoOpt(2, 15))
+        __builtin_abort();
+    if (f2(2, 15) != f2NoOpt(2, 15))
+        __builtin_abort();
+    printf("pass");
+}
+
+// There should be two instances of trunc_div_expr and 
+// mult_expr in the output. One in fNoOpt and one in f2NoOpt.
+/* { dg-final { scan-tree-dump-times "trunc_div_expr" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "mult_expr" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "trunc_mod_expr" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "minus_expr" 2 "optimized" } } */
+/* { dg-output "pass" } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030807-10.c b/gcc/testsuite/gcc.dg/tree-ssa/20030807-10.c
index 0e01e511b78..4cd35738057 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20030807-10.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20030807-10.c
@@ -20,6 +20,9 @@ subreg_highpart_offset (outermode, innermode)
 /* There should be one mask with the value 3.  */
 /* { dg-final { scan-tree-dump-times " \& 3" 1 "vrp1"} } */
   
-/* There should be one right shift by 2 places.  */
-/* { dg-final { scan-tree-dump-times " >> 2" 1 "vrp1"} } */
+/* There should be no right shift by 2 places.  */
+/* { dg-final { scan-tree-dump-times " >> 2" 0 "vrp1"} } */
+
+/* The "difference / 4 * 4" should become a subtraction */
+/* { dg-final { scan-tree-dump-times " - " 2 "vrp1"} } */
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
  2021-03-31 23:02 [PATCH] tree-optimization: Optimize division followed by multiply [PR95176] Victor Tong
@ 2021-04-27  8:29 ` Richard Biener
  2021-06-02 20:55   ` [EXTERNAL] " Victor Tong
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Biener @ 2021-04-27  8:29 UTC (permalink / raw)
  To: Victor Tong; +Cc: gcc-patches

On Thu, Apr 1, 2021 at 1:03 AM Victor Tong via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hello,
>
> This patch fixes PR tree-optimization/95176. A new pattern in match.pd was added to transform "a * (b / a)" --> "b - (b % a)". A new test case was also added to cover this scenario.
>
> The new pattern interfered with the existing pattern of "X - (X / Y) * Y". In some cases (such as in fn4() in gcc/testsuite/gcc.dg/fold-minus-6.c), the new pattern is applied causing the existing pattern to no longer apply. This results in worse code generation because the expression is left as "X - (X - Y)". An additional subtraction pattern of "X - (X - Y) --> Y" was added to this patch to avoid this regression.
>
> I also didn't remove the existing pattern because it triggered in more cases than the new pattern because of a tree_invariant_p check that's inserted by genmatch for the new pattern.

Yes, we do not handle using Y multiple times when it might contain
side-effects in GENERIC folding
(comments in genmatch suggest we can use save_expr but we don't
implement this [anymore]).

On GIMPLE there's also the issue that your new pattern creates a
complex expression which
makes it failed to be used by value-numbering for example where the
old pattern was OK
(eventually, if no conversion was required).

So indeed it looks OK to preserve both.

I wonder why you needed the

+/* X - (X - Y) --> Y */
+(simplify
+ (minus (convert1? @0) (convert2? (minus @@0 @1)))
+ (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) &&
TYPE_OVERFLOW_UNDEFINED(type))
+  (convert @1)))

pattern since it should be handled by

  /* Match patterns that allow contracting a plus-minus pair
     irrespective of overflow issues.  */
  /* (A +- B) - A       ->  +- B */
  /* (A +- B) -+ B      ->  A */
  /* A - (A +- B)       -> -+ B */
  /* A +- (B -+ A)      ->  +- B */

in particular

  (simplify
   (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
   (view_convert @1))

if there's supported cases missing I'd rather extend this pattern than
replicating it.

+/* X * (Y / X) is the same as Y - (Y % X).  */
+(simplify
+ (mult:c (convert1? @0) (convert2? (trunc_div @1 @@0)))
+ (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
+  (minus (convert @1) (convert (trunc_mod @1 @0)))))

note that if you're allowing vector types you have to use
(view_convert ...) in the
transform and you also need to make sure that the target can expand
the modulo - I suspect that's an issue with the existing pattern as well.
I don't know of any vector ISA that supports modulo (or integer
division, that is).
Restricting the patterns to integer types is probably the most
sensible solution.

Thanks,
Richard.

> I verified that all "make -k check" tests pass when targeting x86_64-pc-linux-gnu.
>
> 2021-03-31  Victor Tong  <vitong@microsoft.com>
>
> gcc/ChangeLog:
>
>         * match.pd: Two new patterns: One to optimize division followed by multiply and the other to avoid a regression as explained above
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/20030807-10.c: Update existing test to look for a subtraction because a shift is no longer emitted
>         * gcc.dg/pr95176.c: New test to cover optimizing division followed by multiply
>
> I don't have write access to the GCC repo but I've completed the FSF paperwork as I plan to make more contributions in the future. I'm looking for a sponsorship from an existing GCC maintainer before applying for write access.
>
> Thanks,
> Victor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
  2021-04-27  8:29 ` Richard Biener
@ 2021-06-02 20:55   ` Victor Tong
  2021-06-07  8:25     ` Richard Biener
  0 siblings, 1 reply; 15+ messages in thread
From: Victor Tong @ 2021-06-02 20:55 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

Hi Richard,

Thanks for reviewing my patch. I did a search online and you're right -- there isn't a vector modulo instruction. I'll remove the X * (Y / X) --> Y - (Y % X) pattern and the existing X - (X / Y) * Y --> X % Y from triggering on vector types.

I looked into why the following pattern isn't triggering:

  (simplify
   (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
   (view_convert @1))

The nop_converts expand into tree_nop_conversion_p checks. In fn2() of the testsuite/gcc.dg/fold-minus-6.c, the expression during generic matching looks like: 

42 - (long int) (42 - 42 % x)

When looking at the right-hand side of the expression (the (long int) (42 - 42 % x)), the tree_nop_conversion_p check fails because of the type precision difference. The expression inside of the cast has a 32-bit precision and the outer expression has a 64-bit precision.

I looked around at other patterns and it seems like nop_convert and view_convert are used because of underflow/overflow concerns. I'm not familiar with the two constructs. What's the difference between using them and checking TYPE_OVERFLOW_UNDEFINED? In the scenario above, since TYPE_OVERFLOW_UNDEFINED is true, the second pattern that I added (X - (X - Y) --> Y) gets triggered.

Thanks,
Victor


From: Richard Biener <richard.guenther@gmail.com>
Sent: Tuesday, April 27, 2021 1:29 AM
To: Victor Tong <vitong@microsoft.com>
Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
Subject: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176] 
 
On Thu, Apr 1, 2021 at 1:03 AM Victor Tong via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hello,
>
> This patch fixes PR tree-optimization/95176. A new pattern in match.pd was added to transform "a * (b / a)" --> "b - (b % a)". A new test case was also added to cover this scenario.
>
> The new pattern interfered with the existing pattern of "X - (X / Y) * Y". In some cases (such as in fn4() in gcc/testsuite/gcc.dg/fold-minus-6.c), the new pattern is applied causing the existing pattern to no longer apply. This results in worse code generation because the expression is left as "X - (X - Y)". An additional subtraction pattern of "X - (X - Y) --> Y" was added to this patch to avoid this regression.
>
> I also didn't remove the existing pattern because it triggered in more cases than the new pattern because of a tree_invariant_p check that's inserted by genmatch for the new pattern.

Yes, we do not handle using Y multiple times when it might contain
side-effects in GENERIC folding
(comments in genmatch suggest we can use save_expr but we don't
implement this [anymore]).

On GIMPLE there's also the issue that your new pattern creates a
complex expression which
makes it failed to be used by value-numbering for example where the
old pattern was OK
(eventually, if no conversion was required).

So indeed it looks OK to preserve both.

I wonder why you needed the

+/* X - (X - Y) --> Y */
+(simplify
+ (minus (convert1? @0) (convert2? (minus @@0 @1)))
+ (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) &&
TYPE_OVERFLOW_UNDEFINED(type))
+  (convert @1)))

pattern since it should be handled by

  /* Match patterns that allow contracting a plus-minus pair
     irrespective of overflow issues.  */
  /* (A +- B) - A       ->  +- B */
  /* (A +- B) -+ B      ->  A */
  /* A - (A +- B)       -> -+ B */
  /* A +- (B -+ A)      ->  +- B */

in particular

  (simplify
   (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
   (view_convert @1))

if there's supported cases missing I'd rather extend this pattern than
replicating it.

+/* X * (Y / X) is the same as Y - (Y % X).  */
+(simplify
+ (mult:c (convert1? @0) (convert2? (trunc_div @1 @@0)))
+ (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
+  (minus (convert @1) (convert (trunc_mod @1 @0)))))

note that if you're allowing vector types you have to use
(view_convert ...) in the
transform and you also need to make sure that the target can expand
the modulo - I suspect that's an issue with the existing pattern as well.
I don't know of any vector ISA that supports modulo (or integer
division, that is).
Restricting the patterns to integer types is probably the most
sensible solution.

Thanks,
Richard.

> I verified that all "make -k check" tests pass when targeting x86_64-pc-linux-gnu.
>
> 2021-03-31  Victor Tong  <vitong@microsoft.com>
>
> gcc/ChangeLog:
>
>         * match.pd: Two new patterns: One to optimize division followed by multiply and the other to avoid a regression as explained above
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/20030807-10.c: Update existing test to look for a subtraction because a shift is no longer emitted
>         * gcc.dg/pr95176.c: New test to cover optimizing division followed by multiply
>
> I don't have write access to the GCC repo but I've completed the FSF paperwork as I plan to make more contributions in the future. I'm looking for a sponsorship from an existing GCC maintainer before applying for write access.
>
> Thanks,
> Victor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
  2021-06-02 20:55   ` [EXTERNAL] " Victor Tong
@ 2021-06-07  8:25     ` Richard Biener
  2021-06-16 18:49       ` Victor Tong
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Biener @ 2021-06-07  8:25 UTC (permalink / raw)
  To: Victor Tong; +Cc: gcc-patches

On Wed, Jun 2, 2021 at 10:55 PM Victor Tong <vitong@microsoft.com> wrote:
>
> Hi Richard,
>
> Thanks for reviewing my patch. I did a search online and you're right -- there isn't a vector modulo instruction. I'll remove the X * (Y / X) --> Y - (Y % X) pattern and the existing X - (X / Y) * Y --> X % Y from triggering on vector types.
>
> I looked into why the following pattern isn't triggering:
>
>   (simplify
>    (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
>    (view_convert @1))
>
> The nop_converts expand into tree_nop_conversion_p checks. In fn2() of the testsuite/gcc.dg/fold-minus-6.c, the expression during generic matching looks like:
>
> 42 - (long int) (42 - 42 % x)
>
> When looking at the right-hand side of the expression (the (long int) (42 - 42 % x)), the tree_nop_conversion_p check fails because of the type precision difference. The expression inside of the cast has a 32-bit precision and the outer expression has a 64-bit precision.
>
> I looked around at other patterns and it seems like nop_convert and view_convert are used because of underflow/overflow concerns. I'm not familiar with the two constructs. What's the difference between using them and checking TYPE_OVERFLOW_UNDEFINED? In the scenario above, since TYPE_OVERFLOW_UNDEFINED is true, the second pattern that I added (X - (X - Y) --> Y) gets triggered.

But TYPE_OVERFLOW_UNDEFINED is not a good condition here since the
conversion is the problematic one and
conversions have implementation defined behavior.  Now, the above does
not match because it wasn't designed to,
and for non-constant '42' it would have needed a (convert ...) around
the first @0 as well (matching of constants is
by value, not by value + type).

That said, your

+/* X - (X - Y) --> Y */
+(simplify
+ (minus (convert1? @0) (convert2? (minus @@0 @1)))
+ (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) &&
TYPE_OVERFLOW_UNDEFINED(type))
+  (convert @1)))

would match (int)x - (int)(x - y) where you assert the outer subtract
has undefined behavior
on overflow but the inner subtract could wrap and the (int) conversion
can be truncating
or widening.  Is that really always a valid transform then?

Richard.

> Thanks,
> Victor
>
>
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Tuesday, April 27, 2021 1:29 AM
> To: Victor Tong <vitong@microsoft.com>
> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> Subject: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
>
> On Thu, Apr 1, 2021 at 1:03 AM Victor Tong via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hello,
> >
> > This patch fixes PR tree-optimization/95176. A new pattern in match.pd was added to transform "a * (b / a)" --> "b - (b % a)". A new test case was also added to cover this scenario.
> >
> > The new pattern interfered with the existing pattern of "X - (X / Y) * Y". In some cases (such as in fn4() in gcc/testsuite/gcc.dg/fold-minus-6.c), the new pattern is applied causing the existing pattern to no longer apply. This results in worse code generation because the expression is left as "X - (X - Y)". An additional subtraction pattern of "X - (X - Y) --> Y" was added to this patch to avoid this regression.
> >
> > I also didn't remove the existing pattern because it triggered in more cases than the new pattern because of a tree_invariant_p check that's inserted by genmatch for the new pattern.
>
> Yes, we do not handle using Y multiple times when it might contain
> side-effects in GENERIC folding
> (comments in genmatch suggest we can use save_expr but we don't
> implement this [anymore]).
>
> On GIMPLE there's also the issue that your new pattern creates a
> complex expression which
> makes it failed to be used by value-numbering for example where the
> old pattern was OK
> (eventually, if no conversion was required).
>
> So indeed it looks OK to preserve both.
>
> I wonder why you needed the
>
> +/* X - (X - Y) --> Y */
> +(simplify
> + (minus (convert1? @0) (convert2? (minus @@0 @1)))
> + (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) &&
> TYPE_OVERFLOW_UNDEFINED(type))
> +  (convert @1)))
>
> pattern since it should be handled by
>
>   /* Match patterns that allow contracting a plus-minus pair
>      irrespective of overflow issues.  */
>   /* (A +- B) - A       ->  +- B */
>   /* (A +- B) -+ B      ->  A */
>   /* A - (A +- B)       -> -+ B */
>   /* A +- (B -+ A)      ->  +- B */
>
> in particular
>
>   (simplify
>    (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
>    (view_convert @1))
>
> if there's supported cases missing I'd rather extend this pattern than
> replicating it.
>
> +/* X * (Y / X) is the same as Y - (Y % X).  */
> +(simplify
> + (mult:c (convert1? @0) (convert2? (trunc_div @1 @@0)))
> + (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
> +  (minus (convert @1) (convert (trunc_mod @1 @0)))))
>
> note that if you're allowing vector types you have to use
> (view_convert ...) in the
> transform and you also need to make sure that the target can expand
> the modulo - I suspect that's an issue with the existing pattern as well.
> I don't know of any vector ISA that supports modulo (or integer
> division, that is).
> Restricting the patterns to integer types is probably the most
> sensible solution.
>
> Thanks,
> Richard.
>
> > I verified that all "make -k check" tests pass when targeting x86_64-pc-linux-gnu.
> >
> > 2021-03-31  Victor Tong  <vitong@microsoft.com>
> >
> > gcc/ChangeLog:
> >
> >         * match.pd: Two new patterns: One to optimize division followed by multiply and the other to avoid a regression as explained above
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.dg/tree-ssa/20030807-10.c: Update existing test to look for a subtraction because a shift is no longer emitted
> >         * gcc.dg/pr95176.c: New test to cover optimizing division followed by multiply
> >
> > I don't have write access to the GCC repo but I've completed the FSF paperwork as I plan to make more contributions in the future. I'm looking for a sponsorship from an existing GCC maintainer before applying for write access.
> >
> > Thanks,
> > Victor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
  2021-06-07  8:25     ` Richard Biener
@ 2021-06-16 18:49       ` Victor Tong
  2021-06-18  9:43         ` Richard Biener
  0 siblings, 1 reply; 15+ messages in thread
From: Victor Tong @ 2021-06-16 18:49 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

Hi Richard,

Thanks for the feedback. From what you said, I can think of two possible solutions (though I'm not sure if either is feasible/fully correct):

Option 1: Have the new X * (Y / X) --> Y - (Y % X) optimization only run in scenarios that don't interfere with the existing X - (X / Y) * Y --> X % Y optimization. 

This would involve checking the expression one level up to see if there's a subtraction that would trigger the existing optimization. I looked through the match.pd file and couldn't find a bail condition like this. It doesn't seem like there's a link from an expression to its parent expression one level up. This also feels a bit counter-intuitive since it would be doing the opposite of the bottom-up expression matching where the compiler would like to match a larger expression rather than a smaller one.

Option 2: Add a new pattern to support scenarios that the existing nop_convert pattern bails out on.

Existing pattern:

(simplify
   (minus (nop_convert1? @0) (nop_convert2? (minus (nop_convert3? @@0) @1)))
   (view_convert @1))

New pattern to add:

  /* X - (X - Y) --> Y */
  (simplify
  (minus @0 (convert? (minus @@0 @1)))
  (if (INTEGRAL_TYPE_P (type) 
        && TYPE_OVERFLOW_UNDEFINED(type)
        && INTEGRAL_TYPE_P (TREE_TYPE(@1))
        && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@1))
        && !TYPE_UNSIGNED (TREE_TYPE (@1))
        && !TYPE_UNSIGNED (type)
        && TYPE_PRECISION (TREE_TYPE (@1)) <= TYPE_PRECISION (type))
    (convert @1)))

I think the truncation concerns that you brought up should be covered if the external expression type precision is greater than or equal to the internal expression type. There may be a sign extension operation (which is why the nop_convert check fails) but that shouldn't affect the value of the expression. And if the types involved are signed integers where overflow/underflow results in undefined behavior, the X - (X - Y) --> Y optimization should be legal.

Please correct me if I'm wrong with either one of these options, or if you can think of a better option to fix the regression.

Thanks,
Victor




From: Richard Biener <richard.guenther@gmail.com>
Sent: Monday, June 7, 2021 1:25 AM
To: Victor Tong <vitong@microsoft.com>
Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176] 
 
On Wed, Jun 2, 2021 at 10:55 PM Victor Tong <vitong@microsoft.com> wrote:
>
> Hi Richard,
>
> Thanks for reviewing my patch. I did a search online and you're right -- there isn't a vector modulo instruction. I'll remove the X * (Y / X) --> Y - (Y % X) pattern and the existing X - (X / Y) * Y --> X % Y from triggering on vector types.
>
> I looked into why the following pattern isn't triggering:
>
>   (simplify
>    (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
>    (view_convert @1))
>
> The nop_converts expand into tree_nop_conversion_p checks. In fn2() of the testsuite/gcc.dg/fold-minus-6.c, the expression during generic matching looks like:
>
> 42 - (long int) (42 - 42 % x)
>
> When looking at the right-hand side of the expression (the (long int) (42 - 42 % x)), the tree_nop_conversion_p check fails because of the type precision difference. The expression inside of the cast has a 32-bit precision and the outer expression has a 64-bit precision.
>
> I looked around at other patterns and it seems like nop_convert and view_convert are used because of underflow/overflow concerns. I'm not familiar with the two constructs. What's the difference between using them and checking TYPE_OVERFLOW_UNDEFINED? In the scenario above, since TYPE_OVERFLOW_UNDEFINED is true, the second pattern that I added (X - (X - Y) --> Y) gets triggered.

But TYPE_OVERFLOW_UNDEFINED is not a good condition here since the
conversion is the problematic one and
conversions have implementation defined behavior.  Now, the above does
not match because it wasn't designed to,
and for non-constant '42' it would have needed a (convert ...) around
the first @0 as well (matching of constants is
by value, not by value + type).

That said, your

+/* X - (X - Y) --> Y */
+(simplify
+ (minus (convert1? @0) (convert2? (minus @@0 @1)))
+ (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) &&
TYPE_OVERFLOW_UNDEFINED(type))
+  (convert @1)))

would match (int)x - (int)(x - y) where you assert the outer subtract
has undefined behavior
on overflow but the inner subtract could wrap and the (int) conversion
can be truncating
or widening.  Is that really always a valid transform then?

Richard.

> Thanks,
> Victor
>
>
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Tuesday, April 27, 2021 1:29 AM
> To: Victor Tong <vitong@microsoft.com>
> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> Subject: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
>
> On Thu, Apr 1, 2021 at 1:03 AM Victor Tong via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hello,
> >
> > This patch fixes PR tree-optimization/95176. A new pattern in match.pd was added to transform "a * (b / a)" --> "b - (b % a)". A new test case was also added to cover this scenario.
> >
> > The new pattern interfered with the existing pattern of "X - (X / Y) * Y". In some cases (such as in fn4() in gcc/testsuite/gcc.dg/fold-minus-6.c), the new pattern is applied causing the existing pattern to no longer apply. This results in worse code generation because the expression is left as "X - (X - Y)". An additional subtraction pattern of "X - (X - Y) --> Y" was added to this patch to avoid this regression.
> >
> > I also didn't remove the existing pattern because it triggered in more cases than the new pattern because of a tree_invariant_p check that's inserted by genmatch for the new pattern.
>
> Yes, we do not handle using Y multiple times when it might contain
> side-effects in GENERIC folding
> (comments in genmatch suggest we can use save_expr but we don't
> implement this [anymore]).
>
> On GIMPLE there's also the issue that your new pattern creates a
> complex expression which
> makes it failed to be used by value-numbering for example where the
> old pattern was OK
> (eventually, if no conversion was required).
>
> So indeed it looks OK to preserve both.
>
> I wonder why you needed the
>
> +/* X - (X - Y) --> Y */
> +(simplify
> + (minus (convert1? @0) (convert2? (minus @@0 @1)))
> + (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) &&
> TYPE_OVERFLOW_UNDEFINED(type))
> +  (convert @1)))
>
> pattern since it should be handled by
>
>   /* Match patterns that allow contracting a plus-minus pair
>      irrespective of overflow issues.  */
>   /* (A +- B) - A       ->  +- B */
>   /* (A +- B) -+ B      ->  A */
>   /* A - (A +- B)       -> -+ B */
>   /* A +- (B -+ A)      ->  +- B */
>
> in particular
>
>   (simplify
>    (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
>    (view_convert @1))
>
> if there's supported cases missing I'd rather extend this pattern than
> replicating it.
>
> +/* X * (Y / X) is the same as Y - (Y % X).  */
> +(simplify
> + (mult:c (convert1? @0) (convert2? (trunc_div @1 @@0)))
> + (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
> +  (minus (convert @1) (convert (trunc_mod @1 @0)))))
>
> note that if you're allowing vector types you have to use
> (view_convert ...) in the
> transform and you also need to make sure that the target can expand
> the modulo - I suspect that's an issue with the existing pattern as well.
> I don't know of any vector ISA that supports modulo (or integer
> division, that is).
> Restricting the patterns to integer types is probably the most
> sensible solution.
>
> Thanks,
> Richard.
>
> > I verified that all "make -k check" tests pass when targeting x86_64-pc-linux-gnu.
> >
> > 2021-03-31  Victor Tong  <vitong@microsoft.com>
> >
> > gcc/ChangeLog:
> >
> >         * match.pd: Two new patterns: One to optimize division followed by multiply and the other to avoid a regression as explained above
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.dg/tree-ssa/20030807-10.c: Update existing test to look for a subtraction because a shift is no longer emitted
> >         * gcc.dg/pr95176.c: New test to cover optimizing division followed by multiply
> >
> > I don't have write access to the GCC repo but I've completed the FSF paperwork as I plan to make more contributions in the future. I'm looking for a sponsorship from an existing GCC maintainer before applying for write access.
> >
> > Thanks,
> > Victor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
  2021-06-16 18:49       ` Victor Tong
@ 2021-06-18  9:43         ` Richard Biener
  2021-06-19 17:05           ` Marc Glisse
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Biener @ 2021-06-18  9:43 UTC (permalink / raw)
  To: Victor Tong; +Cc: gcc-patches, Marc Glisse

On Wed, Jun 16, 2021 at 8:49 PM Victor Tong <vitong@microsoft.com> wrote:
>
> Hi Richard,
>
> Thanks for the feedback. From what you said, I can think of two possible solutions (though I'm not sure if either is feasible/fully correct):
>
> Option 1: Have the new X * (Y / X) --> Y - (Y % X) optimization only run in scenarios that don't interfere with the existing X - (X / Y) * Y --> X % Y optimization.
>
> This would involve checking the expression one level up to see if there's a subtraction that would trigger the existing optimization. I looked through the match.pd file and couldn't find a bail condition like this. It doesn't seem like there's a link from an expression to its parent expression one level up. This also feels a bit counter-intuitive since it would be doing the opposite of the bottom-up expression matching where the compiler would like to match a larger expression rather than a smaller one.

Yes, that option is not really possible from match.pd.

> Option 2: Add a new pattern to support scenarios that the existing nop_convert pattern bails out on.
>
> Existing pattern:
>
> (simplify
>    (minus (nop_convert1? @0) (nop_convert2? (minus (nop_convert3? @@0) @1)))
>    (view_convert @1))
>
> New pattern to add:
>
>   /* X - (X - Y) --> Y */
>   (simplify
>   (minus @0 (convert? (minus @@0 @1)))
>   (if (INTEGRAL_TYPE_P (type)
>         && TYPE_OVERFLOW_UNDEFINED(type)
>         && INTEGRAL_TYPE_P (TREE_TYPE(@1))
>         && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@1))
>         && !TYPE_UNSIGNED (TREE_TYPE (@1))
>         && !TYPE_UNSIGNED (type)
>         && TYPE_PRECISION (TREE_TYPE (@1)) <= TYPE_PRECISION (type))
>     (convert @1)))
>
> I think the truncation concerns that you brought up should be covered if the external expression type precision is greater than or equal to the internal expression type. There may be a sign extension operation (which is why the nop_convert check fails) but that shouldn't affect the value of the expression. And if the types involved are signed integers where overflow/underflow results in undefined behavior, the X - (X - Y) --> Y optimization should be legal.
>
> Please correct me if I'm wrong with either one of these options, or if you can think of a better option to fix the regression.

So to recap, we're looking to simplify 42 - (long int) (42 - 42 % x)
(simplified from gcc.dg/fold-minus-6.c), or
simply (new testcase):

long
fn1 (int x)
{
  return 42L - (long)(42 - x);
}

where the existing pattern does not apply because the conversion is
not a NOP one:

  (simplify
   (minus (nop_convert1? (minus (nop_convert2? @0) @1)) @0)
   (if (!ANY_INTEGRAL_TYPE_P (type)
        || TYPE_OVERFLOW_WRAPS (type))
   (negate (view_convert @1))
   (view_convert (negate @1))))

so let's consider replacing nop_convert1? with convert1? and thus obtain

  (simplify
   (minus (convert1? (minus (nop_convert2? @0) @1)) @0)
   (if (!ANY_INTEGRAL_TYPE_P (type)
        || TYPE_OVERFLOW_WRAPS (type))
   (negate (view_convert @1))
   (view_convert (negate @1))))

given we still require a matching @0 (as in operand_requal_p) it looks like
a convert1 that is not the inverse of the nop_convert2, and thus also
a nop_convert
is only possible for constants (because operand_equal_p does not verify type
equality).  Now - can we construct any testcase for which this conversion would
be wrong?

Richard.

> Thanks,
> Victor
>
>
>
>
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Monday, June 7, 2021 1:25 AM
> To: Victor Tong <vitong@microsoft.com>
> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
>
> On Wed, Jun 2, 2021 at 10:55 PM Victor Tong <vitong@microsoft.com> wrote:
> >
> > Hi Richard,
> >
> > Thanks for reviewing my patch. I did a search online and you're right -- there isn't a vector modulo instruction. I'll remove the X * (Y / X) --> Y - (Y % X) pattern and the existing X - (X / Y) * Y --> X % Y from triggering on vector types.
> >
> > I looked into why the following pattern isn't triggering:
> >
> >   (simplify
> >    (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
> >    (view_convert @1))
> >
> > The nop_converts expand into tree_nop_conversion_p checks. In fn2() of the testsuite/gcc.dg/fold-minus-6.c, the expression during generic matching looks like:
> >
> > 42 - (long int) (42 - 42 % x)
> >
> > When looking at the right-hand side of the expression (the (long int) (42 - 42 % x)), the tree_nop_conversion_p check fails because of the type precision difference. The expression inside of the cast has a 32-bit precision and the outer expression has a 64-bit precision.
> >
> > I looked around at other patterns and it seems like nop_convert and view_convert are used because of underflow/overflow concerns. I'm not familiar with the two constructs. What's the difference between using them and checking TYPE_OVERFLOW_UNDEFINED? In the scenario above, since TYPE_OVERFLOW_UNDEFINED is true, the second pattern that I added (X - (X - Y) --> Y) gets triggered.
>
> But TYPE_OVERFLOW_UNDEFINED is not a good condition here since the
> conversion is the problematic one and
> conversions have implementation defined behavior.  Now, the above does
> not match because it wasn't designed to,
> and for non-constant '42' it would have needed a (convert ...) around
> the first @0 as well (matching of constants is
> by value, not by value + type).
>
> That said, your
>
> +/* X - (X - Y) --> Y */
> +(simplify
> + (minus (convert1? @0) (convert2? (minus @@0 @1)))
> + (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) &&
> TYPE_OVERFLOW_UNDEFINED(type))
> +  (convert @1)))
>
> would match (int)x - (int)(x - y) where you assert the outer subtract
> has undefined behavior
> on overflow but the inner subtract could wrap and the (int) conversion
> can be truncating
> or widening.  Is that really always a valid transform then?
>
> Richard.
>
> > Thanks,
> > Victor
> >
> >
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: Tuesday, April 27, 2021 1:29 AM
> > To: Victor Tong <vitong@microsoft.com>
> > Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> > Subject: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
> >
> > On Thu, Apr 1, 2021 at 1:03 AM Victor Tong via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Hello,
> > >
> > > This patch fixes PR tree-optimization/95176. A new pattern in match.pd was added to transform "a * (b / a)" --> "b - (b % a)". A new test case was also added to cover this scenario.
> > >
> > > The new pattern interfered with the existing pattern of "X - (X / Y) * Y". In some cases (such as in fn4() in gcc/testsuite/gcc.dg/fold-minus-6.c), the new pattern is applied causing the existing pattern to no longer apply. This results in worse code generation because the expression is left as "X - (X - Y)". An additional subtraction pattern of "X - (X - Y) --> Y" was added to this patch to avoid this regression.
> > >
> > > I also didn't remove the existing pattern because it triggered in more cases than the new pattern because of a tree_invariant_p check that's inserted by genmatch for the new pattern.
> >
> > Yes, we do not handle using Y multiple times when it might contain
> > side-effects in GENERIC folding
> > (comments in genmatch suggest we can use save_expr but we don't
> > implement this [anymore]).
> >
> > On GIMPLE there's also the issue that your new pattern creates a
> > complex expression which
> > makes it failed to be used by value-numbering for example where the
> > old pattern was OK
> > (eventually, if no conversion was required).
> >
> > So indeed it looks OK to preserve both.
> >
> > I wonder why you needed the
> >
> > +/* X - (X - Y) --> Y */
> > +(simplify
> > + (minus (convert1? @0) (convert2? (minus @@0 @1)))
> > + (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) &&
> > TYPE_OVERFLOW_UNDEFINED(type))
> > +  (convert @1)))
> >
> > pattern since it should be handled by
> >
> >   /* Match patterns that allow contracting a plus-minus pair
> >      irrespective of overflow issues.  */
> >   /* (A +- B) - A       ->  +- B */
> >   /* (A +- B) -+ B      ->  A */
> >   /* A - (A +- B)       -> -+ B */
> >   /* A +- (B -+ A)      ->  +- B */
> >
> > in particular
> >
> >   (simplify
> >    (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
> >    (view_convert @1))
> >
> > if there's supported cases missing I'd rather extend this pattern than
> > replicating it.
> >
> > +/* X * (Y / X) is the same as Y - (Y % X).  */
> > +(simplify
> > + (mult:c (convert1? @0) (convert2? (trunc_div @1 @@0)))
> > + (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
> > +  (minus (convert @1) (convert (trunc_mod @1 @0)))))
> >
> > note that if you're allowing vector types you have to use
> > (view_convert ...) in the
> > transform and you also need to make sure that the target can expand
> > the modulo - I suspect that's an issue with the existing pattern as well.
> > I don't know of any vector ISA that supports modulo (or integer
> > division, that is).
> > Restricting the patterns to integer types is probably the most
> > sensible solution.
> >
> > Thanks,
> > Richard.
> >
> > > I verified that all "make -k check" tests pass when targeting x86_64-pc-linux-gnu.
> > >
> > > 2021-03-31  Victor Tong  <vitong@microsoft.com>
> > >
> > > gcc/ChangeLog:
> > >
> > >         * match.pd: Two new patterns: One to optimize division followed by multiply and the other to avoid a regression as explained above
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.dg/tree-ssa/20030807-10.c: Update existing test to look for a subtraction because a shift is no longer emitted
> > >         * gcc.dg/pr95176.c: New test to cover optimizing division followed by multiply
> > >
> > > I don't have write access to the GCC repo but I've completed the FSF paperwork as I plan to make more contributions in the future. I'm looking for a sponsorship from an existing GCC maintainer before applying for write access.
> > >
> > > Thanks,
> > > Victor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
  2021-06-18  9:43         ` Richard Biener
@ 2021-06-19 17:05           ` Marc Glisse
  2021-06-21  7:08             ` Richard Biener
  0 siblings, 1 reply; 15+ messages in thread
From: Marc Glisse @ 2021-06-19 17:05 UTC (permalink / raw)
  To: Richard Biener; +Cc: Victor Tong, gcc-patches

On Fri, 18 Jun 2021, Richard Biener wrote:

>> Option 2: Add a new pattern to support scenarios that the existing nop_convert pattern bails out on.
>>
>> Existing pattern:
>>
>> (simplify
>>    (minus (nop_convert1? @0) (nop_convert2? (minus (nop_convert3? @@0) @1)))
>>    (view_convert @1))

I tried to check with a program when

T3 x;
T1 y;
(T2)x-(T2)((T1)x-y)

can be safely replaced with

(T2)y

From the output, it looks like this is safe when T1 is at least as large 
as T2. It is wrong when T1 is unsigned and smaller than T2. And when T1 is 
signed and smaller than T2, it is ok if T3 is the same type as T1 (signed 
then) or has strictly less precision (any sign), and not in other cases.

Note that this is when signed implies undefined overflow and unsigned 
implies wrapping, and I wouldn't put too much faith in this recently 
dusted program. And it doesn't say how to write the match.pd pattern with 
'?', "@@", disabling it if TYPE_OVERFLOW_SANITIZED, etc.

Mostly, I wanted to say that if we are going to go handle more than 
nop_convert for more than just 1 or 2 easy transformations, I think some 
kind of computer verification would be useful, it would save a lot of time 
and headaches.

(I just check by brute force all possible precisions (from 1 to 6) and 
signedness for T1, T2 and T3, all possible values for x and y, compute 
the before and after formulas, accepting if there is UB before, rejecting 
if there is UB after (and not before), and manually try to see a pattern 
in the list of types that work)

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
  2021-06-19 17:05           ` Marc Glisse
@ 2021-06-21  7:08             ` Richard Biener
  2021-06-28 23:10               ` Victor Tong
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Biener @ 2021-06-21  7:08 UTC (permalink / raw)
  To: Marc Glisse; +Cc: Victor Tong, gcc-patches

On Sat, Jun 19, 2021 at 7:05 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> On Fri, 18 Jun 2021, Richard Biener wrote:
>
> >> Option 2: Add a new pattern to support scenarios that the existing nop_convert pattern bails out on.
> >>
> >> Existing pattern:
> >>
> >> (simplify
> >>    (minus (nop_convert1? @0) (nop_convert2? (minus (nop_convert3? @@0) @1)))
> >>    (view_convert @1))
>
> I tried to check with a program when
>
> T3 x;
> T1 y;
> (T2)x-(T2)((T1)x-y)
>
> can be safely replaced with
>
> (T2)y
>
> From the output, it looks like this is safe when T1 is at least as large
> as T2. It is wrong when T1 is unsigned and smaller than T2. And when T1 is
> signed and smaller than T2, it is ok if T3 is the same type as T1 (signed
> then) or has strictly less precision (any sign), and not in other cases.
>
> Note that this is when signed implies undefined overflow and unsigned
> implies wrapping, and I wouldn't put too much faith in this recently
> dusted program. And it doesn't say how to write the match.pd pattern with
> '?', "@@", disabling it if TYPE_OVERFLOW_SANITIZED, etc.
>
> Mostly, I wanted to say that if we are going to go handle more than
> nop_convert for more than just 1 or 2 easy transformations, I think some
> kind of computer verification would be useful, it would save a lot of time
> and headaches.

True.  I wonder if auto-generating such tests from match.pd rules would
be a good project to work on (GSoC?).  When there's define_match
involved things get a little tricky, but then one item on the TODO list
is "inlining" those at the use site (exploding the decision tree even more).

Richard.

> (I just check by brute force all possible precisions (from 1 to 6) and
> signedness for T1, T2 and T3, all possible values for x and y, compute
> the before and after formulas, accepting if there is UB before, rejecting
> if there is UB after (and not before), and manually try to see a pattern
> in the list of types that work)
>
> --
> Marc Glisse

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
  2021-06-21  7:08             ` Richard Biener
@ 2021-06-28 23:10               ` Victor Tong
  2021-07-19 20:58                 ` Victor Tong
  2021-07-28  9:59                 ` Richard Biener
  0 siblings, 2 replies; 15+ messages in thread
From: Victor Tong @ 2021-06-28 23:10 UTC (permalink / raw)
  To: Richard Biener, Marc Glisse; +Cc: gcc-patches

​Thanks Richard and Marc.

I wrote the following test case to compare the outputs of fn1() and fn1NoOpt() below with my extra pattern being applied. I tested the two functions with all of the integers from INT_MIN to INT_MAX.

long
fn1 (int x)
{
  return 42L - (long)(42 - x);
}

#pragma GCC push_options
#pragma GCC optimize ("O0")
long
fn1NoOpt (int x)
{
  volatile int y = (42 - x);
  return 42L - (long)y;
}
#pragma GCC pop_options

int main ()
{
	for (long i=INT_MIN; i<=INT_MAX;i++)
	{
		auto valNoOpt = fn1NoOpt(i);
		auto valOpt = fn1(i);
		if (valNoOpt != valOpt)
			printf("valOpt=%ld, valNoOpt=%ld\n", valOpt, valNoOpt);
	}
	return 0;
}

I saw that the return values of fn1() and fn1NoOpt() differed when the input was between INT_MIN and INT_MIN+42 inclusive. When passing values in this range to fn1NoOpt(), a signed overflow is triggered which causes the value to differ (undefined behavior). This seems to go in line with what Marc described and I think the transformation is correct in the scenario above. I do think that type casts that result in truncation (i.e. from a higher precision to a lower one) or with unsigned types will result in an incorrect transformation so those scenarios need to be avoided.

Given that the extra pattern I'm adding is taking advantage the undefined behavior of signed integer overflow, I'm considering keeping the existing nop_convert pattern in place and adding a new pattern to cover these new cases. I'd also like to avoid touching nop_convert given that it's used in a number of other patterns.

This is the pattern I have currently:

  (simplify
    (minus (convert1? @0) (convert2? (minus (convert3? @2) @1)))
    (if (operand_equal_p(@0, @2, 0)
        && INTEGRAL_TYPE_P (type)
        && TYPE_OVERFLOW_UNDEFINED(type)
        && !TYPE_OVERFLOW_SANITIZED(type)
        && INTEGRAL_TYPE_P (TREE_TYPE(@1))
        && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@1))
        && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@1))
        && !TYPE_UNSIGNED (TREE_TYPE (@1))
        && !TYPE_UNSIGNED (type)
        && TYPE_PRECISION (TREE_TYPE (@1)) <= TYPE_PRECISION (type)
        && INTEGRAL_TYPE_P (TREE_TYPE(@0))
        && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@0))
        && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0))
        && !TYPE_UNSIGNED (TREE_TYPE (@0))
        && TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (type)
        && TREE_TYPE(@1) == TREE_TYPE(@2))
    (convert @1)))

Is there a more concise/better way of writing the pattern? I was looking for similar checks in match.pd and I couldn't find anything that I could leverage.

I also kept my pattern to the specific scenario I'm seeing with the regression to lower the risk of something breaking. I've limited @1 and @2 to have the same type.

I'm also in favor of adding/running computer verification to make sure the transformation is legal. I've written some tests to verify that the pattern is being applied in the right scenarios and not being applied in others, but I think there are too many possibilities to manually write them all. Is there anything in GCC that can be used to verify that match.pd transformations are correct? I'm thinking of something like Alive https://github.com/AliveToolkit/alive2.

Thanks,
Victor



From: Richard Biener <richard.guenther@gmail.com>
Sent: Monday, June 21, 2021 12:08 AM
To: Marc Glisse <marc.glisse@inria.fr>
Cc: Victor Tong <vitong@microsoft.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176] 
 
On Sat, Jun 19, 2021 at 7:05 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> On Fri, 18 Jun 2021, Richard Biener wrote:
>
> >> Option 2: Add a new pattern to support scenarios that the existing nop_convert pattern bails out on.
> >>
> >> Existing pattern:
> >>
> >> (simplify
> >>    (minus (nop_convert1? @0) (nop_convert2? (minus (nop_convert3? @@0) @1)))
> >>    (view_convert @1))
>
> I tried to check with a program when
>
> T3 x;
> T1 y;
> (T2)x-(T2)((T1)x-y)
>
> can be safely replaced with
>
> (T2)y
>
> From the output, it looks like this is safe when T1 is at least as large
> as T2. It is wrong when T1 is unsigned and smaller than T2. And when T1 is
> signed and smaller than T2, it is ok if T3 is the same type as T1 (signed
> then) or has strictly less precision (any sign), and not in other cases.
>
> Note that this is when signed implies undefined overflow and unsigned
> implies wrapping, and I wouldn't put too much faith in this recently
> dusted program. And it doesn't say how to write the match.pd pattern with
> '?', "@@", disabling it if TYPE_OVERFLOW_SANITIZED, etc.
>
> Mostly, I wanted to say that if we are going to go handle more than
> nop_convert for more than just 1 or 2 easy transformations, I think some
> kind of computer verification would be useful, it would save a lot of time
> and headaches.

True.  I wonder if auto-generating such tests from match.pd rules would
be a good project to work on (GSoC?).  When there's define_match
involved things get a little tricky, but then one item on the TODO list
is "inlining" those at the use site (exploding the decision tree even more).

Richard.

> (I just check by brute force all possible precisions (from 1 to 6) and
> signedness for T1, T2 and T3, all possible values for x and y, compute
> the before and after formulas, accepting if there is UB before, rejecting
> if there is UB after (and not before), and manually try to see a pattern
> in the list of types that work)
>
> --
> Marc Glisse

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
  2021-06-28 23:10               ` Victor Tong
@ 2021-07-19 20:58                 ` Victor Tong
  2021-07-28  9:59                 ` Richard Biener
  1 sibling, 0 replies; 15+ messages in thread
From: Victor Tong @ 2021-07-19 20:58 UTC (permalink / raw)
  To: Richard Biener, Marc Glisse, Victor Tong; +Cc: gcc-patches

Gentle ping.
________________________________
From: Gcc-patches <gcc-patches-bounces+vitong=microsoft.com@gcc.gnu.org> on behalf of Victor Tong via Gcc-patches <gcc-patches@gcc.gnu.org>
Sent: Monday, June 28, 2021 4:10 PM
To: Richard Biener <richard.guenther@gmail.com>; Marc Glisse <marc.glisse@inria.fr>
Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]

​Thanks Richard and Marc.

I wrote the following test case to compare the outputs of fn1() and fn1NoOpt() below with my extra pattern being applied. I tested the two functions with all of the integers from INT_MIN to INT_MAX.

long
fn1 (int x)
{
  return 42L - (long)(42 - x);
}

#pragma GCC push_options
#pragma GCC optimize ("O0")
long
fn1NoOpt (int x)
{
  volatile int y = (42 - x);
  return 42L - (long)y;
}
#pragma GCC pop_options

int main ()
{
        for (long i=INT_MIN; i<=INT_MAX;i++)
        {
                auto valNoOpt = fn1NoOpt(i);
                auto valOpt = fn1(i);
                if (valNoOpt != valOpt)
                        printf("valOpt=%ld, valNoOpt=%ld\n", valOpt, valNoOpt);
        }
        return 0;
}

I saw that the return values of fn1() and fn1NoOpt() differed when the input was between INT_MIN and INT_MIN+42 inclusive. When passing values in this range to fn1NoOpt(), a signed overflow is triggered which causes the value to differ (undefined behavior). This seems to go in line with what Marc described and I think the transformation is correct in the scenario above. I do think that type casts that result in truncation (i.e. from a higher precision to a lower one) or with unsigned types will result in an incorrect transformation so those scenarios need to be avoided.

Given that the extra pattern I'm adding is taking advantage the undefined behavior of signed integer overflow, I'm considering keeping the existing nop_convert pattern in place and adding a new pattern to cover these new cases. I'd also like to avoid touching nop_convert given that it's used in a number of other patterns.

This is the pattern I have currently:

  (simplify
    (minus (convert1? @0) (convert2? (minus (convert3? @2) @1)))
    (if (operand_equal_p(@0, @2, 0)
        && INTEGRAL_TYPE_P (type)
        && TYPE_OVERFLOW_UNDEFINED(type)
        && !TYPE_OVERFLOW_SANITIZED(type)
        && INTEGRAL_TYPE_P (TREE_TYPE(@1))
        && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@1))
        && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@1))
        && !TYPE_UNSIGNED (TREE_TYPE (@1))
        && !TYPE_UNSIGNED (type)
        && TYPE_PRECISION (TREE_TYPE (@1)) <= TYPE_PRECISION (type)
        && INTEGRAL_TYPE_P (TREE_TYPE(@0))
        && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@0))
        && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0))
        && !TYPE_UNSIGNED (TREE_TYPE (@0))
        && TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (type)
        && TREE_TYPE(@1) == TREE_TYPE(@2))
    (convert @1)))

Is there a more concise/better way of writing the pattern? I was looking for similar checks in match.pd and I couldn't find anything that I could leverage.

I also kept my pattern to the specific scenario I'm seeing with the regression to lower the risk of something breaking. I've limited @1 and @2 to have the same type.

I'm also in favor of adding/running computer verification to make sure the transformation is legal. I've written some tests to verify that the pattern is being applied in the right scenarios and not being applied in others, but I think there are too many possibilities to manually write them all. Is there anything in GCC that can be used to verify that match.pd transformations are correct? I'm thinking of something like Alive https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAliveToolkit%2Falive2&amp;data=04%7C01%7Cvitong%40microsoft.com%7Cba7d8f9f9b774462148608d93a8a0471%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637605186726283785%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ugx%2FTw58OPjLzamE9yqQThV5u4EfQ8JLrurnIy00AzQ%3D&amp;reserved=0.

Thanks,
Victor



From: Richard Biener <richard.guenther@gmail.com>
Sent: Monday, June 21, 2021 12:08 AM
To: Marc Glisse <marc.glisse@inria.fr>
Cc: Victor Tong <vitong@microsoft.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]

On Sat, Jun 19, 2021 at 7:05 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> On Fri, 18 Jun 2021, Richard Biener wrote:
>
> >> Option 2: Add a new pattern to support scenarios that the existing nop_convert pattern bails out on.
> >>
> >> Existing pattern:
> >>
> >> (simplify
> >>    (minus (nop_convert1? @0) (nop_convert2? (minus (nop_convert3? @@0) @1)))
> >>    (view_convert @1))
>
> I tried to check with a program when
>
> T3 x;
> T1 y;
> (T2)x-(T2)((T1)x-y)
>
> can be safely replaced with
>
> (T2)y
>
> From the output, it looks like this is safe when T1 is at least as large
> as T2. It is wrong when T1 is unsigned and smaller than T2. And when T1 is
> signed and smaller than T2, it is ok if T3 is the same type as T1 (signed
> then) or has strictly less precision (any sign), and not in other cases.
>
> Note that this is when signed implies undefined overflow and unsigned
> implies wrapping, and I wouldn't put too much faith in this recently
> dusted program. And it doesn't say how to write the match.pd pattern with
> '?', "@@", disabling it if TYPE_OVERFLOW_SANITIZED, etc.
>
> Mostly, I wanted to say that if we are going to go handle more than
> nop_convert for more than just 1 or 2 easy transformations, I think some
> kind of computer verification would be useful, it would save a lot of time
> and headaches.

True.  I wonder if auto-generating such tests from match.pd rules would
be a good project to work on (GSoC?).  When there's define_match
involved things get a little tricky, but then one item on the TODO list
is "inlining" those at the use site (exploding the decision tree even more).

Richard.

> (I just check by brute force all possible precisions (from 1 to 6) and
> signedness for T1, T2 and T3, all possible values for x and y, compute
> the before and after formulas, accepting if there is UB before, rejecting
> if there is UB after (and not before), and manually try to see a pattern
> in the list of types that work)
>
> --
> Marc Glisse

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
  2021-06-28 23:10               ` Victor Tong
  2021-07-19 20:58                 ` Victor Tong
@ 2021-07-28  9:59                 ` Richard Biener
  2021-08-06 21:17                   ` Victor Tong
  1 sibling, 1 reply; 15+ messages in thread
From: Richard Biener @ 2021-07-28  9:59 UTC (permalink / raw)
  To: Victor Tong; +Cc: Marc Glisse, gcc-patches

On Tue, Jun 29, 2021 at 1:10 AM Victor Tong <vitong@microsoft.com> wrote:
>
> Thanks Richard and Marc.
>
> I wrote the following test case to compare the outputs of fn1() and fn1NoOpt() below with my extra pattern being applied. I tested the two functions with all of the integers from INT_MIN to INT_MAX.
>
> long
> fn1 (int x)
> {
>   return 42L - (long)(42 - x);
> }
>
> #pragma GCC push_options
> #pragma GCC optimize ("O0")
> long
> fn1NoOpt (int x)
> {
>   volatile int y = (42 - x);
>   return 42L - (long)y;
> }
> #pragma GCC pop_options
>
> int main ()
> {
>         for (long i=INT_MIN; i<=INT_MAX;i++)
>         {
>                 auto valNoOpt = fn1NoOpt(i);
>                 auto valOpt = fn1(i);
>                 if (valNoOpt != valOpt)
>                         printf("valOpt=%ld, valNoOpt=%ld\n", valOpt, valNoOpt);
>         }
>         return 0;
> }
>
> I saw that the return values of fn1() and fn1NoOpt() differed when the input was between INT_MIN and INT_MIN+42 inclusive. When passing values in this range to fn1NoOpt(), a signed overflow is triggered which causes the value to differ (undefined behavior). This seems to go in line with what Marc described and I think the transformation is correct in the scenario above. I do think that type casts that result in truncation (i.e. from a higher precision to a lower one) or with unsigned types will result in an incorrect transformation so those scenarios need to be avoided.
>
> Given that the extra pattern I'm adding is taking advantage the undefined behavior of signed integer overflow, I'm considering keeping the existing nop_convert pattern in place and adding a new pattern to cover these new cases. I'd also like to avoid touching nop_convert given that it's used in a number of other patterns.
>
> This is the pattern I have currently:
>
>   (simplify
>     (minus (convert1? @0) (convert2? (minus (convert3? @2) @1)))
>     (if (operand_equal_p(@0, @2, 0)

The operand_equal_p should be reflected by using @0 in place of @2.

>         && INTEGRAL_TYPE_P (type)
>         && TYPE_OVERFLOW_UNDEFINED(type)
>         && !TYPE_OVERFLOW_SANITIZED(type)
>         && INTEGRAL_TYPE_P (TREE_TYPE(@1))
>         && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@1))
>         && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@1))
>         && !TYPE_UNSIGNED (TREE_TYPE (@1))
>         && !TYPE_UNSIGNED (type)

please group checks on common argument.  I think a single test
on INTEGRAL_TYPE_P (type) is enough, it could be ANY_INTEGRAL_TYPE_P
to include vector and complex types.

>         && TYPE_PRECISION (TREE_TYPE (@1)) <= TYPE_PRECISION (type)
>         && INTEGRAL_TYPE_P (TREE_TYPE(@0))
>         && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@0))

why is this testing TREE_TYPE (@0)?  The outer subtract is using 'type',
the inner subtract uses TREE_TYPE (@1) though you could place
a capture on the minus to make what you test more obvious, like

  (minus (convert1? @0) (convert2? (minus@3 (convert3? @2) @1)))

TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@3))

there's one set of checks too much I think.

>         && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0))
>         && !TYPE_UNSIGNED (TREE_TYPE (@0))

we only ever have TYPE_OVERFLOW_UNDEFINED on singed types so
the !TYPE_UNSIGNED checks are superfluous.

>         && TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (type)
>         && TREE_TYPE(@1) == TREE_TYPE(@2))

This check means that convert3 is never present since a MINUS requires
compatible types.

Sorry for the late reply.

Note the pattern will necessarily be partly redundant with the

  (simplify
   (minus (nop_convert1? (minus (nop_convert2? @0) @1)) @0)
   (if (!ANY_INTEGRAL_TYPE_P (type)
        || TYPE_OVERFLOW_WRAPS (type))
   (negate (view_convert @1))
   (view_convert (negate @1))))

pattern.  Once we'd "inline" nop_convert genmatch would complain
about this.

>     (convert @1)))
>
> Is there a more concise/better way of writing the pattern? I was looking for similar checks in match.pd and I couldn't find anything that I could leverage.
>
> I also kept my pattern to the specific scenario I'm seeing with the regression to lower the risk of something breaking. I've limited @1 and @2 to have the same type.
>
> I'm also in favor of adding/running computer verification to make sure the transformation is legal. I've written some tests to verify that the pattern is being applied in the right scenarios and not being applied in others, but I think there are too many possibilities to manually write them all. Is there anything in GCC that can be used to verify that match.pd transformations are correct? I'm thinking of something like Alive https://github.com/AliveToolkit/alive2.
>
> Thanks,
> Victor
>
>
>
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Monday, June 21, 2021 12:08 AM
> To: Marc Glisse <marc.glisse@inria.fr>
> Cc: Victor Tong <vitong@microsoft.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
>
> On Sat, Jun 19, 2021 at 7:05 PM Marc Glisse <marc.glisse@inria.fr> wrote:
> >
> > On Fri, 18 Jun 2021, Richard Biener wrote:
> >
> > >> Option 2: Add a new pattern to support scenarios that the existing nop_convert pattern bails out on.
> > >>
> > >> Existing pattern:
> > >>
> > >> (simplify
> > >>    (minus (nop_convert1? @0) (nop_convert2? (minus (nop_convert3? @@0) @1)))
> > >>    (view_convert @1))
> >
> > I tried to check with a program when
> >
> > T3 x;
> > T1 y;
> > (T2)x-(T2)((T1)x-y)
> >
> > can be safely replaced with
> >
> > (T2)y
> >
> > From the output, it looks like this is safe when T1 is at least as large
> > as T2. It is wrong when T1 is unsigned and smaller than T2. And when T1 is
> > signed and smaller than T2, it is ok if T3 is the same type as T1 (signed
> > then) or has strictly less precision (any sign), and not in other cases.
> >
> > Note that this is when signed implies undefined overflow and unsigned
> > implies wrapping, and I wouldn't put too much faith in this recently
> > dusted program. And it doesn't say how to write the match.pd pattern with
> > '?', "@@", disabling it if TYPE_OVERFLOW_SANITIZED, etc.
> >
> > Mostly, I wanted to say that if we are going to go handle more than
> > nop_convert for more than just 1 or 2 easy transformations, I think some
> > kind of computer verification would be useful, it would save a lot of time
> > and headaches.
>
> True.  I wonder if auto-generating such tests from match.pd rules would
> be a good project to work on (GSoC?).  When there's define_match
> involved things get a little tricky, but then one item on the TODO list
> is "inlining" those at the use site (exploding the decision tree even more).
>
> Richard.
>
> > (I just check by brute force all possible precisions (from 1 to 6) and
> > signedness for T1, T2 and T3, all possible values for x and y, compute
> > the before and after formulas, accepting if there is UB before, rejecting
> > if there is UB after (and not before), and manually try to see a pattern
> > in the list of types that work)
> >
> > --
> > Marc Glisse

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
  2021-07-28  9:59                 ` Richard Biener
@ 2021-08-06 21:17                   ` Victor Tong
  2021-08-06 22:49                     ` Marc Glisse
  0 siblings, 1 reply; 15+ messages in thread
From: Victor Tong @ 2021-08-06 21:17 UTC (permalink / raw)
  To: Richard Biener; +Cc: Marc Glisse, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 9780 bytes --]

Thanks for the feedback. Here's the updated pattern:

  /* X - (X - Y) --> Y */
  (simplify
    (minus (convert1? @0) (convert2? (minus@2 (convert3? @@0) @1)))
    (if (ANY_INTEGRAL_TYPE_P (type)
        && TYPE_OVERFLOW_UNDEFINED(type)
        && !TYPE_OVERFLOW_SANITIZED(type)
        && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@2))
        && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@2))
        && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@2))
        && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@0))
        && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@0))
        && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0))
        && TYPE_PRECISION (TREE_TYPE (@2)) <= TYPE_PRECISION (type)
        && TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (type))
    (convert @1)))

I kept the TYPE_OVERFLOW_SANITIZED checks because I saw other patterns that leverage undefined overflows check for it. I think this new pattern shouldn't be applied if overflow sanitizer checks are enabled.

>> why is this testing TREE_TYPE (@0)?

I'm checking the type of @0 because I'm concerned that there could be a case where @0's type isn't an integer type with undefined overflow. I tried creating a test case and couldn't seem to create one where this is violated but I kept the checks to avoid causing a regression. If I'm being overcautious and you feel that the type checks on @0 aren't needed, I can remove them. I think the precision check on TREE_TYPE(@0) is needed to avoid truncation cases though.

>> Once we'd "inline" nop_convert genmatch would complain
about this.

Is someone working on inlining nop_convert? I'd like to avoid breaking someone else's work if that's being worked on right now.

I've also added some extra tests to cover this new pattern. I've attached a patch with my latest changes.


From: Richard Biener <richard.guenther@gmail.com>
Sent: Wednesday, July 28, 2021 2:59 AM
To: Victor Tong <vitong@microsoft.com>
Cc: Marc Glisse <marc.glisse@inria.fr>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176] 
 
On Tue, Jun 29, 2021 at 1:10 AM Victor Tong <vitong@microsoft.com> wrote:
>
> Thanks Richard and Marc.
>
> I wrote the following test case to compare the outputs of fn1() and fn1NoOpt() below with my extra pattern being applied. I tested the two functions with all of the integers from INT_MIN to INT_MAX.
>
> long
> fn1 (int x)
> {
>   return 42L - (long)(42 - x);
> }
>
> #pragma GCC push_options
> #pragma GCC optimize ("O0")
> long
> fn1NoOpt (int x)
> {
>   volatile int y = (42 - x);
>   return 42L - (long)y;
> }
> #pragma GCC pop_options
>
> int main ()
> {
>         for (long i=INT_MIN; i<=INT_MAX;i++)
>         {
>                 auto valNoOpt = fn1NoOpt(i);
>                 auto valOpt = fn1(i);
>                 if (valNoOpt != valOpt)
>                         printf("valOpt=%ld, valNoOpt=%ld\n", valOpt, valNoOpt);
>         }
>         return 0;
> }
>
> I saw that the return values of fn1() and fn1NoOpt() differed when the input was between INT_MIN and INT_MIN+42 inclusive. When passing values in this range to fn1NoOpt(), a signed overflow is triggered which causes the value to differ (undefined behavior). This seems to go in line with what Marc described and I think the transformation is correct in the scenario above. I do think that type casts that result in truncation (i.e. from a higher precision to a lower one) or with unsigned types will result in an incorrect transformation so those scenarios need to be avoided.
>
> Given that the extra pattern I'm adding is taking advantage the undefined behavior of signed integer overflow, I'm considering keeping the existing nop_convert pattern in place and adding a new pattern to cover these new cases. I'd also like to avoid touching nop_convert given that it's used in a number of other patterns.
>
> This is the pattern I have currently:
>
>   (simplify
>     (minus (convert1? @0) (convert2? (minus (convert3? @2) @1)))
>     (if (operand_equal_p(@0, @2, 0)

The operand_equal_p should be reflected by using @0 in place of @2.

>         && INTEGRAL_TYPE_P (type)
>         && TYPE_OVERFLOW_UNDEFINED(type)
>         && !TYPE_OVERFLOW_SANITIZED(type)
>         && INTEGRAL_TYPE_P (TREE_TYPE(@1))
>         && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@1))
>         && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@1))
>         && !TYPE_UNSIGNED (TREE_TYPE (@1))
>         && !TYPE_UNSIGNED (type)

please group checks on common argument.  I think a single test
on INTEGRAL_TYPE_P (type) is enough, it could be ANY_INTEGRAL_TYPE_P
to include vector and complex types.

>         && TYPE_PRECISION (TREE_TYPE (@1)) <= TYPE_PRECISION (type)
>         && INTEGRAL_TYPE_P (TREE_TYPE(@0))
>         && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@0))

why is this testing TREE_TYPE (@0)?  The outer subtract is using 'type',
the inner subtract uses TREE_TYPE (@1) though you could place
a capture on the minus to make what you test more obvious, like

  (minus (convert1? @0) (convert2? (minus@3 (convert3? @2) @1)))

TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@3))

there's one set of checks too much I think.

>         && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0))
>         && !TYPE_UNSIGNED (TREE_TYPE (@0))

we only ever have TYPE_OVERFLOW_UNDEFINED on singed types so
the !TYPE_UNSIGNED checks are superfluous.

>         && TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (type)
>         && TREE_TYPE(@1) == TREE_TYPE(@2))

This check means that convert3 is never present since a MINUS requires
compatible types.

Sorry for the late reply.

Note the pattern will necessarily be partly redundant with the

  (simplify
   (minus (nop_convert1? (minus (nop_convert2? @0) @1)) @0)
   (if (!ANY_INTEGRAL_TYPE_P (type)
        || TYPE_OVERFLOW_WRAPS (type))
   (negate (view_convert @1))
   (view_convert (negate @1))))

pattern.  Once we'd "inline" nop_convert genmatch would complain
about this.

>     (convert @1)))
>
> Is there a more concise/better way of writing the pattern? I was looking for similar checks in match.pd and I couldn't find anything that I could leverage.
>
> I also kept my pattern to the specific scenario I'm seeing with the regression to lower the risk of something breaking. I've limited @1 and @2 to have the same type.
>
> I'm also in favor of adding/running computer verification to make sure the transformation is legal. I've written some tests to verify that the pattern is being applied in the right scenarios and not being applied in others, but I think there are too many possibilities to manually write them all. Is there anything in GCC that can be used to verify that match.pd transformations are correct? I'm thinking of something like Alive https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAliveToolkit%2Falive2&amp;data=04%7C01%7Cvitong%40microsoft.com%7C64334bd5c79443af04ec08d951ae6117%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637630631678231097%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=x9Heq%2Bjgb%2Fy1PJFZ0gT9yfkQuzOz0rzWnZsl7VOymp0%3D&amp;reserved=0.
>
> Thanks,
> Victor
>
>
>
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Monday, June 21, 2021 12:08 AM
> To: Marc Glisse <marc.glisse@inria.fr>
> Cc: Victor Tong <vitong@microsoft.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
>
> On Sat, Jun 19, 2021 at 7:05 PM Marc Glisse <marc.glisse@inria.fr> wrote:
> >
> > On Fri, 18 Jun 2021, Richard Biener wrote:
> >
> > >> Option 2: Add a new pattern to support scenarios that the existing nop_convert pattern bails out on.
> > >>
> > >> Existing pattern:
> > >>
> > >> (simplify
> > >>    (minus (nop_convert1? @0) (nop_convert2? (minus (nop_convert3? @@0) @1)))
> > >>    (view_convert @1))
> >
> > I tried to check with a program when
> >
> > T3 x;
> > T1 y;
> > (T2)x-(T2)((T1)x-y)
> >
> > can be safely replaced with
> >
> > (T2)y
> >
> > From the output, it looks like this is safe when T1 is at least as large
> > as T2. It is wrong when T1 is unsigned and smaller than T2. And when T1 is
> > signed and smaller than T2, it is ok if T3 is the same type as T1 (signed
> > then) or has strictly less precision (any sign), and not in other cases.
> >
> > Note that this is when signed implies undefined overflow and unsigned
> > implies wrapping, and I wouldn't put too much faith in this recently
> > dusted program. And it doesn't say how to write the match.pd pattern with
> > '?', "@@", disabling it if TYPE_OVERFLOW_SANITIZED, etc.
> >
> > Mostly, I wanted to say that if we are going to go handle more than
> > nop_convert for more than just 1 or 2 easy transformations, I think some
> > kind of computer verification would be useful, it would save a lot of time
> > and headaches.
>
> True.  I wonder if auto-generating such tests from match.pd rules would
> be a good project to work on (GSoC?).  When there's define_match
> involved things get a little tricky, but then one item on the TODO list
> is "inlining" those at the use site (exploding the decision tree even more).
>
> Richard.
>
> > (I just check by brute force all possible precisions (from 1 to 6) and
> > signedness for T1, T2 and T3, all possible values for x and y, compute
> > the before and after formulas, accepting if there is UB before, rejecting
> > if there is UB after (and not before), and manually try to see a pattern
> > in the list of types that work)
> >
> > --
> > Marc Glisse

[-- Attachment #2: pr95176-2.patch --]
[-- Type: application/octet-stream, Size: 6019 bytes --]

diff --git a/gcc/match.pd b/gcc/match.pd
index 19f4a782ae9..106895aa568 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -599,6 +599,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
   (convert (trunc_mod @0 @1))))
 
+/* X * (Y / X) is the same as Y - (Y % X).  */
+(simplify
+ (mult:c (convert1? @0) (convert2? (trunc_div @1 @@0)))
+ (if (INTEGRAL_TYPE_P (type))
+  (minus (convert @1) (convert (trunc_mod @1 @0)))))
+
 /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,
    i.e. "X % C" into "X & (C - 1)", if X and C are positive.
    Also optimize A % (C << N)  where C is a power of 2,
@@ -2388,9 +2394,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 	 || TYPE_OVERFLOW_WRAPS (type))
      (negate (view_convert @1))
      (view_convert (negate @1))))
+
   (simplify
    (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
    (view_convert @1))
+
+  /* X - (X - Y) --> Y */
+  (simplify
+    (minus (convert1? @0) (convert2? (minus@2 (convert3? @@0) @1)))
+    (if (ANY_INTEGRAL_TYPE_P (type)
+        && TYPE_OVERFLOW_UNDEFINED(type)
+        && !TYPE_OVERFLOW_SANITIZED(type)
+        && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@2))
+        && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@2))
+        && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@2))
+        && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@0))
+        && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@0))
+        && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0))
+        && TYPE_PRECISION (TREE_TYPE (@2)) <= TYPE_PRECISION (type)
+        && TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (type))
+    (convert @1)))
+
   /* (A +- B) + (C - A)   -> C +- B */
   /* (A +  B) - (A - C)   -> B + C */
   /* More cases are handled with comparisons.  */
diff --git a/gcc/testsuite/gcc.dg/pr95176-2.c b/gcc/testsuite/gcc.dg/pr95176-2.c
new file mode 100644
index 00000000000..03fed413240
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr95176-2.c
@@ -0,0 +1,91 @@
+/* { dg-do run } */
+/* { dg-options "-O -fdump-tree-optimized-raw" } */
+
+/* Test the X - (X - Y) --> Y transformation */
+
+extern int printf (__const char *__restrict __format, ...);
+
+int __attribute__ ((noinline))
+f(int a, int b)
+{
+    return a - (a-b);
+}
+
+int __attribute__((optimize("O0"))) __attribute__ ((noinline))
+fNoOpt(volatile int a, volatile int b)
+{
+    return a - (a-b);
+}
+
+long __attribute__ ((noinline))
+f2(short a, long b)
+{
+	return a - (((int) a)-b);
+}
+
+long __attribute__((optimize("O0"))) __attribute__ ((noinline))
+f2NoOpt(volatile short a, volatile long b)
+{
+    return a - (((int) a)-b);
+}
+
+long __attribute__ ((noinline))
+f3(short a, long b)
+{
+    return a - (a-b);
+}
+
+long __attribute__((optimize("O0"))) __attribute__ ((noinline))
+f3NoOpt(volatile short a, volatile long b)
+{
+    return a - (a-b);
+}
+
+long long __attribute__ ((noinline))
+f4(long b)
+{
+    return ((short)40L) - (((int)40)-b);
+}
+
+long __attribute__((optimize("O0"))) __attribute__ ((noinline))
+f4NoOpt(volatile long b)
+{
+    return ((short)40L) - (((int)40)-b);
+}
+
+long __attribute__ ((noinline))
+fNeg(long a, long b)
+{
+	return a - (((short) a)-b);
+}
+
+long long __attribute__ ((noinline))
+fNeg2(long long v, long b)
+{
+    return ((int)v) - (((short)v)-b);
+}
+
+long __attribute__ ((noinline))
+fNeg3(short a, long b)
+{
+	return a - (((unsigned int) a)-b);
+}
+
+int main()
+{
+    if (f(2, 15) != fNoOpt(2, 15))
+        __builtin_abort();
+    if (f2(2, 15) != f2NoOpt(2, 15))
+        __builtin_abort();
+    if (f3(2, 15) != f3NoOpt(2, 15))
+        __builtin_abort();
+    if (f4(15) != f4NoOpt(15))
+        __builtin_abort();
+    printf("pass");
+}
+
+// There should be 14 instances of minus_expr in the output:
+// Two for each NoOpt* function
+// Two for each fNeg* function
+/* { dg-final { scan-tree-dump-times "minus_expr" 12 "optimized" } } */
+/* { dg-output "pass" } */
diff --git a/gcc/testsuite/gcc.dg/pr95176.c b/gcc/testsuite/gcc.dg/pr95176.c
new file mode 100644
index 00000000000..ef087317187
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr95176.c
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+/* { dg-options "-O -fdump-tree-optimized-raw" } */
+
+extern int printf (__const char *__restrict __format, ...);
+
+int __attribute__ ((noinline))
+f(int a, int b)
+{
+    return a * (b / a);
+}
+
+int __attribute__((optimize("O0"))) __attribute__ ((noinline))
+fNoOpt(int a, int b)
+{
+    return a * (b / a);
+}
+
+int __attribute__ ((noinline))
+f2(int a, int b)
+{
+    return (b / a) * a;
+}
+
+int __attribute__((optimize("O0"))) __attribute__ ((noinline))
+f2NoOpt(int a, int b)
+{
+    return (b / a) * a;
+}
+
+int main()
+{
+    if (f(2, 15) != fNoOpt(2, 15))
+        __builtin_abort();
+    if (f2(2, 15) != f2NoOpt(2, 15))
+        __builtin_abort();
+    printf("pass");
+}
+
+// There should be two instances of trunc_div_expr and 
+// mult_expr in the output. One in fNoOpt and one in f2NoOpt.
+/* { dg-final { scan-tree-dump-times "trunc_div_expr" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "mult_expr" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "trunc_mod_expr" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "minus_expr" 2 "optimized" } } */
+/* { dg-output "pass" } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030807-10.c b/gcc/testsuite/gcc.dg/tree-ssa/20030807-10.c
index 0e01e511b78..4cd35738057 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20030807-10.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20030807-10.c
@@ -20,6 +20,9 @@ subreg_highpart_offset (outermode, innermode)
 /* There should be one mask with the value 3.  */
 /* { dg-final { scan-tree-dump-times " \& 3" 1 "vrp1"} } */
   
-/* There should be one right shift by 2 places.  */
-/* { dg-final { scan-tree-dump-times " >> 2" 1 "vrp1"} } */
+/* There should be no right shift by 2 places.  */
+/* { dg-final { scan-tree-dump-times " >> 2" 0 "vrp1"} } */
+
+/* The "difference / 4 * 4" should become a subtraction */
+/* { dg-final { scan-tree-dump-times " - " 2 "vrp1"} } */
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
  2021-08-06 21:17                   ` Victor Tong
@ 2021-08-06 22:49                     ` Marc Glisse
  2021-08-09  9:58                       ` Richard Biener
  0 siblings, 1 reply; 15+ messages in thread
From: Marc Glisse @ 2021-08-06 22:49 UTC (permalink / raw)
  To: Victor Tong; +Cc: Richard Biener, gcc-patches

On Fri, 6 Aug 2021, Victor Tong wrote:

> Thanks for the feedback. Here's the updated pattern:
>
>  /* X - (X - Y) --> Y */
>  (simplify
>    (minus (convert1? @0) (convert2? (minus@2 (convert3? @@0) @1)))
>    (if (ANY_INTEGRAL_TYPE_P (type)
>        && TYPE_OVERFLOW_UNDEFINED(type)
>        && !TYPE_OVERFLOW_SANITIZED(type)
>        && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@2))
>        && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@2))
>        && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@2))
>        && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@0))
>        && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@0))
>        && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0))
>        && TYPE_PRECISION (TREE_TYPE (@2)) <= TYPE_PRECISION (type)
>        && TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (type))
>    (convert @1)))
>
> I kept the TYPE_OVERFLOW_SANITIZED checks because I saw other patterns that leverage undefined overflows check for it. I think this new pattern shouldn't be applied if overflow sanitizer checks are enabled.
>
>>> why is this testing TREE_TYPE (@0)?
>
> I'm checking the type of @0 because I'm concerned that there could be a case where @0's type isn't an integer type with undefined overflow. I tried creating a test case and couldn't seem to create one where this is violated but I kept the checks to avoid causing a regression. If I'm being overcautious and you feel that the type checks on @0 aren't needed, I can remove them. I think the precision check on TREE_TYPE(@0) is needed to avoid truncation cases though.

It doesn't matter if @0 has undefined overflow, but it can matter that it 
be signed (yes, the 2 are correlated...) if it has the same precision as 
@2. Otherwise (int64_t)(-1u)-(int64_t)((int)(-1u)-0) is definitely not 0 
and it has type:int64_t, @2:int, @0:unsigned.

Ignoring the sanitizer, the confusing double matching of constants, and 
restricting to scalars, I think the tightest condition (without vrp) that 
works is

TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2)) ||
  TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@2)) &&
   (TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (TREE_TYPE (@2)) ||
    TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@2)) &&
    !TYPE_UNSIGNED (TREE_TYPE (@0))
   )

(where implicitly undefined => signed) but of course it is ok to handle
only a subset. It is too late for me to think about @0 vs @@0.

The patch you posted seems to accept the wrong 
(int128t)LLONG_MAX-(int128_t)((int)LLONG_MAX-0) -> (int128_t)0

Using ANY_INTEGRAL_TYPE_P allows vectors, and TYPE_PRECISION mustn't be 
used for vectors (maybe we should add more checking to catch such uses), 
and they may not be very happy with convert either...

>>> Once we'd "inline" nop_convert genmatch would complain about this.

Maybe the new transform could be about scalars, and we could restrict the 
old one to vectors, to simplify the code, but that wouldn't help genmatch 
with the fact that the pattern x-(x-y) would appear twice. Is it really 
that bad? It is suspicious, but can be justified.

> Is someone working on inlining nop_convert? I'd like to avoid breaking someone else's work if that's being worked on right now.
>
> I've also added some extra tests to cover this new pattern. I've attached a patch with my latest changes.
>
>
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Wednesday, July 28, 2021 2:59 AM
> To: Victor Tong <vitong@microsoft.com>
> Cc: Marc Glisse <marc.glisse@inria.fr>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176] 
>  
> On Tue, Jun 29, 2021 at 1:10 AM Victor Tong <vitong@microsoft.com> wrote:
>>
>> Thanks Richard and Marc.
>>
>> I wrote the following test case to compare the outputs of fn1() and fn1NoOpt() below with my extra pattern being applied. I tested the two functions with all of the integers from INT_MIN to INT_MAX.
>>
>> long
>> fn1 (int x)
>> {
>>    return 42L - (long)(42 - x);
>> }
>>
>> #pragma GCC push_options
>> #pragma GCC optimize ("O0")
>> long
>> fn1NoOpt (int x)
>> {
>>    volatile int y = (42 - x);
>>    return 42L - (long)y;
>> }
>> #pragma GCC pop_options
>>
>> int main ()
>> {
>>          for (long i=INT_MIN; i<=INT_MAX;i++)
>>          {
>>                  auto valNoOpt = fn1NoOpt(i);
>>                  auto valOpt = fn1(i);
>>                  if (valNoOpt != valOpt)
>>                          printf("valOpt=%ld, valNoOpt=%ld\n", valOpt, valNoOpt);
>>          }
>>          return 0;
>> }
>>
>> I saw that the return values of fn1() and fn1NoOpt() differed when the input was between INT_MIN and INT_MIN+42 inclusive. When passing values in this range to fn1NoOpt(), a signed overflow is triggered which causes the value to differ (undefined behavior). This seems to go in line with what Marc described and I think the transformation is correct in the scenario above. I do think that type casts that result in truncation (i.e. from a higher precision to a lower one) or with unsigned types will result in an incorrect transformation so those scenarios need to be avoided.
>>
>> Given that the extra pattern I'm adding is taking advantage the undefined behavior of signed integer overflow, I'm considering keeping the existing nop_convert pattern in place and adding a new pattern to cover these new cases. I'd also like to avoid touching nop_convert given that it's used in a number of other patterns.
>>
>> This is the pattern I have currently:
>>
>>    (simplify
>>      (minus (convert1? @0) (convert2? (minus (convert3? @2) @1)))
>>      (if (operand_equal_p(@0, @2, 0)
>
> The operand_equal_p should be reflected by using @0 in place of @2.
>
>>          && INTEGRAL_TYPE_P (type)
>>          && TYPE_OVERFLOW_UNDEFINED(type)
>>          && !TYPE_OVERFLOW_SANITIZED(type)
>>          && INTEGRAL_TYPE_P (TREE_TYPE(@1))
>>          && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@1))
>>          && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@1))
>>          && !TYPE_UNSIGNED (TREE_TYPE (@1))
>>          && !TYPE_UNSIGNED (type)
>
> please group checks on common argument.  I think a single test
> on INTEGRAL_TYPE_P (type) is enough, it could be ANY_INTEGRAL_TYPE_P
> to include vector and complex types.
>
>>          && TYPE_PRECISION (TREE_TYPE (@1)) <= TYPE_PRECISION (type)
>>          && INTEGRAL_TYPE_P (TREE_TYPE(@0))
>>          && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@0))
>
> why is this testing TREE_TYPE (@0)?  The outer subtract is using 'type',
> the inner subtract uses TREE_TYPE (@1) though you could place
> a capture on the minus to make what you test more obvious, like
>
>   (minus (convert1? @0) (convert2? (minus@3 (convert3? @2) @1)))
>
> TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@3))
>
> there's one set of checks too much I think.
>
>>          && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0))
>>          && !TYPE_UNSIGNED (TREE_TYPE (@0))
>
> we only ever have TYPE_OVERFLOW_UNDEFINED on singed types so
> the !TYPE_UNSIGNED checks are superfluous.
>
>>          && TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (type)
>>          && TREE_TYPE(@1) == TREE_TYPE(@2))
>
> This check means that convert3 is never present since a MINUS requires
> compatible types.
>
> Sorry for the late reply.
>
> Note the pattern will necessarily be partly redundant with the
>
>   (simplify
>    (minus (nop_convert1? (minus (nop_convert2? @0) @1)) @0)
>    (if (!ANY_INTEGRAL_TYPE_P (type)
>         || TYPE_OVERFLOW_WRAPS (type))
>    (negate (view_convert @1))
>    (view_convert (negate @1))))
>
> pattern.  Once we'd "inline" nop_convert genmatch would complain
> about this.
>
>>      (convert @1)))
>>
>> Is there a more concise/better way of writing the pattern? I was looking for similar checks in match.pd and I couldn't find anything that I could leverage.
>>
>> I also kept my pattern to the specific scenario I'm seeing with the regression to lower the risk of something breaking. I've limited @1 and @2 to have the same type.
>>
>> I'm also in favor of adding/running computer verification to make sure the transformation is legal. I've written some tests to verify that the pattern is being applied in the right scenarios and not being applied in others, but I think there are too many possibilities to manually write them all. Is there anything in GCC that can be used to verify that match.pd transformations are correct? I'm thinking of something like Alive https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAliveToolkit%2Falive2&amp;data=04%7C01%7Cvitong%40microsoft.com%7C64334bd5c79443af04ec08d951ae6117%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637630631678231097%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=x9Heq%2Bjgb%2Fy1PJFZ0gT9yfkQuzOz0rzWnZsl7VOymp0%3D&amp;reserved=0.
>>
>> Thanks,
>> Victor
>>
>>
>>
>> From: Richard Biener <richard.guenther@gmail.com>
>> Sent: Monday, June 21, 2021 12:08 AM
>> To: Marc Glisse <marc.glisse@inria.fr>
>> Cc: Victor Tong <vitong@microsoft.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
>> Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
>>
>> On Sat, Jun 19, 2021 at 7:05 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>> >
>> > On Fri, 18 Jun 2021, Richard Biener wrote:
>> >
>> > >> Option 2: Add a new pattern to support scenarios that the existing nop_convert pattern bails out on.
>> > >>
>> > >> Existing pattern:
>> > >>
>> > >> (simplify
>> > >>    (minus (nop_convert1? @0) (nop_convert2? (minus (nop_convert3? @@0) @1)))
>> > >>    (view_convert @1))
>> >
>> > I tried to check with a program when
>> >
>> > T3 x;
>> > T1 y;
>> > (T2)x-(T2)((T1)x-y)
>> >
>> > can be safely replaced with
>> >
>> > (T2)y
>> >
>> > From the output, it looks like this is safe when T1 is at least as large
>> > as T2. It is wrong when T1 is unsigned and smaller than T2. And when T1 is
>> > signed and smaller than T2, it is ok if T3 is the same type as T1 (signed
>> > then) or has strictly less precision (any sign), and not in other cases.
>> >
>> > Note that this is when signed implies undefined overflow and unsigned
>> > implies wrapping, and I wouldn't put too much faith in this recently
>> > dusted program. And it doesn't say how to write the match.pd pattern with
>> > '?', "@@", disabling it if TYPE_OVERFLOW_SANITIZED, etc.
>> >
>> > Mostly, I wanted to say that if we are going to go handle more than
>> > nop_convert for more than just 1 or 2 easy transformations, I think some
>> > kind of computer verification would be useful, it would save a lot of time
>> > and headaches.
>>
>> True.  I wonder if auto-generating such tests from match.pd rules would
>> be a good project to work on (GSoC?).  When there's define_match
>> involved things get a little tricky, but then one item on the TODO list
>> is "inlining" those at the use site (exploding the decision tree even more).
>>
>> Richard.
>>
>> > (I just check by brute force all possible precisions (from 1 to 6) and
>> > signedness for T1, T2 and T3, all possible values for x and y, compute
>> > the before and after formulas, accepting if there is UB before, rejecting
>> > if there is UB after (and not before), and manually try to see a pattern
>> > in the list of types that work)
>> >
>> > --
>> > Marc Glisse

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
  2021-08-06 22:49                     ` Marc Glisse
@ 2021-08-09  9:58                       ` Richard Biener
  2021-08-23  0:44                         ` Victor Tong
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Biener @ 2021-08-09  9:58 UTC (permalink / raw)
  To: Marc Glisse; +Cc: Victor Tong, gcc-patches

On Sat, Aug 7, 2021 at 12:49 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> On Fri, 6 Aug 2021, Victor Tong wrote:
>
> > Thanks for the feedback. Here's the updated pattern:
> >
> >  /* X - (X - Y) --> Y */
> >  (simplify
> >    (minus (convert1? @0) (convert2? (minus@2 (convert3? @@0) @1)))
> >    (if (ANY_INTEGRAL_TYPE_P (type)
> >        && TYPE_OVERFLOW_UNDEFINED(type)
> >        && !TYPE_OVERFLOW_SANITIZED(type)
> >        && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@2))
> >        && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@2))
> >        && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@2))
> >        && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@0))
> >        && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@0))
> >        && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0))
> >        && TYPE_PRECISION (TREE_TYPE (@2)) <= TYPE_PRECISION (type)
> >        && TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (type))
> >    (convert @1)))
> >
> > I kept the TYPE_OVERFLOW_SANITIZED checks because I saw other patterns that leverage undefined overflows check for it. I think this new pattern shouldn't be applied if overflow sanitizer checks are enabled.
> >
> >>> why is this testing TREE_TYPE (@0)?
> >
> > I'm checking the type of @0 because I'm concerned that there could be a case where @0's type isn't an integer type with undefined overflow. I tried creating a test case and couldn't seem to create one where this is violated but I kept the checks to avoid causing a regression. If I'm being overcautious and you feel that the type checks on @0 aren't needed, I can remove them. I think the precision check on TREE_TYPE(@0) is needed to avoid truncation cases though.
>
> It doesn't matter if @0 has undefined overflow, but it can matter that it
> be signed (yes, the 2 are correlated...) if it has the same precision as
> @2. Otherwise (int64_t)(-1u)-(int64_t)((int)(-1u)-0) is definitely not 0
> and it has type:int64_t, @2:int, @0:unsigned.
>
> Ignoring the sanitizer, the confusing double matching of constants, and
> restricting to scalars, I think the tightest condition (without vrp) that
> works is
>
> TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2)) ||
>   TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@2)) &&
>    (TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (TREE_TYPE (@2)) ||
>     TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@2)) &&
>     !TYPE_UNSIGNED (TREE_TYPE (@0))
>    )
>
> (where implicitly undefined => signed) but of course it is ok to handle
> only a subset. It is too late for me to think about @0 vs @@0.
>
> The patch you posted seems to accept the wrong
> (int128t)LLONG_MAX-(int128_t)((int)LLONG_MAX-0) -> (int128_t)0
>
> Using ANY_INTEGRAL_TYPE_P allows vectors, and TYPE_PRECISION mustn't be
> used for vectors (maybe we should add more checking to catch such uses),
> and they may not be very happy with convert either...
>
> >>> Once we'd "inline" nop_convert genmatch would complain about this.
>
> Maybe the new transform could be about scalars, and we could restrict the
> old one to vectors, to simplify the code,

Hmm, I guess that might indeed be a good idea.

> but that wouldn't help genmatch
> with the fact that the pattern x-(x-y) would appear twice. Is it really
> that bad? It is suspicious, but can be justified.

I think genmatch will either complain or might end up generating
unreachable code.
Note with the present patch there's probably order forcing patterns inbetween so
it could simply work.  Otherwise whether it works or not (as expected)
depends on placing
positions of the patterns ...

So no, it's not inherently "bad" but it's not designed to work.

> > Is someone working on inlining nop_convert? I'd like to avoid breaking someone else's work if that's being worked on right now.
> >
> > I've also added some extra tests to cover this new pattern. I've attached a patch with my latest changes.
> >
> >
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: Wednesday, July 28, 2021 2:59 AM
> > To: Victor Tong <vitong@microsoft.com>
> > Cc: Marc Glisse <marc.glisse@inria.fr>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> > Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
> >
> > On Tue, Jun 29, 2021 at 1:10 AM Victor Tong <vitong@microsoft.com> wrote:
> >>
> >> Thanks Richard and Marc.
> >>
> >> I wrote the following test case to compare the outputs of fn1() and fn1NoOpt() below with my extra pattern being applied. I tested the two functions with all of the integers from INT_MIN to INT_MAX.
> >>
> >> long
> >> fn1 (int x)
> >> {
> >>    return 42L - (long)(42 - x);
> >> }
> >>
> >> #pragma GCC push_options
> >> #pragma GCC optimize ("O0")
> >> long
> >> fn1NoOpt (int x)
> >> {
> >>    volatile int y = (42 - x);
> >>    return 42L - (long)y;
> >> }
> >> #pragma GCC pop_options
> >>
> >> int main ()
> >> {
> >>          for (long i=INT_MIN; i<=INT_MAX;i++)
> >>          {
> >>                  auto valNoOpt = fn1NoOpt(i);
> >>                  auto valOpt = fn1(i);
> >>                  if (valNoOpt != valOpt)
> >>                          printf("valOpt=%ld, valNoOpt=%ld\n", valOpt, valNoOpt);
> >>          }
> >>          return 0;
> >> }
> >>
> >> I saw that the return values of fn1() and fn1NoOpt() differed when the input was between INT_MIN and INT_MIN+42 inclusive. When passing values in this range to fn1NoOpt(), a signed overflow is triggered which causes the value to differ (undefined behavior). This seems to go in line with what Marc described and I think the transformation is correct in the scenario above. I do think that type casts that result in truncation (i.e. from a higher precision to a lower one) or with unsigned types will result in an incorrect transformation so those scenarios need to be avoided.
> >>
> >> Given that the extra pattern I'm adding is taking advantage the undefined behavior of signed integer overflow, I'm considering keeping the existing nop_convert pattern in place and adding a new pattern to cover these new cases. I'd also like to avoid touching nop_convert given that it's used in a number of other patterns.
> >>
> >> This is the pattern I have currently:
> >>
> >>    (simplify
> >>      (minus (convert1? @0) (convert2? (minus (convert3? @2) @1)))
> >>      (if (operand_equal_p(@0, @2, 0)
> >
> > The operand_equal_p should be reflected by using @0 in place of @2.
> >
> >>          && INTEGRAL_TYPE_P (type)
> >>          && TYPE_OVERFLOW_UNDEFINED(type)
> >>          && !TYPE_OVERFLOW_SANITIZED(type)
> >>          && INTEGRAL_TYPE_P (TREE_TYPE(@1))
> >>          && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@1))
> >>          && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@1))
> >>          && !TYPE_UNSIGNED (TREE_TYPE (@1))
> >>          && !TYPE_UNSIGNED (type)
> >
> > please group checks on common argument.  I think a single test
> > on INTEGRAL_TYPE_P (type) is enough, it could be ANY_INTEGRAL_TYPE_P
> > to include vector and complex types.
> >
> >>          && TYPE_PRECISION (TREE_TYPE (@1)) <= TYPE_PRECISION (type)
> >>          && INTEGRAL_TYPE_P (TREE_TYPE(@0))
> >>          && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@0))
> >
> > why is this testing TREE_TYPE (@0)?  The outer subtract is using 'type',
> > the inner subtract uses TREE_TYPE (@1) though you could place
> > a capture on the minus to make what you test more obvious, like
> >
> >   (minus (convert1? @0) (convert2? (minus@3 (convert3? @2) @1)))
> >
> > TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@3))
> >
> > there's one set of checks too much I think.
> >
> >>          && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0))
> >>          && !TYPE_UNSIGNED (TREE_TYPE (@0))
> >
> > we only ever have TYPE_OVERFLOW_UNDEFINED on singed types so
> > the !TYPE_UNSIGNED checks are superfluous.
> >
> >>          && TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (type)
> >>          && TREE_TYPE(@1) == TREE_TYPE(@2))
> >
> > This check means that convert3 is never present since a MINUS requires
> > compatible types.
> >
> > Sorry for the late reply.
> >
> > Note the pattern will necessarily be partly redundant with the
> >
> >   (simplify
> >    (minus (nop_convert1? (minus (nop_convert2? @0) @1)) @0)
> >    (if (!ANY_INTEGRAL_TYPE_P (type)
> >         || TYPE_OVERFLOW_WRAPS (type))
> >    (negate (view_convert @1))
> >    (view_convert (negate @1))))
> >
> > pattern.  Once we'd "inline" nop_convert genmatch would complain
> > about this.
> >
> >>      (convert @1)))
> >>
> >> Is there a more concise/better way of writing the pattern? I was looking for similar checks in match.pd and I couldn't find anything that I could leverage.
> >>
> >> I also kept my pattern to the specific scenario I'm seeing with the regression to lower the risk of something breaking. I've limited @1 and @2 to have the same type.
> >>
> >> I'm also in favor of adding/running computer verification to make sure the transformation is legal. I've written some tests to verify that the pattern is being applied in the right scenarios and not being applied in others, but I think there are too many possibilities to manually write them all. Is there anything in GCC that can be used to verify that match.pd transformations are correct? I'm thinking of something like Alive https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAliveToolkit%2Falive2&amp;data=04%7C01%7Cvitong%40microsoft.com%7C64334bd5c79443af04ec08d951ae6117%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637630631678231097%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=x9Heq%2Bjgb%2Fy1PJFZ0gT9yfkQuzOz0rzWnZsl7VOymp0%3D&amp;reserved=0.
> >>
> >> Thanks,
> >> Victor
> >>
> >>
> >>
> >> From: Richard Biener <richard.guenther@gmail.com>
> >> Sent: Monday, June 21, 2021 12:08 AM
> >> To: Marc Glisse <marc.glisse@inria.fr>
> >> Cc: Victor Tong <vitong@microsoft.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> >> Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
> >>
> >> On Sat, Jun 19, 2021 at 7:05 PM Marc Glisse <marc.glisse@inria.fr> wrote:
> >> >
> >> > On Fri, 18 Jun 2021, Richard Biener wrote:
> >> >
> >> > >> Option 2: Add a new pattern to support scenarios that the existing nop_convert pattern bails out on.
> >> > >>
> >> > >> Existing pattern:
> >> > >>
> >> > >> (simplify
> >> > >>    (minus (nop_convert1? @0) (nop_convert2? (minus (nop_convert3? @@0) @1)))
> >> > >>    (view_convert @1))
> >> >
> >> > I tried to check with a program when
> >> >
> >> > T3 x;
> >> > T1 y;
> >> > (T2)x-(T2)((T1)x-y)
> >> >
> >> > can be safely replaced with
> >> >
> >> > (T2)y
> >> >
> >> > From the output, it looks like this is safe when T1 is at least as large
> >> > as T2. It is wrong when T1 is unsigned and smaller than T2. And when T1 is
> >> > signed and smaller than T2, it is ok if T3 is the same type as T1 (signed
> >> > then) or has strictly less precision (any sign), and not in other cases.
> >> >
> >> > Note that this is when signed implies undefined overflow and unsigned
> >> > implies wrapping, and I wouldn't put too much faith in this recently
> >> > dusted program. And it doesn't say how to write the match.pd pattern with
> >> > '?', "@@", disabling it if TYPE_OVERFLOW_SANITIZED, etc.
> >> >
> >> > Mostly, I wanted to say that if we are going to go handle more than
> >> > nop_convert for more than just 1 or 2 easy transformations, I think some
> >> > kind of computer verification would be useful, it would save a lot of time
> >> > and headaches.
> >>
> >> True.  I wonder if auto-generating such tests from match.pd rules would
> >> be a good project to work on (GSoC?).  When there's define_match
> >> involved things get a little tricky, but then one item on the TODO list
> >> is "inlining" those at the use site (exploding the decision tree even more).
> >>
> >> Richard.
> >>
> >> > (I just check by brute force all possible precisions (from 1 to 6) and
> >> > signedness for T1, T2 and T3, all possible values for x and y, compute
> >> > the before and after formulas, accepting if there is UB before, rejecting
> >> > if there is UB after (and not before), and manually try to see a pattern
> >> > in the list of types that work)
> >> >
> >> > --
> >> > Marc Glisse
>
> --
> Marc Glisse

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
  2021-08-09  9:58                       ` Richard Biener
@ 2021-08-23  0:44                         ` Victor Tong
  0 siblings, 0 replies; 15+ messages in thread
From: Victor Tong @ 2021-08-23  0:44 UTC (permalink / raw)
  To: Richard Biener, Marc Glisse; +Cc: gcc-patches

Thanks for the feedback. I updated the pattern and it passes all tests (existing and the new ones I wrote). I added some brackets since there were some warnings about missing brackets on the || and &&. Here's the updated pattern:

  (simplify
    (minus (convert1? @0) (convert2? (minus@2 (convert3? @@0) @1)))
    (if (INTEGRAL_TYPE_P (type)
        && !TYPE_OVERFLOW_SANITIZED(type)
        && INTEGRAL_TYPE_P (TREE_TYPE(@2))
        && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@2))
        && INTEGRAL_TYPE_P (TREE_TYPE(@0))
        && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0))
        && 
         (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2)) ||
           (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@2)) &&
             (TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (TREE_TYPE (@2)) ||
             (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@2)) &&
              !TYPE_UNSIGNED (TREE_TYPE (@0)))
         ))))
    (convert @1)))


>>> TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2))

Did you mean > instead of <=? With the condition you proposed, that would trigger the optimization in cases where values may get truncated which I think should be avoided for this optimization.

>>> Maybe the new transform could be about scalars, and we could restrict the old one to vectors, to simplify the code,

I tried limiting the existing pattern to vector types by changing it to the following:

  (simplify
   (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
   (if (VECTOR_TYPE_P(type))
   (view_convert @1)))
   
I found that the new pattern doesn't cover some cases. Specifically, it doesn't cover a case in pr92734-2.c:

unsigned
f10 (unsigned x, int y)
{
  unsigned a = (int) x - y;
  return x - a;
}

I think the pattern isn't triggering because of the !TYPE_UNSIGNED (TREE_TYPE (@0)) check. I'm slightly concerned that changing the new pattern to cover the existing cases would add complexity to the new pattern, making it difficult to understand.

I also think the new pattern could be simplified by removing the convert on @0. I don't think it's needed for the regression pattern that I was seeing, but I had added it to be more thorough so the pattern covers more cases. 

From: Richard Biener <richard.guenther@gmail.com>
Sent: Monday, August 9, 2021 2:58 AM
To: Marc Glisse <marc.glisse@inria.fr>
Cc: Victor Tong <vitong@microsoft.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176] 
 
On Sat, Aug 7, 2021 at 12:49 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> On Fri, 6 Aug 2021, Victor Tong wrote:
>
> > Thanks for the feedback. Here's the updated pattern:
> >
> >  /* X - (X - Y) --> Y */
> >  (simplify
> >    (minus (convert1? @0) (convert2? (minus@2 (convert3? @@0) @1)))
> >    (if (ANY_INTEGRAL_TYPE_P (type)
> >        && TYPE_OVERFLOW_UNDEFINED(type)
> >        && !TYPE_OVERFLOW_SANITIZED(type)
> >        && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@2))
> >        && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@2))
> >        && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@2))
> >        && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@0))
> >        && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@0))
> >        && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0))
> >        && TYPE_PRECISION (TREE_TYPE (@2)) <= TYPE_PRECISION (type)
> >        && TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (type))
> >    (convert @1)))
> >
> > I kept the TYPE_OVERFLOW_SANITIZED checks because I saw other patterns that leverage undefined overflows check for it. I think this new pattern shouldn't be applied if overflow sanitizer checks are enabled.
> >
> >>> why is this testing TREE_TYPE (@0)?
> >
> > I'm checking the type of @0 because I'm concerned that there could be a case where @0's type isn't an integer type with undefined overflow. I tried creating a test case and couldn't seem to create one where this is violated but I kept the checks to avoid causing a regression. If I'm being overcautious and you feel that the type checks on @0 aren't needed, I can remove them. I think the precision check on TREE_TYPE(@0) is needed to avoid truncation cases though.
>
> It doesn't matter if @0 has undefined overflow, but it can matter that it
> be signed (yes, the 2 are correlated...) if it has the same precision as
> @2. Otherwise (int64_t)(-1u)-(int64_t)((int)(-1u)-0) is definitely not 0
> and it has type:int64_t, @2:int, @0:unsigned.
>
> Ignoring the sanitizer, the confusing double matching of constants, and
> restricting to scalars, I think the tightest condition (without vrp) that
> works is
>
> TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2)) ||
>   TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@2)) &&
>    (TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (TREE_TYPE (@2)) ||
>     TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@2)) &&
>     !TYPE_UNSIGNED (TREE_TYPE (@0))
>    )
>
> (where implicitly undefined => signed) but of course it is ok to handle
> only a subset. It is too late for me to think about @0 vs @@0.
>
> The patch you posted seems to accept the wrong
> (int128t)LLONG_MAX-(int128_t)((int)LLONG_MAX-0) -> (int128_t)0
>
> Using ANY_INTEGRAL_TYPE_P allows vectors, and TYPE_PRECISION mustn't be
> used for vectors (maybe we should add more checking to catch such uses),
> and they may not be very happy with convert either...
>
> >>> Once we'd "inline" nop_convert genmatch would complain about this.
>
> Maybe the new transform could be about scalars, and we could restrict the
> old one to vectors, to simplify the code,

Hmm, I guess that might indeed be a good idea.

> but that wouldn't help genmatch
> with the fact that the pattern x-(x-y) would appear twice. Is it really
> that bad? It is suspicious, but can be justified.

I think genmatch will either complain or might end up generating
unreachable code.
Note with the present patch there's probably order forcing patterns inbetween so
it could simply work.  Otherwise whether it works or not (as expected)
depends on placing
positions of the patterns ...

So no, it's not inherently "bad" but it's not designed to work.

> > Is someone working on inlining nop_convert? I'd like to avoid breaking someone else's work if that's being worked on right now.
> >
> > I've also added some extra tests to cover this new pattern. I've attached a patch with my latest changes.
> >
> >
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: Wednesday, July 28, 2021 2:59 AM
> > To: Victor Tong <vitong@microsoft.com>
> > Cc: Marc Glisse <marc.glisse@inria.fr>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> > Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
> >
> > On Tue, Jun 29, 2021 at 1:10 AM Victor Tong <vitong@microsoft.com> wrote:
> >>
> >> Thanks Richard and Marc.
> >>
> >> I wrote the following test case to compare the outputs of fn1() and fn1NoOpt() below with my extra pattern being applied. I tested the two functions with all of the integers from INT_MIN to INT_MAX.
> >>
> >> long
> >> fn1 (int x)
> >> {
> >>    return 42L - (long)(42 - x);
> >> }
> >>
> >> #pragma GCC push_options
> >> #pragma GCC optimize ("O0")
> >> long
> >> fn1NoOpt (int x)
> >> {
> >>    volatile int y = (42 - x);
> >>    return 42L - (long)y;
> >> }
> >> #pragma GCC pop_options
> >>
> >> int main ()
> >> {
> >>          for (long i=INT_MIN; i<=INT_MAX;i++)
> >>          {
> >>                  auto valNoOpt = fn1NoOpt(i);
> >>                  auto valOpt = fn1(i);
> >>                  if (valNoOpt != valOpt)
> >>                          printf("valOpt=%ld, valNoOpt=%ld\n", valOpt, valNoOpt);
> >>          }
> >>          return 0;
> >> }
> >>
> >> I saw that the return values of fn1() and fn1NoOpt() differed when the input was between INT_MIN and INT_MIN+42 inclusive. When passing values in this range to fn1NoOpt(), a signed overflow is triggered which causes the value to differ (undefined behavior). This seems to go in line with what Marc described and I think the transformation is correct in the scenario above. I do think that type casts that result in truncation (i.e. from a higher precision to a lower one) or with unsigned types will result in an incorrect transformation so those scenarios need to be avoided.
> >>
> >> Given that the extra pattern I'm adding is taking advantage the undefined behavior of signed integer overflow, I'm considering keeping the existing nop_convert pattern in place and adding a new pattern to cover these new cases. I'd also like to avoid touching nop_convert given that it's used in a number of other patterns.
> >>
> >> This is the pattern I have currently:
> >>
> >>    (simplify
> >>      (minus (convert1? @0) (convert2? (minus (convert3? @2) @1)))
> >>      (if (operand_equal_p(@0, @2, 0)
> >
> > The operand_equal_p should be reflected by using @0 in place of @2.
> >
> >>          && INTEGRAL_TYPE_P (type)
> >>          && TYPE_OVERFLOW_UNDEFINED(type)
> >>          && !TYPE_OVERFLOW_SANITIZED(type)
> >>          && INTEGRAL_TYPE_P (TREE_TYPE(@1))
> >>          && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@1))
> >>          && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@1))
> >>          && !TYPE_UNSIGNED (TREE_TYPE (@1))
> >>          && !TYPE_UNSIGNED (type)
> >
> > please group checks on common argument.  I think a single test
> > on INTEGRAL_TYPE_P (type) is enough, it could be ANY_INTEGRAL_TYPE_P
> > to include vector and complex types.
> >
> >>          && TYPE_PRECISION (TREE_TYPE (@1)) <= TYPE_PRECISION (type)
> >>          && INTEGRAL_TYPE_P (TREE_TYPE(@0))
> >>          && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@0))
> >
> > why is this testing TREE_TYPE (@0)?  The outer subtract is using 'type',
> > the inner subtract uses TREE_TYPE (@1) though you could place
> > a capture on the minus to make what you test more obvious, like
> >
> >   (minus (convert1? @0) (convert2? (minus@3 (convert3? @2) @1)))
> >
> > TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@3))
> >
> > there's one set of checks too much I think.
> >
> >>          && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0))
> >>          && !TYPE_UNSIGNED (TREE_TYPE (@0))
> >
> > we only ever have TYPE_OVERFLOW_UNDEFINED on singed types so
> > the !TYPE_UNSIGNED checks are superfluous.
> >
> >>          && TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (type)
> >>          && TREE_TYPE(@1) == TREE_TYPE(@2))
> >
> > This check means that convert3 is never present since a MINUS requires
> > compatible types.
> >
> > Sorry for the late reply.
> >
> > Note the pattern will necessarily be partly redundant with the
> >
> >   (simplify
> >    (minus (nop_convert1? (minus (nop_convert2? @0) @1)) @0)
> >    (if (!ANY_INTEGRAL_TYPE_P (type)
> >         || TYPE_OVERFLOW_WRAPS (type))
> >    (negate (view_convert @1))
> >    (view_convert (negate @1))))
> >
> > pattern.  Once we'd "inline" nop_convert genmatch would complain
> > about this.
> >
> >>      (convert @1)))
> >>
> >> Is there a more concise/better way of writing the pattern? I was looking for similar checks in match.pd and I couldn't find anything that I could leverage.
> >>
> >> I also kept my pattern to the specific scenario I'm seeing with the regression to lower the risk of something breaking. I've limited @1 and @2 to have the same type.
> >>
> >> I'm also in favor of adding/running computer verification to make sure the transformation is legal. I've written some tests to verify that the pattern is being applied in the right scenarios and not being applied in others, but I think there are too many possibilities to manually write them all. Is there anything in GCC that can be used to verify that match.pd transformations are correct? I'm thinking of something like Alive https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAliveToolkit%2Falive2&amp;data=04%7C01%7Cvitong%40microsoft.com%7C1abb4ed5e6e745eb756a08d95b1c3d72%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637640999133594608%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=olTfA5yAjtKs9Sckdnwn4iRGSynb7wX0Xx6uKrUNHhs%3D&amp;reserved=0.
> >>
> >> Thanks,
> >> Victor
> >>
> >>
> >>
> >> From: Richard Biener <richard.guenther@gmail.com>
> >> Sent: Monday, June 21, 2021 12:08 AM
> >> To: Marc Glisse <marc.glisse@inria.fr>
> >> Cc: Victor Tong <vitong@microsoft.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> >> Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
> >>
> >> On Sat, Jun 19, 2021 at 7:05 PM Marc Glisse <marc.glisse@inria.fr> wrote:
> >> >
> >> > On Fri, 18 Jun 2021, Richard Biener wrote:
> >> >
> >> > >> Option 2: Add a new pattern to support scenarios that the existing nop_convert pattern bails out on.
> >> > >>
> >> > >> Existing pattern:
> >> > >>
> >> > >> (simplify
> >> > >>    (minus (nop_convert1? @0) (nop_convert2? (minus (nop_convert3? @@0) @1)))
> >> > >>    (view_convert @1))
> >> >
> >> > I tried to check with a program when
> >> >
> >> > T3 x;
> >> > T1 y;
> >> > (T2)x-(T2)((T1)x-y)
> >> >
> >> > can be safely replaced with
> >> >
> >> > (T2)y
> >> >
> >> > From the output, it looks like this is safe when T1 is at least as large
> >> > as T2. It is wrong when T1 is unsigned and smaller than T2. And when T1 is
> >> > signed and smaller than T2, it is ok if T3 is the same type as T1 (signed
> >> > then) or has strictly less precision (any sign), and not in other cases.
> >> >
> >> > Note that this is when signed implies undefined overflow and unsigned
> >> > implies wrapping, and I wouldn't put too much faith in this recently
> >> > dusted program. And it doesn't say how to write the match.pd pattern with
> >> > '?', "@@", disabling it if TYPE_OVERFLOW_SANITIZED, etc.
> >> >
> >> > Mostly, I wanted to say that if we are going to go handle more than
> >> > nop_convert for more than just 1 or 2 easy transformations, I think some
> >> > kind of computer verification would be useful, it would save a lot of time
> >> > and headaches.
> >>
> >> True.  I wonder if auto-generating such tests from match.pd rules would
> >> be a good project to work on (GSoC?).  When there's define_match
> >> involved things get a little tricky, but then one item on the TODO list
> >> is "inlining" those at the use site (exploding the decision tree even more).
> >>
> >> Richard.
> >>
> >> > (I just check by brute force all possible precisions (from 1 to 6) and
> >> > signedness for T1, T2 and T3, all possible values for x and y, compute
> >> > the before and after formulas, accepting if there is UB before, rejecting
> >> > if there is UB after (and not before), and manually try to see a pattern
> >> > in the list of types that work)
> >> >
> >> > --
> >> > Marc Glisse
>
> --
> Marc Glisse

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-08-23  0:44 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-31 23:02 [PATCH] tree-optimization: Optimize division followed by multiply [PR95176] Victor Tong
2021-04-27  8:29 ` Richard Biener
2021-06-02 20:55   ` [EXTERNAL] " Victor Tong
2021-06-07  8:25     ` Richard Biener
2021-06-16 18:49       ` Victor Tong
2021-06-18  9:43         ` Richard Biener
2021-06-19 17:05           ` Marc Glisse
2021-06-21  7:08             ` Richard Biener
2021-06-28 23:10               ` Victor Tong
2021-07-19 20:58                 ` Victor Tong
2021-07-28  9:59                 ` Richard Biener
2021-08-06 21:17                   ` Victor Tong
2021-08-06 22:49                     ` Marc Glisse
2021-08-09  9:58                       ` Richard Biener
2021-08-23  0:44                         ` Victor Tong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).