public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393]
@ 2024-04-23 10:33 Manolis Tsamis
  2024-05-02 13:00 ` Richard Biener
  2024-05-14  8:57 ` Manolis Tsamis
  0 siblings, 2 replies; 10+ messages in thread
From: Manolis Tsamis @ 2024-04-23 10:33 UTC (permalink / raw)
  To: gcc-patches
  Cc: Richard Biener, Jiangning Liu, Philipp Tomsich, Andrew Pinski,
	Manolis Tsamis

The original motivation for this pattern was that the following function does
not fold to 'return 1':

int foo(int *a, int j)
{
  int k = j - 1;
  return a[j - 1] == a[k];
}

The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of
address calculations (e.g. arrays). These patterns help fold and simplify more
expressions.

	PR tree-optimization/109393

gcc/ChangeLog:

	* match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and
	  ((T)(A +- CST1)) * CST2 + CST3.

gcc/testsuite/ChangeLog:

	* gcc.dg/pr109393.c: New test.

Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
---

 gcc/match.pd                    | 30 ++++++++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
 2 files changed, 46 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr109393.c

diff --git a/gcc/match.pd b/gcc/match.pd
index d401e7503e6..13c828ba70d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        (plus (convert @0) (op @2 (convert @1))))))
 #endif
 
+/* ((T)(A + CST1)) * CST2 + CST3
+     -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
+   Where (A + CST1) doesn't need to have a single use.  */
+#if GIMPLE
+  (for op (plus minus)
+   (simplify
+    (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3)
+     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
+	  && TREE_CODE (type) == INTEGER_TYPE
+	  && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
+	  && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
+	  && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
+	  && TYPE_OVERFLOW_WRAPS (type))
+       (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3)))))
+#endif
+
+/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2)  */
+#if GIMPLE
+  (for op (plus minus)
+   (simplify
+    (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
+     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
+	  && TREE_CODE (type) == INTEGER_TYPE
+	  && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
+	  && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
+	  && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
+	  && TYPE_OVERFLOW_WRAPS (type))
+       (op (mult @2 (convert @0)) (mult @2 (convert @1))))))
+#endif
+
 /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
    to a simple value.  */
   (for op (plus minus)
diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
new file mode 100644
index 00000000000..e9051273672
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr109393.c
@@ -0,0 +1,16 @@
+/* PR tree-optimization/109393 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
+
+int foo(int *a, int j)
+{
+  int k = j - 1;
+  return a[j - 1] == a[k];
+}
+
+int bar(int *a, int j)
+{
+  int k = j - 1;
+  return (&a[j + 1] - 2) == &a[k];
+}
-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393]
  2024-04-23 10:33 [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393] Manolis Tsamis
@ 2024-05-02 13:00 ` Richard Biener
  2024-05-02 13:08   ` Manolis Tsamis
  2024-05-14  8:57 ` Manolis Tsamis
  1 sibling, 1 reply; 10+ messages in thread
From: Richard Biener @ 2024-05-02 13:00 UTC (permalink / raw)
  To: Manolis Tsamis; +Cc: gcc-patches, Jiangning Liu, Philipp Tomsich, Andrew Pinski

On Tue, 23 Apr 2024, Manolis Tsamis wrote:

> The original motivation for this pattern was that the following function does
> not fold to 'return 1':
> 
> int foo(int *a, int j)
> {
>   int k = j - 1;
>   return a[j - 1] == a[k];
> }
> 
> The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of
> address calculations (e.g. arrays). These patterns help fold and simplify more
> expressions.
> 
> 	PR tree-optimization/109393
> 
> gcc/ChangeLog:
> 
> 	* match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and
> 	  ((T)(A +- CST1)) * CST2 + CST3.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/pr109393.c: New test.
> 
> Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> ---
> 
>  gcc/match.pd                    | 30 ++++++++++++++++++++++++++++++
>  gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
>  2 files changed, 46 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index d401e7503e6..13c828ba70d 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>         (plus (convert @0) (op @2 (convert @1))))))
>  #endif
>  
> +/* ((T)(A + CST1)) * CST2 + CST3
> +     -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> +   Where (A + CST1) doesn't need to have a single use.  */
> +#if GIMPLE
> +  (for op (plus minus)
> +   (simplify
> +    (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3)
> +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> +	  && TREE_CODE (type) == INTEGER_TYPE
> +	  && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> +	  && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> +	  && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> +	  && TYPE_OVERFLOW_WRAPS (type))
> +       (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3)))))
> +#endif
> +
> +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2)  */
> +#if GIMPLE
> +  (for op (plus minus)
> +   (simplify
> +    (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE

Please use INTEGRAL_TYPE_P

> +	  && TREE_CODE (type) == INTEGER_TYPE
> +	  && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> +	  && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> +	  && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> +	  && TYPE_OVERFLOW_WRAPS (type))
> +       (op (mult @2 (convert @0)) (mult @2 (convert @1))))))

(mult @2 (convert @0)) is non-canonical for no good reason if @0
isn't constant - constant should be 2nd, please swap operands here.

> +#endif

The first pattern is an extension of the second, why's the first
necessary at all?  The add of CST3 is unchanged (OK, you seem to
associate here, but that's again a different thing).

I'd say the 2nd pattern is OK with the above changes but the first
looks redundant.

Thanks,
Richard.

> +
>  /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
>     to a simple value.  */
>    (for op (plus minus)
> diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
> new file mode 100644
> index 00000000000..e9051273672
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr109393.c
> @@ -0,0 +1,16 @@
> +/* PR tree-optimization/109393 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> +
> +int foo(int *a, int j)
> +{
> +  int k = j - 1;
> +  return a[j - 1] == a[k];
> +}
> +
> +int bar(int *a, int j)
> +{
> +  int k = j - 1;
> +  return (&a[j + 1] - 2) == &a[k];
> +}
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393]
  2024-05-02 13:00 ` Richard Biener
@ 2024-05-02 13:08   ` Manolis Tsamis
  2024-05-02 13:27     ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Manolis Tsamis @ 2024-05-02 13:08 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Jiangning Liu, Philipp Tomsich, Andrew Pinski

On Thu, May 2, 2024 at 4:00 PM Richard Biener <rguenther@suse.de> wrote:
>
> On Tue, 23 Apr 2024, Manolis Tsamis wrote:
>
> > The original motivation for this pattern was that the following function does
> > not fold to 'return 1':
> >
> > int foo(int *a, int j)
> > {
> >   int k = j - 1;
> >   return a[j - 1] == a[k];
> > }
> >
> > The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of
> > address calculations (e.g. arrays). These patterns help fold and simplify more
> > expressions.
> >
> >       PR tree-optimization/109393
> >
> > gcc/ChangeLog:
> >
> >       * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and
> >         ((T)(A +- CST1)) * CST2 + CST3.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.dg/pr109393.c: New test.
> >
> > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > ---
> >
> >  gcc/match.pd                    | 30 ++++++++++++++++++++++++++++++
> >  gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
> >  2 files changed, 46 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index d401e7503e6..13c828ba70d 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >         (plus (convert @0) (op @2 (convert @1))))))
> >  #endif
> >
> > +/* ((T)(A + CST1)) * CST2 + CST3
> > +     -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > +   Where (A + CST1) doesn't need to have a single use.  */
> > +#if GIMPLE
> > +  (for op (plus minus)
> > +   (simplify
> > +    (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3)
> > +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> > +       && TREE_CODE (type) == INTEGER_TYPE
> > +       && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > +       && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > +       && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > +       && TYPE_OVERFLOW_WRAPS (type))
> > +       (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3)))))
> > +#endif
> > +
> > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2)  */
> > +#if GIMPLE
> > +  (for op (plus minus)
> > +   (simplify
> > +    (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> > +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
>
> Please use INTEGRAL_TYPE_P
>
> > +       && TREE_CODE (type) == INTEGER_TYPE
> > +       && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > +       && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > +       && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > +       && TYPE_OVERFLOW_WRAPS (type))
> > +       (op (mult @2 (convert @0)) (mult @2 (convert @1))))))
>
> (mult @2 (convert @0)) is non-canonical for no good reason if @0
> isn't constant - constant should be 2nd, please swap operands here.
>
> > +#endif
>
> The first pattern is an extension of the second, why's the first
> necessary at all?  The add of CST3 is unchanged (OK, you seem to
> associate here, but that's again a different thing).
>
> I'd say the 2nd pattern is OK with the above changes but the first
> looks redundant.
>
Hi Richard,

Thanks for the comments, I'll fix these.

The difference is that the second uses op:s while the first uses just op.
In the second case if A + CST1 has other uses expanding the pattern
may not be a good idea but in the first case it always is because we
know + CST1 * CST2 will merge with + CST3.

Thanks,
Manolis

> Thanks,
> Richard.
>
> > +
> >  /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
> >     to a simple value.  */
> >    (for op (plus minus)
> > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
> > new file mode 100644
> > index 00000000000..e9051273672
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr109393.c
> > @@ -0,0 +1,16 @@
> > +/* PR tree-optimization/109393 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> > +
> > +int foo(int *a, int j)
> > +{
> > +  int k = j - 1;
> > +  return a[j - 1] == a[k];
> > +}
> > +
> > +int bar(int *a, int j)
> > +{
> > +  int k = j - 1;
> > +  return (&a[j + 1] - 2) == &a[k];
> > +}
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393]
  2024-05-02 13:08   ` Manolis Tsamis
@ 2024-05-02 13:27     ` Richard Biener
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Biener @ 2024-05-02 13:27 UTC (permalink / raw)
  To: Manolis Tsamis; +Cc: gcc-patches, Jiangning Liu, Philipp Tomsich, Andrew Pinski

[-- Attachment #1: Type: text/plain, Size: 5361 bytes --]

On Thu, 2 May 2024, Manolis Tsamis wrote:

> On Thu, May 2, 2024 at 4:00 PM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Tue, 23 Apr 2024, Manolis Tsamis wrote:
> >
> > > The original motivation for this pattern was that the following function does
> > > not fold to 'return 1':
> > >
> > > int foo(int *a, int j)
> > > {
> > >   int k = j - 1;
> > >   return a[j - 1] == a[k];
> > > }
> > >
> > > The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of
> > > address calculations (e.g. arrays). These patterns help fold and simplify more
> > > expressions.
> > >
> > >       PR tree-optimization/109393
> > >
> > > gcc/ChangeLog:
> > >
> > >       * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and
> > >         ((T)(A +- CST1)) * CST2 + CST3.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >       * gcc.dg/pr109393.c: New test.
> > >
> > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > > ---
> > >
> > >  gcc/match.pd                    | 30 ++++++++++++++++++++++++++++++
> > >  gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
> > >  2 files changed, 46 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index d401e7503e6..13c828ba70d 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >         (plus (convert @0) (op @2 (convert @1))))))
> > >  #endif
> > >
> > > +/* ((T)(A + CST1)) * CST2 + CST3
> > > +     -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > > +   Where (A + CST1) doesn't need to have a single use.  */
> > > +#if GIMPLE
> > > +  (for op (plus minus)
> > > +   (simplify
> > > +    (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3)
> > > +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> > > +       && TREE_CODE (type) == INTEGER_TYPE
> > > +       && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > +       && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > +       && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > +       && TYPE_OVERFLOW_WRAPS (type))
> > > +       (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3)))))
> > > +#endif
> > > +
> > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2)  */
> > > +#if GIMPLE
> > > +  (for op (plus minus)
> > > +   (simplify
> > > +    (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> >
> > Please use INTEGRAL_TYPE_P
> >
> > > +       && TREE_CODE (type) == INTEGER_TYPE
> > > +       && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > +       && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > +       && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > +       && TYPE_OVERFLOW_WRAPS (type))
> > > +       (op (mult @2 (convert @0)) (mult @2 (convert @1))))))
> >
> > (mult @2 (convert @0)) is non-canonical for no good reason if @0
> > isn't constant - constant should be 2nd, please swap operands here.
> >
> > > +#endif
> >
> > The first pattern is an extension of the second, why's the first
> > necessary at all?  The add of CST3 is unchanged (OK, you seem to
> > associate here, but that's again a different thing).
> >
> > I'd say the 2nd pattern is OK with the above changes but the first
> > looks redundant.
> >
> Hi Richard,
> 
> Thanks for the comments, I'll fix these.
> 
> The difference is that the second uses op:s while the first uses just op.
> In the second case if A + CST1 has other uses expanding the pattern
> may not be a good idea but in the first case it always is because we
> know + CST1 * CST2 will merge with + CST3.

I see.  But that pattern misses a :s on the multiplication result then, 
no?  Local pattern-matching isn't the best vehicle to handle multi-use
cases, introducing context dependent canonicalizations can lead to
SCEV analysis no longer matching up for related accesses and then
data dependence analysis failing.  It's been a trade-off here.

Richard.

> 
> Thanks,
> Manolis
> 
> > Thanks,
> > Richard.
> >
> > > +
> > >  /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
> > >     to a simple value.  */
> > >    (for op (plus minus)
> > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
> > > new file mode 100644
> > > index 00000000000..e9051273672
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/pr109393.c
> > > @@ -0,0 +1,16 @@
> > > +/* PR tree-optimization/109393 */
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> > > +
> > > +int foo(int *a, int j)
> > > +{
> > > +  int k = j - 1;
> > > +  return a[j - 1] == a[k];
> > > +}
> > > +
> > > +int bar(int *a, int j)
> > > +{
> > > +  int k = j - 1;
> > > +  return (&a[j + 1] - 2) == &a[k];
> > > +}
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393]
  2024-04-23 10:33 [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393] Manolis Tsamis
  2024-05-02 13:00 ` Richard Biener
@ 2024-05-14  8:57 ` Manolis Tsamis
  2024-05-16  8:15   ` Richard Biener
  1 sibling, 1 reply; 10+ messages in thread
From: Manolis Tsamis @ 2024-05-14  8:57 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Biener, Jiangning Liu, Philipp Tomsich, Andrew Pinski

New patch with the requested changes can be found below.

I don't know how much this affects SCEV, but I do believe that we
should incorporate this change somehow. I've seen various cases of
suboptimal address calculation codegen that boil down to this.

gcc/match.pd | 31 +++++++++++++++++++++++++++++++
gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
2 files changed, 47 insertions(+)
create mode 100644 gcc/testsuite/gcc.dg/pr109393.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 07e743ae464..1d642c205f0 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(plus (convert @0) (op @2 (convert @1))))))
#endif
+/* ((T)(A + CST1)) * CST2 + CST3
+ -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
+ Where (A + CST1) doesn't need to have a single use. */
+#if GIMPLE
+ (for op (plus minus)
+ (simplify
+ (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2)
+ INTEGER_CST@3)
+ (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+ && INTEGRAL_TYPE_P (type)
+ && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
+ && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
+ && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
+ && TYPE_OVERFLOW_WRAPS (type))
+ (op (mult (convert @0) @2) (plus (mult (convert @1) @2) @3)))))
+#endif
+
+/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */
+#if GIMPLE
+ (for op (plus minus)
+ (simplify
+ (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
+ (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+ && INTEGRAL_TYPE_P (type)
+ && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
+ && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
+ && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
+ && TYPE_OVERFLOW_WRAPS (type))
+ (op (mult (convert @0) @2) (mult (convert @1) @2)))))
+#endif
+
/* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
to a simple value. */
(for op (plus minus)
diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
new file mode 100644
index 00000000000..e9051273672
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr109393.c
@@ -0,0 +1,16 @@
+/* PR tree-optimization/109393 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
+
+int foo(int *a, int j)
+{
+ int k = j - 1;
+ return a[j - 1] == a[k];
+}
+
+int bar(int *a, int j)
+{
+ int k = j - 1;
+ return (&a[j + 1] - 2) == &a[k];
+}
-- 
2.44.0


On Tue, Apr 23, 2024 at 1:33 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
>
> The original motivation for this pattern was that the following function does
> not fold to 'return 1':
>
> int foo(int *a, int j)
> {
>   int k = j - 1;
>   return a[j - 1] == a[k];
> }
>
> The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of
> address calculations (e.g. arrays). These patterns help fold and simplify more
> expressions.
>
>         PR tree-optimization/109393
>
> gcc/ChangeLog:
>
>         * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and
>           ((T)(A +- CST1)) * CST2 + CST3.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/pr109393.c: New test.
>
> Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> ---
>
>  gcc/match.pd                    | 30 ++++++++++++++++++++++++++++++
>  gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
>  2 files changed, 46 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index d401e7503e6..13c828ba70d 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>         (plus (convert @0) (op @2 (convert @1))))))
>  #endif
>
> +/* ((T)(A + CST1)) * CST2 + CST3
> +     -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> +   Where (A + CST1) doesn't need to have a single use.  */
> +#if GIMPLE
> +  (for op (plus minus)
> +   (simplify
> +    (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3)
> +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> +         && TREE_CODE (type) == INTEGER_TYPE
> +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> +         && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> +         && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> +         && TYPE_OVERFLOW_WRAPS (type))
> +       (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3)))))
> +#endif
> +
> +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2)  */
> +#if GIMPLE
> +  (for op (plus minus)
> +   (simplify
> +    (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> +         && TREE_CODE (type) == INTEGER_TYPE
> +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> +         && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> +         && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> +         && TYPE_OVERFLOW_WRAPS (type))
> +       (op (mult @2 (convert @0)) (mult @2 (convert @1))))))
> +#endif
> +
>  /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
>     to a simple value.  */
>    (for op (plus minus)
> diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
> new file mode 100644
> index 00000000000..e9051273672
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr109393.c
> @@ -0,0 +1,16 @@
> +/* PR tree-optimization/109393 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> +
> +int foo(int *a, int j)
> +{
> +  int k = j - 1;
> +  return a[j - 1] == a[k];
> +}
> +
> +int bar(int *a, int j)
> +{
> +  int k = j - 1;
> +  return (&a[j + 1] - 2) == &a[k];
> +}
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393]
  2024-05-14  8:57 ` Manolis Tsamis
@ 2024-05-16  8:15   ` Richard Biener
  2024-05-17  8:23     ` Manolis Tsamis
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2024-05-16  8:15 UTC (permalink / raw)
  To: Manolis Tsamis
  Cc: gcc-patches, Richard Biener, Jiangning Liu, Philipp Tomsich,
	Andrew Pinski

On Tue, May 14, 2024 at 10:58 AM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
>
> New patch with the requested changes can be found below.
>
> I don't know how much this affects SCEV, but I do believe that we
> should incorporate this change somehow. I've seen various cases of
> suboptimal address calculation codegen that boil down to this.

This misses the ChangeLog (I assume it's unchanged) and indent
of the match.pd part is now off.

Please fix that, the patch is OK with that change.

Thanks,
Richard.

> gcc/match.pd | 31 +++++++++++++++++++++++++++++++
> gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
> 2 files changed, 47 insertions(+)
> create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 07e743ae464..1d642c205f0 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (plus (convert @0) (op @2 (convert @1))))))
> #endif
> +/* ((T)(A + CST1)) * CST2 + CST3
> + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> + Where (A + CST1) doesn't need to have a single use. */
> +#if GIMPLE
> + (for op (plus minus)
> + (simplify
> + (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2)
> + INTEGER_CST@3)
> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> + && INTEGRAL_TYPE_P (type)
> + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> + && TYPE_OVERFLOW_WRAPS (type))
> + (op (mult (convert @0) @2) (plus (mult (convert @1) @2) @3)))))
> +#endif
> +
> +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */
> +#if GIMPLE
> + (for op (plus minus)
> + (simplify
> + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> + && INTEGRAL_TYPE_P (type)
> + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> + && TYPE_OVERFLOW_WRAPS (type))
> + (op (mult (convert @0) @2) (mult (convert @1) @2)))))
> +#endif
> +
> /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
> to a simple value. */
> (for op (plus minus)
> diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
> new file mode 100644
> index 00000000000..e9051273672
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr109393.c
> @@ -0,0 +1,16 @@
> +/* PR tree-optimization/109393 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> +
> +int foo(int *a, int j)
> +{
> + int k = j - 1;
> + return a[j - 1] == a[k];
> +}
> +
> +int bar(int *a, int j)
> +{
> + int k = j - 1;
> + return (&a[j + 1] - 2) == &a[k];
> +}
> --
> 2.44.0
>
>
> On Tue, Apr 23, 2024 at 1:33 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
> >
> > The original motivation for this pattern was that the following function does
> > not fold to 'return 1':
> >
> > int foo(int *a, int j)
> > {
> >   int k = j - 1;
> >   return a[j - 1] == a[k];
> > }
> >
> > The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of
> > address calculations (e.g. arrays). These patterns help fold and simplify more
> > expressions.
> >
> >         PR tree-optimization/109393
> >
> > gcc/ChangeLog:
> >
> >         * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and
> >           ((T)(A +- CST1)) * CST2 + CST3.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.dg/pr109393.c: New test.
> >
> > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > ---
> >
> >  gcc/match.pd                    | 30 ++++++++++++++++++++++++++++++
> >  gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
> >  2 files changed, 46 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index d401e7503e6..13c828ba70d 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >         (plus (convert @0) (op @2 (convert @1))))))
> >  #endif
> >
> > +/* ((T)(A + CST1)) * CST2 + CST3
> > +     -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > +   Where (A + CST1) doesn't need to have a single use.  */
> > +#if GIMPLE
> > +  (for op (plus minus)
> > +   (simplify
> > +    (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3)
> > +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> > +         && TREE_CODE (type) == INTEGER_TYPE
> > +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > +         && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > +         && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > +         && TYPE_OVERFLOW_WRAPS (type))
> > +       (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3)))))
> > +#endif
> > +
> > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2)  */
> > +#if GIMPLE
> > +  (for op (plus minus)
> > +   (simplify
> > +    (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> > +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> > +         && TREE_CODE (type) == INTEGER_TYPE
> > +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > +         && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > +         && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > +         && TYPE_OVERFLOW_WRAPS (type))
> > +       (op (mult @2 (convert @0)) (mult @2 (convert @1))))))
> > +#endif
> > +
> >  /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
> >     to a simple value.  */
> >    (for op (plus minus)
> > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
> > new file mode 100644
> > index 00000000000..e9051273672
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr109393.c
> > @@ -0,0 +1,16 @@
> > +/* PR tree-optimization/109393 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> > +
> > +int foo(int *a, int j)
> > +{
> > +  int k = j - 1;
> > +  return a[j - 1] == a[k];
> > +}
> > +
> > +int bar(int *a, int j)
> > +{
> > +  int k = j - 1;
> > +  return (&a[j + 1] - 2) == &a[k];
> > +}
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393]
  2024-05-16  8:15   ` Richard Biener
@ 2024-05-17  8:23     ` Manolis Tsamis
  2024-05-17  9:22       ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Manolis Tsamis @ 2024-05-17  8:23 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, Richard Biener, Jiangning Liu, Philipp Tomsich,
	Andrew Pinski

Hi Richard,

While I was re-testing the latest version of this patch I noticed that
it FAILs an AArch64 test, gcc.target/aarch64/subsp.c. With the patch
we generate one instruction more:

        sbfiz   x1, x1, 4, 32
        stp     x29, x30, [sp, -16]!
        add     x1, x1, 16
        mov     x29, sp
        sub     sp, sp, x1
        mov     x0, sp
        bl      foo

Instead of:

        stp     x29, x30, [sp, -16]!
        add     w1, w1, 1
        mov     x29, sp
        sub     sp, sp, w1, sxtw 4
        mov     x0, sp
        bl      foo

I've looked at it but can't really find a way to solve the regression.
Any thoughts on this?

Thanks,
Manolis



On Thu, May 16, 2024 at 11:15 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Tue, May 14, 2024 at 10:58 AM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
> >
> > New patch with the requested changes can be found below.
> >
> > I don't know how much this affects SCEV, but I do believe that we
> > should incorporate this change somehow. I've seen various cases of
> > suboptimal address calculation codegen that boil down to this.
>
> This misses the ChangeLog (I assume it's unchanged) and indent
> of the match.pd part is now off.
>
> Please fix that, the patch is OK with that change.
>
> Thanks,
> Richard.
>
> > gcc/match.pd | 31 +++++++++++++++++++++++++++++++
> > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
> > 2 files changed, 47 insertions(+)
> > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 07e743ae464..1d642c205f0 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > (plus (convert @0) (op @2 (convert @1))))))
> > #endif
> > +/* ((T)(A + CST1)) * CST2 + CST3
> > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > + Where (A + CST1) doesn't need to have a single use. */
> > +#if GIMPLE
> > + (for op (plus minus)
> > + (simplify
> > + (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2)
> > + INTEGER_CST@3)
> > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > + && INTEGRAL_TYPE_P (type)
> > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > + && TYPE_OVERFLOW_WRAPS (type))
> > + (op (mult (convert @0) @2) (plus (mult (convert @1) @2) @3)))))
> > +#endif
> > +
> > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */
> > +#if GIMPLE
> > + (for op (plus minus)
> > + (simplify
> > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > + && INTEGRAL_TYPE_P (type)
> > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > + && TYPE_OVERFLOW_WRAPS (type))
> > + (op (mult (convert @0) @2) (mult (convert @1) @2)))))
> > +#endif
> > +
> > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
> > to a simple value. */
> > (for op (plus minus)
> > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
> > new file mode 100644
> > index 00000000000..e9051273672
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr109393.c
> > @@ -0,0 +1,16 @@
> > +/* PR tree-optimization/109393 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> > +
> > +int foo(int *a, int j)
> > +{
> > + int k = j - 1;
> > + return a[j - 1] == a[k];
> > +}
> > +
> > +int bar(int *a, int j)
> > +{
> > + int k = j - 1;
> > + return (&a[j + 1] - 2) == &a[k];
> > +}
> > --
> > 2.44.0
> >
> >
> > On Tue, Apr 23, 2024 at 1:33 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
> > >
> > > The original motivation for this pattern was that the following function does
> > > not fold to 'return 1':
> > >
> > > int foo(int *a, int j)
> > > {
> > >   int k = j - 1;
> > >   return a[j - 1] == a[k];
> > > }
> > >
> > > The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of
> > > address calculations (e.g. arrays). These patterns help fold and simplify more
> > > expressions.
> > >
> > >         PR tree-optimization/109393
> > >
> > > gcc/ChangeLog:
> > >
> > >         * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and
> > >           ((T)(A +- CST1)) * CST2 + CST3.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.dg/pr109393.c: New test.
> > >
> > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > > ---
> > >
> > >  gcc/match.pd                    | 30 ++++++++++++++++++++++++++++++
> > >  gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
> > >  2 files changed, 46 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index d401e7503e6..13c828ba70d 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >         (plus (convert @0) (op @2 (convert @1))))))
> > >  #endif
> > >
> > > +/* ((T)(A + CST1)) * CST2 + CST3
> > > +     -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > > +   Where (A + CST1) doesn't need to have a single use.  */
> > > +#if GIMPLE
> > > +  (for op (plus minus)
> > > +   (simplify
> > > +    (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3)
> > > +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> > > +         && TREE_CODE (type) == INTEGER_TYPE
> > > +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > +         && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > +         && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > +         && TYPE_OVERFLOW_WRAPS (type))
> > > +       (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3)))))
> > > +#endif
> > > +
> > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2)  */
> > > +#if GIMPLE
> > > +  (for op (plus minus)
> > > +   (simplify
> > > +    (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> > > +         && TREE_CODE (type) == INTEGER_TYPE
> > > +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > +         && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > +         && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > +         && TYPE_OVERFLOW_WRAPS (type))
> > > +       (op (mult @2 (convert @0)) (mult @2 (convert @1))))))
> > > +#endif
> > > +
> > >  /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
> > >     to a simple value.  */
> > >    (for op (plus minus)
> > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
> > > new file mode 100644
> > > index 00000000000..e9051273672
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/pr109393.c
> > > @@ -0,0 +1,16 @@
> > > +/* PR tree-optimization/109393 */
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> > > +
> > > +int foo(int *a, int j)
> > > +{
> > > +  int k = j - 1;
> > > +  return a[j - 1] == a[k];
> > > +}
> > > +
> > > +int bar(int *a, int j)
> > > +{
> > > +  int k = j - 1;
> > > +  return (&a[j + 1] - 2) == &a[k];
> > > +}
> > > --
> > > 2.34.1
> > >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393]
  2024-05-17  8:23     ` Manolis Tsamis
@ 2024-05-17  9:22       ` Richard Biener
  2024-05-17 11:35         ` Manolis Tsamis
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2024-05-17  9:22 UTC (permalink / raw)
  To: Manolis Tsamis
  Cc: Richard Biener, gcc-patches, Jiangning Liu, Philipp Tomsich,
	Andrew Pinski

[-- Attachment #1: Type: text/plain, Size: 8704 bytes --]

On Fri, 17 May 2024, Manolis Tsamis wrote:

> Hi Richard,
> 
> While I was re-testing the latest version of this patch I noticed that
> it FAILs an AArch64 test, gcc.target/aarch64/subsp.c. With the patch
> we generate one instruction more:
> 
>         sbfiz   x1, x1, 4, 32
>         stp     x29, x30, [sp, -16]!
>         add     x1, x1, 16
>         mov     x29, sp
>         sub     sp, sp, x1
>         mov     x0, sp
>         bl      foo
> 
> Instead of:
> 
>         stp     x29, x30, [sp, -16]!
>         add     w1, w1, 1
>         mov     x29, sp
>         sub     sp, sp, w1, sxtw 4
>         mov     x0, sp
>         bl      foo
> 
> I've looked at it but can't really find a way to solve the regression.
> Any thoughts on this?

Can you explain what goes wrong?  As I said rewriting parts of
address calculation is tricky, there's always the chance that some
cases regress (see your observation in comment#4 of the PR).

Note that I still believe that avoiding the early and premature
promotion of the addition to unsigned is a good thing.

Note the testcase in the PR is fixed with -fwrapv because then
we do _not_ perform this premature optimization.  Without -fwrapv
the optimization is valid but as you note we do not perform it
consistently - otherwise we wouldn't regress.

Richard.



> Thanks,
> Manolis
> 
> 
> 
> On Thu, May 16, 2024 at 11:15 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Tue, May 14, 2024 at 10:58 AM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
> > >
> > > New patch with the requested changes can be found below.
> > >
> > > I don't know how much this affects SCEV, but I do believe that we
> > > should incorporate this change somehow. I've seen various cases of
> > > suboptimal address calculation codegen that boil down to this.
> >
> > This misses the ChangeLog (I assume it's unchanged) and indent
> > of the match.pd part is now off.
> >
> > Please fix that, the patch is OK with that change.
> >
> > Thanks,
> > Richard.
> >
> > > gcc/match.pd | 31 +++++++++++++++++++++++++++++++
> > > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
> > > 2 files changed, 47 insertions(+)
> > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index 07e743ae464..1d642c205f0 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > (plus (convert @0) (op @2 (convert @1))))))
> > > #endif
> > > +/* ((T)(A + CST1)) * CST2 + CST3
> > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > > + Where (A + CST1) doesn't need to have a single use. */
> > > +#if GIMPLE
> > > + (for op (plus minus)
> > > + (simplify
> > > + (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > + INTEGER_CST@3)
> > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > + && INTEGRAL_TYPE_P (type)
> > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > + && TYPE_OVERFLOW_WRAPS (type))
> > > + (op (mult (convert @0) @2) (plus (mult (convert @1) @2) @3)))))
> > > +#endif
> > > +
> > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */
> > > +#if GIMPLE
> > > + (for op (plus minus)
> > > + (simplify
> > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > + && INTEGRAL_TYPE_P (type)
> > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > + && TYPE_OVERFLOW_WRAPS (type))
> > > + (op (mult (convert @0) @2) (mult (convert @1) @2)))))
> > > +#endif
> > > +
> > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
> > > to a simple value. */
> > > (for op (plus minus)
> > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
> > > new file mode 100644
> > > index 00000000000..e9051273672
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/pr109393.c
> > > @@ -0,0 +1,16 @@
> > > +/* PR tree-optimization/109393 */
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> > > +
> > > +int foo(int *a, int j)
> > > +{
> > > + int k = j - 1;
> > > + return a[j - 1] == a[k];
> > > +}
> > > +
> > > +int bar(int *a, int j)
> > > +{
> > > + int k = j - 1;
> > > + return (&a[j + 1] - 2) == &a[k];
> > > +}
> > > --
> > > 2.44.0
> > >
> > >
> > > On Tue, Apr 23, 2024 at 1:33 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
> > > >
> > > > The original motivation for this pattern was that the following function does
> > > > not fold to 'return 1':
> > > >
> > > > int foo(int *a, int j)
> > > > {
> > > >   int k = j - 1;
> > > >   return a[j - 1] == a[k];
> > > > }
> > > >
> > > > The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of
> > > > address calculations (e.g. arrays). These patterns help fold and simplify more
> > > > expressions.
> > > >
> > > >         PR tree-optimization/109393
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and
> > > >           ((T)(A +- CST1)) * CST2 + CST3.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >         * gcc.dg/pr109393.c: New test.
> > > >
> > > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > > > ---
> > > >
> > > >  gcc/match.pd                    | 30 ++++++++++++++++++++++++++++++
> > > >  gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
> > > >  2 files changed, 46 insertions(+)
> > > >  create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> > > >
> > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > index d401e7503e6..13c828ba70d 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > >         (plus (convert @0) (op @2 (convert @1))))))
> > > >  #endif
> > > >
> > > > +/* ((T)(A + CST1)) * CST2 + CST3
> > > > +     -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > > > +   Where (A + CST1) doesn't need to have a single use.  */
> > > > +#if GIMPLE
> > > > +  (for op (plus minus)
> > > > +   (simplify
> > > > +    (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3)
> > > > +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> > > > +         && TREE_CODE (type) == INTEGER_TYPE
> > > > +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > +         && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > +         && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > +         && TYPE_OVERFLOW_WRAPS (type))
> > > > +       (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3)))))
> > > > +#endif
> > > > +
> > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2)  */
> > > > +#if GIMPLE
> > > > +  (for op (plus minus)
> > > > +   (simplify
> > > > +    (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > > +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> > > > +         && TREE_CODE (type) == INTEGER_TYPE
> > > > +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > +         && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > +         && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > +         && TYPE_OVERFLOW_WRAPS (type))
> > > > +       (op (mult @2 (convert @0)) (mult @2 (convert @1))))))
> > > > +#endif
> > > > +
> > > >  /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
> > > >     to a simple value.  */
> > > >    (for op (plus minus)
> > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
> > > > new file mode 100644
> > > > index 00000000000..e9051273672
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c
> > > > @@ -0,0 +1,16 @@
> > > > +/* PR tree-optimization/109393 */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> > > > +
> > > > +int foo(int *a, int j)
> > > > +{
> > > > +  int k = j - 1;
> > > > +  return a[j - 1] == a[k];
> > > > +}
> > > > +
> > > > +int bar(int *a, int j)
> > > > +{
> > > > +  int k = j - 1;
> > > > +  return (&a[j + 1] - 2) == &a[k];
> > > > +}
> > > > --
> > > > 2.34.1
> > > >
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393]
  2024-05-17  9:22       ` Richard Biener
@ 2024-05-17 11:35         ` Manolis Tsamis
  2024-05-17 11:41           ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Manolis Tsamis @ 2024-05-17 11:35 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener, gcc-patches, Jiangning Liu, Philipp Tomsich,
	Andrew Pinski

On Fri, May 17, 2024 at 12:22 PM Richard Biener <rguenther@suse.de> wrote:
>
> On Fri, 17 May 2024, Manolis Tsamis wrote:
>
> > Hi Richard,
> >
> > While I was re-testing the latest version of this patch I noticed that
> > it FAILs an AArch64 test, gcc.target/aarch64/subsp.c. With the patch
> > we generate one instruction more:
> >
> >         sbfiz   x1, x1, 4, 32
> >         stp     x29, x30, [sp, -16]!
> >         add     x1, x1, 16
> >         mov     x29, sp
> >         sub     sp, sp, x1
> >         mov     x0, sp
> >         bl      foo
> >
> > Instead of:
> >
> >         stp     x29, x30, [sp, -16]!
> >         add     w1, w1, 1
> >         mov     x29, sp
> >         sub     sp, sp, w1, sxtw 4
> >         mov     x0, sp
> >         bl      foo
> >
> > I've looked at it but can't really find a way to solve the regression.
> > Any thoughts on this?
>
> Can you explain what goes wrong?  As I said rewriting parts of
> address calculation is tricky, there's always the chance that some
> cases regress (see your observation in comment#4 of the PR).
>

In this case the int -> sizetype cast ends up happening earlier. Instead of

  _7 = y_6(D) + 1;
  _1 = (sizetype) _7;
  _2 = _1 * 16;

We get

  _13 = (sizetype) y_6(D);
  _15 = _13 + 1;
  _2 = _15 * 16;

and then in RTL we have

x1 = ((sizetype) x1) << 4
sp = sp - (x1 + 16)

instead of

x1 = x1 + 1
sp = sp - ((sizetype) x1) << 4

which doesn't form sub sp, sp, w1, sxtw 4.

But more importantly, I realized that (in this case among others) the
pattern is undone by (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A
-> A * (C+-1). AFAIK having one pattern and its reverse is a bad thing
so something needs to be changed.
One idea could be to only keep the larger one ((T)(A + CST1)) * CST2 +
CST3 -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3). it's not enough to
deal with the testcases of the ticket but it does help in other cases.

Manolis

> Note that I still believe that avoiding the early and premature
> promotion of the addition to unsigned is a good thing.
>
> Note the testcase in the PR is fixed with -fwrapv because then
> we do _not_ perform this premature optimization.  Without -fwrapv
> the optimization is valid but as you note we do not perform it
> consistently - otherwise we wouldn't regress.
>
> Richard.
>
>
>
> > Thanks,
> > Manolis
> >
> >
> >
> > On Thu, May 16, 2024 at 11:15 AM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Tue, May 14, 2024 at 10:58 AM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
> > > >
> > > > New patch with the requested changes can be found below.
> > > >
> > > > I don't know how much this affects SCEV, but I do believe that we
> > > > should incorporate this change somehow. I've seen various cases of
> > > > suboptimal address calculation codegen that boil down to this.
> > >
> > > This misses the ChangeLog (I assume it's unchanged) and indent
> > > of the match.pd part is now off.
> > >
> > > Please fix that, the patch is OK with that change.
> > >
> > > Thanks,
> > > Richard.
> > >
> > > > gcc/match.pd | 31 +++++++++++++++++++++++++++++++
> > > > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
> > > > 2 files changed, 47 insertions(+)
> > > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> > > >
> > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > index 07e743ae464..1d642c205f0 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > > (plus (convert @0) (op @2 (convert @1))))))
> > > > #endif
> > > > +/* ((T)(A + CST1)) * CST2 + CST3
> > > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > > > + Where (A + CST1) doesn't need to have a single use. */
> > > > +#if GIMPLE
> > > > + (for op (plus minus)
> > > > + (simplify
> > > > + (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > > + INTEGER_CST@3)
> > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > > + && INTEGRAL_TYPE_P (type)
> > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > + && TYPE_OVERFLOW_WRAPS (type))
> > > > + (op (mult (convert @0) @2) (plus (mult (convert @1) @2) @3)))))
> > > > +#endif
> > > > +
> > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */
> > > > +#if GIMPLE
> > > > + (for op (plus minus)
> > > > + (simplify
> > > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > > + && INTEGRAL_TYPE_P (type)
> > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > + && TYPE_OVERFLOW_WRAPS (type))
> > > > + (op (mult (convert @0) @2) (mult (convert @1) @2)))))
> > > > +#endif
> > > > +
> > > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
> > > > to a simple value. */
> > > > (for op (plus minus)
> > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
> > > > new file mode 100644
> > > > index 00000000000..e9051273672
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c
> > > > @@ -0,0 +1,16 @@
> > > > +/* PR tree-optimization/109393 */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> > > > +
> > > > +int foo(int *a, int j)
> > > > +{
> > > > + int k = j - 1;
> > > > + return a[j - 1] == a[k];
> > > > +}
> > > > +
> > > > +int bar(int *a, int j)
> > > > +{
> > > > + int k = j - 1;
> > > > + return (&a[j + 1] - 2) == &a[k];
> > > > +}
> > > > --
> > > > 2.44.0
> > > >
> > > >
> > > > On Tue, Apr 23, 2024 at 1:33 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
> > > > >
> > > > > The original motivation for this pattern was that the following function does
> > > > > not fold to 'return 1':
> > > > >
> > > > > int foo(int *a, int j)
> > > > > {
> > > > >   int k = j - 1;
> > > > >   return a[j - 1] == a[k];
> > > > > }
> > > > >
> > > > > The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of
> > > > > address calculations (e.g. arrays). These patterns help fold and simplify more
> > > > > expressions.
> > > > >
> > > > >         PR tree-optimization/109393
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >         * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and
> > > > >           ((T)(A +- CST1)) * CST2 + CST3.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > >         * gcc.dg/pr109393.c: New test.
> > > > >
> > > > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > > > > ---
> > > > >
> > > > >  gcc/match.pd                    | 30 ++++++++++++++++++++++++++++++
> > > > >  gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
> > > > >  2 files changed, 46 insertions(+)
> > > > >  create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> > > > >
> > > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > > index d401e7503e6..13c828ba70d 100644
> > > > > --- a/gcc/match.pd
> > > > > +++ b/gcc/match.pd
> > > > > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > > >         (plus (convert @0) (op @2 (convert @1))))))
> > > > >  #endif
> > > > >
> > > > > +/* ((T)(A + CST1)) * CST2 + CST3
> > > > > +     -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > > > > +   Where (A + CST1) doesn't need to have a single use.  */
> > > > > +#if GIMPLE
> > > > > +  (for op (plus minus)
> > > > > +   (simplify
> > > > > +    (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3)
> > > > > +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> > > > > +         && TREE_CODE (type) == INTEGER_TYPE
> > > > > +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > > +         && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > > +         && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > > +         && TYPE_OVERFLOW_WRAPS (type))
> > > > > +       (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3)))))
> > > > > +#endif
> > > > > +
> > > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2)  */
> > > > > +#if GIMPLE
> > > > > +  (for op (plus minus)
> > > > > +   (simplify
> > > > > +    (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > > > +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> > > > > +         && TREE_CODE (type) == INTEGER_TYPE
> > > > > +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > > +         && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > > +         && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > > +         && TYPE_OVERFLOW_WRAPS (type))
> > > > > +       (op (mult @2 (convert @0)) (mult @2 (convert @1))))))
> > > > > +#endif
> > > > > +
> > > > >  /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
> > > > >     to a simple value.  */
> > > > >    (for op (plus minus)
> > > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
> > > > > new file mode 100644
> > > > > index 00000000000..e9051273672
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c
> > > > > @@ -0,0 +1,16 @@
> > > > > +/* PR tree-optimization/109393 */
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> > > > > +
> > > > > +int foo(int *a, int j)
> > > > > +{
> > > > > +  int k = j - 1;
> > > > > +  return a[j - 1] == a[k];
> > > > > +}
> > > > > +
> > > > > +int bar(int *a, int j)
> > > > > +{
> > > > > +  int k = j - 1;
> > > > > +  return (&a[j + 1] - 2) == &a[k];
> > > > > +}
> > > > > --
> > > > > 2.34.1
> > > > >
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393]
  2024-05-17 11:35         ` Manolis Tsamis
@ 2024-05-17 11:41           ` Richard Biener
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Biener @ 2024-05-17 11:41 UTC (permalink / raw)
  To: Manolis Tsamis
  Cc: Richard Biener, gcc-patches, Jiangning Liu, Philipp Tomsich,
	Andrew Pinski

[-- Attachment #1: Type: text/plain, Size: 11496 bytes --]

On Fri, 17 May 2024, Manolis Tsamis wrote:

> On Fri, May 17, 2024 at 12:22 PM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Fri, 17 May 2024, Manolis Tsamis wrote:
> >
> > > Hi Richard,
> > >
> > > While I was re-testing the latest version of this patch I noticed that
> > > it FAILs an AArch64 test, gcc.target/aarch64/subsp.c. With the patch
> > > we generate one instruction more:
> > >
> > >         sbfiz   x1, x1, 4, 32
> > >         stp     x29, x30, [sp, -16]!
> > >         add     x1, x1, 16
> > >         mov     x29, sp
> > >         sub     sp, sp, x1
> > >         mov     x0, sp
> > >         bl      foo
> > >
> > > Instead of:
> > >
> > >         stp     x29, x30, [sp, -16]!
> > >         add     w1, w1, 1
> > >         mov     x29, sp
> > >         sub     sp, sp, w1, sxtw 4
> > >         mov     x0, sp
> > >         bl      foo
> > >
> > > I've looked at it but can't really find a way to solve the regression.
> > > Any thoughts on this?
> >
> > Can you explain what goes wrong?  As I said rewriting parts of
> > address calculation is tricky, there's always the chance that some
> > cases regress (see your observation in comment#4 of the PR).
> >
> 
> In this case the int -> sizetype cast ends up happening earlier. Instead of
> 
>   _7 = y_6(D) + 1;
>   _1 = (sizetype) _7;
>   _2 = _1 * 16;
> 
> We get
> 
>   _13 = (sizetype) y_6(D);
>   _15 = _13 + 1;
>   _2 = _15 * 16;
> 
> and then in RTL we have
> 
> x1 = ((sizetype) x1) << 4
> sp = sp - (x1 + 16)
> 
> instead of
> 
> x1 = x1 + 1
> sp = sp - ((sizetype) x1) << 4
> 
> which doesn't form sub sp, sp, w1, sxtw 4.
> 
> But more importantly, I realized that (in this case among others) the
> pattern is undone by (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A
> -> A * (C+-1). AFAIK having one pattern and its reverse is a bad thing
> so something needs to be changed.

Yes, we have that issue.  And we've guarded GIMPLE vs. non-GIMPLE and
have recursion limits in match to deal with this.  But yes, having
both is bad.  I'd say that clearly patterns reducing the number of
operations are good at least for canonicalization.

> One idea could be to only keep the larger one ((T)(A + CST1)) * CST2 +
> CST3 -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3). it's not enough to
> deal with the testcases of the ticket but it does help in other cases.

The issue with such larger patterns is that they hint at the fact
the transform should happen with an eye on more than just the
small expresion.  Thus not in match.pd but in a pass like reassoc
or SLSR or IVOPTs or even CSE itself.  We also have to avoid
doing changes that cannot be undone when canonicalizing.

Richard.

> Manolis
> 
> > Note that I still believe that avoiding the early and premature
> > promotion of the addition to unsigned is a good thing.
> >
> > Note the testcase in the PR is fixed with -fwrapv because then
> > we do _not_ perform this premature optimization.  Without -fwrapv
> > the optimization is valid but as you note we do not perform it
> > consistently - otherwise we wouldn't regress.
> >
> > Richard.
> >
> >
> >
> > > Thanks,
> > > Manolis
> > >
> > >
> > >
> > > On Thu, May 16, 2024 at 11:15 AM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Tue, May 14, 2024 at 10:58 AM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
> > > > >
> > > > > New patch with the requested changes can be found below.
> > > > >
> > > > > I don't know how much this affects SCEV, but I do believe that we
> > > > > should incorporate this change somehow. I've seen various cases of
> > > > > suboptimal address calculation codegen that boil down to this.
> > > >
> > > > This misses the ChangeLog (I assume it's unchanged) and indent
> > > > of the match.pd part is now off.
> > > >
> > > > Please fix that, the patch is OK with that change.
> > > >
> > > > Thanks,
> > > > Richard.
> > > >
> > > > > gcc/match.pd | 31 +++++++++++++++++++++++++++++++
> > > > > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
> > > > > 2 files changed, 47 insertions(+)
> > > > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> > > > >
> > > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > > index 07e743ae464..1d642c205f0 100644
> > > > > --- a/gcc/match.pd
> > > > > +++ b/gcc/match.pd
> > > > > @@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > > > (plus (convert @0) (op @2 (convert @1))))))
> > > > > #endif
> > > > > +/* ((T)(A + CST1)) * CST2 + CST3
> > > > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > > > > + Where (A + CST1) doesn't need to have a single use. */
> > > > > +#if GIMPLE
> > > > > + (for op (plus minus)
> > > > > + (simplify
> > > > > + (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > > > + INTEGER_CST@3)
> > > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > > > + && INTEGRAL_TYPE_P (type)
> > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > > + && TYPE_OVERFLOW_WRAPS (type))
> > > > > + (op (mult (convert @0) @2) (plus (mult (convert @1) @2) @3)))))
> > > > > +#endif
> > > > > +
> > > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */
> > > > > +#if GIMPLE
> > > > > + (for op (plus minus)
> > > > > + (simplify
> > > > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > > > + && INTEGRAL_TYPE_P (type)
> > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > > + && TYPE_OVERFLOW_WRAPS (type))
> > > > > + (op (mult (convert @0) @2) (mult (convert @1) @2)))))
> > > > > +#endif
> > > > > +
> > > > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
> > > > > to a simple value. */
> > > > > (for op (plus minus)
> > > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
> > > > > new file mode 100644
> > > > > index 00000000000..e9051273672
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c
> > > > > @@ -0,0 +1,16 @@
> > > > > +/* PR tree-optimization/109393 */
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> > > > > +
> > > > > +int foo(int *a, int j)
> > > > > +{
> > > > > + int k = j - 1;
> > > > > + return a[j - 1] == a[k];
> > > > > +}
> > > > > +
> > > > > +int bar(int *a, int j)
> > > > > +{
> > > > > + int k = j - 1;
> > > > > + return (&a[j + 1] - 2) == &a[k];
> > > > > +}
> > > > > --
> > > > > 2.44.0
> > > > >
> > > > >
> > > > > On Tue, Apr 23, 2024 at 1:33 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
> > > > > >
> > > > > > The original motivation for this pattern was that the following function does
> > > > > > not fold to 'return 1':
> > > > > >
> > > > > > int foo(int *a, int j)
> > > > > > {
> > > > > >   int k = j - 1;
> > > > > >   return a[j - 1] == a[k];
> > > > > > }
> > > > > >
> > > > > > The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of
> > > > > > address calculations (e.g. arrays). These patterns help fold and simplify more
> > > > > > expressions.
> > > > > >
> > > > > >         PR tree-optimization/109393
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > >         * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and
> > > > > >           ((T)(A +- CST1)) * CST2 + CST3.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > >         * gcc.dg/pr109393.c: New test.
> > > > > >
> > > > > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > > > > > ---
> > > > > >
> > > > > >  gcc/match.pd                    | 30 ++++++++++++++++++++++++++++++
> > > > > >  gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
> > > > > >  2 files changed, 46 insertions(+)
> > > > > >  create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> > > > > >
> > > > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > > > index d401e7503e6..13c828ba70d 100644
> > > > > > --- a/gcc/match.pd
> > > > > > +++ b/gcc/match.pd
> > > > > > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > > > >         (plus (convert @0) (op @2 (convert @1))))))
> > > > > >  #endif
> > > > > >
> > > > > > +/* ((T)(A + CST1)) * CST2 + CST3
> > > > > > +     -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > > > > > +   Where (A + CST1) doesn't need to have a single use.  */
> > > > > > +#if GIMPLE
> > > > > > +  (for op (plus minus)
> > > > > > +   (simplify
> > > > > > +    (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3)
> > > > > > +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> > > > > > +         && TREE_CODE (type) == INTEGER_TYPE
> > > > > > +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > > > +         && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > > > +         && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > > > +         && TYPE_OVERFLOW_WRAPS (type))
> > > > > > +       (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3)))))
> > > > > > +#endif
> > > > > > +
> > > > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2)  */
> > > > > > +#if GIMPLE
> > > > > > +  (for op (plus minus)
> > > > > > +   (simplify
> > > > > > +    (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > > > > +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> > > > > > +         && TREE_CODE (type) == INTEGER_TYPE
> > > > > > +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > > > +         && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > > > +         && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > > > +         && TYPE_OVERFLOW_WRAPS (type))
> > > > > > +       (op (mult @2 (convert @0)) (mult @2 (convert @1))))))
> > > > > > +#endif
> > > > > > +
> > > > > >  /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
> > > > > >     to a simple value.  */
> > > > > >    (for op (plus minus)
> > > > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
> > > > > > new file mode 100644
> > > > > > index 00000000000..e9051273672
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c
> > > > > > @@ -0,0 +1,16 @@
> > > > > > +/* PR tree-optimization/109393 */
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> > > > > > +
> > > > > > +int foo(int *a, int j)
> > > > > > +{
> > > > > > +  int k = j - 1;
> > > > > > +  return a[j - 1] == a[k];
> > > > > > +}
> > > > > > +
> > > > > > +int bar(int *a, int j)
> > > > > > +{
> > > > > > +  int k = j - 1;
> > > > > > +  return (&a[j + 1] - 2) == &a[k];
> > > > > > +}
> > > > > > --
> > > > > > 2.34.1
> > > > > >
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-05-17 11:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-23 10:33 [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393] Manolis Tsamis
2024-05-02 13:00 ` Richard Biener
2024-05-02 13:08   ` Manolis Tsamis
2024-05-02 13:27     ` Richard Biener
2024-05-14  8:57 ` Manolis Tsamis
2024-05-16  8:15   ` Richard Biener
2024-05-17  8:23     ` Manolis Tsamis
2024-05-17  9:22       ` Richard Biener
2024-05-17 11:35         ` Manolis Tsamis
2024-05-17 11:41           ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).