* [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393] @ 2024-04-23 10:33 Manolis Tsamis 2024-05-02 13:00 ` Richard Biener 2024-05-14 8:57 ` Manolis Tsamis 0 siblings, 2 replies; 10+ messages in thread From: Manolis Tsamis @ 2024-04-23 10:33 UTC (permalink / raw) To: gcc-patches Cc: Richard Biener, Jiangning Liu, Philipp Tomsich, Andrew Pinski, Manolis Tsamis The original motivation for this pattern was that the following function does not fold to 'return 1': int foo(int *a, int j) { int k = j - 1; return a[j - 1] == a[k]; } The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of address calculations (e.g. arrays). These patterns help fold and simplify more expressions. PR tree-optimization/109393 gcc/ChangeLog: * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and ((T)(A +- CST1)) * CST2 + CST3. gcc/testsuite/ChangeLog: * gcc.dg/pr109393.c: New test. Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> --- gcc/match.pd | 30 ++++++++++++++++++++++++++++++ gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++ 2 files changed, 46 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/pr109393.c diff --git a/gcc/match.pd b/gcc/match.pd index d401e7503e6..13c828ba70d 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (plus (convert @0) (op @2 (convert @1)))))) #endif +/* ((T)(A + CST1)) * CST2 + CST3 + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3) + Where (A + CST1) doesn't need to have a single use. */ +#if GIMPLE + (for op (plus minus) + (simplify + (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3) + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE + && TREE_CODE (type) == INTEGER_TYPE + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) + && TYPE_OVERFLOW_WRAPS (type)) + (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3))))) +#endif + +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */ +#if GIMPLE + (for op (plus minus) + (simplify + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2) + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE + && TREE_CODE (type) == INTEGER_TYPE + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) + && TYPE_OVERFLOW_WRAPS (type)) + (op (mult @2 (convert @0)) (mult @2 (convert @1)))))) +#endif + /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified to a simple value. */ (for op (plus minus) diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c new file mode 100644 index 00000000000..e9051273672 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr109393.c @@ -0,0 +1,16 @@ +/* PR tree-optimization/109393 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */ + +int foo(int *a, int j) +{ + int k = j - 1; + return a[j - 1] == a[k]; +} + +int bar(int *a, int j) +{ + int k = j - 1; + return (&a[j + 1] - 2) == &a[k]; +} -- 2.34.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393] 2024-04-23 10:33 [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393] Manolis Tsamis @ 2024-05-02 13:00 ` Richard Biener 2024-05-02 13:08 ` Manolis Tsamis 2024-05-14 8:57 ` Manolis Tsamis 1 sibling, 1 reply; 10+ messages in thread From: Richard Biener @ 2024-05-02 13:00 UTC (permalink / raw) To: Manolis Tsamis; +Cc: gcc-patches, Jiangning Liu, Philipp Tomsich, Andrew Pinski On Tue, 23 Apr 2024, Manolis Tsamis wrote: > The original motivation for this pattern was that the following function does > not fold to 'return 1': > > int foo(int *a, int j) > { > int k = j - 1; > return a[j - 1] == a[k]; > } > > The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of > address calculations (e.g. arrays). These patterns help fold and simplify more > expressions. > > PR tree-optimization/109393 > > gcc/ChangeLog: > > * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and > ((T)(A +- CST1)) * CST2 + CST3. > > gcc/testsuite/ChangeLog: > > * gcc.dg/pr109393.c: New test. > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> > --- > > gcc/match.pd | 30 ++++++++++++++++++++++++++++++ > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++ > 2 files changed, 46 insertions(+) > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index d401e7503e6..13c828ba70d 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (plus (convert @0) (op @2 (convert @1)))))) > #endif > > +/* ((T)(A + CST1)) * CST2 + CST3 > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3) > + Where (A + CST1) doesn't need to have a single use. */ > +#if GIMPLE > + (for op (plus minus) > + (simplify > + (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3) > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE > + && TREE_CODE (type) == INTEGER_TYPE > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > + && TYPE_OVERFLOW_WRAPS (type)) > + (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3))))) > +#endif > + > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */ > +#if GIMPLE > + (for op (plus minus) > + (simplify > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2) > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE Please use INTEGRAL_TYPE_P > + && TREE_CODE (type) == INTEGER_TYPE > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > + && TYPE_OVERFLOW_WRAPS (type)) > + (op (mult @2 (convert @0)) (mult @2 (convert @1)))))) (mult @2 (convert @0)) is non-canonical for no good reason if @0 isn't constant - constant should be 2nd, please swap operands here. > +#endif The first pattern is an extension of the second, why's the first necessary at all? The add of CST3 is unchanged (OK, you seem to associate here, but that's again a different thing). I'd say the 2nd pattern is OK with the above changes but the first looks redundant. Thanks, Richard. > + > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified > to a simple value. */ > (for op (plus minus) > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c > new file mode 100644 > index 00000000000..e9051273672 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/pr109393.c > @@ -0,0 +1,16 @@ > +/* PR tree-optimization/109393 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */ > + > +int foo(int *a, int j) > +{ > + int k = j - 1; > + return a[j - 1] == a[k]; > +} > + > +int bar(int *a, int j) > +{ > + int k = j - 1; > + return (&a[j + 1] - 2) == &a[k]; > +} > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393] 2024-05-02 13:00 ` Richard Biener @ 2024-05-02 13:08 ` Manolis Tsamis 2024-05-02 13:27 ` Richard Biener 0 siblings, 1 reply; 10+ messages in thread From: Manolis Tsamis @ 2024-05-02 13:08 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches, Jiangning Liu, Philipp Tomsich, Andrew Pinski On Thu, May 2, 2024 at 4:00 PM Richard Biener <rguenther@suse.de> wrote: > > On Tue, 23 Apr 2024, Manolis Tsamis wrote: > > > The original motivation for this pattern was that the following function does > > not fold to 'return 1': > > > > int foo(int *a, int j) > > { > > int k = j - 1; > > return a[j - 1] == a[k]; > > } > > > > The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of > > address calculations (e.g. arrays). These patterns help fold and simplify more > > expressions. > > > > PR tree-optimization/109393 > > > > gcc/ChangeLog: > > > > * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and > > ((T)(A +- CST1)) * CST2 + CST3. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/pr109393.c: New test. > > > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> > > --- > > > > gcc/match.pd | 30 ++++++++++++++++++++++++++++++ > > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++ > > 2 files changed, 46 insertions(+) > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > index d401e7503e6..13c828ba70d 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > (plus (convert @0) (op @2 (convert @1)))))) > > #endif > > > > +/* ((T)(A + CST1)) * CST2 + CST3 > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3) > > + Where (A + CST1) doesn't need to have a single use. */ > > +#if GIMPLE > > + (for op (plus minus) > > + (simplify > > + (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3) > > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE > > + && TREE_CODE (type) == INTEGER_TYPE > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > + && TYPE_OVERFLOW_WRAPS (type)) > > + (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3))))) > > +#endif > > + > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */ > > +#if GIMPLE > > + (for op (plus minus) > > + (simplify > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2) > > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE > > Please use INTEGRAL_TYPE_P > > > + && TREE_CODE (type) == INTEGER_TYPE > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > + && TYPE_OVERFLOW_WRAPS (type)) > > + (op (mult @2 (convert @0)) (mult @2 (convert @1)))))) > > (mult @2 (convert @0)) is non-canonical for no good reason if @0 > isn't constant - constant should be 2nd, please swap operands here. > > > +#endif > > The first pattern is an extension of the second, why's the first > necessary at all? The add of CST3 is unchanged (OK, you seem to > associate here, but that's again a different thing). > > I'd say the 2nd pattern is OK with the above changes but the first > looks redundant. > Hi Richard, Thanks for the comments, I'll fix these. The difference is that the second uses op:s while the first uses just op. In the second case if A + CST1 has other uses expanding the pattern may not be a good idea but in the first case it always is because we know + CST1 * CST2 will merge with + CST3. Thanks, Manolis > Thanks, > Richard. > > > + > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified > > to a simple value. */ > > (for op (plus minus) > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c > > new file mode 100644 > > index 00000000000..e9051273672 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/pr109393.c > > @@ -0,0 +1,16 @@ > > +/* PR tree-optimization/109393 */ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */ > > + > > +int foo(int *a, int j) > > +{ > > + int k = j - 1; > > + return a[j - 1] == a[k]; > > +} > > + > > +int bar(int *a, int j) > > +{ > > + int k = j - 1; > > + return (&a[j + 1] - 2) == &a[k]; > > +} > > > > -- > Richard Biener <rguenther@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393] 2024-05-02 13:08 ` Manolis Tsamis @ 2024-05-02 13:27 ` Richard Biener 0 siblings, 0 replies; 10+ messages in thread From: Richard Biener @ 2024-05-02 13:27 UTC (permalink / raw) To: Manolis Tsamis; +Cc: gcc-patches, Jiangning Liu, Philipp Tomsich, Andrew Pinski [-- Attachment #1: Type: text/plain, Size: 5361 bytes --] On Thu, 2 May 2024, Manolis Tsamis wrote: > On Thu, May 2, 2024 at 4:00 PM Richard Biener <rguenther@suse.de> wrote: > > > > On Tue, 23 Apr 2024, Manolis Tsamis wrote: > > > > > The original motivation for this pattern was that the following function does > > > not fold to 'return 1': > > > > > > int foo(int *a, int j) > > > { > > > int k = j - 1; > > > return a[j - 1] == a[k]; > > > } > > > > > > The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of > > > address calculations (e.g. arrays). These patterns help fold and simplify more > > > expressions. > > > > > > PR tree-optimization/109393 > > > > > > gcc/ChangeLog: > > > > > > * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and > > > ((T)(A +- CST1)) * CST2 + CST3. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.dg/pr109393.c: New test. > > > > > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> > > > --- > > > > > > gcc/match.pd | 30 ++++++++++++++++++++++++++++++ > > > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++ > > > 2 files changed, 46 insertions(+) > > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > index d401e7503e6..13c828ba70d 100644 > > > --- a/gcc/match.pd > > > +++ b/gcc/match.pd > > > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > (plus (convert @0) (op @2 (convert @1)))))) > > > #endif > > > > > > +/* ((T)(A + CST1)) * CST2 + CST3 > > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3) > > > + Where (A + CST1) doesn't need to have a single use. */ > > > +#if GIMPLE > > > + (for op (plus minus) > > > + (simplify > > > + (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3) > > > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE > > > + && TREE_CODE (type) == INTEGER_TYPE > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > > + && TYPE_OVERFLOW_WRAPS (type)) > > > + (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3))))) > > > +#endif > > > + > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */ > > > +#if GIMPLE > > > + (for op (plus minus) > > > + (simplify > > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2) > > > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE > > > > Please use INTEGRAL_TYPE_P > > > > > + && TREE_CODE (type) == INTEGER_TYPE > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > > + && TYPE_OVERFLOW_WRAPS (type)) > > > + (op (mult @2 (convert @0)) (mult @2 (convert @1)))))) > > > > (mult @2 (convert @0)) is non-canonical for no good reason if @0 > > isn't constant - constant should be 2nd, please swap operands here. > > > > > +#endif > > > > The first pattern is an extension of the second, why's the first > > necessary at all? The add of CST3 is unchanged (OK, you seem to > > associate here, but that's again a different thing). > > > > I'd say the 2nd pattern is OK with the above changes but the first > > looks redundant. > > > Hi Richard, > > Thanks for the comments, I'll fix these. > > The difference is that the second uses op:s while the first uses just op. > In the second case if A + CST1 has other uses expanding the pattern > may not be a good idea but in the first case it always is because we > know + CST1 * CST2 will merge with + CST3. I see. But that pattern misses a :s on the multiplication result then, no? Local pattern-matching isn't the best vehicle to handle multi-use cases, introducing context dependent canonicalizations can lead to SCEV analysis no longer matching up for related accesses and then data dependence analysis failing. It's been a trade-off here. Richard. > > Thanks, > Manolis > > > Thanks, > > Richard. > > > > > + > > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified > > > to a simple value. */ > > > (for op (plus minus) > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c > > > new file mode 100644 > > > index 00000000000..e9051273672 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c > > > @@ -0,0 +1,16 @@ > > > +/* PR tree-optimization/109393 */ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */ > > > + > > > +int foo(int *a, int j) > > > +{ > > > + int k = j - 1; > > > + return a[j - 1] == a[k]; > > > +} > > > + > > > +int bar(int *a, int j) > > > +{ > > > + int k = j - 1; > > > + return (&a[j + 1] - 2) == &a[k]; > > > +} > > > > > > > -- > > Richard Biener <rguenther@suse.de> > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393] 2024-04-23 10:33 [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393] Manolis Tsamis 2024-05-02 13:00 ` Richard Biener @ 2024-05-14 8:57 ` Manolis Tsamis 2024-05-16 8:15 ` Richard Biener 1 sibling, 1 reply; 10+ messages in thread From: Manolis Tsamis @ 2024-05-14 8:57 UTC (permalink / raw) To: gcc-patches; +Cc: Richard Biener, Jiangning Liu, Philipp Tomsich, Andrew Pinski New patch with the requested changes can be found below. I don't know how much this affects SCEV, but I do believe that we should incorporate this change somehow. I've seen various cases of suboptimal address calculation codegen that boil down to this. gcc/match.pd | 31 +++++++++++++++++++++++++++++++ gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++ 2 files changed, 47 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/pr109393.c diff --git a/gcc/match.pd b/gcc/match.pd index 07e743ae464..1d642c205f0 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (plus (convert @0) (op @2 (convert @1)))))) #endif +/* ((T)(A + CST1)) * CST2 + CST3 + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3) + Where (A + CST1) doesn't need to have a single use. */ +#if GIMPLE + (for op (plus minus) + (simplify + (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) + INTEGER_CST@3) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (type) + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) + && TYPE_OVERFLOW_WRAPS (type)) + (op (mult (convert @0) @2) (plus (mult (convert @1) @2) @3))))) +#endif + +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */ +#if GIMPLE + (for op (plus minus) + (simplify + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (type) + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) + && TYPE_OVERFLOW_WRAPS (type)) + (op (mult (convert @0) @2) (mult (convert @1) @2))))) +#endif + /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified to a simple value. */ (for op (plus minus) diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c new file mode 100644 index 00000000000..e9051273672 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr109393.c @@ -0,0 +1,16 @@ +/* PR tree-optimization/109393 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */ + +int foo(int *a, int j) +{ + int k = j - 1; + return a[j - 1] == a[k]; +} + +int bar(int *a, int j) +{ + int k = j - 1; + return (&a[j + 1] - 2) == &a[k]; +} -- 2.44.0 On Tue, Apr 23, 2024 at 1:33 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote: > > The original motivation for this pattern was that the following function does > not fold to 'return 1': > > int foo(int *a, int j) > { > int k = j - 1; > return a[j - 1] == a[k]; > } > > The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of > address calculations (e.g. arrays). These patterns help fold and simplify more > expressions. > > PR tree-optimization/109393 > > gcc/ChangeLog: > > * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and > ((T)(A +- CST1)) * CST2 + CST3. > > gcc/testsuite/ChangeLog: > > * gcc.dg/pr109393.c: New test. > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> > --- > > gcc/match.pd | 30 ++++++++++++++++++++++++++++++ > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++ > 2 files changed, 46 insertions(+) > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index d401e7503e6..13c828ba70d 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (plus (convert @0) (op @2 (convert @1)))))) > #endif > > +/* ((T)(A + CST1)) * CST2 + CST3 > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3) > + Where (A + CST1) doesn't need to have a single use. */ > +#if GIMPLE > + (for op (plus minus) > + (simplify > + (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3) > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE > + && TREE_CODE (type) == INTEGER_TYPE > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > + && TYPE_OVERFLOW_WRAPS (type)) > + (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3))))) > +#endif > + > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */ > +#if GIMPLE > + (for op (plus minus) > + (simplify > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2) > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE > + && TREE_CODE (type) == INTEGER_TYPE > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > + && TYPE_OVERFLOW_WRAPS (type)) > + (op (mult @2 (convert @0)) (mult @2 (convert @1)))))) > +#endif > + > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified > to a simple value. */ > (for op (plus minus) > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c > new file mode 100644 > index 00000000000..e9051273672 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/pr109393.c > @@ -0,0 +1,16 @@ > +/* PR tree-optimization/109393 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */ > + > +int foo(int *a, int j) > +{ > + int k = j - 1; > + return a[j - 1] == a[k]; > +} > + > +int bar(int *a, int j) > +{ > + int k = j - 1; > + return (&a[j + 1] - 2) == &a[k]; > +} > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393] 2024-05-14 8:57 ` Manolis Tsamis @ 2024-05-16 8:15 ` Richard Biener 2024-05-17 8:23 ` Manolis Tsamis 0 siblings, 1 reply; 10+ messages in thread From: Richard Biener @ 2024-05-16 8:15 UTC (permalink / raw) To: Manolis Tsamis Cc: gcc-patches, Richard Biener, Jiangning Liu, Philipp Tomsich, Andrew Pinski On Tue, May 14, 2024 at 10:58 AM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote: > > New patch with the requested changes can be found below. > > I don't know how much this affects SCEV, but I do believe that we > should incorporate this change somehow. I've seen various cases of > suboptimal address calculation codegen that boil down to this. This misses the ChangeLog (I assume it's unchanged) and indent of the match.pd part is now off. Please fix that, the patch is OK with that change. Thanks, Richard. > gcc/match.pd | 31 +++++++++++++++++++++++++++++++ > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++ > 2 files changed, 47 insertions(+) > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index 07e743ae464..1d642c205f0 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (plus (convert @0) (op @2 (convert @1)))))) > #endif > +/* ((T)(A + CST1)) * CST2 + CST3 > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3) > + Where (A + CST1) doesn't need to have a single use. */ > +#if GIMPLE > + (for op (plus minus) > + (simplify > + (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) > + INTEGER_CST@3) > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > + && INTEGRAL_TYPE_P (type) > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > + && TYPE_OVERFLOW_WRAPS (type)) > + (op (mult (convert @0) @2) (plus (mult (convert @1) @2) @3))))) > +#endif > + > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */ > +#if GIMPLE > + (for op (plus minus) > + (simplify > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2) > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > + && INTEGRAL_TYPE_P (type) > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > + && TYPE_OVERFLOW_WRAPS (type)) > + (op (mult (convert @0) @2) (mult (convert @1) @2))))) > +#endif > + > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified > to a simple value. */ > (for op (plus minus) > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c > new file mode 100644 > index 00000000000..e9051273672 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/pr109393.c > @@ -0,0 +1,16 @@ > +/* PR tree-optimization/109393 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */ > + > +int foo(int *a, int j) > +{ > + int k = j - 1; > + return a[j - 1] == a[k]; > +} > + > +int bar(int *a, int j) > +{ > + int k = j - 1; > + return (&a[j + 1] - 2) == &a[k]; > +} > -- > 2.44.0 > > > On Tue, Apr 23, 2024 at 1:33 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote: > > > > The original motivation for this pattern was that the following function does > > not fold to 'return 1': > > > > int foo(int *a, int j) > > { > > int k = j - 1; > > return a[j - 1] == a[k]; > > } > > > > The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of > > address calculations (e.g. arrays). These patterns help fold and simplify more > > expressions. > > > > PR tree-optimization/109393 > > > > gcc/ChangeLog: > > > > * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and > > ((T)(A +- CST1)) * CST2 + CST3. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/pr109393.c: New test. > > > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> > > --- > > > > gcc/match.pd | 30 ++++++++++++++++++++++++++++++ > > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++ > > 2 files changed, 46 insertions(+) > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > index d401e7503e6..13c828ba70d 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > (plus (convert @0) (op @2 (convert @1)))))) > > #endif > > > > +/* ((T)(A + CST1)) * CST2 + CST3 > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3) > > + Where (A + CST1) doesn't need to have a single use. */ > > +#if GIMPLE > > + (for op (plus minus) > > + (simplify > > + (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3) > > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE > > + && TREE_CODE (type) == INTEGER_TYPE > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > + && TYPE_OVERFLOW_WRAPS (type)) > > + (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3))))) > > +#endif > > + > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */ > > +#if GIMPLE > > + (for op (plus minus) > > + (simplify > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2) > > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE > > + && TREE_CODE (type) == INTEGER_TYPE > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > + && TYPE_OVERFLOW_WRAPS (type)) > > + (op (mult @2 (convert @0)) (mult @2 (convert @1)))))) > > +#endif > > + > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified > > to a simple value. */ > > (for op (plus minus) > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c > > new file mode 100644 > > index 00000000000..e9051273672 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/pr109393.c > > @@ -0,0 +1,16 @@ > > +/* PR tree-optimization/109393 */ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */ > > + > > +int foo(int *a, int j) > > +{ > > + int k = j - 1; > > + return a[j - 1] == a[k]; > > +} > > + > > +int bar(int *a, int j) > > +{ > > + int k = j - 1; > > + return (&a[j + 1] - 2) == &a[k]; > > +} > > -- > > 2.34.1 > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393] 2024-05-16 8:15 ` Richard Biener @ 2024-05-17 8:23 ` Manolis Tsamis 2024-05-17 9:22 ` Richard Biener 0 siblings, 1 reply; 10+ messages in thread From: Manolis Tsamis @ 2024-05-17 8:23 UTC (permalink / raw) To: Richard Biener Cc: gcc-patches, Richard Biener, Jiangning Liu, Philipp Tomsich, Andrew Pinski Hi Richard, While I was re-testing the latest version of this patch I noticed that it FAILs an AArch64 test, gcc.target/aarch64/subsp.c. With the patch we generate one instruction more: sbfiz x1, x1, 4, 32 stp x29, x30, [sp, -16]! add x1, x1, 16 mov x29, sp sub sp, sp, x1 mov x0, sp bl foo Instead of: stp x29, x30, [sp, -16]! add w1, w1, 1 mov x29, sp sub sp, sp, w1, sxtw 4 mov x0, sp bl foo I've looked at it but can't really find a way to solve the regression. Any thoughts on this? Thanks, Manolis On Thu, May 16, 2024 at 11:15 AM Richard Biener <richard.guenther@gmail.com> wrote: > > On Tue, May 14, 2024 at 10:58 AM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote: > > > > New patch with the requested changes can be found below. > > > > I don't know how much this affects SCEV, but I do believe that we > > should incorporate this change somehow. I've seen various cases of > > suboptimal address calculation codegen that boil down to this. > > This misses the ChangeLog (I assume it's unchanged) and indent > of the match.pd part is now off. > > Please fix that, the patch is OK with that change. > > Thanks, > Richard. > > > gcc/match.pd | 31 +++++++++++++++++++++++++++++++ > > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++ > > 2 files changed, 47 insertions(+) > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > index 07e743ae464..1d642c205f0 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > (plus (convert @0) (op @2 (convert @1)))))) > > #endif > > +/* ((T)(A + CST1)) * CST2 + CST3 > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3) > > + Where (A + CST1) doesn't need to have a single use. */ > > +#if GIMPLE > > + (for op (plus minus) > > + (simplify > > + (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) > > + INTEGER_CST@3) > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > > + && INTEGRAL_TYPE_P (type) > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > + && TYPE_OVERFLOW_WRAPS (type)) > > + (op (mult (convert @0) @2) (plus (mult (convert @1) @2) @3))))) > > +#endif > > + > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */ > > +#if GIMPLE > > + (for op (plus minus) > > + (simplify > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2) > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > > + && INTEGRAL_TYPE_P (type) > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > + && TYPE_OVERFLOW_WRAPS (type)) > > + (op (mult (convert @0) @2) (mult (convert @1) @2))))) > > +#endif > > + > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified > > to a simple value. */ > > (for op (plus minus) > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c > > new file mode 100644 > > index 00000000000..e9051273672 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/pr109393.c > > @@ -0,0 +1,16 @@ > > +/* PR tree-optimization/109393 */ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */ > > + > > +int foo(int *a, int j) > > +{ > > + int k = j - 1; > > + return a[j - 1] == a[k]; > > +} > > + > > +int bar(int *a, int j) > > +{ > > + int k = j - 1; > > + return (&a[j + 1] - 2) == &a[k]; > > +} > > -- > > 2.44.0 > > > > > > On Tue, Apr 23, 2024 at 1:33 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote: > > > > > > The original motivation for this pattern was that the following function does > > > not fold to 'return 1': > > > > > > int foo(int *a, int j) > > > { > > > int k = j - 1; > > > return a[j - 1] == a[k]; > > > } > > > > > > The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of > > > address calculations (e.g. arrays). These patterns help fold and simplify more > > > expressions. > > > > > > PR tree-optimization/109393 > > > > > > gcc/ChangeLog: > > > > > > * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and > > > ((T)(A +- CST1)) * CST2 + CST3. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.dg/pr109393.c: New test. > > > > > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> > > > --- > > > > > > gcc/match.pd | 30 ++++++++++++++++++++++++++++++ > > > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++ > > > 2 files changed, 46 insertions(+) > > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > index d401e7503e6..13c828ba70d 100644 > > > --- a/gcc/match.pd > > > +++ b/gcc/match.pd > > > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > (plus (convert @0) (op @2 (convert @1)))))) > > > #endif > > > > > > +/* ((T)(A + CST1)) * CST2 + CST3 > > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3) > > > + Where (A + CST1) doesn't need to have a single use. */ > > > +#if GIMPLE > > > + (for op (plus minus) > > > + (simplify > > > + (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3) > > > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE > > > + && TREE_CODE (type) == INTEGER_TYPE > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > > + && TYPE_OVERFLOW_WRAPS (type)) > > > + (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3))))) > > > +#endif > > > + > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */ > > > +#if GIMPLE > > > + (for op (plus minus) > > > + (simplify > > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2) > > > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE > > > + && TREE_CODE (type) == INTEGER_TYPE > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > > + && TYPE_OVERFLOW_WRAPS (type)) > > > + (op (mult @2 (convert @0)) (mult @2 (convert @1)))))) > > > +#endif > > > + > > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified > > > to a simple value. */ > > > (for op (plus minus) > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c > > > new file mode 100644 > > > index 00000000000..e9051273672 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c > > > @@ -0,0 +1,16 @@ > > > +/* PR tree-optimization/109393 */ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */ > > > + > > > +int foo(int *a, int j) > > > +{ > > > + int k = j - 1; > > > + return a[j - 1] == a[k]; > > > +} > > > + > > > +int bar(int *a, int j) > > > +{ > > > + int k = j - 1; > > > + return (&a[j + 1] - 2) == &a[k]; > > > +} > > > -- > > > 2.34.1 > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393] 2024-05-17 8:23 ` Manolis Tsamis @ 2024-05-17 9:22 ` Richard Biener 2024-05-17 11:35 ` Manolis Tsamis 0 siblings, 1 reply; 10+ messages in thread From: Richard Biener @ 2024-05-17 9:22 UTC (permalink / raw) To: Manolis Tsamis Cc: Richard Biener, gcc-patches, Jiangning Liu, Philipp Tomsich, Andrew Pinski [-- Attachment #1: Type: text/plain, Size: 8704 bytes --] On Fri, 17 May 2024, Manolis Tsamis wrote: > Hi Richard, > > While I was re-testing the latest version of this patch I noticed that > it FAILs an AArch64 test, gcc.target/aarch64/subsp.c. With the patch > we generate one instruction more: > > sbfiz x1, x1, 4, 32 > stp x29, x30, [sp, -16]! > add x1, x1, 16 > mov x29, sp > sub sp, sp, x1 > mov x0, sp > bl foo > > Instead of: > > stp x29, x30, [sp, -16]! > add w1, w1, 1 > mov x29, sp > sub sp, sp, w1, sxtw 4 > mov x0, sp > bl foo > > I've looked at it but can't really find a way to solve the regression. > Any thoughts on this? Can you explain what goes wrong? As I said rewriting parts of address calculation is tricky, there's always the chance that some cases regress (see your observation in comment#4 of the PR). Note that I still believe that avoiding the early and premature promotion of the addition to unsigned is a good thing. Note the testcase in the PR is fixed with -fwrapv because then we do _not_ perform this premature optimization. Without -fwrapv the optimization is valid but as you note we do not perform it consistently - otherwise we wouldn't regress. Richard. > Thanks, > Manolis > > > > On Thu, May 16, 2024 at 11:15 AM Richard Biener > <richard.guenther@gmail.com> wrote: > > > > On Tue, May 14, 2024 at 10:58 AM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote: > > > > > > New patch with the requested changes can be found below. > > > > > > I don't know how much this affects SCEV, but I do believe that we > > > should incorporate this change somehow. I've seen various cases of > > > suboptimal address calculation codegen that boil down to this. > > > > This misses the ChangeLog (I assume it's unchanged) and indent > > of the match.pd part is now off. > > > > Please fix that, the patch is OK with that change. > > > > Thanks, > > Richard. > > > > > gcc/match.pd | 31 +++++++++++++++++++++++++++++++ > > > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++ > > > 2 files changed, 47 insertions(+) > > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > index 07e743ae464..1d642c205f0 100644 > > > --- a/gcc/match.pd > > > +++ b/gcc/match.pd > > > @@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > (plus (convert @0) (op @2 (convert @1)))))) > > > #endif > > > +/* ((T)(A + CST1)) * CST2 + CST3 > > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3) > > > + Where (A + CST1) doesn't need to have a single use. */ > > > +#if GIMPLE > > > + (for op (plus minus) > > > + (simplify > > > + (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) > > > + INTEGER_CST@3) > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > > > + && INTEGRAL_TYPE_P (type) > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > > + && TYPE_OVERFLOW_WRAPS (type)) > > > + (op (mult (convert @0) @2) (plus (mult (convert @1) @2) @3))))) > > > +#endif > > > + > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */ > > > +#if GIMPLE > > > + (for op (plus minus) > > > + (simplify > > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2) > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > > > + && INTEGRAL_TYPE_P (type) > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > > + && TYPE_OVERFLOW_WRAPS (type)) > > > + (op (mult (convert @0) @2) (mult (convert @1) @2))))) > > > +#endif > > > + > > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified > > > to a simple value. */ > > > (for op (plus minus) > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c > > > new file mode 100644 > > > index 00000000000..e9051273672 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c > > > @@ -0,0 +1,16 @@ > > > +/* PR tree-optimization/109393 */ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */ > > > + > > > +int foo(int *a, int j) > > > +{ > > > + int k = j - 1; > > > + return a[j - 1] == a[k]; > > > +} > > > + > > > +int bar(int *a, int j) > > > +{ > > > + int k = j - 1; > > > + return (&a[j + 1] - 2) == &a[k]; > > > +} > > > -- > > > 2.44.0 > > > > > > > > > On Tue, Apr 23, 2024 at 1:33 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote: > > > > > > > > The original motivation for this pattern was that the following function does > > > > not fold to 'return 1': > > > > > > > > int foo(int *a, int j) > > > > { > > > > int k = j - 1; > > > > return a[j - 1] == a[k]; > > > > } > > > > > > > > The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of > > > > address calculations (e.g. arrays). These patterns help fold and simplify more > > > > expressions. > > > > > > > > PR tree-optimization/109393 > > > > > > > > gcc/ChangeLog: > > > > > > > > * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and > > > > ((T)(A +- CST1)) * CST2 + CST3. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > * gcc.dg/pr109393.c: New test. > > > > > > > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> > > > > --- > > > > > > > > gcc/match.pd | 30 ++++++++++++++++++++++++++++++ > > > > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++ > > > > 2 files changed, 46 insertions(+) > > > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > > index d401e7503e6..13c828ba70d 100644 > > > > --- a/gcc/match.pd > > > > +++ b/gcc/match.pd > > > > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > > (plus (convert @0) (op @2 (convert @1)))))) > > > > #endif > > > > > > > > +/* ((T)(A + CST1)) * CST2 + CST3 > > > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3) > > > > + Where (A + CST1) doesn't need to have a single use. */ > > > > +#if GIMPLE > > > > + (for op (plus minus) > > > > + (simplify > > > > + (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3) > > > > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE > > > > + && TREE_CODE (type) == INTEGER_TYPE > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > > > + && TYPE_OVERFLOW_WRAPS (type)) > > > > + (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3))))) > > > > +#endif > > > > + > > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */ > > > > +#if GIMPLE > > > > + (for op (plus minus) > > > > + (simplify > > > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2) > > > > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE > > > > + && TREE_CODE (type) == INTEGER_TYPE > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > > > + && TYPE_OVERFLOW_WRAPS (type)) > > > > + (op (mult @2 (convert @0)) (mult @2 (convert @1)))))) > > > > +#endif > > > > + > > > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified > > > > to a simple value. */ > > > > (for op (plus minus) > > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c > > > > new file mode 100644 > > > > index 00000000000..e9051273672 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c > > > > @@ -0,0 +1,16 @@ > > > > +/* PR tree-optimization/109393 */ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */ > > > > + > > > > +int foo(int *a, int j) > > > > +{ > > > > + int k = j - 1; > > > > + return a[j - 1] == a[k]; > > > > +} > > > > + > > > > +int bar(int *a, int j) > > > > +{ > > > > + int k = j - 1; > > > > + return (&a[j + 1] - 2) == &a[k]; > > > > +} > > > > -- > > > > 2.34.1 > > > > > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393] 2024-05-17 9:22 ` Richard Biener @ 2024-05-17 11:35 ` Manolis Tsamis 2024-05-17 11:41 ` Richard Biener 0 siblings, 1 reply; 10+ messages in thread From: Manolis Tsamis @ 2024-05-17 11:35 UTC (permalink / raw) To: Richard Biener Cc: Richard Biener, gcc-patches, Jiangning Liu, Philipp Tomsich, Andrew Pinski On Fri, May 17, 2024 at 12:22 PM Richard Biener <rguenther@suse.de> wrote: > > On Fri, 17 May 2024, Manolis Tsamis wrote: > > > Hi Richard, > > > > While I was re-testing the latest version of this patch I noticed that > > it FAILs an AArch64 test, gcc.target/aarch64/subsp.c. With the patch > > we generate one instruction more: > > > > sbfiz x1, x1, 4, 32 > > stp x29, x30, [sp, -16]! > > add x1, x1, 16 > > mov x29, sp > > sub sp, sp, x1 > > mov x0, sp > > bl foo > > > > Instead of: > > > > stp x29, x30, [sp, -16]! > > add w1, w1, 1 > > mov x29, sp > > sub sp, sp, w1, sxtw 4 > > mov x0, sp > > bl foo > > > > I've looked at it but can't really find a way to solve the regression. > > Any thoughts on this? > > Can you explain what goes wrong? As I said rewriting parts of > address calculation is tricky, there's always the chance that some > cases regress (see your observation in comment#4 of the PR). > In this case the int -> sizetype cast ends up happening earlier. Instead of _7 = y_6(D) + 1; _1 = (sizetype) _7; _2 = _1 * 16; We get _13 = (sizetype) y_6(D); _15 = _13 + 1; _2 = _15 * 16; and then in RTL we have x1 = ((sizetype) x1) << 4 sp = sp - (x1 + 16) instead of x1 = x1 + 1 sp = sp - ((sizetype) x1) << 4 which doesn't form sub sp, sp, w1, sxtw 4. But more importantly, I realized that (in this case among others) the pattern is undone by (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A -> A * (C+-1). AFAIK having one pattern and its reverse is a bad thing so something needs to be changed. One idea could be to only keep the larger one ((T)(A + CST1)) * CST2 + CST3 -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3). it's not enough to deal with the testcases of the ticket but it does help in other cases. Manolis > Note that I still believe that avoiding the early and premature > promotion of the addition to unsigned is a good thing. > > Note the testcase in the PR is fixed with -fwrapv because then > we do _not_ perform this premature optimization. Without -fwrapv > the optimization is valid but as you note we do not perform it > consistently - otherwise we wouldn't regress. > > Richard. > > > > > Thanks, > > Manolis > > > > > > > > On Thu, May 16, 2024 at 11:15 AM Richard Biener > > <richard.guenther@gmail.com> wrote: > > > > > > On Tue, May 14, 2024 at 10:58 AM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote: > > > > > > > > New patch with the requested changes can be found below. > > > > > > > > I don't know how much this affects SCEV, but I do believe that we > > > > should incorporate this change somehow. I've seen various cases of > > > > suboptimal address calculation codegen that boil down to this. > > > > > > This misses the ChangeLog (I assume it's unchanged) and indent > > > of the match.pd part is now off. > > > > > > Please fix that, the patch is OK with that change. > > > > > > Thanks, > > > Richard. > > > > > > > gcc/match.pd | 31 +++++++++++++++++++++++++++++++ > > > > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++ > > > > 2 files changed, 47 insertions(+) > > > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > > index 07e743ae464..1d642c205f0 100644 > > > > --- a/gcc/match.pd > > > > +++ b/gcc/match.pd > > > > @@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > > (plus (convert @0) (op @2 (convert @1)))))) > > > > #endif > > > > +/* ((T)(A + CST1)) * CST2 + CST3 > > > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3) > > > > + Where (A + CST1) doesn't need to have a single use. */ > > > > +#if GIMPLE > > > > + (for op (plus minus) > > > > + (simplify > > > > + (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) > > > > + INTEGER_CST@3) > > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > > > > + && INTEGRAL_TYPE_P (type) > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > > > + && TYPE_OVERFLOW_WRAPS (type)) > > > > + (op (mult (convert @0) @2) (plus (mult (convert @1) @2) @3))))) > > > > +#endif > > > > + > > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */ > > > > +#if GIMPLE > > > > + (for op (plus minus) > > > > + (simplify > > > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2) > > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > > > > + && INTEGRAL_TYPE_P (type) > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > > > + && TYPE_OVERFLOW_WRAPS (type)) > > > > + (op (mult (convert @0) @2) (mult (convert @1) @2))))) > > > > +#endif > > > > + > > > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified > > > > to a simple value. */ > > > > (for op (plus minus) > > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c > > > > new file mode 100644 > > > > index 00000000000..e9051273672 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c > > > > @@ -0,0 +1,16 @@ > > > > +/* PR tree-optimization/109393 */ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */ > > > > + > > > > +int foo(int *a, int j) > > > > +{ > > > > + int k = j - 1; > > > > + return a[j - 1] == a[k]; > > > > +} > > > > + > > > > +int bar(int *a, int j) > > > > +{ > > > > + int k = j - 1; > > > > + return (&a[j + 1] - 2) == &a[k]; > > > > +} > > > > -- > > > > 2.44.0 > > > > > > > > > > > > On Tue, Apr 23, 2024 at 1:33 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote: > > > > > > > > > > The original motivation for this pattern was that the following function does > > > > > not fold to 'return 1': > > > > > > > > > > int foo(int *a, int j) > > > > > { > > > > > int k = j - 1; > > > > > return a[j - 1] == a[k]; > > > > > } > > > > > > > > > > The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of > > > > > address calculations (e.g. arrays). These patterns help fold and simplify more > > > > > expressions. > > > > > > > > > > PR tree-optimization/109393 > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and > > > > > ((T)(A +- CST1)) * CST2 + CST3. > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > * gcc.dg/pr109393.c: New test. > > > > > > > > > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> > > > > > --- > > > > > > > > > > gcc/match.pd | 30 ++++++++++++++++++++++++++++++ > > > > > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++ > > > > > 2 files changed, 46 insertions(+) > > > > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c > > > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > > > index d401e7503e6..13c828ba70d 100644 > > > > > --- a/gcc/match.pd > > > > > +++ b/gcc/match.pd > > > > > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > > > (plus (convert @0) (op @2 (convert @1)))))) > > > > > #endif > > > > > > > > > > +/* ((T)(A + CST1)) * CST2 + CST3 > > > > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3) > > > > > + Where (A + CST1) doesn't need to have a single use. */ > > > > > +#if GIMPLE > > > > > + (for op (plus minus) > > > > > + (simplify > > > > > + (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3) > > > > > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE > > > > > + && TREE_CODE (type) == INTEGER_TYPE > > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > > > > + && TYPE_OVERFLOW_WRAPS (type)) > > > > > + (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3))))) > > > > > +#endif > > > > > + > > > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */ > > > > > +#if GIMPLE > > > > > + (for op (plus minus) > > > > > + (simplify > > > > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2) > > > > > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE > > > > > + && TREE_CODE (type) == INTEGER_TYPE > > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > > > > + && TYPE_OVERFLOW_WRAPS (type)) > > > > > + (op (mult @2 (convert @0)) (mult @2 (convert @1)))))) > > > > > +#endif > > > > > + > > > > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified > > > > > to a simple value. */ > > > > > (for op (plus minus) > > > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c > > > > > new file mode 100644 > > > > > index 00000000000..e9051273672 > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c > > > > > @@ -0,0 +1,16 @@ > > > > > +/* PR tree-optimization/109393 */ > > > > > +/* { dg-do compile } */ > > > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */ > > > > > + > > > > > +int foo(int *a, int j) > > > > > +{ > > > > > + int k = j - 1; > > > > > + return a[j - 1] == a[k]; > > > > > +} > > > > > + > > > > > +int bar(int *a, int j) > > > > > +{ > > > > > + int k = j - 1; > > > > > + return (&a[j + 1] - 2) == &a[k]; > > > > > +} > > > > > -- > > > > > 2.34.1 > > > > > > > > > -- > Richard Biener <rguenther@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393] 2024-05-17 11:35 ` Manolis Tsamis @ 2024-05-17 11:41 ` Richard Biener 0 siblings, 0 replies; 10+ messages in thread From: Richard Biener @ 2024-05-17 11:41 UTC (permalink / raw) To: Manolis Tsamis Cc: Richard Biener, gcc-patches, Jiangning Liu, Philipp Tomsich, Andrew Pinski [-- Attachment #1: Type: text/plain, Size: 11496 bytes --] On Fri, 17 May 2024, Manolis Tsamis wrote: > On Fri, May 17, 2024 at 12:22 PM Richard Biener <rguenther@suse.de> wrote: > > > > On Fri, 17 May 2024, Manolis Tsamis wrote: > > > > > Hi Richard, > > > > > > While I was re-testing the latest version of this patch I noticed that > > > it FAILs an AArch64 test, gcc.target/aarch64/subsp.c. With the patch > > > we generate one instruction more: > > > > > > sbfiz x1, x1, 4, 32 > > > stp x29, x30, [sp, -16]! > > > add x1, x1, 16 > > > mov x29, sp > > > sub sp, sp, x1 > > > mov x0, sp > > > bl foo > > > > > > Instead of: > > > > > > stp x29, x30, [sp, -16]! > > > add w1, w1, 1 > > > mov x29, sp > > > sub sp, sp, w1, sxtw 4 > > > mov x0, sp > > > bl foo > > > > > > I've looked at it but can't really find a way to solve the regression. > > > Any thoughts on this? > > > > Can you explain what goes wrong? As I said rewriting parts of > > address calculation is tricky, there's always the chance that some > > cases regress (see your observation in comment#4 of the PR). > > > > In this case the int -> sizetype cast ends up happening earlier. Instead of > > _7 = y_6(D) + 1; > _1 = (sizetype) _7; > _2 = _1 * 16; > > We get > > _13 = (sizetype) y_6(D); > _15 = _13 + 1; > _2 = _15 * 16; > > and then in RTL we have > > x1 = ((sizetype) x1) << 4 > sp = sp - (x1 + 16) > > instead of > > x1 = x1 + 1 > sp = sp - ((sizetype) x1) << 4 > > which doesn't form sub sp, sp, w1, sxtw 4. > > But more importantly, I realized that (in this case among others) the > pattern is undone by (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A > -> A * (C+-1). AFAIK having one pattern and its reverse is a bad thing > so something needs to be changed. Yes, we have that issue. And we've guarded GIMPLE vs. non-GIMPLE and have recursion limits in match to deal with this. But yes, having both is bad. I'd say that clearly patterns reducing the number of operations are good at least for canonicalization. > One idea could be to only keep the larger one ((T)(A + CST1)) * CST2 + > CST3 -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3). it's not enough to > deal with the testcases of the ticket but it does help in other cases. The issue with such larger patterns is that they hint at the fact the transform should happen with an eye on more than just the small expresion. Thus not in match.pd but in a pass like reassoc or SLSR or IVOPTs or even CSE itself. We also have to avoid doing changes that cannot be undone when canonicalizing. Richard. > Manolis > > > Note that I still believe that avoiding the early and premature > > promotion of the addition to unsigned is a good thing. > > > > Note the testcase in the PR is fixed with -fwrapv because then > > we do _not_ perform this premature optimization. Without -fwrapv > > the optimization is valid but as you note we do not perform it > > consistently - otherwise we wouldn't regress. > > > > Richard. > > > > > > > > > Thanks, > > > Manolis > > > > > > > > > > > > On Thu, May 16, 2024 at 11:15 AM Richard Biener > > > <richard.guenther@gmail.com> wrote: > > > > > > > > On Tue, May 14, 2024 at 10:58 AM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote: > > > > > > > > > > New patch with the requested changes can be found below. > > > > > > > > > > I don't know how much this affects SCEV, but I do believe that we > > > > > should incorporate this change somehow. I've seen various cases of > > > > > suboptimal address calculation codegen that boil down to this. > > > > > > > > This misses the ChangeLog (I assume it's unchanged) and indent > > > > of the match.pd part is now off. > > > > > > > > Please fix that, the patch is OK with that change. > > > > > > > > Thanks, > > > > Richard. > > > > > > > > > gcc/match.pd | 31 +++++++++++++++++++++++++++++++ > > > > > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++ > > > > > 2 files changed, 47 insertions(+) > > > > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c > > > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > > > index 07e743ae464..1d642c205f0 100644 > > > > > --- a/gcc/match.pd > > > > > +++ b/gcc/match.pd > > > > > @@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > > > (plus (convert @0) (op @2 (convert @1)))))) > > > > > #endif > > > > > +/* ((T)(A + CST1)) * CST2 + CST3 > > > > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3) > > > > > + Where (A + CST1) doesn't need to have a single use. */ > > > > > +#if GIMPLE > > > > > + (for op (plus minus) > > > > > + (simplify > > > > > + (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) > > > > > + INTEGER_CST@3) > > > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > > > > > + && INTEGRAL_TYPE_P (type) > > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > > > > + && TYPE_OVERFLOW_WRAPS (type)) > > > > > + (op (mult (convert @0) @2) (plus (mult (convert @1) @2) @3))))) > > > > > +#endif > > > > > + > > > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */ > > > > > +#if GIMPLE > > > > > + (for op (plus minus) > > > > > + (simplify > > > > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2) > > > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > > > > > + && INTEGRAL_TYPE_P (type) > > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > > > > + && TYPE_OVERFLOW_WRAPS (type)) > > > > > + (op (mult (convert @0) @2) (mult (convert @1) @2))))) > > > > > +#endif > > > > > + > > > > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified > > > > > to a simple value. */ > > > > > (for op (plus minus) > > > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c > > > > > new file mode 100644 > > > > > index 00000000000..e9051273672 > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c > > > > > @@ -0,0 +1,16 @@ > > > > > +/* PR tree-optimization/109393 */ > > > > > +/* { dg-do compile } */ > > > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */ > > > > > + > > > > > +int foo(int *a, int j) > > > > > +{ > > > > > + int k = j - 1; > > > > > + return a[j - 1] == a[k]; > > > > > +} > > > > > + > > > > > +int bar(int *a, int j) > > > > > +{ > > > > > + int k = j - 1; > > > > > + return (&a[j + 1] - 2) == &a[k]; > > > > > +} > > > > > -- > > > > > 2.44.0 > > > > > > > > > > > > > > > On Tue, Apr 23, 2024 at 1:33 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote: > > > > > > > > > > > > The original motivation for this pattern was that the following function does > > > > > > not fold to 'return 1': > > > > > > > > > > > > int foo(int *a, int j) > > > > > > { > > > > > > int k = j - 1; > > > > > > return a[j - 1] == a[k]; > > > > > > } > > > > > > > > > > > > The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of > > > > > > address calculations (e.g. arrays). These patterns help fold and simplify more > > > > > > expressions. > > > > > > > > > > > > PR tree-optimization/109393 > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and > > > > > > ((T)(A +- CST1)) * CST2 + CST3. > > > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > > > * gcc.dg/pr109393.c: New test. > > > > > > > > > > > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> > > > > > > --- > > > > > > > > > > > > gcc/match.pd | 30 ++++++++++++++++++++++++++++++ > > > > > > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++ > > > > > > 2 files changed, 46 insertions(+) > > > > > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c > > > > > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > > > > index d401e7503e6..13c828ba70d 100644 > > > > > > --- a/gcc/match.pd > > > > > > +++ b/gcc/match.pd > > > > > > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > > > > (plus (convert @0) (op @2 (convert @1)))))) > > > > > > #endif > > > > > > > > > > > > +/* ((T)(A + CST1)) * CST2 + CST3 > > > > > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3) > > > > > > + Where (A + CST1) doesn't need to have a single use. */ > > > > > > +#if GIMPLE > > > > > > + (for op (plus minus) > > > > > > + (simplify > > > > > > + (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3) > > > > > > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE > > > > > > + && TREE_CODE (type) == INTEGER_TYPE > > > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > > > > > + && TYPE_OVERFLOW_WRAPS (type)) > > > > > > + (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3))))) > > > > > > +#endif > > > > > > + > > > > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */ > > > > > > +#if GIMPLE > > > > > > + (for op (plus minus) > > > > > > + (simplify > > > > > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2) > > > > > > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE > > > > > > + && TREE_CODE (type) == INTEGER_TYPE > > > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > > > > > + && TYPE_OVERFLOW_WRAPS (type)) > > > > > > + (op (mult @2 (convert @0)) (mult @2 (convert @1)))))) > > > > > > +#endif > > > > > > + > > > > > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified > > > > > > to a simple value. */ > > > > > > (for op (plus minus) > > > > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c > > > > > > new file mode 100644 > > > > > > index 00000000000..e9051273672 > > > > > > --- /dev/null > > > > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c > > > > > > @@ -0,0 +1,16 @@ > > > > > > +/* PR tree-optimization/109393 */ > > > > > > +/* { dg-do compile } */ > > > > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > > > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */ > > > > > > + > > > > > > +int foo(int *a, int j) > > > > > > +{ > > > > > > + int k = j - 1; > > > > > > + return a[j - 1] == a[k]; > > > > > > +} > > > > > > + > > > > > > +int bar(int *a, int j) > > > > > > +{ > > > > > > + int k = j - 1; > > > > > > + return (&a[j + 1] - 2) == &a[k]; > > > > > > +} > > > > > > -- > > > > > > 2.34.1 > > > > > > > > > > > > > -- > > Richard Biener <rguenther@suse.de> > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-05-17 11:41 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-04-23 10:33 [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393] Manolis Tsamis 2024-05-02 13:00 ` Richard Biener 2024-05-02 13:08 ` Manolis Tsamis 2024-05-02 13:27 ` Richard Biener 2024-05-14 8:57 ` Manolis Tsamis 2024-05-16 8:15 ` Richard Biener 2024-05-17 8:23 ` Manolis Tsamis 2024-05-17 9:22 ` Richard Biener 2024-05-17 11:35 ` Manolis Tsamis 2024-05-17 11:41 ` Richard Biener
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).