public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <rguenther@suse.de>
To: Manolis Tsamis <manolis.tsamis@vrull.eu>
Cc: Richard Biener <richard.guenther@gmail.com>,
	gcc-patches@gcc.gnu.org,
	 Jiangning Liu <jiangning.liu@amperecomputing.com>,
	 Philipp Tomsich <philipp.tomsich@vrull.eu>,
	 Andrew Pinski <quic_apinski@quicinc.com>
Subject: Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393]
Date: Fri, 17 May 2024 13:41:56 +0200 (CEST)	[thread overview]
Message-ID: <p5470479-1pn6-s0qs-s388-8nr6qs7184n3@fhfr.qr> (raw)
In-Reply-To: <CAM3yNXovPF=Cys5AyTvmHnK-g2GZFU=it0r0dXf62DvGxqJkxw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 11496 bytes --]

On Fri, 17 May 2024, Manolis Tsamis wrote:

> On Fri, May 17, 2024 at 12:22 PM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Fri, 17 May 2024, Manolis Tsamis wrote:
> >
> > > Hi Richard,
> > >
> > > While I was re-testing the latest version of this patch I noticed that
> > > it FAILs an AArch64 test, gcc.target/aarch64/subsp.c. With the patch
> > > we generate one instruction more:
> > >
> > >         sbfiz   x1, x1, 4, 32
> > >         stp     x29, x30, [sp, -16]!
> > >         add     x1, x1, 16
> > >         mov     x29, sp
> > >         sub     sp, sp, x1
> > >         mov     x0, sp
> > >         bl      foo
> > >
> > > Instead of:
> > >
> > >         stp     x29, x30, [sp, -16]!
> > >         add     w1, w1, 1
> > >         mov     x29, sp
> > >         sub     sp, sp, w1, sxtw 4
> > >         mov     x0, sp
> > >         bl      foo
> > >
> > > I've looked at it but can't really find a way to solve the regression.
> > > Any thoughts on this?
> >
> > Can you explain what goes wrong?  As I said rewriting parts of
> > address calculation is tricky, there's always the chance that some
> > cases regress (see your observation in comment#4 of the PR).
> >
> 
> In this case the int -> sizetype cast ends up happening earlier. Instead of
> 
>   _7 = y_6(D) + 1;
>   _1 = (sizetype) _7;
>   _2 = _1 * 16;
> 
> We get
> 
>   _13 = (sizetype) y_6(D);
>   _15 = _13 + 1;
>   _2 = _15 * 16;
> 
> and then in RTL we have
> 
> x1 = ((sizetype) x1) << 4
> sp = sp - (x1 + 16)
> 
> instead of
> 
> x1 = x1 + 1
> sp = sp - ((sizetype) x1) << 4
> 
> which doesn't form sub sp, sp, w1, sxtw 4.
> 
> But more importantly, I realized that (in this case among others) the
> pattern is undone by (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A
> -> A * (C+-1). AFAIK having one pattern and its reverse is a bad thing
> so something needs to be changed.

Yes, we have that issue.  And we've guarded GIMPLE vs. non-GIMPLE and
have recursion limits in match to deal with this.  But yes, having
both is bad.  I'd say that clearly patterns reducing the number of
operations are good at least for canonicalization.

> One idea could be to only keep the larger one ((T)(A + CST1)) * CST2 +
> CST3 -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3). it's not enough to
> deal with the testcases of the ticket but it does help in other cases.

The issue with such larger patterns is that they hint at the fact
the transform should happen with an eye on more than just the
small expresion.  Thus not in match.pd but in a pass like reassoc
or SLSR or IVOPTs or even CSE itself.  We also have to avoid
doing changes that cannot be undone when canonicalizing.

Richard.

> Manolis
> 
> > Note that I still believe that avoiding the early and premature
> > promotion of the addition to unsigned is a good thing.
> >
> > Note the testcase in the PR is fixed with -fwrapv because then
> > we do _not_ perform this premature optimization.  Without -fwrapv
> > the optimization is valid but as you note we do not perform it
> > consistently - otherwise we wouldn't regress.
> >
> > Richard.
> >
> >
> >
> > > Thanks,
> > > Manolis
> > >
> > >
> > >
> > > On Thu, May 16, 2024 at 11:15 AM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Tue, May 14, 2024 at 10:58 AM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
> > > > >
> > > > > New patch with the requested changes can be found below.
> > > > >
> > > > > I don't know how much this affects SCEV, but I do believe that we
> > > > > should incorporate this change somehow. I've seen various cases of
> > > > > suboptimal address calculation codegen that boil down to this.
> > > >
> > > > This misses the ChangeLog (I assume it's unchanged) and indent
> > > > of the match.pd part is now off.
> > > >
> > > > Please fix that, the patch is OK with that change.
> > > >
> > > > Thanks,
> > > > Richard.
> > > >
> > > > > gcc/match.pd | 31 +++++++++++++++++++++++++++++++
> > > > > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
> > > > > 2 files changed, 47 insertions(+)
> > > > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> > > > >
> > > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > > index 07e743ae464..1d642c205f0 100644
> > > > > --- a/gcc/match.pd
> > > > > +++ b/gcc/match.pd
> > > > > @@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > > > (plus (convert @0) (op @2 (convert @1))))))
> > > > > #endif
> > > > > +/* ((T)(A + CST1)) * CST2 + CST3
> > > > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > > > > + Where (A + CST1) doesn't need to have a single use. */
> > > > > +#if GIMPLE
> > > > > + (for op (plus minus)
> > > > > + (simplify
> > > > > + (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > > > + INTEGER_CST@3)
> > > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > > > + && INTEGRAL_TYPE_P (type)
> > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > > + && TYPE_OVERFLOW_WRAPS (type))
> > > > > + (op (mult (convert @0) @2) (plus (mult (convert @1) @2) @3)))))
> > > > > +#endif
> > > > > +
> > > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */
> > > > > +#if GIMPLE
> > > > > + (for op (plus minus)
> > > > > + (simplify
> > > > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > > > + && INTEGRAL_TYPE_P (type)
> > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > > + && TYPE_OVERFLOW_WRAPS (type))
> > > > > + (op (mult (convert @0) @2) (mult (convert @1) @2)))))
> > > > > +#endif
> > > > > +
> > > > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
> > > > > to a simple value. */
> > > > > (for op (plus minus)
> > > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
> > > > > new file mode 100644
> > > > > index 00000000000..e9051273672
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c
> > > > > @@ -0,0 +1,16 @@
> > > > > +/* PR tree-optimization/109393 */
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> > > > > +
> > > > > +int foo(int *a, int j)
> > > > > +{
> > > > > + int k = j - 1;
> > > > > + return a[j - 1] == a[k];
> > > > > +}
> > > > > +
> > > > > +int bar(int *a, int j)
> > > > > +{
> > > > > + int k = j - 1;
> > > > > + return (&a[j + 1] - 2) == &a[k];
> > > > > +}
> > > > > --
> > > > > 2.44.0
> > > > >
> > > > >
> > > > > On Tue, Apr 23, 2024 at 1:33 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
> > > > > >
> > > > > > The original motivation for this pattern was that the following function does
> > > > > > not fold to 'return 1':
> > > > > >
> > > > > > int foo(int *a, int j)
> > > > > > {
> > > > > >   int k = j - 1;
> > > > > >   return a[j - 1] == a[k];
> > > > > > }
> > > > > >
> > > > > > The expression ((unsigned long) (X +- C1) * C2) appears frequently as part of
> > > > > > address calculations (e.g. arrays). These patterns help fold and simplify more
> > > > > > expressions.
> > > > > >
> > > > > >         PR tree-optimization/109393
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > >         * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and
> > > > > >           ((T)(A +- CST1)) * CST2 + CST3.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > >         * gcc.dg/pr109393.c: New test.
> > > > > >
> > > > > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > > > > > ---
> > > > > >
> > > > > >  gcc/match.pd                    | 30 ++++++++++++++++++++++++++++++
> > > > > >  gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
> > > > > >  2 files changed, 46 insertions(+)
> > > > > >  create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> > > > > >
> > > > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > > > index d401e7503e6..13c828ba70d 100644
> > > > > > --- a/gcc/match.pd
> > > > > > +++ b/gcc/match.pd
> > > > > > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > > > >         (plus (convert @0) (op @2 (convert @1))))))
> > > > > >  #endif
> > > > > >
> > > > > > +/* ((T)(A + CST1)) * CST2 + CST3
> > > > > > +     -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > > > > > +   Where (A + CST1) doesn't need to have a single use.  */
> > > > > > +#if GIMPLE
> > > > > > +  (for op (plus minus)
> > > > > > +   (simplify
> > > > > > +    (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) INTEGER_CST@3)
> > > > > > +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> > > > > > +         && TREE_CODE (type) == INTEGER_TYPE
> > > > > > +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > > > +         && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > > > +         && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > > > +         && TYPE_OVERFLOW_WRAPS (type))
> > > > > > +       (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3)))))
> > > > > > +#endif
> > > > > > +
> > > > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2)  */
> > > > > > +#if GIMPLE
> > > > > > +  (for op (plus minus)
> > > > > > +   (simplify
> > > > > > +    (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > > > > +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> > > > > > +         && TREE_CODE (type) == INTEGER_TYPE
> > > > > > +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > > > +         && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > > > +         && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > > > +         && TYPE_OVERFLOW_WRAPS (type))
> > > > > > +       (op (mult @2 (convert @0)) (mult @2 (convert @1))))))
> > > > > > +#endif
> > > > > > +
> > > > > >  /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
> > > > > >     to a simple value.  */
> > > > > >    (for op (plus minus)
> > > > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
> > > > > > new file mode 100644
> > > > > > index 00000000000..e9051273672
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c
> > > > > > @@ -0,0 +1,16 @@
> > > > > > +/* PR tree-optimization/109393 */
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> > > > > > +
> > > > > > +int foo(int *a, int j)
> > > > > > +{
> > > > > > +  int k = j - 1;
> > > > > > +  return a[j - 1] == a[k];
> > > > > > +}
> > > > > > +
> > > > > > +int bar(int *a, int j)
> > > > > > +{
> > > > > > +  int k = j - 1;
> > > > > > +  return (&a[j + 1] - 2) == &a[k];
> > > > > > +}
> > > > > > --
> > > > > > 2.34.1
> > > > > >
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

      reply	other threads:[~2024-05-17 11:41 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-23 10:33 Manolis Tsamis
2024-05-02 13:00 ` Richard Biener
2024-05-02 13:08   ` Manolis Tsamis
2024-05-02 13:27     ` Richard Biener
2024-05-14  8:57 ` Manolis Tsamis
2024-05-16  8:15   ` Richard Biener
2024-05-17  8:23     ` Manolis Tsamis
2024-05-17  9:22       ` Richard Biener
2024-05-17 11:35         ` Manolis Tsamis
2024-05-17 11:41           ` Richard Biener [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=p5470479-1pn6-s0qs-s388-8nr6qs7184n3@fhfr.qr \
    --to=rguenther@suse.de \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jiangning.liu@amperecomputing.com \
    --cc=manolis.tsamis@vrull.eu \
    --cc=philipp.tomsich@vrull.eu \
    --cc=quic_apinski@quicinc.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).