public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH/RFC] Simplify wrapped RTL op
@ 2019-08-27  9:47 Robin Dapp
  2019-08-27 10:50 ` Segher Boessenkool
  0 siblings, 1 reply; 6+ messages in thread
From: Robin Dapp @ 2019-08-27  9:47 UTC (permalink / raw)
  To: GCC Patches

Hi,

as announced in the wrapped-binop gimple patch mail, on s390 we still
emit odd code in front of loops:

  void v1 (unsigned long *in, unsigned long *out, unsigned int n)
  {
    int i;
    for (i = 0; i < n; i++)
    {
      out[i] = in[i];
    }
   }

   -->

   aghi    %r1,-8
   srlg    %r1,%r1,3
   aghi    %r1,1

This is created by doloop after getting niter from the loop as n - 1 or
"n * 8 - 8" with a step width of 8.  Realizing s390's doloop pattern
compares against 1, we add 1 to niter resulting in the code above.

When going a similar route as with the gimple patch, something like

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 9359a3cdb4d..9c06c9b6ee9 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -2364,6 +2364,24 @@ simplify_binary_operation_1 (enum rtx_code code,
machine_mode mode,
                                                           in1, in2));
        }

+      /* Transform (plus (lshiftrt (plus A -C1) C2) C3) to (lshiftrt A C2)
+         if C1 == -C3 * (1 << C2).  */
+      if (CONST_SCALAR_INT_P (op1)
+         && GET_CODE (op0) == LSHIFTRT
+         && CONST_SCALAR_INT_P (XEXP (op0, 1))
+         && GET_CODE (XEXP (op0, 0)) == PLUS
+         && CONST_SCALAR_INT_P (XEXP (XEXP (op0, 0), 1)))
+       {
+         rtx c3 = op1;
+         rtx c2 = XEXP (op0, 1);
+         rtx c1 = XEXP (XEXP (op0, 0), 1);
+
+         rtx a = XEXP (XEXP (op0, 0), 0);
+
+         if (-INTVAL (c3) * (1 << INTVAL (c2)) == INTVAL (c1))
+           return simplify_gen_binary (LSHIFTRT, mode, a, c2);
+       }
+
       /* (plus (comparison A B) C) can become (neg (rev-comp A B)) if
         C is 1 and STORE_FLAG_VALUE is -1 or if C is -1 and
STORE_FLAG_VALUE
         is 1.  */

helps immediately, yet overflow/range information is not considered.  Do
we somehow guarantee that the niter-related we created until doloop do
not overflow?  I did not note something when looking through the code.
Granted, the simplification seems oddly specific and is probably not
useful for a wide range of targets and situations.


Another approach would be to store "niter+1" (== n) when niter (== n-1)
is calculated and, when we need to do the increment, use the niter+1
that we already have without needing to simplify (n - 8) >> 3 + 1.

Any comments on this?

The patch above bootstraps and test suite is without regressions on s390
fwiw.

Regards
 Robin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH/RFC] Simplify wrapped RTL op
  2019-08-27  9:47 [PATCH/RFC] Simplify wrapped RTL op Robin Dapp
@ 2019-08-27 10:50 ` Segher Boessenkool
  2019-08-28  7:44   ` Segher Boessenkool
  0 siblings, 1 reply; 6+ messages in thread
From: Segher Boessenkool @ 2019-08-27 10:50 UTC (permalink / raw)
  To: Robin Dapp; +Cc: GCC Patches

On Tue, Aug 27, 2019 at 11:12:32AM +0200, Robin Dapp wrote:
> as announced in the wrapped-binop gimple patch mail, on s390 we still
> emit odd code in front of loops:

>    aghi    %r1,-8
>    srlg    %r1,%r1,3
>    aghi    %r1,1

This is done like this because %r1 might be 0.

We see this same problem on Power; there are quite a few PRs about it.

[ ... ]

> helps immediately, yet overflow/range information is not considered.

Yeah, and it has to be.

> Do
> we somehow guarantee that the niter-related we created until doloop do
> not overflow?  I did not note something when looking through the code.
> Granted, the simplification seems oddly specific and is probably not
> useful for a wide range of targets and situations.

You're at least the third target, and it's pretty annoying, and it tends
to cost more than two insns (because things can often be simplified
further after this).  It won't do super much for execution time, there
is a loop after this after all, a handful of insns executed once can't
be all that expensive relatively.

> Another approach would be to store "niter+1" (== n) when niter (== n-1)
> is calculated and, when we need to do the increment, use the niter+1
> that we already have without needing to simplify (n - 8) >> 3 + 1.
> 
> Any comments on this?
> 
> The patch above bootstraps and test suite is without regressions on s390
> fwiw.

When something similar was tried before there were regressions for
rs6000.  I'll find the PR later.

I was hoping that now that ivopts learns about doloops, this can be
handled better as well.  Ideally the doloop pass can move closer to
expand, and do much less analysis and work, all the heavy lifting has
been done already.


Segher

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH/RFC] Simplify wrapped RTL op
  2019-08-27 10:50 ` Segher Boessenkool
@ 2019-08-28  7:44   ` Segher Boessenkool
  2019-08-28  8:14     ` Segher Boessenkool
  0 siblings, 1 reply; 6+ messages in thread
From: Segher Boessenkool @ 2019-08-28  7:44 UTC (permalink / raw)
  To: Robin Dapp; +Cc: GCC Patches

On Tue, Aug 27, 2019 at 05:09:52AM -0500, Segher Boessenkool wrote:
> On Tue, Aug 27, 2019 at 11:12:32AM +0200, Robin Dapp wrote:
> > as announced in the wrapped-binop gimple patch mail, on s390 we still
> > emit odd code in front of loops:
> 
> >    aghi    %r1,-8
> >    srlg    %r1,%r1,3
> >    aghi    %r1,1
> 
> This is done like this because %r1 might be 0.
> 
> We see this same problem on Power; there are quite a few PRs about it.
> 
> [ ... ]
> 
> > helps immediately, yet overflow/range information is not considered.
> 
> Yeah, and it has to be.
> 
> > Do
> > we somehow guarantee that the niter-related we created until doloop do
> > not overflow?  I did not note something when looking through the code.
> > Granted, the simplification seems oddly specific and is probably not
> > useful for a wide range of targets and situations.
> 
> You're at least the third target, and it's pretty annoying, and it tends
> to cost more than two insns (because things can often be simplified
> further after this).  It won't do super much for execution time, there
> is a loop after this after all, a handful of insns executed once can't
> be all that expensive relatively.
> 
> > Another approach would be to store "niter+1" (== n) when niter (== n-1)
> > is calculated and, when we need to do the increment, use the niter+1
> > that we already have without needing to simplify (n - 8) >> 3 + 1.
> > 
> > Any comments on this?
> > 
> > The patch above bootstraps and test suite is without regressions on s390
> > fwiw.
> 
> When something similar was tried before there were regressions for
> rs6000.  I'll find the PR later.

PR37451.  Not clear what target that regressed on, btw.

> I was hoping that now that ivopts learns about doloops, this can be
> handled better as well.  Ideally the doloop pass can move closer to
> expand, and do much less analysis and work, all the heavy lifting has
> been done already.


Segher

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH/RFC] Simplify wrapped RTL op
  2019-08-28  7:44   ` Segher Boessenkool
@ 2019-08-28  8:14     ` Segher Boessenkool
  2019-08-29  9:31       ` Robin Dapp
  0 siblings, 1 reply; 6+ messages in thread
From: Segher Boessenkool @ 2019-08-28  8:14 UTC (permalink / raw)
  To: Robin Dapp; +Cc: GCC Patches

On Wed, Aug 28, 2019 at 02:05:58AM -0500, Segher Boessenkool wrote:
> On Tue, Aug 27, 2019 at 05:09:52AM -0500, Segher Boessenkool wrote:
> > On Tue, Aug 27, 2019 at 11:12:32AM +0200, Robin Dapp wrote:
> > > as announced in the wrapped-binop gimple patch mail, on s390 we still
> > > emit odd code in front of loops:
> > 
> > >    aghi    %r1,-8
> > >    srlg    %r1,%r1,3
> > >    aghi    %r1,1
> > 
> > This is done like this because %r1 might be 0.
> > 
> > We see this same problem on Power; there are quite a few PRs about it.
> > 
> > [ ... ]
> > 
> > > helps immediately, yet overflow/range information is not considered.
> > 
> > Yeah, and it has to be.
> > 
> > > Do
> > > we somehow guarantee that the niter-related we created until doloop do
> > > not overflow?  I did not note something when looking through the code.
> > > Granted, the simplification seems oddly specific and is probably not
> > > useful for a wide range of targets and situations.
> > 
> > You're at least the third target, and it's pretty annoying, and it tends
> > to cost more than two insns (because things can often be simplified
> > further after this).  It won't do super much for execution time, there
> > is a loop after this after all, a handful of insns executed once can't
> > be all that expensive relatively.
> > 
> > > Another approach would be to store "niter+1" (== n) when niter (== n-1)
> > > is calculated and, when we need to do the increment, use the niter+1
> > > that we already have without needing to simplify (n - 8) >> 3 + 1.
> > > 
> > > Any comments on this?
> > > 
> > > The patch above bootstraps and test suite is without regressions on s390
> > > fwiw.
> > 
> > When something similar was tried before there were regressions for
> > rs6000.  I'll find the PR later.
> 
> PR37451.  Not clear what target that regressed on, btw.

And PR55190 and PR67288 and probably more.

> > I was hoping that now that ivopts learns about doloops, this can be
> > handled better as well.  Ideally the doloop pass can move closer to
> > expand, and do much less analysis and work, all the heavy lifting has
> > been done already.


Segher

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH/RFC] Simplify wrapped RTL op
  2019-08-28  8:14     ` Segher Boessenkool
@ 2019-08-29  9:31       ` Robin Dapp
  2019-08-29 13:57         ` Segher Boessenkool
  0 siblings, 1 reply; 6+ messages in thread
From: Robin Dapp @ 2019-08-29  9:31 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: GCC Patches

>> PR37451.  Not clear what target that regressed on, btw.
> 
> And PR55190 and PR67288 and probably more.

Thanks for finding those.  So the hope is to get this fixed or rather
move towards a fix with the patch series that's currently reviewed which
injects some doloop knowledge into ivopts?

As said before, I was thinking of storing niter + 1 somewhere and use
this instead of doing the + 1 later when it cannot be simplified
anymore. But if we expect a larger rewrite anyway, it's probably not
worthwhile to pursue that now.

Regards
 Robin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH/RFC] Simplify wrapped RTL op
  2019-08-29  9:31       ` Robin Dapp
@ 2019-08-29 13:57         ` Segher Boessenkool
  0 siblings, 0 replies; 6+ messages in thread
From: Segher Boessenkool @ 2019-08-29 13:57 UTC (permalink / raw)
  To: Robin Dapp; +Cc: GCC Patches

Hi Robin,

On Thu, Aug 29, 2019 at 11:08:11AM +0200, Robin Dapp wrote:
> >> PR37451.  Not clear what target that regressed on, btw.
> > 
> > And PR55190 and PR67288 and probably more.
> 
> Thanks for finding those.  So the hope is to get this fixed or rather
> move towards a fix with the patch series that's currently reviewed which
> injects some doloop knowledge into ivopts?

The long-term plan is to do more and more of the loop optimisation earlier,
at Gimple level, and to certainly do almost no analysis on the RTL, as done
currently.  But this is a longer term still somewhat vague plan.

Right now ivopts decides if a loop can use doloop, and costs stuff based
on that.  Next steps will be to communicate "please use / do not use doloop
here" down to RTL.  Perhaps the doloop expansion should move closer to the
expand pass as well.

And it's not just doloop -- the unrolling should be done earlier, too.

How much of this will ever work out, and how much of it will make GCC 10,
hrm I need a new crystal ball :-)

> As said before, I was thinking of storing niter + 1 somewhere and use
> this instead of doing the + 1 later when it cannot be simplified
> anymore. But if we expect a larger rewrite anyway, it's probably not
> worthwhile to pursue that now.

This sounds like a pretty simple short-term solution.  If it works, I'm
all for it :-)


Segher

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-08-29 12:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-27  9:47 [PATCH/RFC] Simplify wrapped RTL op Robin Dapp
2019-08-27 10:50 ` Segher Boessenkool
2019-08-28  7:44   ` Segher Boessenkool
2019-08-28  8:14     ` Segher Boessenkool
2019-08-29  9:31       ` Robin Dapp
2019-08-29 13:57         ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).