public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like
@ 2020-11-10 16:51 segher at gcc dot gnu.org
  2020-11-10 16:58 ` [Bug rtl-optimization/97784] " pinskia at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: segher at gcc dot gnu.org @ 2020-11-10 16:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784

            Bug ID: 97784
           Summary: Expressions evaluated as long chain instead of as tree
                    or the like
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: segher at gcc dot gnu.org
  Target Milestone: ---

When compiling something like

#define O +
long x4(long x, long a, long b, long c, long d) { return x O a O b O c O d; }

we end up with machine code like

        add 3,3,4        # 10   [c=4 l=4]  *adddi3/0
        add 3,3,5        # 11   [c=4 l=4]  *adddi3/0
        add 3,3,6        # 12   [c=4 l=4]  *adddi3/0
        add 3,3,7        # 18   [c=4 l=4]  *adddi3/0
        blr              # 30   [c=4 l=4]  simple_return

Every of those "add" insns depends on the result of the previous one,
making this slower than necessary: it has the latency of 4 add insns in
series, while some can be done in parallel.


This problem is there on gimple level already:

  _1 = x_4(D) + a_5(D);
  _2 = _1 + b_6(D);
  _3 = _2 + c_7(D);
  _9 = _3 + d_8(D);
  return _9;


A very similar problem also happens as a result of RTL unrolling.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/97784] Expressions evaluated as long chain instead of as tree or the like
  2020-11-10 16:51 [Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like segher at gcc dot gnu.org
@ 2020-11-10 16:58 ` pinskia at gcc dot gnu.org
  2020-11-10 17:50 ` segher at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2020-11-10 16:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Reassociation is done for signed types and places where it could introduce
overflows.

If you -fwrapv, you should get the optimization. Likewise for unsigned types.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/97784] Expressions evaluated as long chain instead of as tree or the like
  2020-11-10 16:51 [Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like segher at gcc dot gnu.org
  2020-11-10 16:58 ` [Bug rtl-optimization/97784] " pinskia at gcc dot gnu.org
@ 2020-11-10 17:50 ` segher at gcc dot gnu.org
  2020-11-11  7:28 ` [Bug target/97784] " rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: segher at gcc dot gnu.org @ 2020-11-10 17:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784

--- Comment #2 from Segher Boessenkool <segher at gcc dot gnu.org> ---
No, it is exactly the same with unsigned types :-(

Use  -Dlong="unsigned long"  or use  #define O ^  (as in my original test).
I forgot about this signed thing, but it has nothing to do with it (that
matters on gimple level, sure, but the problem exists in pure RTL as well).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/97784] Expressions evaluated as long chain instead of as tree or the like
  2020-11-10 16:51 [Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like segher at gcc dot gnu.org
  2020-11-10 16:58 ` [Bug rtl-optimization/97784] " pinskia at gcc dot gnu.org
  2020-11-10 17:50 ` segher at gcc dot gnu.org
@ 2020-11-11  7:28 ` rguenth at gcc dot gnu.org
  2020-11-11  7:34 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-11-11  7:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|rtl-optimization            |target
             Target|                            |powerpc

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
There is targetm.sched.reassociation_width which specifies how re-assocation
should make such sequence "wide".  Andrew is correct that we don't do this
for any types that are TYPE_OVERFLOW_UNDEFINED.

And powerpc has

static int
rs6000_reassociation_width (unsigned int opc ATTRIBUTE_UNUSED,
                            machine_mode mode)
{
  switch (rs6000_tune)
    {
    case PROCESSOR_POWER8:
    case PROCESSOR_POWER9:
    case PROCESSOR_POWER10:
      if (DECIMAL_FLOAT_MODE_P (mode))
        return 1;
      if (VECTOR_MODE_P (mode))
        return 4;
      if (INTEGRAL_MODE_P (mode))
        return 1;

thus you get width 1 which means a linear chain (even if the user wrote
a tree).

Note RTL doesn't do any such thing like re-assocation (I guess in principle
scheduling could, and that's the only place where it would make sense
on RTL).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/97784] Expressions evaluated as long chain instead of as tree or the like
  2020-11-10 16:51 [Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like segher at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2020-11-11  7:28 ` [Bug target/97784] " rguenth at gcc dot gnu.org
@ 2020-11-11  7:34 ` rguenth at gcc dot gnu.org
  2020-11-11  9:10 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-11-11  7:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Oh and for the GIMPLE side there's still my TODO item of "finishing" turning
late GIMPLE to -fwrapv as to make the last reassoc pass effective here.  But
it's again a bit late for that.  Hmm, I might simply try ...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/97784] Expressions evaluated as long chain instead of as tree or the like
  2020-11-10 16:51 [Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like segher at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2020-11-11  7:34 ` rguenth at gcc dot gnu.org
@ 2020-11-11  9:10 ` rguenth at gcc dot gnu.org
  2020-11-11 20:32 ` segher at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-11-11  9:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 49544
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49544&action=edit
patch doing -fwrapv late

One piece of the approach would move reassoc after the last VRP pass:

diff --git a/gcc/passes.def b/gcc/passes.def
index c68231287b6..872511442f1 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -317,7 +317,6 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_lower_vector_ssa);
       NEXT_PASS (pass_lower_switch);
       NEXT_PASS (pass_cse_reciprocals);
-      NEXT_PASS (pass_reassoc, false /* insert_powi_p */);
       NEXT_PASS (pass_strength_reduction);
       NEXT_PASS (pass_split_paths);
       NEXT_PASS (pass_tracer);
@@ -332,6 +331,8 @@ along with GCC; see the file COPYING3.  If not see
       /* Threading can leave many const/copy propagations in the IL.
         Clean them up.  */
       NEXT_PASS (pass_copy_prop);
+      NEXT_PASS (pass_reassoc, false /* insert_powi_p */);
       NEXT_PASS (pass_warn_restrict);
       NEXT_PASS (pass_dse);
       NEXT_PASS (pass_cd_dce);

that seems to at least regress

FAIL: gcc.dg/tree-ssa/pr96480.c scan-tree-dump optimized " = _[0-9]* <= 3;"

maybe also some Wstringop-overflow.c diagnostics.

Now, altering a flag is a bit awkward since we have to restore it somewhere
as 'flag_wrapv' is global state also affecting other functions not yet in
late state.  A cleaner approach would be to move flag_wrapv (& friends)
to struct function fully (and change all users) much like we did for
can_throw_non_call_exceptions & flag_non_call_exceptions.  Maybe it's not
too bad (not so many users of flag_wrapv), but well - have coded the "ugly"
variant here.

We also can't alter flag_wrapv in case flag_trapv is set, obviously.

It might be interesting to do some more pass shuffling here, eventually
moving VRP a bit earlier.  We're doing a bit much threading and CSE late
nowadays.  It's also not entirely clear what late passes actually benefit
from undefined overflow (besides VRP, that is).

Bootstrapped / tested on x86_64-unknown-linux-gnu.

FAIL: gcc.dg/pr64434.c scan-rtl-dump-times expand "Swap operands" 1
FAIL: gcc.dg/tree-ssa/phi-opt-15.c scan-tree-dump-not optimized "ABS"
FAIL: gcc.dg/tree-ssa/pr44133.c (test for excess errors)
FAIL: gcc.dg/tree-ssa/pr92712-3.c scan-tree-dump-not optimized " =
[tv]_[0-9]*\\\\(D\\\\) \\\\* [tv]_[0-9]*\\\\(D\\\\);"
FAIL: gcc.dg/tree-ssa/pr96480.c scan-tree-dump optimized " = _[0-9]* <= 3;"
XPASS: gcc.dg/tree-ssa/reassoc-2.c scan-tree-dump-times optimized "return 0" 1
FAIL: gcc.dg/tree-ssa/slsr-10.c scan-tree-dump-times optimized " \\\\* " 1
FAIL: gcc.dg/tree-ssa/slsr-11.c scan-tree-dump-times optimized " \\\\* " 1
FAIL: gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\\\* 4" 2
FAIL: gcc.dg/tree-ssa/slsr-20.c scan-tree-dump-times optimized " \\\\* s" 1
FAIL: gcc.dg/tree-ssa/slsr-31.c scan-tree-dump-times optimized " \\\\* 2" 1
FAIL: gcc.dg/tree-ssa/slsr-32.c scan-tree-dump-times optimized " \\\\* 2" 1
FAIL: gcc.dg/tree-ssa/slsr-33.c scan-tree-dump-times optimized " \\\\* " 1
FAIL: gcc.dg/tree-ssa/slsr-34.c scan-tree-dump-times optimized " \\\\* " 1
FAIL: gcc.dg/tree-ssa/slsr-37.c scan-tree-dump-times optimized " \\\\* 2" 1
FAIL: gcc.dg/tree-ssa/slsr-38.c scan-tree-dump-times optimized " \\\\* " 1
FAIL: gcc.dg/tree-ssa/slsr-5.c scan-tree-dump-times optimized " \\\\* " 1
FAIL: gcc.dg/tree-ssa/slsr-7.c scan-tree-dump-times optimized " \\\\* " 1
FAIL: gcc.dg/tree-ssa/slsr-9.c scan-tree-dump-times optimized " \\\\* " 1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/97784] Expressions evaluated as long chain instead of as tree or the like
  2020-11-10 16:51 [Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like segher at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2020-11-11  9:10 ` rguenth at gcc dot gnu.org
@ 2020-11-11 20:32 ` segher at gcc dot gnu.org
  2022-10-31  6:35 ` pinskia at gcc dot gnu.org
  2023-06-07  3:15 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: segher at gcc dot gnu.org @ 2020-11-11 20:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784

--- Comment #6 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #3)
> There is targetm.sched.reassociation_width which specifies how re-assocation
> should make such sequence "wide".

Ah cool, thank you :-)

> Andrew is correct that we don't do this
> for any types that are TYPE_OVERFLOW_UNDEFINED.

Yes; but I see the sub-optimal behaviour for unsigned, too.

> And powerpc has
> 
> static int
> rs6000_reassociation_width (unsigned int opc ATTRIBUTE_UNUSED,
>                             machine_mode mode)
> {
>   switch (rs6000_tune)
>     {
>     case PROCESSOR_POWER8:
>     case PROCESSOR_POWER9:
>     case PROCESSOR_POWER10:
>       if (DECIMAL_FLOAT_MODE_P (mode))
>         return 1;
>       if (VECTOR_MODE_P (mode))
>         return 4;
>       if (INTEGRAL_MODE_P (mode))
>         return 1;

Yeah this last 1 is the problem :-)

> thus you get width 1 which means a linear chain (even if the user wrote
> a tree).

Yup.

> Note RTL doesn't do any such thing like re-assocation (I guess in principle
> scheduling could, and that's the only place where it would make sense
> on RTL).

RTL unrolling can, actually!  "Variable expansion" is its horrible name
(and it makes a lot of sense there: it allows breaking a bit linear chain
into pieces).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/97784] Expressions evaluated as long chain instead of as tree or the like
  2020-11-10 16:51 [Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like segher at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2020-11-11 20:32 ` segher at gcc dot gnu.org
@ 2022-10-31  6:35 ` pinskia at gcc dot gnu.org
  2023-06-07  3:15 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-10-31  6:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
           Keywords|                            |missed-optimization

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/97784] Expressions evaluated as long chain instead of as tree or the like
  2020-11-10 16:51 [Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like segher at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2022-10-31  6:35 ` pinskia at gcc dot gnu.org
@ 2023-06-07  3:15 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-07  3:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-06-07

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-06-07  3:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-10 16:51 [Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like segher at gcc dot gnu.org
2020-11-10 16:58 ` [Bug rtl-optimization/97784] " pinskia at gcc dot gnu.org
2020-11-10 17:50 ` segher at gcc dot gnu.org
2020-11-11  7:28 ` [Bug target/97784] " rguenth at gcc dot gnu.org
2020-11-11  7:34 ` rguenth at gcc dot gnu.org
2020-11-11  9:10 ` rguenth at gcc dot gnu.org
2020-11-11 20:32 ` segher at gcc dot gnu.org
2022-10-31  6:35 ` pinskia at gcc dot gnu.org
2023-06-07  3:15 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).