public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like
@ 2020-11-10 16:51 segher at gcc dot gnu.org
2020-11-10 16:58 ` [Bug rtl-optimization/97784] " pinskia at gcc dot gnu.org
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: segher at gcc dot gnu.org @ 2020-11-10 16:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784
Bug ID: 97784
Summary: Expressions evaluated as long chain instead of as tree
or the like
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: segher at gcc dot gnu.org
Target Milestone: ---
When compiling something like
#define O +
long x4(long x, long a, long b, long c, long d) { return x O a O b O c O d; }
we end up with machine code like
add 3,3,4 # 10 [c=4 l=4] *adddi3/0
add 3,3,5 # 11 [c=4 l=4] *adddi3/0
add 3,3,6 # 12 [c=4 l=4] *adddi3/0
add 3,3,7 # 18 [c=4 l=4] *adddi3/0
blr # 30 [c=4 l=4] simple_return
Every of those "add" insns depends on the result of the previous one,
making this slower than necessary: it has the latency of 4 add insns in
series, while some can be done in parallel.
This problem is there on gimple level already:
_1 = x_4(D) + a_5(D);
_2 = _1 + b_6(D);
_3 = _2 + c_7(D);
_9 = _3 + d_8(D);
return _9;
A very similar problem also happens as a result of RTL unrolling.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/97784] Expressions evaluated as long chain instead of as tree or the like
2020-11-10 16:51 [Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like segher at gcc dot gnu.org
@ 2020-11-10 16:58 ` pinskia at gcc dot gnu.org
2020-11-10 17:50 ` segher at gcc dot gnu.org
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2020-11-10 16:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Reassociation is done for signed types and places where it could introduce
overflows.
If you -fwrapv, you should get the optimization. Likewise for unsigned types.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/97784] Expressions evaluated as long chain instead of as tree or the like
2020-11-10 16:51 [Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like segher at gcc dot gnu.org
2020-11-10 16:58 ` [Bug rtl-optimization/97784] " pinskia at gcc dot gnu.org
@ 2020-11-10 17:50 ` segher at gcc dot gnu.org
2020-11-11 7:28 ` [Bug target/97784] " rguenth at gcc dot gnu.org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: segher at gcc dot gnu.org @ 2020-11-10 17:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784
--- Comment #2 from Segher Boessenkool <segher at gcc dot gnu.org> ---
No, it is exactly the same with unsigned types :-(
Use -Dlong="unsigned long" or use #define O ^ (as in my original test).
I forgot about this signed thing, but it has nothing to do with it (that
matters on gimple level, sure, but the problem exists in pure RTL as well).
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/97784] Expressions evaluated as long chain instead of as tree or the like
2020-11-10 16:51 [Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like segher at gcc dot gnu.org
2020-11-10 16:58 ` [Bug rtl-optimization/97784] " pinskia at gcc dot gnu.org
2020-11-10 17:50 ` segher at gcc dot gnu.org
@ 2020-11-11 7:28 ` rguenth at gcc dot gnu.org
2020-11-11 7:34 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-11-11 7:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|rtl-optimization |target
Target| |powerpc
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
There is targetm.sched.reassociation_width which specifies how re-assocation
should make such sequence "wide". Andrew is correct that we don't do this
for any types that are TYPE_OVERFLOW_UNDEFINED.
And powerpc has
static int
rs6000_reassociation_width (unsigned int opc ATTRIBUTE_UNUSED,
machine_mode mode)
{
switch (rs6000_tune)
{
case PROCESSOR_POWER8:
case PROCESSOR_POWER9:
case PROCESSOR_POWER10:
if (DECIMAL_FLOAT_MODE_P (mode))
return 1;
if (VECTOR_MODE_P (mode))
return 4;
if (INTEGRAL_MODE_P (mode))
return 1;
thus you get width 1 which means a linear chain (even if the user wrote
a tree).
Note RTL doesn't do any such thing like re-assocation (I guess in principle
scheduling could, and that's the only place where it would make sense
on RTL).
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/97784] Expressions evaluated as long chain instead of as tree or the like
2020-11-10 16:51 [Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like segher at gcc dot gnu.org
` (2 preceding siblings ...)
2020-11-11 7:28 ` [Bug target/97784] " rguenth at gcc dot gnu.org
@ 2020-11-11 7:34 ` rguenth at gcc dot gnu.org
2020-11-11 9:10 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-11-11 7:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Oh and for the GIMPLE side there's still my TODO item of "finishing" turning
late GIMPLE to -fwrapv as to make the last reassoc pass effective here. But
it's again a bit late for that. Hmm, I might simply try ...
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/97784] Expressions evaluated as long chain instead of as tree or the like
2020-11-10 16:51 [Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like segher at gcc dot gnu.org
` (3 preceding siblings ...)
2020-11-11 7:34 ` rguenth at gcc dot gnu.org
@ 2020-11-11 9:10 ` rguenth at gcc dot gnu.org
2020-11-11 20:32 ` segher at gcc dot gnu.org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-11-11 9:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 49544
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49544&action=edit
patch doing -fwrapv late
One piece of the approach would move reassoc after the last VRP pass:
diff --git a/gcc/passes.def b/gcc/passes.def
index c68231287b6..872511442f1 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -317,7 +317,6 @@ along with GCC; see the file COPYING3. If not see
NEXT_PASS (pass_lower_vector_ssa);
NEXT_PASS (pass_lower_switch);
NEXT_PASS (pass_cse_reciprocals);
- NEXT_PASS (pass_reassoc, false /* insert_powi_p */);
NEXT_PASS (pass_strength_reduction);
NEXT_PASS (pass_split_paths);
NEXT_PASS (pass_tracer);
@@ -332,6 +331,8 @@ along with GCC; see the file COPYING3. If not see
/* Threading can leave many const/copy propagations in the IL.
Clean them up. */
NEXT_PASS (pass_copy_prop);
+ NEXT_PASS (pass_reassoc, false /* insert_powi_p */);
NEXT_PASS (pass_warn_restrict);
NEXT_PASS (pass_dse);
NEXT_PASS (pass_cd_dce);
that seems to at least regress
FAIL: gcc.dg/tree-ssa/pr96480.c scan-tree-dump optimized " = _[0-9]* <= 3;"
maybe also some Wstringop-overflow.c diagnostics.
Now, altering a flag is a bit awkward since we have to restore it somewhere
as 'flag_wrapv' is global state also affecting other functions not yet in
late state. A cleaner approach would be to move flag_wrapv (& friends)
to struct function fully (and change all users) much like we did for
can_throw_non_call_exceptions & flag_non_call_exceptions. Maybe it's not
too bad (not so many users of flag_wrapv), but well - have coded the "ugly"
variant here.
We also can't alter flag_wrapv in case flag_trapv is set, obviously.
It might be interesting to do some more pass shuffling here, eventually
moving VRP a bit earlier. We're doing a bit much threading and CSE late
nowadays. It's also not entirely clear what late passes actually benefit
from undefined overflow (besides VRP, that is).
Bootstrapped / tested on x86_64-unknown-linux-gnu.
FAIL: gcc.dg/pr64434.c scan-rtl-dump-times expand "Swap operands" 1
FAIL: gcc.dg/tree-ssa/phi-opt-15.c scan-tree-dump-not optimized "ABS"
FAIL: gcc.dg/tree-ssa/pr44133.c (test for excess errors)
FAIL: gcc.dg/tree-ssa/pr92712-3.c scan-tree-dump-not optimized " =
[tv]_[0-9]*\\\\(D\\\\) \\\\* [tv]_[0-9]*\\\\(D\\\\);"
FAIL: gcc.dg/tree-ssa/pr96480.c scan-tree-dump optimized " = _[0-9]* <= 3;"
XPASS: gcc.dg/tree-ssa/reassoc-2.c scan-tree-dump-times optimized "return 0" 1
FAIL: gcc.dg/tree-ssa/slsr-10.c scan-tree-dump-times optimized " \\\\* " 1
FAIL: gcc.dg/tree-ssa/slsr-11.c scan-tree-dump-times optimized " \\\\* " 1
FAIL: gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\\\* 4" 2
FAIL: gcc.dg/tree-ssa/slsr-20.c scan-tree-dump-times optimized " \\\\* s" 1
FAIL: gcc.dg/tree-ssa/slsr-31.c scan-tree-dump-times optimized " \\\\* 2" 1
FAIL: gcc.dg/tree-ssa/slsr-32.c scan-tree-dump-times optimized " \\\\* 2" 1
FAIL: gcc.dg/tree-ssa/slsr-33.c scan-tree-dump-times optimized " \\\\* " 1
FAIL: gcc.dg/tree-ssa/slsr-34.c scan-tree-dump-times optimized " \\\\* " 1
FAIL: gcc.dg/tree-ssa/slsr-37.c scan-tree-dump-times optimized " \\\\* 2" 1
FAIL: gcc.dg/tree-ssa/slsr-38.c scan-tree-dump-times optimized " \\\\* " 1
FAIL: gcc.dg/tree-ssa/slsr-5.c scan-tree-dump-times optimized " \\\\* " 1
FAIL: gcc.dg/tree-ssa/slsr-7.c scan-tree-dump-times optimized " \\\\* " 1
FAIL: gcc.dg/tree-ssa/slsr-9.c scan-tree-dump-times optimized " \\\\* " 1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/97784] Expressions evaluated as long chain instead of as tree or the like
2020-11-10 16:51 [Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like segher at gcc dot gnu.org
` (4 preceding siblings ...)
2020-11-11 9:10 ` rguenth at gcc dot gnu.org
@ 2020-11-11 20:32 ` segher at gcc dot gnu.org
2022-10-31 6:35 ` pinskia at gcc dot gnu.org
2023-06-07 3:15 ` pinskia at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: segher at gcc dot gnu.org @ 2020-11-11 20:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784
--- Comment #6 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #3)
> There is targetm.sched.reassociation_width which specifies how re-assocation
> should make such sequence "wide".
Ah cool, thank you :-)
> Andrew is correct that we don't do this
> for any types that are TYPE_OVERFLOW_UNDEFINED.
Yes; but I see the sub-optimal behaviour for unsigned, too.
> And powerpc has
>
> static int
> rs6000_reassociation_width (unsigned int opc ATTRIBUTE_UNUSED,
> machine_mode mode)
> {
> switch (rs6000_tune)
> {
> case PROCESSOR_POWER8:
> case PROCESSOR_POWER9:
> case PROCESSOR_POWER10:
> if (DECIMAL_FLOAT_MODE_P (mode))
> return 1;
> if (VECTOR_MODE_P (mode))
> return 4;
> if (INTEGRAL_MODE_P (mode))
> return 1;
Yeah this last 1 is the problem :-)
> thus you get width 1 which means a linear chain (even if the user wrote
> a tree).
Yup.
> Note RTL doesn't do any such thing like re-assocation (I guess in principle
> scheduling could, and that's the only place where it would make sense
> on RTL).
RTL unrolling can, actually! "Variable expansion" is its horrible name
(and it makes a lot of sense there: it allows breaking a bit linear chain
into pieces).
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/97784] Expressions evaluated as long chain instead of as tree or the like
2020-11-10 16:51 [Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like segher at gcc dot gnu.org
` (5 preceding siblings ...)
2020-11-11 20:32 ` segher at gcc dot gnu.org
@ 2022-10-31 6:35 ` pinskia at gcc dot gnu.org
2023-06-07 3:15 ` pinskia at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-10-31 6:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
Keywords| |missed-optimization
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/97784] Expressions evaluated as long chain instead of as tree or the like
2020-11-10 16:51 [Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like segher at gcc dot gnu.org
` (6 preceding siblings ...)
2022-10-31 6:35 ` pinskia at gcc dot gnu.org
@ 2023-06-07 3:15 ` pinskia at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-07 3:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2023-06-07
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-06-07 3:15 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-10 16:51 [Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like segher at gcc dot gnu.org
2020-11-10 16:58 ` [Bug rtl-optimization/97784] " pinskia at gcc dot gnu.org
2020-11-10 17:50 ` segher at gcc dot gnu.org
2020-11-11 7:28 ` [Bug target/97784] " rguenth at gcc dot gnu.org
2020-11-11 7:34 ` rguenth at gcc dot gnu.org
2020-11-11 9:10 ` rguenth at gcc dot gnu.org
2020-11-11 20:32 ` segher at gcc dot gnu.org
2022-10-31 6:35 ` pinskia at gcc dot gnu.org
2023-06-07 3:15 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).