public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/54394] New: fatigue2 -flto run time regression
@ 2012-08-28 22:32 jamborm at gcc dot gnu.org
2012-08-29 15:47 ` [Bug middle-end/54394] [4.8 Regression] " dominiq at lps dot ens.fr
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: jamborm at gcc dot gnu.org @ 2012-08-28 22:32 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54394
Bug #: 54394
Summary: fatigue2 -flto run time regression
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: jamborm@gcc.gnu.org
CC: rguenth@gcc.gnu.org
Host: x86_64-linux-gnu
Target: x86_64-linux-gnu
Revision 190346 caused a large run time regression of fatigue2
polyhedron benchmark when run with -Ofast -flto. On a x86_64-linux
box, the run time went from 150 seconds to 215 seconds and there is a
similar percentage increase on my i686-linux desktop.
The commit leading to that revision is:
2012-08-13 Richard Guenther <rguenther@suse.de>
* basic-block.h (struct basic_block): Remove loop_depth
member, move flags and index members next to each other.
* cfgloop.h (bb_loop_depth): New inline function.
* cfghooks.c (split_block): Do not set loop_depth.
(duplicate_block): Likewise.
* cfgloop.c (flow_loop_nodes_find): Likewise.
(flow_loops_find): Likewise.
(add_bb_to_loop): Likewise.
(remove_bb_from_loops): Likewise.
* cfgrtl.c (force_nonfallthru_and_redirect): Likewise.
* gimple-streamer-in.c (input_bb): Do not stream loop_depth.
* gimple-streamer-out.c (output_bb): Likewise.
* bt-load.c: Include cfgloop.h.
(migrate_btr_defs): Use bb_loop_depth.
* cfg.c (dump_bb_info): Likewise.
* final.c (compute_alignments): Likewise.
* ira.c (update_equiv_regs): Likewise.
* tree-ssa-copy.c (init_copy_prop): Likewise.
* tree-ssa-dom.c (loop_depth_of_name): Likewise.
* tree-ssa-forwprop.c: Include cfgloop.h.
(forward_propagate_addr_expr): Use bb_loop_depth.
* tree-ssa-pre.c (insert_into_preds_of_block): Likewise.
* tree-ssa-sink.c (select_best_block): Likewise.
* ipa-inline-analysis.c: Include cfgloop.h.
(estimate_function_body_sizes): Use bb_loop_depth.
* Makefile.in (tree-ssa-forwprop.o): Depend on $(CFGLOOP_H).
(ipa-inline-analysis.o): Likewise.
(bt-load.o): Likewise.
* gcc.dg/tree-prof/update-loopch.c: Adjust.
I believe the patch was not supposed to alter compiler output in any
(significant) way. However, inlining decisions are different (file 1
is the dump before the patch, file 2 with it):
In file 1: extra inlining into function MAIN__.2477/17
Function __computer_time_m_MOD_computer_time/13 inlined 1 times (as opposed
to 0 times)
Function __perdida_m_MOD_perdida/16 inlined 1 times (as opposed to 0 times)
In file 2: extra inlining into function MAIN__.2477/17
Function __free_input_MOD_convert_lower_case/9 inlined 1 times (as opposed
to 0 times)
Function __free_input_MOD_convert_lower_case.part.2.2390/62 inlined 1 times
(as opposed to 0 times)
Function __read_input_m_MOD_read_input/12 inlined 1 times (as opposed to 0
times)
In file 2: extra un-inlined function __perdida_m_MOD_perdida/16
Callers: 1, Callees: 27, Inlinees: 0
In file 1: extra un-inlined function
__read_input_m_MOD_read_input.constprop.0/122
Originally a clone of __read_input_m_MOD_read_input/12
Callers: 1, Callees: 530, Inlinees: 22
At the same time this does not seem to be an LTO issue because the
inline dump of the compilation (as opposed to linking) before the
patch contains lines:
__perdida_m_MOD_perdida/9 function not considered for inlining
loop depth: 2 freq:53666 size:21 time: 30 callee size: 0 stack: 0
which the patch changes to:
__perdida_m_MOD_perdida/9 function not considered for inlining
loop depth: 0 freq:53666 size:21 time: 30 callee size: 0 stack: 0
LTO only makes the heuristics inline perdida as a function called just
once. Loop depth 0 makes the candidate look not beneficial/cold even
when we know there are no other callees.
Loop depth is zero because at the time of inlining analysis, the
bb->loop_father is NULL. So it seems we need to compute loops at the
beginning of inline summary generation?
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/54394] [4.8 Regression] fatigue2 -flto run time regression
2012-08-28 22:32 [Bug middle-end/54394] New: fatigue2 -flto run time regression jamborm at gcc dot gnu.org
@ 2012-08-29 15:47 ` dominiq at lps dot ens.fr
2012-08-29 18:28 ` jamborm at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-08-29 15:47 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54394
Dominique d'Humieres <dominiq at lps dot ens.fr> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|fatigue2 -flto run time |[4.8 Regression] fatigue2
|regression |-flto run time regression
--- Comment #1 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-08-29 15:46:58 UTC ---
I see something similar on x86_64-apple-darwin10, but with -fwhole-program and
not with -flto:
i.e., with -Ofast before or after r190346 the run time is ~150s and does not
change with -flto.
At r190345 and before, the run time with -fwhole-program is ~100s (and ~56s
with --param max-inline-insns-auto=98 -funroll-loops), while it is ~142s at
r190346 and after.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/54394] [4.8 Regression] fatigue2 -flto run time regression
2012-08-28 22:32 [Bug middle-end/54394] New: fatigue2 -flto run time regression jamborm at gcc dot gnu.org
2012-08-29 15:47 ` [Bug middle-end/54394] [4.8 Regression] " dominiq at lps dot ens.fr
@ 2012-08-29 18:28 ` jamborm at gcc dot gnu.org
2012-08-29 21:31 ` dominiq at lps dot ens.fr
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: jamborm at gcc dot gnu.org @ 2012-08-29 18:28 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54394
Martin Jambor <jamborm at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |ASSIGNED
URL| |http://gcc.gnu.org/ml/gcc-p
| |atches/2012-08/msg01991.htm
| |l
Last reconfirmed| |2012-08-29
AssignedTo|unassigned at gcc dot |jamborm at gcc dot gnu.org
|gnu.org |
Ever Confirmed|0 |1
--- Comment #2 from Martin Jambor <jamborm at gcc dot gnu.org> 2012-08-29 18:28:41 UTC ---
I have posted a fix to the mailing list:
http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01991.html
You can try whether it fixes your regression too.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/54394] [4.8 Regression] fatigue2 -flto run time regression
2012-08-28 22:32 [Bug middle-end/54394] New: fatigue2 -flto run time regression jamborm at gcc dot gnu.org
2012-08-29 15:47 ` [Bug middle-end/54394] [4.8 Regression] " dominiq at lps dot ens.fr
2012-08-29 18:28 ` jamborm at gcc dot gnu.org
@ 2012-08-29 21:31 ` dominiq at lps dot ens.fr
2012-08-30 15:33 ` jamborm at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-08-29 21:31 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54394
--- Comment #3 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-08-29 21:30:45 UTC ---
> You can try whether it fixes your regression too.
Yes, it does. Thanks.
Did you check if you get the same run time with -flto and -fwhole-program? If
yes, it would probably mean that -flto is not working as it should on darwin.
How could I check that?
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/54394] [4.8 Regression] fatigue2 -flto run time regression
2012-08-28 22:32 [Bug middle-end/54394] New: fatigue2 -flto run time regression jamborm at gcc dot gnu.org
` (2 preceding siblings ...)
2012-08-29 21:31 ` dominiq at lps dot ens.fr
@ 2012-08-30 15:33 ` jamborm at gcc dot gnu.org
2012-08-31 13:13 ` jamborm at gcc dot gnu.org
2012-08-31 13:27 ` jamborm at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: jamborm at gcc dot gnu.org @ 2012-08-30 15:33 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54394
--- Comment #4 from Martin Jambor <jamborm at gcc dot gnu.org> 2012-08-30 15:32:53 UTC ---
(In reply to comment #3)
> > You can try whether it fixes your regression too.
>
> Yes, it does. Thanks.
Great, thanks.
>
> Did you check if you get the same run time with -flto and -fwhole-program? If
> yes, it would probably mean that -flto is not working as it should on darwin.
> How could I check that?
With the patch, -fwhole-program is only 1% slower than -flto on my
i686 desktop (the x86_64 machine is currently running a bootstrap).
-flto can provide a "better-than-fwhole-program" when you use a linker
plugin. I do not know what is the status of that on darwin (whether
it works at all or what minimum ld version you need, etc.), you'll
have to search yourself.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/54394] [4.8 Regression] fatigue2 -flto run time regression
2012-08-28 22:32 [Bug middle-end/54394] New: fatigue2 -flto run time regression jamborm at gcc dot gnu.org
` (3 preceding siblings ...)
2012-08-30 15:33 ` jamborm at gcc dot gnu.org
@ 2012-08-31 13:13 ` jamborm at gcc dot gnu.org
2012-08-31 13:27 ` jamborm at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: jamborm at gcc dot gnu.org @ 2012-08-31 13:13 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54394
--- Comment #5 from Martin Jambor <jamborm at gcc dot gnu.org> 2012-08-31 13:13:09 UTC ---
Author: jamborm
Date: Fri Aug 31 13:13:03 2012
New Revision: 190831
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190831
Log:
2012-08-31 Martin Jambor <mjambor@suse.cz>
PR middle-end/54394
* ipa-inline-analysis.c (estimate_function_body_sizes): Compute
dominance info and loops whenever optimizing.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/ipa-inline-analysis.c
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/54394] [4.8 Regression] fatigue2 -flto run time regression
2012-08-28 22:32 [Bug middle-end/54394] New: fatigue2 -flto run time regression jamborm at gcc dot gnu.org
` (4 preceding siblings ...)
2012-08-31 13:13 ` jamborm at gcc dot gnu.org
@ 2012-08-31 13:27 ` jamborm at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: jamborm at gcc dot gnu.org @ 2012-08-31 13:27 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54394
Martin Jambor <jamborm at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
--- Comment #6 from Martin Jambor <jamborm at gcc dot gnu.org> 2012-08-31 13:27:08 UTC ---
Fixed.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-08-31 13:27 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-28 22:32 [Bug middle-end/54394] New: fatigue2 -flto run time regression jamborm at gcc dot gnu.org
2012-08-29 15:47 ` [Bug middle-end/54394] [4.8 Regression] " dominiq at lps dot ens.fr
2012-08-29 18:28 ` jamborm at gcc dot gnu.org
2012-08-29 21:31 ` dominiq at lps dot ens.fr
2012-08-30 15:33 ` jamborm at gcc dot gnu.org
2012-08-31 13:13 ` jamborm at gcc dot gnu.org
2012-08-31 13:27 ` jamborm at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).