* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
@ 2008-12-10 15:27 ` jv244 at cam dot ac dot uk
2008-12-10 15:41 ` [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, " rguenth at gcc dot gnu dot org
` (49 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-10 15:27 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from jv244 at cam dot ac dot uk 2008-12-10 15:25 -------
Created an attachment (id=16873)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16873&action=view)
testcase
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
2008-12-10 15:27 ` [Bug middle-end/38474] " jv244 at cam dot ac dot uk
@ 2008-12-10 15:41 ` rguenth at gcc dot gnu dot org
2008-12-10 16:14 ` jv244 at cam dot ac dot uk
` (48 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-12-10 15:41 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from rguenth at gcc dot gnu dot org 2008-12-10 15:39 -------
Confirmed. 4.3 is worse (I ran out of memory).
Probably the FE presents us with sth funny.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Keywords| |memory-hog
Last reconfirmed|0000-00-00 00:00:00 |2008-12-10 15:39:38
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
2008-12-10 15:27 ` [Bug middle-end/38474] " jv244 at cam dot ac dot uk
2008-12-10 15:41 ` [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, " rguenth at gcc dot gnu dot org
@ 2008-12-10 16:14 ` jv244 at cam dot ac dot uk
2008-12-10 16:58 ` rguenth at gcc dot gnu dot org
` (47 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-10 16:14 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from jv244 at cam dot ac dot uk 2008-12-10 16:13 -------
(In reply to comment #2)
> Confirmed. 4.3 is worse (I ran out of memory).
>
> Probably the FE presents us with sth funny.
>
actually, I just got a timing report from 4.3 [4.3.1 20080507 (prerelease)
[gcc-4_3-branch revision 135036]] (on a different machine, but with roughly the
same clock speed, and plenty of RAM):
Execution times (seconds)
garbage collection : 1.05 ( 0%) usr 0.00 ( 0%) sys 1.05 ( 0%) wall
0 kB ( 0%) ggc
cfg cleanup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
trivially dead code : 0.70 ( 0%) usr 0.00 ( 0%) sys 0.70 ( 0%) wall
0 kB ( 0%) ggc
df live regs : 0.96 ( 0%) usr 0.00 ( 0%) sys 0.96 ( 0%) wall
0 kB ( 0%) ggc
df reg dead/unused notes: 1.11 ( 0%) usr 0.00 ( 0%) sys 1.12 ( 0%) wall
25889 kB ( 4%) ggc
register information : 0.90 ( 0%) usr 0.00 ( 0%) sys 0.89 ( 0%) wall
0 kB ( 0%) ggc
alias analysis : 0.82 ( 0%) usr 0.00 ( 0%) sys 0.83 ( 0%) wall
8335 kB ( 1%) ggc
rebuild jump labels : 1.10 ( 0%) usr 0.00 ( 0%) sys 1.10 ( 0%) wall
0 kB ( 0%) ggc
parser : 3.02 ( 0%) usr 0.12 ( 1%) sys 3.96 ( 0%) wall
75960 kB (12%) ggc
inline heuristics :1862.97 (65%) usr 4.94 (59%) sys2078.10 (65%) wall
1 kB ( 0%) ggc
tree gimplify : 0.48 ( 0%) usr 0.00 ( 0%) sys 0.47 ( 0%) wall
3446 kB ( 1%) ggc
tree eh : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree CFG construction : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
7151 kB ( 1%) ggc
expand : 967.65 (34%) usr 2.96 (35%) sys1102.03 (34%) wall
357297 kB (54%) ggc
local alloc : 5.22 ( 0%) usr 0.08 ( 1%) sys 5.29 ( 0%) wall
8652 kB ( 1%) ggc
global alloc : 12.28 ( 0%) usr 0.27 ( 3%) sys 12.59 ( 0%) wall
163884 kB (25%) ggc
thread pro- & epilogue: 0.74 ( 0%) usr 0.00 ( 0%) sys 0.75 ( 0%) wall
172 kB ( 0%) ggc
final : 2.91 ( 0%) usr 0.05 ( 1%) sys 2.92 ( 0%) wall
541 kB ( 0%) ggc
symout : 0.06 ( 0%) usr 0.01 ( 0%) sys 0.07 ( 0%) wall
5690 kB ( 1%) ggc
TOTAL :2862.03 8.43 3212.90
657217 kB
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (2 preceding siblings ...)
2008-12-10 16:14 ` jv244 at cam dot ac dot uk
@ 2008-12-10 16:58 ` rguenth at gcc dot gnu dot org
2008-12-10 22:35 ` jv244 at cam dot ac dot uk
` (46 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-12-10 16:58 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from rguenth at gcc dot gnu dot org 2008-12-10 16:57 -------
Could you capture the memory requirements on the 4.3 branch?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (3 preceding siblings ...)
2008-12-10 16:58 ` rguenth at gcc dot gnu dot org
@ 2008-12-10 22:35 ` jv244 at cam dot ac dot uk
2008-12-10 22:49 ` rguenth at gcc dot gnu dot org
` (45 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-10 22:35 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from jv244 at cam dot ac dot uk 2008-12-10 22:34 -------
(In reply to comment #4)
> Could you capture the memory requirements on the 4.3 branch?
I watched top (for 4.3.1), but can't recall anything more than 3Gb. It's a bit
boring to watch top for 45min.... any better approach?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (4 preceding siblings ...)
2008-12-10 22:35 ` jv244 at cam dot ac dot uk
@ 2008-12-10 22:49 ` rguenth at gcc dot gnu dot org
2008-12-10 22:58 ` jv244 at cam dot ac dot uk
` (44 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-12-10 22:49 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from rguenth at gcc dot gnu dot org 2008-12-10 22:48 -------
Created an attachment (id=16881)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16881&action=view)
memory measurement tool
Of course! Try the attached with just
~/bin/maxmem2.sh gfortan ...
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (5 preceding siblings ...)
2008-12-10 22:49 ` rguenth at gcc dot gnu dot org
@ 2008-12-10 22:58 ` jv244 at cam dot ac dot uk
2008-12-11 8:28 ` jv244 at cam dot ac dot uk
` (43 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-10 22:58 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from jv244 at cam dot ac dot uk 2008-12-10 22:57 -------
(In reply to comment #6)
> Created an attachment (id=16881)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16881&action=view) [edit]
> memory measurement tool
>
> Of course! Try the attached with just
>
> ~/bin/maxmem2.sh gfortan ...
>
ugh how intuitive... but very useful. Will try to run it tomorrow.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (6 preceding siblings ...)
2008-12-10 22:58 ` jv244 at cam dot ac dot uk
@ 2008-12-11 8:28 ` jv244 at cam dot ac dot uk
2008-12-11 11:35 ` rguenth at gcc dot gnu dot org
` (42 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-11 8:28 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from jv244 at cam dot ac dot uk 2008-12-11 08:27 -------
(In reply to comment #6)
> Created an attachment (id=16881)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16881&action=view) [edit]
> memory measurement tool
>
> Of course! Try the attached with just
>
> ~/bin/maxmem2.sh gfortan ...
>
Hmmm. So this is what is returned:
4.3.3 is GNU Fortran (GCC) 4.3.3 20080912 (prerelease)
trunk is NU Fortran (GCC) 4.4.0 20081206 (experimental) [trunk revision 142525]
4.3.3: 899675 kB (and about 33min)
trunk: 1145308 kB (and about 45min).
this is on the same machine, so times can be compared (module enable
checking?).
However, for the memory usage, top (oh no) showed 2.3-2.5Gb, which is quite
different from what the script returns?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (7 preceding siblings ...)
2008-12-11 8:28 ` jv244 at cam dot ac dot uk
@ 2008-12-11 11:35 ` rguenth at gcc dot gnu dot org
2008-12-11 12:04 ` jv244 at cam dot ac dot uk
` (41 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-12-11 11:35 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from rguenth at gcc dot gnu dot org 2008-12-11 11:33 -------
The script only has received testing on linux systems, so if you are running
somewhere else it is likely that either the regexps do not match or that
you require different/additional syscalls to be traced. It's not perfect, but
it works reliably for me.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (8 preceding siblings ...)
2008-12-11 11:35 ` rguenth at gcc dot gnu dot org
@ 2008-12-11 12:04 ` jv244 at cam dot ac dot uk
2008-12-15 19:39 ` jv244 at cam dot ac dot uk
` (40 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-11 12:04 UTC (permalink / raw)
To: gcc-bugs
------- Comment #10 from jv244 at cam dot ac dot uk 2008-12-11 12:02 -------
(In reply to comment #9)
> The script only has received testing on linux systems, so if you are running
> somewhere else it is likely that either the regexps do not match or that
> you require different/additional syscalls to be traced. It's not perfect, but
> it works reliably for me.
>
no, it is a linux box (actually SUSE as far as I can tell,
> uname -a
Linux pcihopt3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 x86_64
x86_64 x86_64 GNU/Linux
> cat /etc/SuSE-release
SUSE Linux Enterprise Server 10 (x86_64)
VERSION = 10
).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (9 preceding siblings ...)
2008-12-11 12:04 ` jv244 at cam dot ac dot uk
@ 2008-12-15 19:39 ` jv244 at cam dot ac dot uk
2008-12-15 21:19 ` steven at gcc dot gnu dot org
` (39 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-15 19:39 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from jv244 at cam dot ac dot uk 2008-12-15 19:38 -------
as this file is included in a project compiled normally with '-O3 -march=native
-funroll-loops' the timing in that case is also important. As I'm finding out,
this becomes unworkable (>6h, and still compiling).
Looking at -fdump-tree-original, the overloaded operators (+,-,..) expand to
function call, leading to a subroutine which contains 73000 function calls. So,
likely that some stuff is scaling at least quadratically wrt to this parameter.
--
jv244 at cam dot ac dot uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |critical
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (10 preceding siblings ...)
2008-12-15 19:39 ` jv244 at cam dot ac dot uk
@ 2008-12-15 21:19 ` steven at gcc dot gnu dot org
2008-12-15 21:29 ` steven at gcc dot gnu dot org
` (38 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-15 21:19 UTC (permalink / raw)
To: gcc-bugs
------- Comment #12 from steven at gcc dot gnu dot org 2008-12-15 21:17 -------
One of the bottlenecks seems to be find_temp_slot_from_address.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (11 preceding siblings ...)
2008-12-15 21:19 ` steven at gcc dot gnu dot org
@ 2008-12-15 21:29 ` steven at gcc dot gnu dot org
2008-12-15 21:54 ` steven at gcc dot gnu dot org
` (37 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-15 21:29 UTC (permalink / raw)
To: gcc-bugs
------- Comment #13 from steven at gcc dot gnu dot org 2008-12-15 21:27 -------
OK, to elaborate: I'm playing with this test case on ia64-linux, and I reduced
the test case by some 8000 lines to make it compilable at all. With this 8000
lines less, it actually spends more time for me in "expand", in the function
"find_temp_slot_from_address (rtx x)". It spends all of its time...
for (i = max_slot_level (); i >= 0; i--)
for (p = *temp_slots_at_level (i); p; p = p->next)
{
if (XEXP (p->slot, 0) == x
|| p->address == x
|| (GET_CODE (x) == PLUS
&& XEXP (x, 0) == virtual_stack_vars_rtx
&& GET_CODE (XEXP (x, 1)) == CONST_INT
&& INTVAL (XEXP (x, 1)) >= p->base_offset
&& INTVAL (XEXP (x, 1)) < p->base_offset + p->full_size))
return p;
else if (p->address != 0 && GET_CODE (p->address) == EXPR_LIST)
for (next = p->address; next; next = XEXP (next, 1))
if (XEXP (next, 0) == x) /* ...here in this loop... */
return p;
in the "for (next = p->address; ...)" loop. This list in p->address is actually
several thousand items long and it is traversed many times:
traversals ~ max_slot_level()*temp_slots_at_level(i)*list length of p->address
which is, at best, cubic behavior.
--
steven at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed|2008-12-10 15:39:38 |2008-12-15 21:27:40
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (12 preceding siblings ...)
2008-12-15 21:29 ` steven at gcc dot gnu dot org
@ 2008-12-15 21:54 ` steven at gcc dot gnu dot org
2008-12-15 21:56 ` steven at gcc dot gnu dot org
` (36 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-15 21:54 UTC (permalink / raw)
To: gcc-bugs
------- Comment #14 from steven at gcc dot gnu dot org 2008-12-15 21:53 -------
For the inline heuristics, almost all time is also spent in stack slot related
stuff. The culprit is estimate_stack_frame_size (or actually,
add_alias_set__conflicts) in cfgexpand.c.
(What are we doing in cfgexpand anyway, for inlining?!?)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (13 preceding siblings ...)
2008-12-15 21:54 ` steven at gcc dot gnu dot org
@ 2008-12-15 21:56 ` steven at gcc dot gnu dot org
2008-12-15 21:57 ` steven at gcc dot gnu dot org
` (35 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-15 21:56 UTC (permalink / raw)
To: gcc-bugs
------- Comment #15 from steven at gcc dot gnu dot org 2008-12-15 21:55 -------
>From cfgexpand.c:
static void
add_alias_set_conflicts (void)
{
size_t i, j, n = stack_vars_num;
for (i = 0; i < n; ++i)
{
tree type_i = TREE_TYPE (stack_vars[i].decl);
bool aggr_i = AGGREGATE_TYPE_P (type_i);
bool contains_union;
contains_union = aggregate_contains_union_type (type_i);
for (j = 0; j < i; ++j)
{
Classic example of quadratic algorithm...
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (14 preceding siblings ...)
2008-12-15 21:56 ` steven at gcc dot gnu dot org
@ 2008-12-15 21:57 ` steven at gcc dot gnu dot org
2008-12-16 7:52 ` jv244 at cam dot ac dot uk
` (34 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-15 21:57 UTC (permalink / raw)
To: gcc-bugs
------- Comment #16 from steven at gcc dot gnu dot org 2008-12-15 21:56 -------
Oh, and FWIW, for yukawa_gn_full, stack_vars_num == 67551 for me.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (15 preceding siblings ...)
2008-12-15 21:57 ` steven at gcc dot gnu dot org
@ 2008-12-16 7:52 ` jv244 at cam dot ac dot uk
2008-12-16 11:59 ` jv244 at cam dot ac dot uk
` (33 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-16 7:52 UTC (permalink / raw)
To: gcc-bugs
------- Comment #17 from jv244 at cam dot ac dot uk 2008-12-16 07:51 -------
(In reply to comment #16)
> Oh, and FWIW, for yukawa_gn_full, stack_vars_num == 67551 for me.
Thanks for the analysis. Detailed enough to have me peak in the gcc code for
once.
This would mean that the array stack_vars_conflict takes about 2.2Gb, since it
is O(stack_vars_num**2/2) (assuming a bool is 8bits, quite consistent with
what we see).
There is already a function (defer_stack_allocation) that decides to give up
due to 'the quadratic problem'. Maybe gcc should use some drastic short-cut,
including not allocating the stack_vars_conflict array, as soon as ~10000 stack
variables are detected ?
BTW, the -O3 compilation is still running (for 17h now).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (16 preceding siblings ...)
2008-12-16 7:52 ` jv244 at cam dot ac dot uk
@ 2008-12-16 11:59 ` jv244 at cam dot ac dot uk
2008-12-16 12:46 ` steven at gcc dot gnu dot org
` (32 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-16 11:59 UTC (permalink / raw)
To: gcc-bugs
------- Comment #18 from jv244 at cam dot ac dot uk 2008-12-16 11:58 -------
(In reply to comment #17)
> BTW, the -O3 compilation is still running (for 17h now).
finished successfully after 23h...
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (17 preceding siblings ...)
2008-12-16 11:59 ` jv244 at cam dot ac dot uk
@ 2008-12-16 12:46 ` steven at gcc dot gnu dot org
2008-12-16 12:48 ` jv244 at cam dot ac dot uk
` (31 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-16 12:46 UTC (permalink / raw)
To: gcc-bugs
------- Comment #19 from steven at gcc dot gnu dot org 2008-12-16 12:45 -------
Re. comment #18, I'd say "brilliant" if it wasn't such a poor performance :-)
Did you manage to get a time report out of that run?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (18 preceding siblings ...)
2008-12-16 12:46 ` steven at gcc dot gnu dot org
@ 2008-12-16 12:48 ` jv244 at cam dot ac dot uk
2008-12-16 12:50 ` jv244 at cam dot ac dot uk
` (30 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-16 12:48 UTC (permalink / raw)
To: gcc-bugs
------- Comment #20 from jv244 at cam dot ac dot uk 2008-12-16 12:47 -------
(In reply to comment #19)
> Re. comment #18, I'd say "brilliant" if it wasn't such a poor performance :-)
I agree... quite an achievement not to crash in such a case.
> Did you manage to get a time report out of that run?
no... obviously I can rerun this (numbers tomorrow, of course)?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (19 preceding siblings ...)
2008-12-16 12:48 ` jv244 at cam dot ac dot uk
@ 2008-12-16 12:50 ` jv244 at cam dot ac dot uk
2008-12-16 13:43 ` steven at gcc dot gnu dot org
` (29 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-16 12:50 UTC (permalink / raw)
To: gcc-bugs
------- Comment #21 from jv244 at cam dot ac dot uk 2008-12-16 12:48 -------
(In reply to comment #16)
> Oh, and FWIW, for yukawa_gn_full, stack_vars_num == 67551 for me.
btw, that routine only has 3800 user variables, the rests are FE generated
temporaries (which should have a limited lifetime).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (20 preceding siblings ...)
2008-12-16 12:50 ` jv244 at cam dot ac dot uk
@ 2008-12-16 13:43 ` steven at gcc dot gnu dot org
2008-12-16 14:19 ` jv244 at cam dot ac dot uk
` (28 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-16 13:43 UTC (permalink / raw)
To: gcc-bugs
------- Comment #22 from steven at gcc dot gnu dot org 2008-12-16 13:41 -------
We may be better off with a slightly reduced test case for the -O3 report.
It's not difficult to cut out ~8000 lines (like I did yesterday) and still have
a huge test case (and the horendous compile times to go with that).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (21 preceding siblings ...)
2008-12-16 13:43 ` steven at gcc dot gnu dot org
@ 2008-12-16 14:19 ` jv244 at cam dot ac dot uk
2008-12-16 14:21 ` jv244 at cam dot ac dot uk
` (27 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-16 14:19 UTC (permalink / raw)
To: gcc-bugs
------- Comment #23 from jv244 at cam dot ac dot uk 2008-12-16 14:17 -------
Created an attachment (id=16913)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16913&action=view)
reduced testcase
just so we talk about the same file, I've reduced the testcase to more
managable sizes. This one compiles in about 1min at -O0. I'll attach
time-report output in a sec.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (22 preceding siblings ...)
2008-12-16 14:19 ` jv244 at cam dot ac dot uk
@ 2008-12-16 14:21 ` jv244 at cam dot ac dot uk
2008-12-16 16:19 ` jv244 at cam dot ac dot uk
` (26 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-16 14:21 UTC (permalink / raw)
To: gcc-bugs
------- Comment #24 from jv244 at cam dot ac dot uk 2008-12-16 14:20 -------
(In reply to comment #23)
reduced testcase timings at -O0 and -O3. Tree operand scan anybody?
> time gfortran -O0 -ffree-line-length-512 -c -ftime-report testcase_reduced.f90
Execution times (seconds)
garbage collection : 0.51 ( 1%) usr 0.00 ( 0%) sys 0.49 ( 1%) wall
0 kB ( 0%) ggc
callgraph construction: 0.05 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
4956 kB ( 2%) ggc
callgraph optimization: 8.13 (18%) usr 0.20 (16%) sys 8.36 (18%) wall
1280 kB ( 1%) ggc
cfg cleanup : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
CFG verifier : 0.48 ( 1%) usr 0.02 ( 2%) sys 0.46 ( 1%) wall
0 kB ( 0%) ggc
trivially dead code : 0.17 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall
0 kB ( 0%) ggc
df live regs : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
0 kB ( 0%) ggc
df reg dead/unused notes: 0.24 ( 1%) usr 0.00 ( 0%) sys 0.23 ( 0%) wall
9445 kB ( 4%) ggc
register information : 0.11 ( 0%) usr 0.01 ( 1%) sys 0.12 ( 0%) wall
0 kB ( 0%) ggc
alias analysis : 0.10 ( 0%) usr 0.01 ( 1%) sys 0.10 ( 0%) wall
4239 kB ( 2%) ggc
rebuild jump labels : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
0 kB ( 0%) ggc
parser : 1.07 ( 2%) usr 0.05 ( 4%) sys 1.12 ( 2%) wall
22673 kB ( 9%) ggc
inline heuristics : 16.30 (36%) usr 0.41 (33%) sys 16.75 (36%) wall
0 kB ( 0%) ggc
tree gimplify : 0.06 ( 0%) usr 0.01 ( 1%) sys 0.08 ( 0%) wall
6435 kB ( 3%) ggc
tree CFG construction : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
180 kB ( 0%) ggc
tree find ref. vars : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
3231 kB ( 1%) ggc
tree SSA rewrite : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
63 kB ( 0%) ggc
tree SSA other : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree operand scan : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
236 kB ( 0%) ggc
tree SSA to normal : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree SSA verifier : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
0 kB ( 0%) ggc
tree STMT verifier : 0.20 ( 0%) usr 0.01 ( 1%) sys 0.22 ( 0%) wall
0 kB ( 0%) ggc
callgraph verifier : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
0 kB ( 0%) ggc
dominance computation : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
0 kB ( 0%) ggc
expand : 10.86 (24%) usr 0.38 (30%) sys 11.26 (24%) wall
132856 kB (52%) ggc
integrated RA : 4.08 ( 9%) usr 0.05 ( 4%) sys 4.13 ( 9%) wall
4604 kB ( 2%) ggc
reload : 1.88 ( 4%) usr 0.07 ( 6%) sys 1.97 ( 4%) wall
59269 kB (23%) ggc
thread pro- & epilogue: 0.17 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall
175 kB ( 0%) ggc
final : 0.63 ( 1%) usr 0.03 ( 2%) sys 0.66 ( 1%) wall
3790 kB ( 1%) ggc
TOTAL : 45.42 1.25 46.73
253684 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.
real 0m47.298s
user 0m45.923s
sys 0m1.316s
> time gfortran -march=native -O3 -ffree-line-length-512 -c -ftime-report testcase_reduced.f90
Execution times (seconds)
garbage collection : 1.48 ( 1%) usr 0.01 ( 0%) sys 1.50 ( 1%) wall
0 kB ( 0%) ggc
callgraph construction: 0.03 ( 0%) usr 0.01 ( 0%) sys 0.05 ( 0%) wall
4955 kB ( 1%) ggc
callgraph optimization: 6.27 ( 3%) usr 0.15 ( 7%) sys 6.46 ( 4%) wall
2366 kB ( 0%) ggc
ipa cp : 0.05 ( 0%) usr 0.01 ( 0%) sys 0.06 ( 0%) wall
34 kB ( 0%) ggc
cfg cleanup : 0.01 ( 0%) usr 0.01 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
CFG verifier : 1.41 ( 1%) usr 0.00 ( 0%) sys 1.33 ( 1%) wall
0 kB ( 0%) ggc
trivially dead code : 0.62 ( 0%) usr 0.00 ( 0%) sys 0.66 ( 0%) wall
0 kB ( 0%) ggc
df reaching defs : 0.69 ( 0%) usr 0.01 ( 0%) sys 0.67 ( 0%) wall
0 kB ( 0%) ggc
df live regs : 1.86 ( 1%) usr 0.00 ( 0%) sys 1.86 ( 1%) wall
0 kB ( 0%) ggc
df live&initialized regs: 0.93 ( 1%) usr 0.00 ( 0%) sys 0.94 ( 1%) wall
0 kB ( 0%) ggc
df use-def / def-use chains: 1.33 ( 1%) usr 0.04 ( 2%) sys 1.38 ( 1%)
wall 0 kB ( 0%) ggc
df reg dead/unused notes: 0.92 ( 1%) usr 0.00 ( 0%) sys 0.96 ( 1%) wall
13469 kB ( 3%) ggc
register information : 0.44 ( 0%) usr 0.00 ( 0%) sys 0.43 ( 0%) wall
0 kB ( 0%) ggc
alias analysis : 1.05 ( 1%) usr 0.00 ( 0%) sys 1.05 ( 1%) wall
24068 kB ( 5%) ggc
register scan : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall
18 kB ( 0%) ggc
rebuild jump labels : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall
0 kB ( 0%) ggc
parser : 1.16 ( 1%) usr 0.03 ( 1%) sys 1.21 ( 1%) wall
22673 kB ( 5%) ggc
inline heuristics : 15.83 ( 9%) usr 0.40 (20%) sys 16.25 ( 9%) wall
138 kB ( 0%) ggc
integration : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
885 kB ( 0%) ggc
tree gimplify : 0.06 ( 0%) usr 0.01 ( 0%) sys 0.07 ( 0%) wall
6434 kB ( 1%) ggc
tree CFG construction : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
179 kB ( 0%) ggc
tree CFG cleanup : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
8 kB ( 0%) ggc
tree VRP : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
448 kB ( 0%) ggc
tree copy propagation : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
159 kB ( 0%) ggc
tree find ref. vars : 0.01 ( 0%) usr 0.01 ( 0%) sys 0.01 ( 0%) wall
3229 kB ( 1%) ggc
tree PTA : 1.29 ( 1%) usr 0.03 ( 1%) sys 1.34 ( 1%) wall
540 kB ( 0%) ggc
tree alias analysis : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
57 kB ( 0%) ggc
tree call clobbering : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
19 kB ( 0%) ggc
tree flow sensitive alias: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
60 kB ( 0%) ggc
tree flow insensitive alias: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%)
wall 0 kB ( 0%) ggc
tree memory partitioning: 0.09 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
0 kB ( 0%) ggc
tree SSA rewrite : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
8391 kB ( 2%) ggc
tree SSA other : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree SSA incremental : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
21 kB ( 0%) ggc
tree operand scan : 98.14 (55%) usr 0.03 ( 1%) sys 98.31 (54%) wall
4048 kB ( 1%) ggc
dominator optimization: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
73 kB ( 0%) ggc
tree CCP : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
119 kB ( 0%) ggc
tree PRE : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
62 kB ( 0%) ggc
tree FRE : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
33 kB ( 0%) ggc
tree forward propagate: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
3 kB ( 0%) ggc
tree conservative DCE : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
tree aggressive DCE : 0.02 ( 0%) usr 0.01 ( 0%) sys 0.01 ( 0%) wall
12 kB ( 0%) ggc
tree loop init : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
44 kB ( 0%) ggc
tree SSA to normal : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
4 kB ( 0%) ggc
tree rename SSA copies: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree SSA verifier : 0.95 ( 1%) usr 0.00 ( 0%) sys 0.97 ( 1%) wall
0 kB ( 0%) ggc
tree STMT verifier : 2.05 ( 1%) usr 0.04 ( 2%) sys 2.11 ( 1%) wall
0 kB ( 0%) ggc
callgraph verifier : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
0 kB ( 0%) ggc
dominance computation : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
expand : 12.80 ( 7%) usr 0.33 (16%) sys 13.09 ( 7%) wall
131225 kB (27%) ggc
lower subreg : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
forward prop : 0.91 ( 1%) usr 0.01 ( 0%) sys 0.91 ( 1%) wall
9021 kB ( 2%) ggc
CSE : 2.70 ( 2%) usr 0.01 ( 0%) sys 2.70 ( 1%) wall
8941 kB ( 2%) ggc
dead code elimination : 0.37 ( 0%) usr 0.00 ( 0%) sys 0.38 ( 0%) wall
0 kB ( 0%) ggc
dead store elim1 : 0.59 ( 0%) usr 0.02 ( 1%) sys 0.62 ( 0%) wall
13140 kB ( 3%) ggc
dead store elim2 : 0.77 ( 0%) usr 0.00 ( 0%) sys 0.76 ( 0%) wall
13219 kB ( 3%) ggc
CSE 2 : 2.04 ( 1%) usr 0.00 ( 0%) sys 2.04 ( 1%) wall
3477 kB ( 1%) ggc
combiner : 0.77 ( 0%) usr 0.00 ( 0%) sys 0.79 ( 0%) wall
6633 kB ( 1%) ggc
if-conversion : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
50 kB ( 0%) ggc
regmove : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
28 kB ( 0%) ggc
integrated RA : 9.17 ( 5%) usr 0.71 (35%) sys 9.92 ( 5%) wall
25558 kB ( 5%) ggc
reload : 3.36 ( 2%) usr 0.10 ( 5%) sys 3.47 ( 2%) wall
101799 kB (21%) ggc
reload CSE regs : 1.76 ( 1%) usr 0.00 ( 0%) sys 1.75 ( 1%) wall
27970 kB ( 6%) ggc
load CSE after reload : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
0 kB ( 0%) ggc
thread pro- & epilogue: 0.18 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall
231 kB ( 0%) ggc
peephole 2 : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall
29 kB ( 0%) ggc
rename registers : 1.03 ( 1%) usr 0.00 ( 0%) sys 1.04 ( 1%) wall
0 kB ( 0%) ggc
scheduling 2 : 3.82 ( 2%) usr 0.03 ( 1%) sys 3.82 ( 2%) wall
53812 kB (11%) ggc
machine dep reorg : 0.41 ( 0%) usr 0.00 ( 0%) sys 0.41 ( 0%) wall
0 kB ( 0%) ggc
reorder blocks : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
3 kB ( 0%) ggc
final : 0.64 ( 0%) usr 0.02 ( 1%) sys 0.65 ( 0%) wall
3824 kB ( 1%) ggc
TOTAL : 179.47 2.03 181.70
492168 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.
real 3m2.238s
user 2m59.927s
sys 0m2.128s
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (23 preceding siblings ...)
2008-12-16 14:21 ` jv244 at cam dot ac dot uk
@ 2008-12-16 16:19 ` jv244 at cam dot ac dot uk
2008-12-16 16:28 ` steven at gcc dot gnu dot org
` (25 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-16 16:19 UTC (permalink / raw)
To: gcc-bugs
------- Comment #25 from jv244 at cam dot ac dot uk 2008-12-16 16:17 -------
doing some more profiling, the -O0 problem is to a large extend due to
compute_inline_parameters and estimate_stack_frame_size. Spending 10-30min just
on estimating the stack_frame_size on something that can't be reasonably
inlined anyways seems a waste.
--
jv244 at cam dot ac dot uk changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |hubicka at gcc dot gnu dot
| |org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (24 preceding siblings ...)
2008-12-16 16:19 ` jv244 at cam dot ac dot uk
@ 2008-12-16 16:28 ` steven at gcc dot gnu dot org
2008-12-16 16:31 ` jv244 at cam dot ac dot uk
` (24 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-16 16:28 UTC (permalink / raw)
To: gcc-bugs
------- Comment #26 from steven at gcc dot gnu dot org 2008-12-16 16:26 -------
I am going to work on the -O0 problems a bit.
The operand scanner is the problem at -O3. Richi, this is one you may want to
try on the alias improvements branch, if most of the time is spent on virtual
SSA names (I haven't checked, but it's likely with so many aggregate-typed
variables).
--
steven at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|unassigned at gcc dot gnu |steven at gcc dot gnu dot
|dot org |org
Status|NEW |ASSIGNED
Last reconfirmed|2008-12-15 21:27:40 |2008-12-16 16:26:45
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (25 preceding siblings ...)
2008-12-16 16:28 ` steven at gcc dot gnu dot org
@ 2008-12-16 16:31 ` jv244 at cam dot ac dot uk
2008-12-16 19:35 ` [Bug middle-end/38474] [4.3/4.4 Regression] " pinskia at gcc dot gnu dot org
` (23 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-16 16:31 UTC (permalink / raw)
To: gcc-bugs
------- Comment #27 from jv244 at cam dot ac dot uk 2008-12-16 16:29 -------
the slow routines at -O3 are related to compute_may_aliases, at the point I
interupted the profiling, this routine had called add_virtual_operand 200M
times.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [4.3/4.4 Regression] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (26 preceding siblings ...)
2008-12-16 16:31 ` jv244 at cam dot ac dot uk
@ 2008-12-16 19:35 ` pinskia at gcc dot gnu dot org
2008-12-16 20:32 ` jv244 at cam dot ac dot uk
` (22 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-12-16 19:35 UTC (permalink / raw)
To: gcc-bugs
------- Comment #28 from pinskia at gcc dot gnu dot org 2008-12-16 19:32 -------
The stack heuristic is new for 4.3.
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|critical |normal
Summary|slow compilation at -O0 |[4.3/4.4 Regression] slow
|(callgraph optimization, |compilation at -O0
|inline heuristics, expand ) |(callgraph optimization,
| |inline heuristics, expand )
Target Milestone|--- |4.3.4
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [4.3/4.4 Regression] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (27 preceding siblings ...)
2008-12-16 19:35 ` [Bug middle-end/38474] [4.3/4.4 Regression] " pinskia at gcc dot gnu dot org
@ 2008-12-16 20:32 ` jv244 at cam dot ac dot uk
2008-12-17 6:52 ` jv244 at cam dot ac dot uk
` (21 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-16 20:32 UTC (permalink / raw)
To: gcc-bugs
--
jv244 at cam dot ac dot uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.3.4 |4.3.3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [4.3/4.4 Regression] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (28 preceding siblings ...)
2008-12-16 20:32 ` jv244 at cam dot ac dot uk
@ 2008-12-17 6:52 ` jv244 at cam dot ac dot uk
2008-12-17 7:03 ` steven at gcc dot gnu dot org
` (20 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-17 6:52 UTC (permalink / raw)
To: gcc-bugs
------- Comment #29 from jv244 at cam dot ac dot uk 2008-12-17 06:50 -------
doing the original testcase again at -O3 has been a useful exercise i think.
13.5h is spent in rename_registers, 2h in tree operand scan, ~1h in inline
heuristics, and 20min in expand. (Note that this is a 4.3 based compiler, maybe
I should redo this with a 4.4 or has nothing changed there despite ira?).
gfortran-4.3 -ffree-line-length-512 -g -fopenmp -ffree-form -D__T_C_G0
-ftime-report -c -O3 -march=native -funroll-loops mpfr_yukawa.f
Execution times (seconds)
garbage collection : 3.96 ( 0%) usr 0.00 ( 0%) sys 3.95 ( 0%) wall
0 kB ( 0%) ggc
callgraph construction: 0.42 ( 0%) usr 0.01 ( 0%) sys 0.44 ( 0%) wall
10751 kB ( 1%) ggc
callgraph optimization: 0.78 ( 0%) usr 0.02 ( 0%) sys 0.82 ( 0%) wall
14320 kB ( 1%) ggc
ipa reference : 0.17 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall
0 kB ( 0%) ggc
ipa pure const : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
cfg cleanup : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
0 kB ( 0%) ggc
trivially dead code : 5.36 ( 0%) usr 0.00 ( 0%) sys 5.38 ( 0%) wall
0 kB ( 0%) ggc
df reaching defs : 6.56 ( 0%) usr 0.07 ( 1%) sys 6.60 ( 0%) wall
0 kB ( 0%) ggc
df live regs : 16.93 ( 0%) usr 0.00 ( 0%) sys 16.96 ( 0%) wall
0 kB ( 0%) ggc
df live&initialized regs: 7.39 ( 0%) usr 0.00 ( 0%) sys 7.38 ( 0%) wall
0 kB ( 0%) ggc
df use-def / def-use chains: 14.49 ( 0%) usr 0.02 ( 0%) sys 14.52 ( 0%)
wall 0 kB ( 0%) ggc
df reg dead/unused notes: 10.94 ( 0%) usr 0.00 ( 0%) sys 10.96 ( 0%) wall
37217 kB ( 3%) ggc
register information : 4.97 ( 0%) usr 0.00 ( 0%) sys 4.97 ( 0%) wall
0 kB ( 0%) ggc
alias analysis : 8.84 ( 0%) usr 0.01 ( 0%) sys 8.86 ( 0%) wall
49164 kB ( 3%) ggc
register scan : 1.78 ( 0%) usr 0.00 ( 0%) sys 1.80 ( 0%) wall
0 kB ( 0%) ggc
rebuild jump labels : 3.93 ( 0%) usr 0.00 ( 0%) sys 3.93 ( 0%) wall
0 kB ( 0%) ggc
parser : 3.24 ( 0%) usr 0.10 ( 1%) sys 3.35 ( 0%) wall
60522 kB ( 4%) ggc
inline heuristics :2516.42 ( 4%) usr 4.20 (35%) sys2542.00 ( 4%) wall
0 kB ( 0%) ggc
tree gimplify : 0.55 ( 0%) usr 0.00 ( 0%) sys 0.54 ( 0%) wall
3453 kB ( 0%) ggc
tree eh : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree CFG construction : 0.02 ( 0%) usr 0.01 ( 0%) sys 0.03 ( 0%) wall
7185 kB ( 1%) ggc
tree CFG cleanup : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
1 kB ( 0%) ggc
tree VRP : 0.25 ( 0%) usr 0.00 ( 0%) sys 0.23 ( 0%) wall
36 kB ( 0%) ggc
tree copy propagation : 0.63 ( 0%) usr 0.00 ( 0%) sys 0.60 ( 0%) wall
6 kB ( 0%) ggc
tree find ref. vars : 0.12 ( 0%) usr 0.01 ( 0%) sys 0.12 ( 0%) wall
8202 kB ( 1%) ggc
tree PTA : 3.61 ( 0%) usr 0.07 ( 1%) sys 3.64 ( 0%) wall
96 kB ( 0%) ggc
tree alias analysis : 6.71 ( 0%) usr 0.11 ( 1%) sys 6.74 ( 0%) wall
3 kB ( 0%) ggc
tree call clobbering : 0.39 ( 0%) usr 0.00 ( 0%) sys 0.37 ( 0%) wall
0 kB ( 0%) ggc
tree flow sensitive alias: 0.11 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
9 kB ( 0%) ggc
tree flow insensitive alias: 3.54 ( 0%) usr 0.00 ( 0%) sys 3.53 ( 0%)
wall 0 kB ( 0%) ggc
tree memory partitioning: 18.88 ( 0%) usr 0.03 ( 0%) sys 18.93 ( 0%) wall
0 kB ( 0%) ggc
tree PHI insertion : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
1 kB ( 0%) ggc
tree SSA rewrite : 0.17 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall
25972 kB ( 2%) ggc
tree SSA other : 0.02 ( 0%) usr 0.03 ( 0%) sys 0.08 ( 0%) wall
0 kB ( 0%) ggc
tree SSA incremental : 0.82 ( 0%) usr 0.00 ( 0%) sys 0.82 ( 0%) wall
9 kB ( 0%) ggc
tree operand scan :6681.62 (11%) usr 0.55 ( 5%) sys6698.35 (11%) wall
19727 kB ( 1%) ggc
dominator optimization: 0.26 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%) wall
11 kB ( 0%) ggc
tree SRA : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
0 kB ( 0%) ggc
tree STORE-CCP : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall
1 kB ( 0%) ggc
tree CCP : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall
3 kB ( 0%) ggc
tree reassociation : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
2 kB ( 0%) ggc
tree PRE : 0.37 ( 0%) usr 0.00 ( 0%) sys 0.38 ( 0%) wall
2289 kB ( 0%) ggc
tree FRE : 0.38 ( 0%) usr 0.01 ( 0%) sys 0.38 ( 0%) wall
2297 kB ( 0%) ggc
tree linearize phis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree forward propagate: 0.04 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
tree conservative DCE : 0.53 ( 0%) usr 0.00 ( 0%) sys 0.56 ( 0%) wall
0 kB ( 0%) ggc
tree aggressive DCE : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
0 kB ( 0%) ggc
tree DSE : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
0 kB ( 0%) ggc
tree SSA to normal : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
3 kB ( 0%) ggc
tree rename SSA copies: 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
0 kB ( 0%) ggc
expand :1279.48 ( 2%) usr 2.74 (23%) sys1285.18 ( 2%) wall
440026 kB (31%) ggc
lower subreg : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
0 kB ( 0%) ggc
forward prop : 1.74 ( 0%) usr 0.03 ( 0%) sys 1.73 ( 0%) wall
8458 kB ( 1%) ggc
CSE : 9.02 ( 0%) usr 0.04 ( 0%) sys 9.04 ( 0%) wall
27164 kB ( 2%) ggc
dead code elimination : 3.63 ( 0%) usr 0.00 ( 0%) sys 3.62 ( 0%) wall
0 kB ( 0%) ggc
dead store elim1 : 5.64 ( 0%) usr 0.05 ( 0%) sys 5.69 ( 0%) wall
37803 kB ( 3%) ggc
dead store elim2 : 7.64 ( 0%) usr 0.01 ( 0%) sys 7.65 ( 0%) wall
38112 kB ( 3%) ggc
web : 1.38 ( 0%) usr 0.02 ( 0%) sys 1.41 ( 0%) wall
0 kB ( 0%) ggc
CSE 2 : 4.26 ( 0%) usr 0.02 ( 0%) sys 4.29 ( 0%) wall
10087 kB ( 1%) ggc
branch prediction : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
3 kB ( 0%) ggc
combiner : 5.02 ( 0%) usr 0.03 ( 0%) sys 5.06 ( 0%) wall
17823 kB ( 1%) ggc
regmove : 4.83 ( 0%) usr 0.00 ( 0%) sys 4.83 ( 0%) wall
1 kB ( 0%) ggc
local alloc : 11.33 ( 0%) usr 0.08 ( 1%) sys 11.36 ( 0%) wall
64779 kB ( 5%) ggc
global alloc : 21.86 ( 0%) usr 0.69 ( 6%) sys 22.55 ( 0%) wall
275939 kB (20%) ggc
reload CSE regs : 10.28 ( 0%) usr 0.00 ( 0%) sys 10.29 ( 0%) wall
79246 kB ( 6%) ggc
load CSE after reload : 0.84 ( 0%) usr 0.00 ( 0%) sys 0.85 ( 0%) wall
0 kB ( 0%) ggc
thread pro- & epilogue: 0.89 ( 0%) usr 0.00 ( 0%) sys 0.89 ( 0%) wall
13 kB ( 0%) ggc
peephole 2 : 1.46 ( 0%) usr 0.00 ( 0%) sys 1.46 ( 0%) wall
0 kB ( 0%) ggc
rename registers :49100.30 (82%) usr 2.30 (19%) sys49376.54 (82%) wall
16842 kB ( 1%) ggc
scheduling 2 : 21.80 ( 0%) usr 0.60 ( 5%) sys 22.39 ( 0%) wall
150933 kB (11%) ggc
machine dep reorg : 1.99 ( 0%) usr 0.00 ( 0%) sys 2.00 ( 0%) wall
0 kB ( 0%) ggc
reorder blocks : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
1 kB ( 0%) ggc
final : 3.83 ( 0%) usr 0.04 ( 0%) sys 3.88 ( 0%) wall
738 kB ( 0%) ggc
symout : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall
5418 kB ( 0%) ggc
variable tracking : 1.86 ( 0%) usr 0.02 ( 0%) sys 1.87 ( 0%) wall
17 kB ( 0%) ggc
TOTAL :59825.47 11.92 60151.75
1415015 kB
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [4.3/4.4 Regression] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (29 preceding siblings ...)
2008-12-17 6:52 ` jv244 at cam dot ac dot uk
@ 2008-12-17 7:03 ` steven at gcc dot gnu dot org
2008-12-17 8:37 ` jv244 at cam dot ac dot uk
` (19 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-17 7:03 UTC (permalink / raw)
To: gcc-bugs
------- Comment #30 from steven at gcc dot gnu dot org 2008-12-17 07:01 -------
I think redoing this with 4.4.0 would be useful, to check if new code (like
IRA) uses this kind of non-linear algorithms. But the register renaming patch
hasn't changed between 4.3 and 4.4, so I would compile with
-fno-rename-registers ;-)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [4.3/4.4 Regression] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (30 preceding siblings ...)
2008-12-17 7:03 ` steven at gcc dot gnu dot org
@ 2008-12-17 8:37 ` jv244 at cam dot ac dot uk
2008-12-17 12:59 ` jv244 at cam dot ac dot uk
` (18 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-17 8:37 UTC (permalink / raw)
To: gcc-bugs
------- Comment #31 from jv244 at cam dot ac dot uk 2008-12-17 08:36 -------
(In reply to comment #30)
> I think redoing this with 4.4.0 would be useful, to check if new code (like
> IRA) uses this kind of non-linear algorithms. But the register renaming patch
> hasn't changed between 4.3 and 4.4, so I would compile with
> -fno-rename-registers ;-)
>
thanks for the '-fno-rename-registers' trick, something like that is not
obvious to beginners. I tried with 4.4, but virtual memory usage went to 9Gb,
so I'll have to run it on a larger machine. I can't remember seeing this with
4.3, but I'll test.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [4.3/4.4 Regression] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (31 preceding siblings ...)
2008-12-17 8:37 ` jv244 at cam dot ac dot uk
@ 2008-12-17 12:59 ` jv244 at cam dot ac dot uk
2008-12-17 19:42 ` steven at gcc dot gnu dot org
` (17 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-17 12:59 UTC (permalink / raw)
To: gcc-bugs
------- Comment #32 from jv244 at cam dot ac dot uk 2008-12-17 12:58 -------
The 9.3Gb for 4.4 is confirmed. I attached gdb to the process at that point
(after about 70min of compilation), and that is the backtrace:
#0 0x0000000000b48a9a in bucket_allocno_compare_func (v1p=0x7fffe3592d98,
v2p=0x7fffe3592d90)
at /data04/vondele/gcc_trunk/gcc/gcc/ira-color.c:746
#1 0x0000000000b4abbd in push_allocno_to_stack (allocno=<value optimized out>)
at /data04/vondele/gcc_trunk/gcc/gcc/ira-color.c:803
#2 0x0000000000b4e4d2 in color_allocnos () at
/data04/vondele/gcc_trunk/gcc/gcc/ira-color.c:989
#3 0x0000000000b4f614 in color_pass (loop_tree_node=<value optimized out>)
at /data04/vondele/gcc_trunk/gcc/gcc/ira-color.c:1936
#4 0x0000000000b3ef2a in ira_traverse_loop_tree (bb_p=0 '\0',
loop_node=0x7fffe3592d90,
preorder_func=0x337d54f8, postorder_func=0) at
/data04/vondele/gcc_trunk/gcc/gcc/ira-build.c:1381
#5 0x0000000000b4a320 in ira_color () at
/data04/vondele/gcc_trunk/gcc/gcc/ira-color.c:2080
#6 0x0000000000b3d7eb in rest_of_handle_ira () at
/data04/vondele/gcc_trunk/gcc/gcc/ira.c:1926
#7 0x000000000069e48d in execute_one_pass (pass=0x10980e0) at
/data04/vondele/gcc_trunk/gcc/gcc/passes.c:1279
#8 0x000000000069e6d5 in execute_pass_list (pass=0x10980e0) at
/data04/vondele/gcc_trunk/gcc/gcc/passes.c:1328
#9 0x000000000069e6ed in execute_pass_list (pass=0x1093060) at
/data04/vondele/gcc_trunk/gcc/gcc/passes.c:1329
#10 0x0000000000794ddc in tree_rest_of_compilation (fndecl=0x7f45da99eb00)
at /data04/vondele/gcc_trunk/gcc/gcc/tree-optimize.c:419
apart from this, 4.4 is actually a bit faster. This is the time report:
gfortran -ffree-line-length-512 -g -ffree-form -ftime-report -c -O3
-march=native -funroll-loops -fno-rename-registers testcase.f90
Execution times (seconds)
garbage collection : 7.00 ( 0%) usr 0.02 ( 0%) sys 7.05 ( 0%) wall
0 kB ( 0%) ggc
callgraph construction: 0.25 ( 0%) usr 0.01 ( 0%) sys 0.25 ( 0%) wall
12496 kB ( 1%) ggc
callgraph optimization: 387.87 ( 8%) usr 1.54 (11%) sys 389.44 ( 8%) wall
4414 kB ( 0%) ggc
ipa cp : 0.30 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall
34 kB ( 0%) ggc
ipa reference : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
0 kB ( 0%) ggc
ipa pure const : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
cfg cleanup : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
1 kB ( 0%) ggc
CFG verifier : 10.67 ( 0%) usr 0.01 ( 0%) sys 10.74 ( 0%) wall
0 kB ( 0%) ggc
trivially dead code : 4.31 ( 0%) usr 0.00 ( 0%) sys 4.30 ( 0%) wall
0 kB ( 0%) ggc
df reaching defs : 4.51 ( 0%) usr 0.02 ( 0%) sys 4.54 ( 0%) wall
0 kB ( 0%) ggc
df live regs : 14.18 ( 0%) usr 0.00 ( 0%) sys 14.22 ( 0%) wall
0 kB ( 0%) ggc
df live&initialized regs: 6.79 ( 0%) usr 0.00 ( 0%) sys 6.82 ( 0%) wall
0 kB ( 0%) ggc
df use-def / def-use chains: 20.22 ( 0%) usr 0.05 ( 0%) sys 20.27 ( 0%)
wall 0 kB ( 0%) ggc
df reg dead/unused notes: 6.55 ( 0%) usr 0.01 ( 0%) sys 6.53 ( 0%) wall
36992 kB ( 3%) ggc
register information : 2.68 ( 0%) usr 0.00 ( 0%) sys 2.71 ( 0%) wall
0 kB ( 0%) ggc
alias analysis : 6.93 ( 0%) usr 0.00 ( 0%) sys 6.92 ( 0%) wall
46600 kB ( 4%) ggc
register scan : 1.32 ( 0%) usr 0.00 ( 0%) sys 1.33 ( 0%) wall
18 kB ( 0%) ggc
rebuild jump labels : 2.14 ( 0%) usr 0.00 ( 0%) sys 2.15 ( 0%) wall
0 kB ( 0%) ggc
parser : 6.23 ( 0%) usr 0.09 ( 1%) sys 6.30 ( 0%) wall
59009 kB ( 4%) ggc
inline heuristics :1328.68 (28%) usr 3.27 (24%) sys1331.94 (28%) wall
138 kB ( 0%) ggc
tree gimplify : 0.36 ( 0%) usr 0.00 ( 0%) sys 0.38 ( 0%) wall
16833 kB ( 1%) ggc
tree eh : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
tree CFG construction : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
179 kB ( 0%) ggc
tree CFG cleanup : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
8 kB ( 0%) ggc
tree VRP : 0.21 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall
448 kB ( 0%) ggc
tree copy propagation : 0.41 ( 0%) usr 0.00 ( 0%) sys 0.38 ( 0%) wall
159 kB ( 0%) ggc
tree find ref. vars : 0.07 ( 0%) usr 0.02 ( 0%) sys 0.08 ( 0%) wall
7873 kB ( 1%) ggc
tree PTA : 19.23 ( 0%) usr 0.06 ( 0%) sys 19.31 ( 0%) wall
540 kB ( 0%) ggc
tree alias analysis : 0.47 ( 0%) usr 0.04 ( 0%) sys 0.53 ( 0%) wall
75 kB ( 0%) ggc
tree call clobbering : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
36 kB ( 0%) ggc
tree flow sensitive alias: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
61 kB ( 0%) ggc
tree flow insensitive alias: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%)
wall 0 kB ( 0%) ggc
tree memory partitioning: 0.61 ( 0%) usr 0.00 ( 0%) sys 0.60 ( 0%) wall
0 kB ( 0%) ggc
tree SSA rewrite : 0.09 ( 0%) usr 0.01 ( 0%) sys 0.10 ( 0%) wall
20578 kB ( 2%) ggc
tree SSA other : 0.06 ( 0%) usr 0.01 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
tree SSA incremental : 0.21 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall
21 kB ( 0%) ggc
tree operand scan :2339.81 (49%) usr 0.26 ( 2%) sys2340.13 (49%) wall
9840 kB ( 1%) ggc
dominator optimization: 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
73 kB ( 0%) ggc
tree SRA : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
78 kB ( 0%) ggc
tree CCP : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall
119 kB ( 0%) ggc
tree reassociation : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
50 kB ( 0%) ggc
tree PRE : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall
62 kB ( 0%) ggc
tree FRE : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%) wall
33 kB ( 0%) ggc
tree linearize phis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
17 kB ( 0%) ggc
tree forward propagate: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
3 kB ( 0%) ggc
tree conservative DCE : 0.30 ( 0%) usr 0.00 ( 0%) sys 0.31 ( 0%) wall
0 kB ( 0%) ggc
tree aggressive DCE : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall
12 kB ( 0%) ggc
tree buildin call DCE : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree DSE : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
10 kB ( 0%) ggc
complete unrolling : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
46 kB ( 0%) ggc
tree loop init : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
44 kB ( 0%) ggc
tree SSA uncprop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree SSA to normal : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
4 kB ( 0%) ggc
tree rename SSA copies: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree SSA verifier : 7.53 ( 0%) usr 0.01 ( 0%) sys 7.55 ( 0%) wall
0 kB ( 0%) ggc
tree STMT verifier : 11.14 ( 0%) usr 0.01 ( 0%) sys 11.19 ( 0%) wall
0 kB ( 0%) ggc
callgraph verifier : 1.01 ( 0%) usr 0.00 ( 0%) sys 1.02 ( 0%) wall
0 kB ( 0%) ggc
dominance computation : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
0 kB ( 0%) ggc
expand : 421.25 ( 9%) usr 2.03 (15%) sys 423.13 ( 9%) wall
365559 kB (28%) ggc
lower subreg : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
0 kB ( 0%) ggc
forward prop : 5.34 ( 0%) usr 0.03 ( 0%) sys 5.36 ( 0%) wall
24812 kB ( 2%) ggc
CSE : 10.24 ( 0%) usr 0.07 ( 1%) sys 10.30 ( 0%) wall
24343 kB ( 2%) ggc
dead code elimination : 2.74 ( 0%) usr 0.00 ( 0%) sys 2.75 ( 0%) wall
0 kB ( 0%) ggc
dead store elim1 : 2.46 ( 0%) usr 0.05 ( 0%) sys 2.52 ( 0%) wall
36487 kB ( 3%) ggc
dead store elim2 : 3.81 ( 0%) usr 0.00 ( 0%) sys 3.82 ( 0%) wall
36569 kB ( 3%) ggc
loop analysis : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
109 kB ( 0%) ggc
global CSE : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
web : 4.48 ( 0%) usr 0.01 ( 0%) sys 4.49 ( 0%) wall
7 kB ( 0%) ggc
CSE 2 : 4.96 ( 0%) usr 0.02 ( 0%) sys 4.99 ( 0%) wall
9448 kB ( 1%) ggc
branch prediction : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
46 kB ( 0%) ggc
combiner : 3.41 ( 0%) usr 0.00 ( 0%) sys 3.41 ( 0%) wall
18359 kB ( 1%) ggc
regmove : 0.27 ( 0%) usr 0.00 ( 0%) sys 0.26 ( 0%) wall
28 kB ( 0%) ggc
integrated RA : 71.46 ( 1%) usr 5.20 (38%) sys 96.42 ( 2%) wall
66797 kB ( 5%) ggc
reload : 14.31 ( 0%) usr 0.19 ( 1%) sys 14.49 ( 0%) wall
284193 kB (21%) ggc
reload CSE regs : 8.05 ( 0%) usr 0.00 ( 0%) sys 8.04 ( 0%) wall
77580 kB ( 6%) ggc
load CSE after reload : 0.62 ( 0%) usr 0.00 ( 0%) sys 0.61 ( 0%) wall
0 kB ( 0%) ggc
thread pro- & epilogue: 0.79 ( 0%) usr 0.00 ( 0%) sys 0.79 ( 0%) wall
216 kB ( 0%) ggc
peephole 2 : 0.99 ( 0%) usr 0.00 ( 0%) sys 0.99 ( 0%) wall
29 kB ( 0%) ggc
rename registers : 3.99 ( 0%) usr 0.00 ( 0%) sys 4.02 ( 0%) wall
0 kB ( 0%) ggc
scheduling 2 : 18.25 ( 0%) usr 0.54 ( 4%) sys 18.77 ( 0%) wall
147959 kB (11%) ggc
machine dep reorg : 1.87 ( 0%) usr 0.00 ( 0%) sys 1.88 ( 0%) wall
2 kB ( 0%) ggc
reorder blocks : 0.36 ( 0%) usr 0.00 ( 0%) sys 0.35 ( 0%) wall
17 kB ( 0%) ggc
final : 2.93 ( 0%) usr 0.09 ( 1%) sys 3.01 ( 0%) wall
11989 kB ( 1%) ggc
symout : 0.04 ( 0%) usr 0.01 ( 0%) sys 0.05 ( 0%) wall
5340 kB ( 0%) ggc
variable tracking : 1.39 ( 0%) usr 0.06 ( 0%) sys 1.44 ( 0%) wall
63 kB ( 0%) ggc
TOTAL :4777.54 13.76 4811.10
1328156 kB
while on the same machine, 4.3 has this time report:
gfortran-4.3 -ffree-line-length-512 -g -ffree-form -ftime-report -c -O3
-march=native -funroll-loops -fno-rename-registers testcase.f90
Execution times (seconds)
garbage collection : 2.22 ( 0%) usr 0.01 ( 0%) sys 2.26 ( 0%) wall
0 kB ( 0%) ggc
callgraph construction: 0.24 ( 0%) usr 0.05 ( 0%) sys 0.30 ( 0%) wall
10435 kB ( 1%) ggc
callgraph optimization: 0.88 ( 0%) usr 0.02 ( 0%) sys 0.94 ( 0%) wall
13970 kB ( 1%) ggc
ipa reference : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%) wall
3 kB ( 0%) ggc
ipa pure const : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
cfg cleanup : 0.06 ( 0%) usr 0.02 ( 0%) sys 0.15 ( 0%) wall
1 kB ( 0%) ggc
trivially dead code : 3.72 ( 0%) usr 0.02 ( 0%) sys 3.76 ( 0%) wall
0 kB ( 0%) ggc
df reaching defs : 4.71 ( 0%) usr 0.06 ( 0%) sys 4.76 ( 0%) wall
0 kB ( 0%) ggc
df live regs : 14.35 ( 0%) usr 0.00 ( 0%) sys 14.35 ( 0%) wall
0 kB ( 0%) ggc
df live&initialized regs: 6.28 ( 0%) usr 0.05 ( 0%) sys 6.41 ( 0%) wall
0 kB ( 0%) ggc
df use-def / def-use chains: 12.90 ( 0%) usr 0.03 ( 0%) sys 12.96 ( 0%)
wall 0 kB ( 0%) ggc
df reg dead/unused notes: 6.17 ( 0%) usr 0.01 ( 0%) sys 6.19 ( 0%) wall
35719 kB ( 3%) ggc
register information : 2.84 ( 0%) usr 0.01 ( 0%) sys 2.91 ( 0%) wall
0 kB ( 0%) ggc
alias analysis : 6.60 ( 0%) usr 0.02 ( 0%) sys 6.69 ( 0%) wall
46609 kB ( 3%) ggc
register scan : 1.24 ( 0%) usr 0.00 ( 0%) sys 1.28 ( 0%) wall
8 kB ( 0%) ggc
rebuild jump labels : 2.33 ( 0%) usr 0.00 ( 0%) sys 2.32 ( 0%) wall
0 kB ( 0%) ggc
parser : 2.77 ( 0%) usr 0.86 ( 6%) sys 5.64 ( 0%) wall
58659 kB ( 4%) ggc
inline heuristics :1683.93 (21%) usr 4.08 (29%) sys1689.76 (21%) wall
136 kB ( 0%) ggc
integration : 0.00 ( 0%) usr 0.02 ( 0%) sys 0.12 ( 0%) wall
824 kB ( 0%) ggc
tree gimplify : 0.43 ( 0%) usr 0.01 ( 0%) sys 0.43 ( 0%) wall
3446 kB ( 0%) ggc
tree eh : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
tree CFG construction : 0.03 ( 0%) usr 0.01 ( 0%) sys 0.03 ( 0%) wall
7150 kB ( 1%) ggc
tree CFG cleanup : 0.02 ( 0%) usr 0.04 ( 0%) sys 0.21 ( 0%) wall
8 kB ( 0%) ggc
tree VRP : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.21 ( 0%) wall
479 kB ( 0%) ggc
tree copy propagation : 0.51 ( 0%) usr 0.00 ( 0%) sys 0.55 ( 0%) wall
159 kB ( 0%) ggc
tree find ref. vars : 0.11 ( 0%) usr 0.02 ( 0%) sys 0.12 ( 0%) wall
7876 kB ( 1%) ggc
tree PTA : 3.88 ( 0%) usr 0.07 ( 1%) sys 3.98 ( 0%) wall
380 kB ( 0%) ggc
tree alias analysis : 8.07 ( 0%) usr 0.83 ( 6%) sys 10.31 ( 0%) wall
49 kB ( 0%) ggc
tree call clobbering : 0.32 ( 0%) usr 0.00 ( 0%) sys 0.31 ( 0%) wall
0 kB ( 0%) ggc
tree flow sensitive alias: 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
62 kB ( 0%) ggc
tree flow insensitive alias: 4.79 ( 0%) usr 0.00 ( 0%) sys 4.79 ( 0%)
wall 0 kB ( 0%) ggc
tree memory partitioning: 23.61 ( 0%) usr 0.01 ( 0%) sys 23.65 ( 0%) wall
2 kB ( 0%) ggc
tree PHI insertion : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
tree SSA rewrite : 0.15 ( 0%) usr 0.02 ( 0%) sys 0.19 ( 0%) wall
25410 kB ( 2%) ggc
tree SSA other : 0.15 ( 0%) usr 0.52 ( 4%) sys 2.07 ( 0%) wall
0 kB ( 0%) ggc
tree SSA incremental : 0.63 ( 0%) usr 0.02 ( 0%) sys 0.69 ( 0%) wall
23 kB ( 0%) ggc
tree operand scan :5025.85 (64%) usr 3.00 (22%) sys5036.21 (64%) wall
19771 kB ( 1%) ggc
dominator optimization: 0.19 ( 0%) usr 0.01 ( 0%) sys 0.18 ( 0%) wall
381 kB ( 0%) ggc
tree SRA : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
71 kB ( 0%) ggc
tree STORE-CCP : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
39 kB ( 0%) ggc
tree CCP : 0.17 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall
81 kB ( 0%) ggc
tree PHI const/copy prop: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
tree reassociation : 0.05 ( 0%) usr 0.01 ( 0%) sys 0.04 ( 0%) wall
50 kB ( 0%) ggc
tree PRE : 0.37 ( 0%) usr 0.00 ( 0%) sys 0.36 ( 0%) wall
2286 kB ( 0%) ggc
tree FRE : 0.31 ( 0%) usr 0.01 ( 0%) sys 0.34 ( 0%) wall
2268 kB ( 0%) ggc
tree code sinking : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
25 kB ( 0%) ggc
tree linearize phis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
10 kB ( 0%) ggc
tree forward propagate: 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
1 kB ( 0%) ggc
tree conservative DCE : 0.44 ( 0%) usr 0.00 ( 0%) sys 0.47 ( 0%) wall
4 kB ( 0%) ggc
tree aggressive DCE : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
0 kB ( 0%) ggc
tree DSE : 0.05 ( 0%) usr 0.01 ( 0%) sys 0.04 ( 0%) wall
4 kB ( 0%) ggc
PHI merge : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree loop optimization: 0.00 ( 0%) usr 0.01 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
loop invariant motion : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree canonical iv : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree loop unswitching : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%) wall
0 kB ( 0%) ggc
complete unrolling : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
predictive commoning : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
1 kB ( 0%) ggc
tree copy headers : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.01 ( 0%) wall
26 kB ( 0%) ggc
tree SSA uncprop : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
tree SSA to normal : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
1 kB ( 0%) ggc
tree NRV optimization : 0.00 ( 0%) usr 0.02 ( 0%) sys 0.00 ( 0%) wall
0 kB ( 0%) ggc
tree rename SSA copies: 0.03 ( 0%) usr 0.01 ( 0%) sys 0.06 ( 0%) wall
0 kB ( 0%) ggc
dominance frontiers : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%) wall
0 kB ( 0%) ggc
dominance computation : 0.00 ( 0%) usr 0.02 ( 0%) sys 0.09 ( 0%) wall
0 kB ( 0%) ggc
expand : 959.04 (12%) usr 2.65 (19%) sys 964.23 (12%) wall
422621 kB (31%) ggc
varconst : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
lower subreg : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
0 kB ( 0%) ggc
jump : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
forward prop : 1.26 ( 0%) usr 0.04 ( 0%) sys 1.34 ( 0%) wall
8174 kB ( 1%) ggc
CSE : 7.64 ( 0%) usr 0.04 ( 0%) sys 7.71 ( 0%) wall
26097 kB ( 2%) ggc
dead code elimination : 2.69 ( 0%) usr 0.00 ( 0%) sys 2.70 ( 0%) wall
0 kB ( 0%) ggc
dead store elim1 : 4.05 ( 0%) usr 0.10 ( 1%) sys 4.21 ( 0%) wall
36200 kB ( 3%) ggc
dead store elim2 : 5.95 ( 0%) usr 0.01 ( 0%) sys 5.98 ( 0%) wall
36506 kB ( 3%) ggc
loop analysis : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.04 ( 0%) wall
111 kB ( 0%) ggc
global CSE : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
bypass jumps : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%) wall
6 kB ( 0%) ggc
web : 0.98 ( 0%) usr 0.01 ( 0%) sys 1.00 ( 0%) wall
0 kB ( 0%) ggc
CSE 2 : 3.65 ( 0%) usr 0.01 ( 0%) sys 3.70 ( 0%) wall
9710 kB ( 1%) ggc
branch prediction : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
48 kB ( 0%) ggc
combiner : 3.34 ( 0%) usr 0.00 ( 0%) sys 3.34 ( 0%) wall
17434 kB ( 1%) ggc
if-conversion : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
50 kB ( 0%) ggc
regmove : 3.07 ( 0%) usr 0.00 ( 0%) sys 3.07 ( 0%) wall
28 kB ( 0%) ggc
mode switching : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
local alloc : 9.49 ( 0%) usr 0.07 ( 1%) sys 9.54 ( 0%) wall
62106 kB ( 5%) ggc
global alloc : 17.28 ( 0%) usr 0.52 ( 4%) sys 18.12 ( 0%) wall
264214 kB (20%) ggc
reload CSE regs : 7.61 ( 0%) usr 0.01 ( 0%) sys 7.62 ( 0%) wall
75867 kB ( 6%) ggc
load CSE after reload : 0.60 ( 0%) usr 0.00 ( 0%) sys 0.60 ( 0%) wall
0 kB ( 0%) ggc
thread pro- & epilogue: 0.68 ( 0%) usr 0.01 ( 0%) sys 0.68 ( 0%) wall
236 kB ( 0%) ggc
if-conversion 2 : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
25 kB ( 0%) ggc
peephole 2 : 1.08 ( 0%) usr 0.00 ( 0%) sys 1.09 ( 0%) wall
28 kB ( 0%) ggc
rename registers : 1.87 ( 0%) usr 0.00 ( 0%) sys 1.90 ( 0%) wall
0 kB ( 0%) ggc
scheduling 2 : 15.12 ( 0%) usr 0.38 ( 3%) sys 15.51 ( 0%) wall
144759 kB (11%) ggc
machine dep reorg : 1.52 ( 0%) usr 0.00 ( 0%) sys 1.51 ( 0%) wall
2 kB ( 0%) ggc
reorder blocks : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
18 kB ( 0%) ggc
reg stack : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
final : 2.94 ( 0%) usr 0.10 ( 1%) sys 3.29 ( 0%) wall
1267 kB ( 0%) ggc
symout : 0.15 ( 0%) usr 0.01 ( 0%) sys 0.16 ( 0%) wall
6928 kB ( 1%) ggc
variable tracking : 1.36 ( 0%) usr 0.00 ( 0%) sys 1.37 ( 0%) wall
76 kB ( 0%) ggc
tree if-combine : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
rest of compilation : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
TOTAL :7873.75 13.93 7906.31
1349263 kB
total: 1271414 kB
--
jv244 at cam dot ac dot uk changed:
What |Removed |Added
----------------------------------------------------------------------------
GCC build triplet| |vmakarov
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [4.3/4.4 Regression] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (32 preceding siblings ...)
2008-12-17 12:59 ` jv244 at cam dot ac dot uk
@ 2008-12-17 19:42 ` steven at gcc dot gnu dot org
2008-12-20 9:00 ` jv244 at cam dot ac dot uk
` (16 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-17 19:42 UTC (permalink / raw)
To: gcc-bugs
------- Comment #33 from steven at gcc dot gnu dot org 2008-12-17 19:40 -------
cfgexpand.c:defer_stack_allocation() has this gem:
/* Without optimization, *most* variables are allocated from the
stack, which makes the quadratic problem large exactly when we
want compilation to proceed as quickly as possible. On the
other hand, we don't want the function's stack frame size to
get completely out of hand. So we avoid adding scalars and
"small" aggregates to the list at all. */
if (optimize == 0 && tree_low_cst (DECL_SIZE_UNIT (var), 1) < 32)
return false;
In our case, most variables are of type mpfr_type, which is ... yes, 32 bytes
:-)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [4.3/4.4 Regression] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (33 preceding siblings ...)
2008-12-17 19:42 ` steven at gcc dot gnu dot org
@ 2008-12-20 9:00 ` jv244 at cam dot ac dot uk
2008-12-20 9:56 ` steven at gcc dot gnu dot org
` (15 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-20 9:00 UTC (permalink / raw)
To: gcc-bugs
------- Comment #34 from jv244 at cam dot ac dot uk 2008-12-20 08:58 -------
BTW, should I split this PR in 4 sub PRs, and make them block this one?
1) inline heuristics (4.3/4.4 Regression)
2) IRA mem explosion (4.4 Regression)
3) rename registers issue (?)
4) may_alias issue (?)
This makes kind of sense according to me.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [4.3/4.4 Regression] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (34 preceding siblings ...)
2008-12-20 9:00 ` jv244 at cam dot ac dot uk
@ 2008-12-20 9:56 ` steven at gcc dot gnu dot org
2008-12-20 11:33 ` [Bug middle-end/38474] [Meta] " jv244 at cam dot ac dot uk
` (14 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-20 9:56 UTC (permalink / raw)
To: gcc-bugs
------- Comment #35 from steven at gcc dot gnu dot org 2008-12-20 09:54 -------
Re comment #34: Good idea, but add:
5) quadratic behaviour in find_temp_slot_from_address.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [Meta] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (35 preceding siblings ...)
2008-12-20 9:56 ` steven at gcc dot gnu dot org
@ 2008-12-20 11:33 ` jv244 at cam dot ac dot uk
2008-12-20 15:51 ` steven at gcc dot gnu dot org
` (13 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-12-20 11:33 UTC (permalink / raw)
To: gcc-bugs
------- Comment #36 from jv244 at cam dot ac dot uk 2008-12-20 11:30 -------
I've added
PR38582 : rename registers
PR38583 : ira
PR38584 : inline heuristic
PR38585 : compute_may_aliases
PR38586 : find_temp_slot_from_address
and turned this one in a meta bug.
--
jv244 at cam dot ac dot uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[4.3/4.4 Regression] slow |[Meta] slow compilation at -
|compilation at -O0 |O0 (callgraph optimization,
|(callgraph optimization, |inline heuristics, expand )
|inline heuristics, expand ) |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [Meta] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (36 preceding siblings ...)
2008-12-20 11:33 ` [Bug middle-end/38474] [Meta] " jv244 at cam dot ac dot uk
@ 2008-12-20 15:51 ` steven at gcc dot gnu dot org
2009-01-24 10:27 ` rguenth at gcc dot gnu dot org
` (12 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-20 15:51 UTC (permalink / raw)
To: gcc-bugs
--
steven at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|steven at gcc dot gnu dot |unassigned at gcc dot gnu
|org |dot org
Status|ASSIGNED |NEW
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [Meta] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (37 preceding siblings ...)
2008-12-20 15:51 ` steven at gcc dot gnu dot org
@ 2009-01-24 10:27 ` rguenth at gcc dot gnu dot org
2009-08-04 12:45 ` rguenth at gcc dot gnu dot org
` (11 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-01-24 10:27 UTC (permalink / raw)
To: gcc-bugs
------- Comment #37 from rguenth at gcc dot gnu dot org 2009-01-24 10:21 -------
GCC 4.3.3 is being released, adjusting target milestone.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.3.3 |4.3.4
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [Meta] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (38 preceding siblings ...)
2009-01-24 10:27 ` rguenth at gcc dot gnu dot org
@ 2009-08-04 12:45 ` rguenth at gcc dot gnu dot org
2009-11-27 8:52 ` jv244 at cam dot ac dot uk
` (10 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-08-04 12:45 UTC (permalink / raw)
To: gcc-bugs
------- Comment #38 from rguenth at gcc dot gnu dot org 2009-08-04 12:29 -------
GCC 4.3.4 is being released, adjusting target milestone.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.3.4 |4.3.5
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [Meta] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (39 preceding siblings ...)
2009-08-04 12:45 ` rguenth at gcc dot gnu dot org
@ 2009-11-27 8:52 ` jv244 at cam dot ac dot uk
2009-11-27 9:00 ` jv244 at cam dot ac dot uk
` (9 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2009-11-27 8:52 UTC (permalink / raw)
To: gcc-bugs
------- Comment #39 from jv244 at cam dot ac dot uk 2009-11-27 08:52 -------
I've rerun the initial (non-reduced) testcase at -O0, and I'm getting now more
reasonable memory usage (2.5Gb), and all time is now in 'expand'. 'expand' is
now about 3 times slower than 1year ago, but this is with checking enabled so I
don't know if this is relevant:
Execution times (seconds)
garbage collection : 2.22 ( 0%) usr 0.00 ( 0%) sys 2.22 ( 0%) wall
0 kB ( 0%) ggc
callgraph construction: 0.22 ( 0%) usr 0.02 ( 0%) sys 0.28 ( 0%) wall
12487 kB ( 2%) ggc
callgraph optimization: 0.23 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall
4370 kB ( 1%) ggc
cfg cleanup : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
0 kB ( 0%) ggc
CFG verifier : 3.33 ( 0%) usr 0.01 ( 0%) sys 3.36 ( 0%) wall
0 kB ( 0%) ggc
trivially dead code : 0.92 ( 0%) usr 0.00 ( 0%) sys 0.91 ( 0%) wall
0 kB ( 0%) ggc
df live regs : 0.62 ( 0%) usr 0.00 ( 0%) sys 0.64 ( 0%) wall
0 kB ( 0%) ggc
df reg dead/unused notes: 1.33 ( 0%) usr 0.02 ( 0%) sys 1.31 ( 0%) wall
19416 kB ( 3%) ggc
register information : 0.63 ( 0%) usr 0.01 ( 0%) sys 0.64 ( 0%) wall
0 kB ( 0%) ggc
alias analysis : 0.58 ( 0%) usr 0.01 ( 0%) sys 0.59 ( 0%) wall
8335 kB ( 1%) ggc
rebuild jump labels : 0.65 ( 0%) usr 0.00 ( 0%) sys 0.65 ( 0%) wall
0 kB ( 0%) ggc
parser : 4.96 ( 1%) usr 0.09 ( 2%) sys 5.06 ( 1%) wall
50732 kB ( 9%) ggc
inline heuristics : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
0 kB ( 0%) ggc
tree gimplify : 0.72 ( 0%) usr 0.01 ( 0%) sys 0.69 ( 0%) wall
13184 kB ( 2%) ggc
tree eh : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree CFG construction : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
172 kB ( 0%) ggc
tree find ref. vars : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
3263 kB ( 1%) ggc
tree SSA rewrite : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
46 kB ( 0%) ggc
tree SSA other : 0.02 ( 0%) usr 0.02 ( 0%) sys 0.04 ( 0%) wall
18 kB ( 0%) ggc
tree operand scan : 0.06 ( 0%) usr 0.02 ( 0%) sys 0.07 ( 0%) wall
118 kB ( 0%) ggc
tree SSA verifier : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
0 kB ( 0%) ggc
tree STMT verifier : 1.49 ( 0%) usr 0.03 ( 1%) sys 1.53 ( 0%) wall
0 kB ( 0%) ggc
callgraph verifier : 1.22 ( 0%) usr 0.00 ( 0%) sys 1.25 ( 0%) wall
0 kB ( 0%) ggc
expand : 737.90 (94%) usr 3.54 (79%) sys 741.44 (94%) wall
309551 kB (55%) ggc
integrated RA : 18.91 ( 2%) usr 0.28 ( 6%) sys 19.24 ( 2%) wall
8696 kB ( 2%) ggc
reload : 8.08 ( 1%) usr 0.26 ( 6%) sys 8.33 ( 1%) wall
123546 kB (22%) ggc
thread pro- & epilogue: 0.80 ( 0%) usr 0.00 ( 0%) sys 0.81 ( 0%) wall
239 kB ( 0%) ggc
final : 3.13 ( 0%) usr 0.15 ( 3%) sys 3.30 ( 0%) wall
533 kB ( 0%) ggc
symout : 0.08 ( 0%) usr 0.02 ( 0%) sys 0.08 ( 0%) wall
4818 kB ( 1%) ggc
TOTAL : 788.49 4.49 792.98
559736 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.
COLLECT_GCC_OPTIONS='-ffree-line-length-512' '-g' '-ffree-form' '-ftime-report'
'-c' '-O0' '-ffree-line-length-512' '-v' '-mtune=generic'
as -V -Qy -o PR38582.o /tmp/ccfulxg5.s
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [Meta] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (40 preceding siblings ...)
2009-11-27 8:52 ` jv244 at cam dot ac dot uk
@ 2009-11-27 9:00 ` jv244 at cam dot ac dot uk
2009-11-27 10:50 ` rguenth at gcc dot gnu dot org
` (8 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2009-11-27 9:00 UTC (permalink / raw)
To: gcc-bugs
------- Comment #40 from jv244 at cam dot ac dot uk 2009-11-27 09:00 -------
with the fix for rename registers this now also runs 'fast' at -O3 (see below),
and memory is reasonable as well. Most time is in expand as well. This is the
time report of -O3:
Execution times (seconds)
garbage collection : 7.60 ( 1%) usr 0.03 ( 0%) sys 7.65 ( 1%) wall
0 kB ( 0%) ggc
callgraph construction: 0.23 ( 0%) usr 0.01 ( 0%) sys 0.25 ( 0%) wall
12524 kB ( 1%) ggc
callgraph optimization: 0.48 ( 0%) usr 0.03 ( 0%) sys 0.51 ( 0%) wall
4370 kB ( 0%) ggc
ipa cp : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
2061 kB ( 0%) ggc
ipa reference : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
0 kB ( 0%) ggc
ipa pure const : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall
2 kB ( 0%) ggc
cfg cleanup : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
0 kB ( 0%) ggc
CFG verifier : 11.18 ( 1%) usr 0.04 ( 0%) sys 11.25 ( 1%) wall
0 kB ( 0%) ggc
trivially dead code : 2.70 ( 0%) usr 0.01 ( 0%) sys 2.72 ( 0%) wall
0 kB ( 0%) ggc
df multiple defs : 3.28 ( 0%) usr 0.00 ( 0%) sys 3.28 ( 0%) wall
0 kB ( 0%) ggc
df reaching defs : 1.30 ( 0%) usr 0.04 ( 0%) sys 1.33 ( 0%) wall
0 kB ( 0%) ggc
df live regs : 11.46 ( 1%) usr 0.01 ( 0%) sys 11.47 ( 1%) wall
0 kB ( 0%) ggc
df live&initialized regs: 6.86 ( 1%) usr 0.02 ( 0%) sys 6.87 ( 1%) wall
0 kB ( 0%) ggc
df use-def / def-use chains: 3.87 ( 0%) usr 0.02 ( 0%) sys 3.91 ( 0%)
wall 0 kB ( 0%) ggc
df reg dead/unused notes: 9.18 ( 1%) usr 0.01 ( 0%) sys 9.23 ( 1%) wall
28894 kB ( 2%) ggc
register information : 3.54 ( 0%) usr 0.02 ( 0%) sys 3.58 ( 0%) wall
0 kB ( 0%) ggc
alias analysis : 5.55 ( 1%) usr 0.01 ( 0%) sys 5.60 ( 1%) wall
42254 kB ( 4%) ggc
alias stmt walking : 0.23 ( 0%) usr 0.11 ( 1%) sys 0.33 ( 0%) wall
0 kB ( 0%) ggc
register scan : 0.70 ( 0%) usr 0.00 ( 0%) sys 0.71 ( 0%) wall
4 kB ( 0%) ggc
rebuild jump labels : 1.43 ( 0%) usr 0.00 ( 0%) sys 1.46 ( 0%) wall
0 kB ( 0%) ggc
parser : 4.66 ( 1%) usr 0.11 ( 1%) sys 4.78 ( 1%) wall
50732 kB ( 4%) ggc
inline heuristics : 40.66 ( 5%) usr 8.08 (51%) sys 48.90 ( 6%) wall
112 kB ( 0%) ggc
integration : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
951 kB ( 0%) ggc
tree gimplify : 0.67 ( 0%) usr 0.00 ( 0%) sys 0.67 ( 0%) wall
13182 kB ( 1%) ggc
tree eh : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
tree CFG construction : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
172 kB ( 0%) ggc
tree CFG cleanup : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
1 kB ( 0%) ggc
tree VRP : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall
425 kB ( 0%) ggc
tree copy propagation : 0.26 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall
139 kB ( 0%) ggc
tree find ref. vars : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
3262 kB ( 0%) ggc
tree PTA : 21.39 ( 3%) usr 0.38 ( 2%) sys 21.76 ( 3%) wall
371 kB ( 0%) ggc
tree PHI insertion : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
0 kB ( 0%) ggc
tree SSA rewrite : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
8504 kB ( 1%) ggc
tree SSA other : 0.04 ( 0%) usr 0.01 ( 0%) sys 0.05 ( 0%) wall
18 kB ( 0%) ggc
tree SSA incremental : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
24 kB ( 0%) ggc
tree operand scan : 0.04 ( 0%) usr 0.07 ( 0%) sys 0.10 ( 0%) wall
4721 kB ( 0%) ggc
dominator optimization: 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
68 kB ( 0%) ggc
tree SRA : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
86 kB ( 0%) ggc
tree CCP : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall
105 kB ( 0%) ggc
tree reassociation : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
48 kB ( 0%) ggc
tree PRE : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
171 kB ( 0%) ggc
tree FRE : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall
140 kB ( 0%) ggc
tree code sinking : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
24 kB ( 0%) ggc
tree linearize phis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
14 kB ( 0%) ggc
tree forward propagate: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
7 kB ( 0%) ggc
tree conservative DCE : 0.40 ( 0%) usr 0.05 ( 0%) sys 0.46 ( 0%) wall
0 kB ( 0%) ggc
tree aggressive DCE : 0.21 ( 0%) usr 0.03 ( 0%) sys 0.19 ( 0%) wall
319 kB ( 0%) ggc
tree buildin call DCE : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
0 kB ( 0%) ggc
tree DSE : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
8 kB ( 0%) ggc
complete unrolling : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
43 kB ( 0%) ggc
tree vectorization : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree slp vectorization: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
27 kB ( 0%) ggc
tree rename SSA copies: 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
0 kB ( 0%) ggc
tree SSA verifier : 2.85 ( 0%) usr 0.02 ( 0%) sys 2.83 ( 0%) wall
0 kB ( 0%) ggc
tree STMT verifier : 13.12 ( 2%) usr 0.06 ( 0%) sys 13.20 ( 2%) wall
0 kB ( 0%) ggc
callgraph verifier : 1.85 ( 0%) usr 0.00 ( 0%) sys 1.86 ( 0%) wall
0 kB ( 0%) ggc
dominance computation : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
expand : 548.68 (65%) usr 4.27 (27%) sys 552.92 (64%) wall
311209 kB (26%) ggc
lower subreg : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
0 kB ( 0%) ggc
forward prop : 6.49 ( 1%) usr 0.08 ( 1%) sys 6.57 ( 1%) wall
18623 kB ( 2%) ggc
CSE : 4.60 ( 1%) usr 0.02 ( 0%) sys 4.62 ( 1%) wall
11149 kB ( 1%) ggc
dead code elimination : 2.60 ( 0%) usr 0.01 ( 0%) sys 2.60 ( 0%) wall
0 kB ( 0%) ggc
dead store elim1 : 3.33 ( 0%) usr 0.22 ( 1%) sys 3.51 ( 0%) wall
27472 kB ( 2%) ggc
dead store elim2 : 8.94 ( 1%) usr 0.02 ( 0%) sys 8.92 ( 1%) wall
40503 kB ( 3%) ggc
CPROP : 3.82 ( 0%) usr 0.01 ( 0%) sys 3.84 ( 0%) wall
10 kB ( 0%) ggc
CSE 2 : 4.43 ( 1%) usr 0.02 ( 0%) sys 4.44 ( 1%) wall
7115 kB ( 1%) ggc
branch prediction : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
43 kB ( 0%) ggc
combiner : 3.60 ( 0%) usr 0.03 ( 0%) sys 3.62 ( 0%) wall
13773 kB ( 1%) ggc
regmove : 1.00 ( 0%) usr 0.01 ( 0%) sys 1.00 ( 0%) wall
0 kB ( 0%) ggc
integrated RA : 30.06 ( 4%) usr 0.29 ( 2%) sys 30.38 ( 4%) wall
52314 kB ( 4%) ggc
reload : 11.54 ( 1%) usr 0.52 ( 3%) sys 12.09 ( 1%) wall
216344 kB (18%) ggc
reload CSE regs : 9.15 ( 1%) usr 0.01 ( 0%) sys 9.16 ( 1%) wall
59432 kB ( 5%) ggc
load CSE after reload : 0.53 ( 0%) usr 0.01 ( 0%) sys 0.53 ( 0%) wall
0 kB ( 0%) ggc
thread pro- & epilogue: 0.86 ( 0%) usr 0.00 ( 0%) sys 0.86 ( 0%) wall
302 kB ( 0%) ggc
if-conversion 2 : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
24 kB ( 0%) ggc
combine stack adjustments: 0.18 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall
0 kB ( 0%) ggc
peephole 2 : 1.07 ( 0%) usr 0.00 ( 0%) sys 1.07 ( 0%) wall
27 kB ( 0%) ggc
hard reg cprop : 3.83 ( 0%) usr 0.00 ( 0%) sys 3.85 ( 0%) wall
2 kB ( 0%) ggc
scheduling 2 : 20.89 ( 2%) usr 0.83 ( 5%) sys 21.75 ( 3%) wall
125198 kB (10%) ggc
machine dep reorg : 1.51 ( 0%) usr 0.00 ( 0%) sys 1.53 ( 0%) wall
0 kB ( 0%) ggc
reorder blocks : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall
1 kB ( 0%) ggc
final : 3.47 ( 0%) usr 0.13 ( 1%) sys 3.56 ( 0%) wall
1631 kB ( 0%) ggc
symout : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
4315 kB ( 0%) ggc
variable tracking : 15.85 ( 2%) usr 0.03 ( 0%) sys 15.90 ( 2%) wall
133442 kB (11%) ggc
TOTAL : 844.13 15.69 860.19
1197120 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.
COLLECT_GCC_OPTIONS='-ffree-line-length-512' '-g' '-ffree-form' '-ftime-report'
'-c' '-O3' '-ffree-line-length-512' '-v' '-mtune=generic'
as -V -Qy -o PR38582.o /tmp/ccoKMKzI.s
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [Meta] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (41 preceding siblings ...)
2009-11-27 9:00 ` jv244 at cam dot ac dot uk
@ 2009-11-27 10:50 ` rguenth at gcc dot gnu dot org
2009-12-03 13:37 ` matz at gcc dot gnu dot org
` (7 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-11-27 10:50 UTC (permalink / raw)
To: gcc-bugs
------- Comment #41 from rguenth at gcc dot gnu dot org 2009-11-27 10:50 -------
Micha - we still spend most of the time in expand_used_vars even at -O0.
Maybe you want to have a look.
expand : 555.46 (92%) usr 4.88 (77%) sys 579.14 (92%) wall
310089 kB (56%) ggc
integrated RA : 13.46 ( 2%) usr 0.14 ( 2%) sys 13.57 ( 2%) wall
8685 kB ( 2%) ggc
reload : 14.29 ( 2%) usr 0.71 (11%) sys 15.04 ( 2%) wall
123548 kB (22%) ggc
TOTAL : 605.85 6.35 631.31
552067 kB
We also still peak at 2.4GB for this testcase... the detailed memory
report is as follows (just the biggest pieces):
Kind Nodes Bytes
---------------------------------------
decls 145133 24456664
exprs 490000 28838464
random kinds 226360 9054736
---------------------------------------
Total 923230 66015565
GIMPLE statements
Kind Stmts Bytes
---------------------------------------
assignments 361 29112
phi nodes 1 240
conditionals 61 4880
sequences 3874 92976
everything else 78134 10240768
---------------------------------------
Total 82431 10367976
(would probably interesting to separately count calls)
RTX Kind Count Bytes
---------------------------------------
expr_list 1113875 26733000
insn 1106192 79645824
set 1106448 26554752
reg 829905 26556960
mem 2508095 60194280
plus 2894284 69462816
---------------------------------------
Total 10615719 311545008
DF as usual is a big memory consumer, even at -O0 ...
source location Garbage Freed
Leak Overhead Times
emit-rtl.c:907 (gen_reg_rtx) 11781120: 2.3%
11769792:33.0% 0: 0.0% 6519744:60.7% 18
rtl.c:285 (copy_rtx) 21352344: 4.1% 0:
0.0% 0: 0.0% 0: 0.0% 889681
emit-rtl.c:425 (gen_raw_REG) 26543328: 5.1% 0:
0.0% 13600: 0.1% 0: 0.0% 829904
reload1.c:2622 (eliminate_regs_1) 26589096: 5.1% 0:
0.0% 0: 0.0% 0: 0.0% 1107879
emit-rtl.c:640 (gen_rtx_MEM) 59946576:11.5% 0:
0.0% 247704: 1.2% 0: 0.0% 2508095
emit-rtl.c:5457 (copy_insn_1) 61260256:11.8% 0:
0.0% 0: 0.0% 0: 0.0% 2608317
emit-rtl.c:3610 (make_insn_raw) 79645680:15.3% 0:
0.0% 0: 0.0% 0: 0.0% 1106190
Total 520012383 35665696
20430230 10739229 16135674
source location Garbage Freed
Leak Overhead Times
So most of the memory is used in the RTL parts of the compiler.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |matz at gcc dot gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [Meta] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (42 preceding siblings ...)
2009-11-27 10:50 ` rguenth at gcc dot gnu dot org
@ 2009-12-03 13:37 ` matz at gcc dot gnu dot org
2009-12-03 17:48 ` jv244 at cam dot ac dot uk
` (6 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: matz at gcc dot gnu dot org @ 2009-12-03 13:37 UTC (permalink / raw)
To: gcc-bugs
------- Comment #42 from matz at gcc dot gnu dot org 2009-12-03 13:36 -------
Subject: Bug 38474
Author: matz
Date: Thu Dec 3 13:36:32 2009
New Revision: 154945
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=154945
Log:
PR middle-end/38474
* cfgexpand.c (struct stack_var): Add conflicts member.
(stack_vars_conflict, stack_vars_conflict_alloc,
n_stack_vars_conflict): Remove.
(add_stack_var): Initialize conflicts member.
(triangular_index, resize_stack_vars_conflict): Remove.
(add_stack_var_conflict, stack_var_conflict_p): Rewrite in
terms of new member.
(union_stack_vars): Only run over the conflicts.
(partition_stack_vars): Remove special case.
(expand_used_vars_for_block): Don't call resize_stack_vars_conflict,
don't create self-conflicts.
(account_used_vars_for_block): Don't create any conflicts.
(fini_vars_expansion): Free bitmaps, don't free or clear removed
globals.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/cfgexpand.c
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [Meta] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (43 preceding siblings ...)
2009-12-03 13:37 ` matz at gcc dot gnu dot org
@ 2009-12-03 17:48 ` jv244 at cam dot ac dot uk
2009-12-03 21:05 ` matz at gcc dot gnu dot org
` (5 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2009-12-03 17:48 UTC (permalink / raw)
To: gcc-bugs
------- Comment #43 from jv244 at cam dot ac dot uk 2009-12-03 17:47 -------
(In reply to comment #42)
> Subject: Bug 38474
>
> Author: matz
> Date: Thu Dec 3 13:36:32 2009
> New Revision: 154945
looks like the initial testcase now runs with 1.3Gb, and with the following
timings (so mem/time both better by a factor of two):
expand : 386.46 (89%) usr 0.81 (48%) sys 387.21 (89%) wall
309554 kB (56%) ggc
integrated RA : 17.97 ( 4%) usr 0.26 (15%) sys 18.28 ( 4%) wall
8696 kB ( 2%) ggc
reload : 7.78 ( 2%) usr 0.25 (15%) sys 8.07 ( 2%) wall
123546 kB (22%) ggc
thread pro- & epilogue: 0.74 ( 0%) usr 0.00 ( 0%) sys 0.76 ( 0%) wall
239 kB ( 0%) ggc
final : 2.84 ( 1%) usr 0.12 ( 7%) sys 2.95 ( 1%) wall
20 kB ( 0%) ggc
TOTAL : 434.29 1.70 436.00
553866 kB
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [Meta] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (44 preceding siblings ...)
2009-12-03 17:48 ` jv244 at cam dot ac dot uk
@ 2009-12-03 21:05 ` matz at gcc dot gnu dot org
2009-12-08 13:56 ` matz at gcc dot gnu dot org
` (4 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: matz at gcc dot gnu dot org @ 2009-12-03 21:05 UTC (permalink / raw)
To: gcc-bugs
------- Comment #44 from matz at gcc dot gnu dot org 2009-12-03 21:05 -------
I'm glad. I plan to work on also the other slow part of expand, which is the
temp slot goo, but a full solution requires touching very old and stable parts
of GCC, hence is IMO nothing for stage 3. I have an obvious band aid patch
giving at least some further improvements that I plan to submit for 4.5.
--
matz at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|unassigned at gcc dot gnu |matz at gcc dot gnu dot org
|dot org |
Status|NEW |ASSIGNED
Last reconfirmed|2008-12-16 16:26:45 |2009-12-03 21:05:09
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [Meta] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (45 preceding siblings ...)
2009-12-03 21:05 ` matz at gcc dot gnu dot org
@ 2009-12-08 13:56 ` matz at gcc dot gnu dot org
2010-05-22 18:33 ` rguenth at gcc dot gnu dot org
` (3 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: matz at gcc dot gnu dot org @ 2009-12-08 13:56 UTC (permalink / raw)
To: gcc-bugs
------- Comment #45 from matz at gcc dot gnu dot org 2009-12-08 13:56 -------
Subject: Bug 38474
Author: matz
Date: Tue Dec 8 13:56:06 2009
New Revision: 155087
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155087
Log:
PR middle-end/38474
* function.c (free_temp_slots): Only walk the temp slot
addresses and combine slots if we actually changes something.
(pop_temp_slots): Ditto.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/function.c
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [Meta] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (46 preceding siblings ...)
2009-12-08 13:56 ` matz at gcc dot gnu dot org
@ 2010-05-22 18:33 ` rguenth at gcc dot gnu dot org
2010-05-23 6:31 ` jv244 at cam dot ac dot uk
` (2 subsequent siblings)
50 siblings, 0 replies; 52+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-05-22 18:33 UTC (permalink / raw)
To: gcc-bugs
------- Comment #46 from rguenth at gcc dot gnu dot org 2010-05-22 18:12 -------
GCC 4.3.5 is being released, adjusting target milestone.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.3.5 |4.3.6
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [Meta] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (47 preceding siblings ...)
2010-05-22 18:33 ` rguenth at gcc dot gnu dot org
@ 2010-05-23 6:31 ` jv244 at cam dot ac dot uk
2010-05-23 20:09 ` rguenth at gcc dot gnu dot org
2010-05-23 21:03 ` [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo steven at gcc dot gnu dot org
50 siblings, 0 replies; 52+ messages in thread
From: jv244 at cam dot ac dot uk @ 2010-05-23 6:31 UTC (permalink / raw)
To: gcc-bugs
------- Comment #47 from jv244 at cam dot ac dot uk 2010-05-23 06:31 -------
all dependencies are fixed, and so is this bug.
--
jv244 at cam dot ac dot uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] [Meta] slow compilation at -O0 (callgraph optimization, inline heuristics, expand )
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (48 preceding siblings ...)
2010-05-23 6:31 ` jv244 at cam dot ac dot uk
@ 2010-05-23 20:09 ` rguenth at gcc dot gnu dot org
2010-05-23 21:03 ` [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo steven at gcc dot gnu dot org
50 siblings, 0 replies; 52+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-05-23 20:09 UTC (permalink / raw)
To: gcc-bugs
------- Comment #48 from rguenth at gcc dot gnu dot org 2010-05-23 20:08 -------
Nope. See comment#44.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|FIXED |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread
* [Bug middle-end/38474] slow compilation at -O0 due to expand's temp slot goo
2008-12-10 15:26 [Bug middle-end/38474] New: slow compilation at -O0 (callgraph optimization, inline heuristics, ggc expand ) jv244 at cam dot ac dot uk
` (49 preceding siblings ...)
2010-05-23 20:09 ` rguenth at gcc dot gnu dot org
@ 2010-05-23 21:03 ` steven at gcc dot gnu dot org
50 siblings, 0 replies; 52+ messages in thread
From: steven at gcc dot gnu dot org @ 2010-05-23 21:03 UTC (permalink / raw)
To: gcc-bugs
------- Comment #49 from steven at gcc dot gnu dot org 2010-05-23 21:02 -------
Let's change the bug type at least, from a meta bug to a normal bug.
--
steven at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[Meta] slow compilation at -|slow compilation at -O0 due
|O0 (callgraph optimization, |to expand's temp slot goo
|inline heuristics, expand ) |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474
^ permalink raw reply [flat|nested] 52+ messages in thread