public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
@ 2010-12-12 23:55 ` hubicka at gcc dot gnu.org
2010-12-13 0:58 ` hubicka at gcc dot gnu.org
` (27 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu.org @ 2010-12-12 23:55 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #5 from Jan Hubicka <hubicka at gcc dot gnu.org> 2010-12-12 23:55:24 UTC ---
With -O2 during early optimization we get to 68% in cgraph_check_inline_limits.
This is weird since early inliner should not be terribly sensitive to this. I
guess it is because we end up walking very long edge lists. I will check if I
can do something here.
Honza
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
2010-12-12 23:55 ` [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions hubicka at gcc dot gnu.org
@ 2010-12-13 0:58 ` hubicka at gcc dot gnu.org
2010-12-13 1:00 ` hubicka at gcc dot gnu.org
` (26 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu.org @ 2010-12-13 0:58 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |dnovillo at google dot com
--- Comment #6 from Jan Hubicka <hubicka at gcc dot gnu.org> 2010-12-13 00:58:04 UTC ---
letting compilation to run longer I get:
3068656 60.0462 cc1 cc1
gsi_for_stmt
1211665 23.7093 cc1 cc1
cgraph_check_inline_limits
396594 7.7604 cc1 cc1
gimple_set_bb
29937 0.5858 libc-2.11.1.so libc-2.11.1.so _IO_vfscanf
23409 0.4581 libc-2.11.1.so libc-2.11.1.so _int_malloc
14539 0.2845 cc1 cc1
gimple_split_block
13307 0.2604 libc-2.11.1.so libc-2.11.1.so memset
9532 0.1865 cc1 cc1 htab_delete
8275 0.1619 cc1 cc1
bitmap_set_bit
so gsi_for_stmt nonlinearity kicks in. I guess it is the inliner calling BB
split and that calling gsi_for_stmt. We probably can have gsi_split_bb with
gsi argument too.
I always wondered why Diego did not embed gimple_seq_node_d into gimple
statment (we don't have too many statements that would not be living in basic
blocks so it would not be wasteful and it would result in better locality
walking the lists). In that case gsi_for_stmt would become O(1).
Honza
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
2010-12-12 23:55 ` [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions hubicka at gcc dot gnu.org
2010-12-13 0:58 ` hubicka at gcc dot gnu.org
@ 2010-12-13 1:00 ` hubicka at gcc dot gnu.org
2010-12-13 1:07 ` hubicka at gcc dot gnu.org
` (25 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu.org @ 2010-12-13 1:00 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #7 from Jan Hubicka <hubicka at gcc dot gnu.org> 2010-12-13 00:59:53 UTC ---
... actually split_bb does not use gsi_for_stmt since it has to walk all the
statements in the BB anyway. It seems that it is one of routines updating
callers from cgraph edges.
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2010-12-13 1:00 ` hubicka at gcc dot gnu.org
@ 2010-12-13 1:07 ` hubicka at gcc dot gnu.org
2010-12-13 1:17 ` hubicka at gcc dot gnu.org
` (24 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu.org @ 2010-12-13 1:07 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #8 from Jan Hubicka <hubicka at gcc dot gnu.org> 2010-12-13 01:07:33 UTC ---
My profile was at -O2. Concerning Jakub's callgrind, the -O0 compilation
finishes in about 44s for me. Profile is:
4349 3.8607 libc-2.11.1.so libc-2.11.1.so _int_malloc
3150 2.7963 cc1 cc1
record_reg_classes.constprop.9
2881 2.5575 cc1 cc1
htab_find_slot_with_hash
2104 1.8678 cc1 cc1
ggc_set_mark
2039 1.8101 libc-2.11.1.so libc-2.11.1.so
msort_with_tmp
2005 1.7799 cc1 cc1
bitmap_set_bit
1836 1.6299 cc1 cc1
df_ref_create_structure
1775 1.5757 cc1 cc1
find_reloads
1738 1.5429 cc1 cc1
ggc_internal_alloc_stat
1538 1.3653 libc-2.11.1.so libc-2.11.1.so memset
1430 1.2694 cc1 cc1 eq_node
1375 1.2206 cc1 cc1
preprocess_constraints
1317 1.1691 libc-2.11.1.so libc-2.11.1.so _int_free
1309 1.1620 cc1 cc1
df_insn_refs_collect
1289 1.1443 cc1 cc1
ix86_function_arg_regno_p
1277 1.1336 cc1 cc1
df_ref_record
1249 1.1088 cc1 cc1
ix86_save_reg
1215 1.0786 cc1 cc1
ix86_compute_frame_layout
1171 1.0395 libc-2.11.1.so libc-2.11.1.so
malloc_consolidate
1134 1.0067 cc1 cc1
extract_insn
So I don't get that much of RA by itself. Tracking that malloc ineffeciency
might be low hanging fruit.
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (3 preceding siblings ...)
2010-12-13 1:07 ` hubicka at gcc dot gnu.org
@ 2010-12-13 1:17 ` hubicka at gcc dot gnu.org
2010-12-13 1:47 ` hubicka at gcc dot gnu.org
` (23 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu.org @ 2010-12-13 1:17 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |vmakarov at redhat dot com
--- Comment #9 from Jan Hubicka <hubicka at gcc dot gnu.org> 2010-12-13 01:17:19 UTC ---
And since Richard did not include 4.6 in his -ftime-reports, here is -O0
garbage collection : 1.74 ( 4%) usr 0.00 ( 0%) sys 1.76 ( 3%) wall
0 kB ( 0%) ggc
callgraph construction: 1.07 ( 2%) usr 0.26 ( 5%) sys 1.33 ( 3%) wall
41984 kB ( 3%) ggc
callgraph optimization: 0.62 ( 1%) usr 0.24 ( 5%) sys 0.98 ( 2%) wall
91137 kB ( 6%) ggc
df scan insns : 3.47 ( 8%) usr 0.32 ( 6%) sys 3.72 ( 7%) wall
7168 kB ( 0%) ggc
parser : 1.37 ( 3%) usr 0.41 ( 8%) sys 1.59 ( 3%) wall
202094 kB (13%) ggc
inline heuristics : 1.33 ( 3%) usr 0.01 ( 0%) sys 1.29 ( 3%) wall
0 kB ( 0%) ggc
tree gimplify : 0.70 ( 2%) usr 0.06 ( 1%) sys 0.70 ( 1%) wall
73728 kB ( 5%) ggc
tree CFG construction : 0.21 ( 0%) usr 0.09 ( 2%) sys 0.25 ( 0%) wall
82433 kB ( 5%) ggc
tree operand scan : 0.10 ( 0%) usr 0.10 ( 2%) sys 0.15 ( 0%) wall
69141 kB ( 5%) ggc
expand : 2.32 ( 5%) usr 0.24 ( 5%) sys 2.45 ( 5%) wall
242940 kB (16%) ggc
post expand cleanups : 0.50 ( 1%) usr 0.02 ( 0%) sys 0.49 ( 1%) wall
76289 kB ( 5%) ggc
integrated RA : 10.03 (22%) usr 0.15 ( 3%) sys 10.44 (20%) wall
135680 kB ( 9%) ggc
reload : 4.42 (10%) usr 0.07 ( 1%) sys 4.60 ( 9%) wall
55295 kB ( 4%) ggc
rest of compilation : 3.03 ( 7%) usr 0.82 (16%) sys 3.65 ( 7%) wall
170498 kB (11%) ggc
final : 3.67 ( 8%) usr 0.09 ( 2%) sys 3.65 ( 7%) wall
30144 kB ( 2%) ggc
unaccounted todo : 1.63 ( 4%) usr 0.91 (18%) sys 2.58 ( 5%) wall
0 kB ( 0%) ggc
Wonder what is accounted into rest of compilation.
Otherwise RA remains the compile time bottleneck. Will do memory stats later.
Adding Vladimir to CC for RA.
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (4 preceding siblings ...)
2010-12-13 1:17 ` hubicka at gcc dot gnu.org
@ 2010-12-13 1:47 ` hubicka at gcc dot gnu.org
2010-12-13 13:22 ` hubicka at gcc dot gnu.org
` (22 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu.org @ 2010-12-13 1:47 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #10 from Jan Hubicka <hubicka at gcc dot gnu.org> 2010-12-13 01:46:39 UTC ---
Created attachment 22730
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22730
Fix for inline cost problem
The attached patch fixes the inliner cost problem so we converge at -O1. It is
bit brute force, but I guess it should work well in practice. With the fix -O1
converges in 90 secons.
Profile is similar to one at -O0
14898 6.7090 libc-2.11.1.so libc-2.11.1.so _int_malloc
7981 3.5941 cc1 cc1
bitmap_set_bit
5993 2.6988 libc-2.11.1.so libc-2.11.1.so memset
5063 2.2800 cc1 cc1 htab_delete
4091 1.8423 libc-2.11.1.so libc-2.11.1.so _IO_vfscanf
3849 1.7333 no-vmlinux no-vmlinux /no-vmlinux
3807 1.7144 libc-2.11.1.so libc-2.11.1.so _int_free
3632 1.6356 cc1 cc1
df_note_compute
3469 1.5622 libc-2.11.1.so libc-2.11.1.so
malloc_consolidate
3352 1.5095 libc-2.11.1.so libc-2.11.1.so
msort_with_tmp
2978 1.3411 cc1 cc1
htab_traverse_noresize
2941 1.3244 libc-2.11.1.so libc-2.11.1.so free
2824 1.2717 cc1 cc1
bitmap_clear_bit
2653 1.1947 cc1 cc1
df_ref_create_structure
2429 1.0938 libc-2.11.1.so libc-2.11.1.so malloc
2239 1.0083 cc1 cc1
df_insn_refs_collect
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (5 preceding siblings ...)
2010-12-13 1:47 ` hubicka at gcc dot gnu.org
@ 2010-12-13 13:22 ` hubicka at gcc dot gnu.org
2010-12-17 0:08 ` hubicka at gcc dot gnu.org
` (21 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu.org @ 2010-12-13 13:22 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #11 from Jan Hubicka <hubicka at gcc dot gnu.org> 2010-12-13 13:22:28 UTC ---
Patched compiler at -O2 now shows:
integration : 166.20 (16%) usr 0.19 ( 1%) sys 166.86 (15%) wall
92691 kB ( 4%) ggc
tree CCP : 792.75 (74%) usr 0.15 ( 1%) sys 794.63 (73%) wall
66560 kB ( 3%) ggc
integration is probably the overhead of splitting BBs, I wonder what makes tree
CCP so slow, it is probably worth investigating.
The profile shows:
2009425 73.3643 cc1 cc1
gsi_for_stmt
398058 14.5331 cc1 cc1
gimple_set_bb
22230 0.8116 libc-2.11.1.so libc-2.11.1.so _int_malloc
14339 0.5235 cc1 cc1
gimple_split_block
11727 0.4282 libc-2.11.1.so libc-2.11.1.so memset
10411 0.3801 libc-2.11.1.so libc-2.11.1.so _IO_vfscanf
9095 0.3321 cc1 cc1 htab_delete
6061 0.2213 cc1 cc1
bitmap_set_bit
5990 0.2187 no-vmlinux no-vmlinux /no-vmlinux
5912 0.2158 libc-2.11.1.so libc-2.11.1.so _int_free
5077 0.1854 libc-2.11.1.so libc-2.11.1.so
malloc_consolidate
4516 0.1649 cc1 cc1
htab_find_slot_with_hash
4515 0.1648 opreport opreport
/usr/bin/opreport
4284 0.1564 libc-2.11.1.so libc-2.11.1.so free
4234 0.1546 libc-2.11.1.so libc-2.11.1.so malloc
4106 0.1499 cc1 cc1
htab_traverse_noresize
3737 0.1364 libc-2.11.1.so libc-2.11.1.so calloc
3197 0.1167 cc1 cc1 eq_node
2996 0.1094 cc1 cc1
df_note_compute
2632 0.0961 cc1 cc1
ggc_internal_alloc_stat
2476 0.0904 cc1 cc1
bitmap_bit_p
Other passes are sub 10s each.
Maybe tree-ccp just gets the overhead of merging the large BBs into single?
Honza
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (6 preceding siblings ...)
2010-12-13 13:22 ` hubicka at gcc dot gnu.org
@ 2010-12-17 0:08 ` hubicka at gcc dot gnu.org
2015-03-07 0:22 ` hubicka at gcc dot gnu.org
` (20 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu.org @ 2010-12-17 0:08 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #12 from Jan Hubicka <hubicka at gcc dot gnu.org> 2010-12-17 00:08:07 UTC ---
Author: hubicka
Date: Fri Dec 17 00:08:02 2010
New Revision: 167964
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=167964
Log:
PR middle-end/44563
* ipa-inline.c: Update doplevel comment.
(cgraph_estimate_size_after_inlining): Remove times attribute.
(cgraph_mark_inline_edge): Update.
(cgraph_mark_inline): Remove.
(cgraph_estimate_growth): Update.
(cgraph_check_inline_limits): Remove one only argument.
(cgraph_edge_badness): Update.
(cgraph_decide_recursive_inlining): Update.
(cgraph_decide_inlining_of_small_function): Fix handling of
tree_can_inline_p
and call_stmt_cannot_inline_p.
(cgraph_flatten): Likewise.
(cgraph_decide_inlining): Update.
(cgraph_decide_inlining_incrementally): Fix handling of
call_stmt_cannot_inline_p.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/ipa-inline.c
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (7 preceding siblings ...)
2010-12-17 0:08 ` hubicka at gcc dot gnu.org
@ 2015-03-07 0:22 ` hubicka at gcc dot gnu.org
2015-03-09 9:36 ` rguenth at gcc dot gnu.org
` (19 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-03-07 0:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #14 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Yeah, this is the old problem that after each inline we recompute the size of
the whole function inlined into. This means walking the whole inline tree and
sum size of all non-inlined call sites. If you get very many functions inlined
into single caller, the nonlinearity kicks in. Here the main() function calls
65536 empty functions that takes time to process.
My plan is to turn the sizes/times to sreal and then update them incrementally
(subtract size of the call statement and account changes). With old fixed
point+capping this did not work well and I always ended up with too many of
misaccounting issues.
This is however more intrussive then what I would like to do in stage3. I did
not completed the sreal conversion because sreal class came bit too late in
this development cycle. Have unfinished patch for that but it is >100K and hits
interestin problems like gengtype not understanding sreal.h header.
I will perf it and check for micro-optimization possibilities.
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (8 preceding siblings ...)
2015-03-07 0:22 ` hubicka at gcc dot gnu.org
@ 2015-03-09 9:36 ` rguenth at gcc dot gnu.org
2015-03-09 11:33 ` rguenth at gcc dot gnu.org
` (18 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-09 9:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu.org
--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
Meanwhile finished to compile at -O1:
ipa inlining heuristics : 155.14 (12%) usr 0.37 ( 2%) sys 155.31 (12%) wall
396289 kB (11%) ggc
integration : 958.73 (75%) usr 2.18 (13%) sys 960.89 (74%) wall
86527 kB ( 2%) ggc
tree CFG cleanup : 116.57 ( 9%) usr 0.30 ( 2%) sys 116.77 ( 9%) wall
0 kB ( 0%) ggc
TOTAL :1285.25 17.15 1302.17
3514948 kB
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (9 preceding siblings ...)
2015-03-09 9:36 ` rguenth at gcc dot gnu.org
@ 2015-03-09 11:33 ` rguenth at gcc dot gnu.org
2015-03-09 15:26 ` rguenth at gcc dot gnu.org
` (17 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-09 11:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
callgrind shows the cgraph_edge_hasher quite high in the profile (via
redirect_all_calls). I suppose as the large main is a single BB walking
all stmts over-and-over is quite bad. Also hash_pointer isn't inlined!?
Ah - it's external in libiberty hashtab.c ... - it should transition to
using/inheriting from pointer_hash.
cgraph_edge *
cgraph_node::get_edge (gimple call_stmt)
{
cgraph_edge *e, *e2;
int n = 0;
if (call_site_hash)
return call_site_hash->find_with_hash (call_stmt,
htab_hash_pointer (call_stmt));
The estimate_calls_size_and_time portion is quite smaller.
cleanup-cfgs main portion is split_bb_on_noreturn_calls.
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (10 preceding siblings ...)
2015-03-09 11:33 ` rguenth at gcc dot gnu.org
@ 2015-03-09 15:26 ` rguenth at gcc dot gnu.org
2015-03-09 15:36 ` rguenth at gcc dot gnu.org
` (16 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-09 15:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #16)
> callgrind shows the cgraph_edge_hasher quite high in the profile (via
> redirect_all_calls). I suppose as the large main is a single BB walking
> all stmts over-and-over is quite bad. Also hash_pointer isn't inlined!?
> Ah - it's external in libiberty hashtab.c ... - it should transition to
> using/inheriting from pointer_hash.
>
> cgraph_edge *
> cgraph_node::get_edge (gimple call_stmt)
> {
> cgraph_edge *e, *e2;
> int n = 0;
>
> if (call_site_hash)
> return call_site_hash->find_with_hash (call_stmt,
> htab_hash_pointer (call_stmt));
>
Btw, for 10000 calls (smaller testcase) we get 100 000 000 calls to
cgraph_edge::redirect_call_stmt_to_callee () (that's from 40000
redirect_all_calls calls which is from 10000 optimize_inline_calls calls).
Ah - we do this also for the ENTRY/EXIT block!
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c (revision 221278)
+++ gcc/tree-inline.c (working copy)
@@ -2802,11 +2802,13 @@ copy_cfg_body (copy_body_data * id, gcov
if (need_debug_cleanup
&& bb->index != ENTRY_BLOCK
&& bb->index != EXIT_BLOCK)
- maybe_move_debug_stmts_to_successors (id, (basic_block) bb->aux);
- /* Update call edge destinations. This can not be done before loop
- info is updated, because we may split basic blocks. */
- if (id->transform_call_graph_edges == CB_CGE_DUPLICATE)
- redirect_all_calls (id, (basic_block)bb->aux);
+ {
+ maybe_move_debug_stmts_to_successors (id, (basic_block) bb->aux);
+ /* Update call edge destinations. This can not be done before loop
+ info is updated, because we may split basic blocks. */
+ if (id->transform_call_graph_edges == CB_CGE_DUPLICATE)
+ redirect_all_calls (id, (basic_block)bb->aux);
+ }
((basic_block)bb->aux)->aux = NULL;
bb->aux = NULL;
}
makes sense?
> The estimate_calls_size_and_time portion is quite smaller.
>
> cleanup-cfgs main portion is split_bb_on_noreturn_calls.
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (11 preceding siblings ...)
2015-03-09 15:26 ` rguenth at gcc dot gnu.org
@ 2015-03-09 15:36 ` rguenth at gcc dot gnu.org
2015-03-10 4:55 ` hubicka at ucw dot cz
` (15 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-09 15:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #18 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #17)
> (In reply to Richard Biener from comment #16)
> > callgrind shows the cgraph_edge_hasher quite high in the profile (via
> > redirect_all_calls). I suppose as the large main is a single BB walking
> > all stmts over-and-over is quite bad. Also hash_pointer isn't inlined!?
> > Ah - it's external in libiberty hashtab.c ... - it should transition to
> > using/inheriting from pointer_hash.
> >
> > cgraph_edge *
> > cgraph_node::get_edge (gimple call_stmt)
> > {
> > cgraph_edge *e, *e2;
> > int n = 0;
> >
> > if (call_site_hash)
> > return call_site_hash->find_with_hash (call_stmt,
> > htab_hash_pointer (call_stmt));
> >
>
> Btw, for 10000 calls (smaller testcase) we get 100 000 000 calls to
> cgraph_edge::redirect_call_stmt_to_callee () (that's from 40000
> redirect_all_calls calls which is from 10000 optimize_inline_calls calls).
>
> Ah - we do this also for the ENTRY/EXIT block!
>
> Index: gcc/tree-inline.c
> ===================================================================
> --- gcc/tree-inline.c (revision 221278)
> +++ gcc/tree-inline.c (working copy)
> @@ -2802,11 +2802,13 @@ copy_cfg_body (copy_body_data * id, gcov
> if (need_debug_cleanup
> && bb->index != ENTRY_BLOCK
> && bb->index != EXIT_BLOCK)
> - maybe_move_debug_stmts_to_successors (id, (basic_block) bb->aux);
> - /* Update call edge destinations. This can not be done before loop
> - info is updated, because we may split basic blocks. */
> - if (id->transform_call_graph_edges == CB_CGE_DUPLICATE)
> - redirect_all_calls (id, (basic_block)bb->aux);
> + {
> + maybe_move_debug_stmts_to_successors (id, (basic_block) bb->aux);
> + /* Update call edge destinations. This can not be done before
> loop
> + info is updated, because we may split basic blocks. */
> + if (id->transform_call_graph_edges == CB_CGE_DUPLICATE)
> + redirect_all_calls (id, (basic_block)bb->aux);
> + }
> ((basic_block)bb->aux)->aux = NULL;
> bb->aux = NULL;
> }
>
> makes sense?
Fails to bootstrap :/ But would improve the testcase to only have the
inline heuristic issue.
/space/rguenther/src/svn/trunk/libstdc++-v3/libsupc++/pbase_type_info.cc: In
member function ‘virtual bool __cxxabiv1::__pbase_type_info::__do_catch(const
std::type_info*, void**, unsigned int) const’:
/space/rguenther/src/svn/trunk/libstdc++-v3/libsupc++/pbase_type_info.cc:32:6:
error: reference to dead statement
bool __pbase_type_info::
^
# .MEM = VDEF <.MEM>
_30 = OBJ_TYPE_REF(_28;(const struct __pbase_type_info)this_3(D)->6)
(this_3(D), thr_type_5(D), thr_obj_9(D), outer_29);
_ZNK10__cxxabiv117__pbase_type_info10__do_catchEPKSt9type_infoPPvj/74 (virtual
bool __cxxabiv1::__pbase_type_info::__do_catch(const std::type_info*, void**,
unsigned int) const) @0x2aaaac8d3ab8
Type: function definition analyzed
Visibility: externally_visible public visibility_specified virtual
Address is taken.
References: _ZNK10__cxxabiv117__pbase_type_info15__pointer_catchEPKS0_PPvj/34
(addr) (speculative)
Referring: _ZTVN10__cxxabiv117__pbase_type_infoE/77 (addr)
Availability: overwritable
First run: 0
Function flags: body
Called by:
Calls: strcmp/85 (0.39 per call) __cxa_bad_typeid/83 (can throw external)
strcmp/85 (0.61 per call)
Indirect call(0.11 per call) (can throw external)
Polymorphic indirect call of type const struct __pbase_type_info
token:6(speculative) (0.03 per call) (can throw external) of param:0
Outer type (dynamic):struct __pbase_type_info (or a derived type) offset 0
/space/rguenther/src/svn/trunk/libstdc++-v3/libsupc++/pbase_type_info.cc:32:6:
internal compiler error: verify_cgraph_node failed
0xa8eebc cgraph_node::verify_node()
/space/rguenther/src/svn/trunk/gcc/cgraph.c:3115
0xa8473f symtab_node::verify()
/space/rguenther/src/svn/trunk/gcc/symtab.c:1103
0x1025861 optimize_inline_calls(tree_node*)
/space/rguenther/src/svn/trunk/gcc/tree-inline.c:4938
> > The estimate_calls_size_and_time portion is quite smaller.
> >
> > cleanup-cfgs main portion is split_bb_on_noreturn_calls.
>From gcc-bugs-return-479823-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Mon Mar 09 15:55:22 2015
Return-Path: <gcc-bugs-return-479823-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 130953 invoked by alias); 9 Mar 2015 15:55:22 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 130855 invoked by uid 48); 9 Mar 2015 15:55:16 -0000
From: "derodat at adacore dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug debug/53927] wrong value for DW_AT_static_link
Date: Mon, 09 Mar 2015 15:55:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: debug
X-Bugzilla-Version: 4.6.3
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: derodat at adacore dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-53927-4-sGp0FNDm4B@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-53927-4@http.gcc.gnu.org/bugzilla/>
References: <bug-53927-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-03/txt/msg00967.txt.bz2
Content-length: 3119
https://gcc.gnu.org/bugzilla/show_bug.cgi?idS927
--- Comment #21 from Pierre-Marie de Rodat <derodat at adacore dot com> ---
(In reply to Eric Botcazou from comment #18)
> I think this is worth investigating though because it's conceptually
> much simpler than adding yet another indirection. And we should
> concentrate on -O0 (and -Og), we don't really care about what happens
> with aggressive optimization.
Understood and agreed. Nevertheless...
> I guess the question is: can we arrange to have a constant offset
> between the frame base and the FRAME object, "constant" meaning valid
> for every function but possibly target-dependent?
I started to hack into cfgexpand.c and dwarf2out.c, but I realized this
is not possible in the general case. Consider the following example:
#include <stdlib.h>
int
nestee (void)
{
int a __attribute__((aligned(64))) = 1234;
void
nested (int b)
{
a = b;
}
nested (2345);
return a;
}
int
call_nestee (int n)
{
int *v = alloca (sizeof (int) * n);
v[0] = nestee ();
return v[0];
}
int
main (void)
{
call_nestee (1);
call_nestee (8);
return 0;
}
With a GCC 5.0 built from fairly recent sources, I get the following CFA
information:
00000090 000000000000002c 00000064 FDE cie\0000030
pc\0000000004004ac..00000000004004eb
DW_CFA_advance_loc: 5 to 00000000004004b1
DW_CFA_def_cfa: r10 (r10) ofs 0
DW_CFA_advance_loc: 9 to 00000000004004ba
DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
DW_CFA_advance_loc: 5 to 00000000004004bf
DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -8; DW_OP_deref)
DW_CFA_advance_loc: 38 to 00000000004004e5
And now here is what I get under GDB:
$ gdb -n -q -ex 'b nestee' ./dyn_frame
Reading symbols from ./dyn_frame...done.
Breakpoint 1 at 0x4004c3: file dyn_frame.c, line 6.
(gdb) r
[...]
Breakpoint 1, nestee () at dyn_frame.c:6
6 int a __attribute__((aligned(64))) = 1234;
(gdb) p $pc
$1 = (void (*)()) 0x4004c3 <nestee+23>
(gdb) x/1xg $rbp - 8
0x7fffffffdf28: 0x00007fffffffdf60
(gdb) p/x (char *) 0x00007fffffffdf60 - (char *) &a
$2 = 0xa0
... so for this frame, the CFA and the FRAME object are 0xa0 bytes from
each other. Now let's resume to see the next "nestee" frame:
(gdb) c
Continuing.
Breakpoint 1, nestee () at dyn_frame.c:6
6 int a __attribute__((aligned(64))) = 1234;
(gdb) p $pc
$3 = (void (*)()) 0x4004c3 <nestee+23>
(gdb) x/1xg $rbp - 8
0x7fffffffdf28: 0x00007fffffffdf50
(gdb) p/x (char *) 0x00007fffffffdf50 - (char *) &a
$4 = 0x90
The offset between the CFA and e FRAME object is now 0x90 bytes. So
because of alignment constraints, I think we cannot assume we can have a
constant offset (even function-dependent).
This offset is dynamic and the only way to compute it is to use the
frame's context: here, nestee's saved registers, which we don't have
access to in DWARF when computing the static link attribute.
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (12 preceding siblings ...)
2015-03-09 15:36 ` rguenth at gcc dot gnu.org
@ 2015-03-10 4:55 ` hubicka at ucw dot cz
2015-03-10 8:26 ` rguenth at gcc dot gnu.org
` (14 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: hubicka at ucw dot cz @ 2015-03-10 4:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #19 from Jan Hubicka <hubicka at ucw dot cz> ---
Hmm, it is definitely wasteful to call the call stmt redirection so many times
- it only needs
to be called once per statement. We probably could call it only on newly
introduced BBs, I will
take a look.
Honza
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (13 preceding siblings ...)
2015-03-10 4:55 ` hubicka at ucw dot cz
@ 2015-03-10 8:26 ` rguenth at gcc dot gnu.org
2015-03-10 8:35 ` rguenther at suse dot de
` (13 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-10 8:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Tue Mar 10 08:25:31 2015
New Revision: 221308
URL: https://gcc.gnu.org/viewcvs?rev=221308&root=gcc&view=rev
Log:
2015-03-10 Richard Biener <rguenther@suse.de>
PR middle-end/44563
* cgraph.h (struct cgraph_edge_hasher): Add hash overload
for compare_type.
* cgraph.c (cgraph_edge_hasher::hash): Inline htab_hash_pointer.
(cgraph_update_edge_in_call_site_hash): Use cgraph_edge_hasher::hash.
(cgraph_add_edge_to_call_site_hash): Likewise.
(cgraph_node::get_edge): Likewise.
(cgraph_edge::set_call_stmt): Likewise.
(cgraph_edge::remove_caller): Likewise.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/cgraph.c
trunk/gcc/cgraph.h
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (14 preceding siblings ...)
2015-03-10 8:26 ` rguenth at gcc dot gnu.org
@ 2015-03-10 8:35 ` rguenther at suse dot de
2015-03-10 11:54 ` rguenth at gcc dot gnu.org
` (12 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: rguenther at suse dot de @ 2015-03-10 8:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #21 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 10 Mar 2015, hubicka at ucw dot cz wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
>
> --- Comment #19 from Jan Hubicka <hubicka at ucw dot cz> ---
> Hmm, it is definitely wasteful to call the call stmt redirection so many times
> - it only needs
> to be called once per statement. We probably could call it only on newly
> introduced BBs, I will
> take a look.
Ah - stupid error on my part producing the "obvious" patch (and not
seeing the bogus need_debug_cleanup guard...)
Testing the proper patch.
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (15 preceding siblings ...)
2015-03-10 8:35 ` rguenther at suse dot de
@ 2015-03-10 11:54 ` rguenth at gcc dot gnu.org
2015-03-10 12:03 ` rguenth at gcc dot gnu.org
` (11 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-10 11:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #23 from Richard Biener <rguenth at gcc dot gnu.org> ---
Funnily apart from the IPA inline summary updating issue the next important
time-hog is basic-block splitting we do for inlining a call. This is because
split_block moves the tail of the block to a new one (and we process inlines
from BB head to BB start). And thus we hit gimple_set_bb () quite hard
(quadratic in the number of calls in main() which is composed of a single BB).
So in theory it's better to work backwards from the BB - but that doesn't play
well with catching all BBs in gimple_expand_calls_inline and its caller.
We are talking about 8% of compile-time spent in gimple_set_bb here (according
to callgrind).
It's bad that splitting blocks is O(n) (but that stmt -> bb pointer saves us
in other places).
If we'd know that we perform multiple inlinings in a block we could use a
special "split block" function that splits the block in multiple places
at once, avoiding the quadraticness seen here. Basically first split
all calls that we want to inline to separate blocks and then do the actual
inlining run.
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (16 preceding siblings ...)
2015-03-10 11:54 ` rguenth at gcc dot gnu.org
@ 2015-03-10 12:03 ` rguenth at gcc dot gnu.org
2015-03-10 12:31 ` jakub at gcc dot gnu.org
` (10 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-10 12:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #24 from Richard Biener <rguenth at gcc dot gnu.org> ---
So in gimple_expand_calls_inline we could look only at BBs last stmt for the
actual inlining but for the rest just do the basic-block splitting. And then
perform that walk backwards. This should remove the quadraticness that arises
when expanding many calls inline from a single basic-block.
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (17 preceding siblings ...)
2015-03-10 12:03 ` rguenth at gcc dot gnu.org
@ 2015-03-10 12:31 ` jakub at gcc dot gnu.org
2015-03-10 12:44 ` rguenth at gcc dot gnu.org
` (9 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-03-10 12:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #25 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Or perhaps add split_block variant that uses the old bb for the second part
rather than the first one, and use it in the inliner?
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (18 preceding siblings ...)
2015-03-10 12:31 ` jakub at gcc dot gnu.org
@ 2015-03-10 12:44 ` rguenth at gcc dot gnu.org
2015-03-10 12:51 ` rguenther at suse dot de
` (8 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-10 12:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #26 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Tue Mar 10 12:44:01 2015
New Revision: 221321
URL: https://gcc.gnu.org/viewcvs?rev=221321&root=gcc&view=rev
Log:
2015-03-09 Richard Biener <rguenther@suse.de>
PR middle-end/44563
* tree-inline.c (copy_cfg_body): Skip block mapped to entry/exit
for redirect_all_calls.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-inline.c
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (19 preceding siblings ...)
2015-03-10 12:44 ` rguenth at gcc dot gnu.org
@ 2015-03-10 12:51 ` rguenther at suse dot de
2015-03-10 13:19 ` rguenth at gcc dot gnu.org
` (7 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: rguenther at suse dot de @ 2015-03-10 12:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #27 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 10 Mar 2015, jakub at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
>
> Jakub Jelinek <jakub at gcc dot gnu.org> changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> CC| |jakub at gcc dot gnu.org
>
> --- Comment #25 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> Or perhaps add split_block variant that uses the old bb for the second part
> rather than the first one, and use it in the inliner?
Seems like
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c (revision 221317)
+++ gcc/tree-inline.c (working copy)
@@ -4777,18 +4781,19 @@ static bool
gimple_expand_calls_inline (basic_block bb, copy_body_data *id)
{
gimple_stmt_iterator gsi;
+ bool inlined = false;
- for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+ for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi);)
{
gimple stmt = gsi_stmt (gsi);
+ gsi_prev (&gsi);
if (is_gimple_call (stmt)
- && !gimple_call_internal_p (stmt)
- && expand_call_inline (bb, stmt, id))
- return true;
+ && !gimple_call_internal_p (stmt))
+ inlined |= expand_call_inline (bb, stmt, id);
}
- return false;
+ return inlined;
}
fixes the issue as well as gsi stays valid over inline expansion if
we advance it before that.
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (20 preceding siblings ...)
2015-03-10 12:51 ` rguenther at suse dot de
@ 2015-03-10 13:19 ` rguenth at gcc dot gnu.org
2015-03-12 15:09 ` rguenth at gcc dot gnu.org
` (6 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-10 13:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #28 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #27)
> On Tue, 10 Mar 2015, jakub at gcc dot gnu.org wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
> >
> > Jakub Jelinek <jakub at gcc dot gnu.org> changed:
> >
> > What |Removed |Added
> > ----------------------------------------------------------------------------
> > CC| |jakub at gcc dot gnu.org
> >
> > --- Comment #25 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> > Or perhaps add split_block variant that uses the old bb for the second part
> > rather than the first one, and use it in the inliner?
>
> Seems like
>
> Index: gcc/tree-inline.c
> ===================================================================
> --- gcc/tree-inline.c (revision 221317)
> +++ gcc/tree-inline.c (working copy)
> @@ -4777,18 +4781,19 @@ static bool
> gimple_expand_calls_inline (basic_block bb, copy_body_data *id)
> {
> gimple_stmt_iterator gsi;
> + bool inlined = false;
>
> - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> + for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi);)
> {
> gimple stmt = gsi_stmt (gsi);
> + gsi_prev (&gsi);
>
> if (is_gimple_call (stmt)
> - && !gimple_call_internal_p (stmt)
> - && expand_call_inline (bb, stmt, id))
> - return true;
> + && !gimple_call_internal_p (stmt))
> + inlined |= expand_call_inline (bb, stmt, id);
> }
>
> - return false;
> + return inlined;
> }
>
>
> fixes the issue as well as gsi stays valid over inline expansion if
> we advance it before that.
Funnily this makes us hit merge_blocks now via cleanup-tree-cfg
walking BBs in block-number order (and the inliner now allocating
blocks in a less "optimal" order...). This:
676 n = last_basic_block_for_fn (cfun);
677 for (i = NUM_FIXED_BLOCKS; i < n; i++)
678 {
679 bb = BASIC_BLOCK_FOR_FN (cfun, i);
680 if (bb)
681 retval |= cleanup_tree_cfg_bb (bb);
682 }
should work (for merging blocks) from entry to exit but after
new block assignment order now effectively works backwards :/
So the above doesn't really fix the issue but just shifts it elsewhere.
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (21 preceding siblings ...)
2015-03-10 13:19 ` rguenth at gcc dot gnu.org
@ 2015-03-12 15:09 ` rguenth at gcc dot gnu.org
2015-03-13 8:43 ` [Bug ipa/44563] " rguenth at gcc dot gnu.org
` (5 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-12 15:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #29 from Richard Biener <rguenth at gcc dot gnu.org> ---
Sth like
@@ -672,8 +650,18 @@ cleanup_tree_cfg_bb (basic_block bb)
if (single_succ_p (bb)
&& can_merge_blocks_p (bb, single_succ (bb)))
{
- merge_blocks (bb, single_succ (bb));
- return true;
+ /* If there is a merge opportunity with the predecessor
+ do nothing now but wait until we process the predecessor.
+ This happens when we visit BBs in a non-optimal order and
+ avoids quadratic behavior with adjusting stmts BB pointer. */
+ if (single_pred_p (bb)
+ && can_merge_blocks_p (single_pred (bb), bb))
+ ;
+ else
+ {
+ merge_blocks (bb, single_succ (bb));
+ return true;
+ }
}
return retval;
in addition should do the job. Iteration on the predecessor should
cover both merges (so we don't actually need to revisit this block itself).
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (22 preceding siblings ...)
2015-03-12 15:09 ` rguenth at gcc dot gnu.org
@ 2015-03-13 8:43 ` rguenth at gcc dot gnu.org
2015-03-13 8:47 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-13 8:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|tree-optimization |ipa
Known to fail| |5.0
--- Comment #30 from Richard Biener <rguenth at gcc dot gnu.org> ---
With all the patches I have for now we end up with a pure IPA issue:
phase opt and generate : 193.97 (99%) usr 13.82 (93%) sys 207.75 (99%) wall
3311016 kB (94%) ggc
ipa inlining heuristics : 140.48 (72%) usr 0.44 ( 3%) sys 141.13 (67%) wall
396289 kB (11%) ggc
dominance computation : 2.99 ( 2%) usr 1.00 ( 7%) sys 3.89 ( 2%) wall
0 kB ( 0%) ggc
integrated RA : 4.05 ( 2%) usr 0.85 ( 6%) sys 5.26 ( 3%) wall
1577496 kB (45%) ggc
rest of compilation : 6.53 ( 3%) usr 1.67 (11%) sys 7.91 ( 4%) wall
155664 kB ( 4%) ggc
unaccounted todo : 3.82 ( 2%) usr 1.07 ( 7%) sys 4.98 ( 2%) wall
0 kB ( 0%) ggc
TOTAL : 195.46 14.79 210.23
3514948 kB
everything <= 1% dropped. I wonder what that unaccounted todo is ;)
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (23 preceding siblings ...)
2015-03-13 8:43 ` [Bug ipa/44563] " rguenth at gcc dot gnu.org
@ 2015-03-13 8:47 ` rguenth at gcc dot gnu.org
2015-03-13 8:53 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-13 8:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #31 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Fri Mar 13 08:47:14 2015
New Revision: 221409
URL: https://gcc.gnu.org/viewcvs?rev=221409&root=gcc&view=rev
Log:
2015-03-10 Richard Biener <rguenther@suse.de>
PR middle-end/44563
* tree-cfgcleanup.c (split_bb_on_noreturn_calls): Remove.
(cleanup_tree_cfg_1): Do not call it.
(execute_cleanup_cfg_post_optimizing): Fixup the CFG here.
(fixup_noreturn_call): Mark the stmt as control altering.
* tree-cfg.c (execute_fixup_cfg): Do not dump the function
here.
(pass_data_fixup_cfg): Produce a dump file.
* tree-ssa-dom.c: Include tree-cfgcleanup.h.
(need_noreturn_fixup): New global.
(pass_dominator::execute): Fixup queued noreturn calls.
(optimize_stmt): Queue calls that became noreturn for fixup.
* tree-ssa-forwprop.c (pass_forwprop::execute): Likewise.
* tree-ssa-pre.c: Include tree-cfgcleanup.h.
(el_to_fixup): New global.
(eliminate_dom_walker::before_dom_childre): Queue calls that
became noreturn for fixup.
(eliminate): Fixup queued noreturn calls.
* tree-ssa-propagate.c: Include tree-cfgcleanup.h.
(substitute_and_fold_dom_walker): New member stmts_to_fixup.
(substitute_and_fold_dom_walker::before_dom_children): Queue
alls that became noreturn for fixup.
(substitute_and_fold): Fixup queued noreturn calls.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-cfg.c
trunk/gcc/tree-cfgcleanup.c
trunk/gcc/tree-ssa-dom.c
trunk/gcc/tree-ssa-forwprop.c
trunk/gcc/tree-ssa-pre.c
trunk/gcc/tree-ssa-propagate.c
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (24 preceding siblings ...)
2015-03-13 8:47 ` rguenth at gcc dot gnu.org
@ 2015-03-13 8:53 ` rguenth at gcc dot gnu.org
2015-03-13 8:55 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-13 8:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #32 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Fri Mar 13 08:52:51 2015
New Revision: 221410
URL: https://gcc.gnu.org/viewcvs?rev=221410&root=gcc&view=rev
Log:
2015-03-12 Richard Biener <rguenther@suse.de>
PR middle-end/44563
* tree-inline.c (gimple_expand_calls_inline): Walk BB backwards
to avoid quadratic behavior with inline expansion splitting blocks.
* tree-cfgcleanup.c (cleanup_tree_cfg_bb): Do not merge block
with the successor if the predecessor will be merged with it.
* tree-cfg.c (gimple_can_merge_blocks_p): We can't merge the
entry block with its successor.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-cfg.c
trunk/gcc/tree-cfgcleanup.c
trunk/gcc/tree-inline.c
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (25 preceding siblings ...)
2015-03-13 8:53 ` rguenth at gcc dot gnu.org
@ 2015-03-13 8:55 ` rguenth at gcc dot gnu.org
2015-03-16 0:07 ` hubicka at ucw dot cz
2024-02-16 13:52 ` rguenth at gcc dot gnu.org
28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-13 8:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot gnu.org
--- Comment #33 from Richard Biener <rguenth at gcc dot gnu.org> ---
Assigning to Honza - I wonder if there is any low-hanging fruit to improve
things for GCC 5 still.
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (26 preceding siblings ...)
2015-03-13 8:55 ` rguenth at gcc dot gnu.org
@ 2015-03-16 0:07 ` hubicka at ucw dot cz
2024-02-16 13:52 ` rguenth at gcc dot gnu.org
28 siblings, 0 replies; 29+ messages in thread
From: hubicka at ucw dot cz @ 2015-03-16 0:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #34 from Jan Hubicka <hubicka at ucw dot cz> ---
The problem is (as described earlier) the fact htat we sum size of all call
statmts
in function after every inline decision.
Most of time is spent in calling estimate_edge_size_and_time:
79.95% cc1 cc1 [.]
_ZL28estimate_calls_size_and_timeP11cgraph_nodePiS1_S1_S1_j3vecIP9tree_node7va_heap6vl_ptrES2_I28ipa_polymorphic_call_contextS5_S6_ES2_IP21ipa_agg
2.21% cc1 libc-2.13.so [.] _int_malloc
0.59% cc1 libc-2.13.so [.] _int_free
Updating summaries incrementally will solve it but at the moment do not see any
really simple change for GCC-5 (i looked at this code couple times already
because of this PR)
Honza
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
` (27 preceding siblings ...)
2015-03-16 0:07 ` hubicka at ucw dot cz
@ 2024-02-16 13:52 ` rguenth at gcc dot gnu.org
28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-16 13:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed|2010-06-17 10:36:48 |2024-2-16
--- Comment #40 from Richard Biener <rguenth at gcc dot gnu.org> ---
Reconfirmed. On for GCC 14 we use about 2GB of ram on x86_64 with -O0 and 20s.
With -O1 that regresses to 60s and a little less peak memory.
callgraph ipa passes : 14.18 ( 23%)
tree PTA : 16.43 ( 27%)
And -O2 memory usage improves further at about the same compile-time.
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2024-02-16 13:52 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
2010-12-12 23:55 ` [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions hubicka at gcc dot gnu.org
2010-12-13 0:58 ` hubicka at gcc dot gnu.org
2010-12-13 1:00 ` hubicka at gcc dot gnu.org
2010-12-13 1:07 ` hubicka at gcc dot gnu.org
2010-12-13 1:17 ` hubicka at gcc dot gnu.org
2010-12-13 1:47 ` hubicka at gcc dot gnu.org
2010-12-13 13:22 ` hubicka at gcc dot gnu.org
2010-12-17 0:08 ` hubicka at gcc dot gnu.org
2015-03-07 0:22 ` hubicka at gcc dot gnu.org
2015-03-09 9:36 ` rguenth at gcc dot gnu.org
2015-03-09 11:33 ` rguenth at gcc dot gnu.org
2015-03-09 15:26 ` rguenth at gcc dot gnu.org
2015-03-09 15:36 ` rguenth at gcc dot gnu.org
2015-03-10 4:55 ` hubicka at ucw dot cz
2015-03-10 8:26 ` rguenth at gcc dot gnu.org
2015-03-10 8:35 ` rguenther at suse dot de
2015-03-10 11:54 ` rguenth at gcc dot gnu.org
2015-03-10 12:03 ` rguenth at gcc dot gnu.org
2015-03-10 12:31 ` jakub at gcc dot gnu.org
2015-03-10 12:44 ` rguenth at gcc dot gnu.org
2015-03-10 12:51 ` rguenther at suse dot de
2015-03-10 13:19 ` rguenth at gcc dot gnu.org
2015-03-12 15:09 ` rguenth at gcc dot gnu.org
2015-03-13 8:43 ` [Bug ipa/44563] " rguenth at gcc dot gnu.org
2015-03-13 8:47 ` rguenth at gcc dot gnu.org
2015-03-13 8:53 ` rguenth at gcc dot gnu.org
2015-03-13 8:55 ` rguenth at gcc dot gnu.org
2015-03-16 0:07 ` hubicka at ucw dot cz
2024-02-16 13:52 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).