[Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
@ 2010-12-12 23:55 ` hubicka at gcc dot gnu.org
  2010-12-13  0:58 ` hubicka at gcc dot gnu.org
                   ` (27 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu.org @ 2010-12-12 23:55 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #5 from Jan Hubicka <hubicka at gcc dot gnu.org> 2010-12-12 23:55:24 UTC ---
With -O2 during early optimization we get to 68% in cgraph_check_inline_limits.
This is weird since early inliner should not be terribly sensitive to this.  I
guess it is because we end up walking very long edge lists. I will check if I
can do something here.

Honza


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
  2010-12-12 23:55 ` [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions hubicka at gcc dot gnu.org
@ 2010-12-13  0:58 ` hubicka at gcc dot gnu.org
  2010-12-13  1:00 ` hubicka at gcc dot gnu.org
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu.org @ 2010-12-13  0:58 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dnovillo at google dot com

--- Comment #6 from Jan Hubicka <hubicka at gcc dot gnu.org> 2010-12-13 00:58:04 UTC ---
letting compilation to run longer I get:
3068656  60.0462  cc1                      cc1                     
gsi_for_stmt
1211665  23.7093  cc1                      cc1                     
cgraph_check_inline_limits
396594    7.7604  cc1                      cc1                     
gimple_set_bb
29937     0.5858  libc-2.11.1.so           libc-2.11.1.so           _IO_vfscanf
23409     0.4581  libc-2.11.1.so           libc-2.11.1.so           _int_malloc
14539     0.2845  cc1                      cc1                     
gimple_split_block
13307     0.2604  libc-2.11.1.so           libc-2.11.1.so           memset
9532      0.1865  cc1                      cc1                      htab_delete
8275      0.1619  cc1                      cc1                     
bitmap_set_bit

so gsi_for_stmt nonlinearity kicks in. I guess it is the inliner calling BB
split and that calling gsi_for_stmt.  We probably can have gsi_split_bb with
gsi argument too.

I always wondered why Diego did not embed gimple_seq_node_d into gimple
statment (we don't have too many statements that would not be living in basic
blocks so it would not be wasteful and it would result in better locality
walking the lists). In that case gsi_for_stmt would become O(1).

Honza


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
  2010-12-12 23:55 ` [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions hubicka at gcc dot gnu.org
  2010-12-13  0:58 ` hubicka at gcc dot gnu.org
@ 2010-12-13  1:00 ` hubicka at gcc dot gnu.org
  2010-12-13  1:07 ` hubicka at gcc dot gnu.org
                   ` (25 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu.org @ 2010-12-13  1:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #7 from Jan Hubicka <hubicka at gcc dot gnu.org> 2010-12-13 00:59:53 UTC ---
... actually split_bb does not use gsi_for_stmt since it has to walk all the
statements in the BB anyway.  It seems that it is one of routines updating
callers from cgraph edges.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2010-12-13  1:00 ` hubicka at gcc dot gnu.org
@ 2010-12-13  1:07 ` hubicka at gcc dot gnu.org
  2010-12-13  1:17 ` hubicka at gcc dot gnu.org
                   ` (24 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu.org @ 2010-12-13  1:07 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #8 from Jan Hubicka <hubicka at gcc dot gnu.org> 2010-12-13 01:07:33 UTC ---
My profile was at -O2.  Concerning Jakub's callgrind, the -O0 compilation
finishes in about 44s for me. Profile is:
4349      3.8607  libc-2.11.1.so           libc-2.11.1.so           _int_malloc
3150      2.7963  cc1                      cc1                     
record_reg_classes.constprop.9
2881      2.5575  cc1                      cc1                     
htab_find_slot_with_hash
2104      1.8678  cc1                      cc1                     
ggc_set_mark
2039      1.8101  libc-2.11.1.so           libc-2.11.1.so          
msort_with_tmp
2005      1.7799  cc1                      cc1                     
bitmap_set_bit
1836      1.6299  cc1                      cc1                     
df_ref_create_structure
1775      1.5757  cc1                      cc1                     
find_reloads
1738      1.5429  cc1                      cc1                     
ggc_internal_alloc_stat
1538      1.3653  libc-2.11.1.so           libc-2.11.1.so           memset
1430      1.2694  cc1                      cc1                      eq_node
1375      1.2206  cc1                      cc1                     
preprocess_constraints
1317      1.1691  libc-2.11.1.so           libc-2.11.1.so           _int_free
1309      1.1620  cc1                      cc1                     
df_insn_refs_collect
1289      1.1443  cc1                      cc1                     
ix86_function_arg_regno_p
1277      1.1336  cc1                      cc1                     
df_ref_record
1249      1.1088  cc1                      cc1                     
ix86_save_reg
1215      1.0786  cc1                      cc1                     
ix86_compute_frame_layout
1171      1.0395  libc-2.11.1.so           libc-2.11.1.so          
malloc_consolidate
1134      1.0067  cc1                      cc1                     
extract_insn

So I don't get that much of RA by itself.  Tracking that malloc ineffeciency
might be low hanging fruit.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2010-12-13  1:07 ` hubicka at gcc dot gnu.org
@ 2010-12-13  1:17 ` hubicka at gcc dot gnu.org
  2010-12-13  1:47 ` hubicka at gcc dot gnu.org
                   ` (23 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu.org @ 2010-12-13  1:17 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at redhat dot com

--- Comment #9 from Jan Hubicka <hubicka at gcc dot gnu.org> 2010-12-13 01:17:19 UTC ---
And since Richard did not include 4.6 in his -ftime-reports, here is -O0
 garbage collection    :   1.74 ( 4%) usr   0.00 ( 0%) sys   1.76 ( 3%) wall   
   0 kB ( 0%) ggc
 callgraph construction:   1.07 ( 2%) usr   0.26 ( 5%) sys   1.33 ( 3%) wall  
41984 kB ( 3%) ggc
 callgraph optimization:   0.62 ( 1%) usr   0.24 ( 5%) sys   0.98 ( 2%) wall  
91137 kB ( 6%) ggc
 df scan insns         :   3.47 ( 8%) usr   0.32 ( 6%) sys   3.72 ( 7%) wall   
7168 kB ( 0%) ggc
 parser                :   1.37 ( 3%) usr   0.41 ( 8%) sys   1.59 ( 3%) wall 
202094 kB (13%) ggc
 inline heuristics     :   1.33 ( 3%) usr   0.01 ( 0%) sys   1.29 ( 3%) wall   
   0 kB ( 0%) ggc
 tree gimplify         :   0.70 ( 2%) usr   0.06 ( 1%) sys   0.70 ( 1%) wall  
73728 kB ( 5%) ggc
 tree CFG construction :   0.21 ( 0%) usr   0.09 ( 2%) sys   0.25 ( 0%) wall  
82433 kB ( 5%) ggc
 tree operand scan     :   0.10 ( 0%) usr   0.10 ( 2%) sys   0.15 ( 0%) wall  
69141 kB ( 5%) ggc
 expand                :   2.32 ( 5%) usr   0.24 ( 5%) sys   2.45 ( 5%) wall 
242940 kB (16%) ggc
 post expand cleanups  :   0.50 ( 1%) usr   0.02 ( 0%) sys   0.49 ( 1%) wall  
76289 kB ( 5%) ggc
 integrated RA         :  10.03 (22%) usr   0.15 ( 3%) sys  10.44 (20%) wall 
135680 kB ( 9%) ggc
 reload                :   4.42 (10%) usr   0.07 ( 1%) sys   4.60 ( 9%) wall  
55295 kB ( 4%) ggc
 rest of compilation   :   3.03 ( 7%) usr   0.82 (16%) sys   3.65 ( 7%) wall 
170498 kB (11%) ggc
 final                 :   3.67 ( 8%) usr   0.09 ( 2%) sys   3.65 ( 7%) wall  
30144 kB ( 2%) ggc
 unaccounted todo      :   1.63 ( 4%) usr   0.91 (18%) sys   2.58 ( 5%) wall   
   0 kB ( 0%) ggc

Wonder what is accounted into rest of compilation.
Otherwise RA remains the compile time bottleneck. Will do memory stats later. 
Adding Vladimir to CC for RA.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2010-12-13  1:17 ` hubicka at gcc dot gnu.org
@ 2010-12-13  1:47 ` hubicka at gcc dot gnu.org
  2010-12-13 13:22 ` hubicka at gcc dot gnu.org
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu.org @ 2010-12-13  1:47 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #10 from Jan Hubicka <hubicka at gcc dot gnu.org> 2010-12-13 01:46:39 UTC ---
Created attachment 22730
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22730
Fix for inline cost problem

The attached patch fixes the inliner cost problem so we converge at -O1. It is
bit brute force, but I guess it should work well in practice. With the fix -O1
converges in 90 secons.
Profile is similar to one at -O0
14898     6.7090  libc-2.11.1.so           libc-2.11.1.so           _int_malloc
7981      3.5941  cc1                      cc1                     
bitmap_set_bit
5993      2.6988  libc-2.11.1.so           libc-2.11.1.so           memset
5063      2.2800  cc1                      cc1                      htab_delete
4091      1.8423  libc-2.11.1.so           libc-2.11.1.so           _IO_vfscanf
3849      1.7333  no-vmlinux               no-vmlinux               /no-vmlinux
3807      1.7144  libc-2.11.1.so           libc-2.11.1.so           _int_free
3632      1.6356  cc1                      cc1                     
df_note_compute
3469      1.5622  libc-2.11.1.so           libc-2.11.1.so          
malloc_consolidate
3352      1.5095  libc-2.11.1.so           libc-2.11.1.so          
msort_with_tmp
2978      1.3411  cc1                      cc1                     
htab_traverse_noresize
2941      1.3244  libc-2.11.1.so           libc-2.11.1.so           free
2824      1.2717  cc1                      cc1                     
bitmap_clear_bit
2653      1.1947  cc1                      cc1                     
df_ref_create_structure
2429      1.0938  libc-2.11.1.so           libc-2.11.1.so           malloc
2239      1.0083  cc1                      cc1                     
df_insn_refs_collect


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2010-12-13  1:47 ` hubicka at gcc dot gnu.org
@ 2010-12-13 13:22 ` hubicka at gcc dot gnu.org
  2010-12-17  0:08 ` hubicka at gcc dot gnu.org
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu.org @ 2010-12-13 13:22 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #11 from Jan Hubicka <hubicka at gcc dot gnu.org> 2010-12-13 13:22:28 UTC ---
Patched compiler at -O2 now shows:
 integration           : 166.20 (16%) usr   0.19 ( 1%) sys 166.86 (15%) wall  
92691 kB ( 4%) ggc
 tree CCP              : 792.75 (74%) usr   0.15 ( 1%) sys 794.63 (73%) wall  
66560 kB ( 3%) ggc

integration is probably the overhead of splitting BBs, I wonder what makes tree
CCP so slow, it is probably worth investigating.

The profile shows:
2009425  73.3643  cc1                      cc1                     
gsi_for_stmt
398058   14.5331  cc1                      cc1                     
gimple_set_bb                                                                   
22230     0.8116  libc-2.11.1.so           libc-2.11.1.so           _int_malloc
14339     0.5235  cc1                      cc1                     
gimple_split_block                                                              
11727     0.4282  libc-2.11.1.so           libc-2.11.1.so           memset
10411     0.3801  libc-2.11.1.so           libc-2.11.1.so           _IO_vfscanf 
9095      0.3321  cc1                      cc1                      htab_delete
6061      0.2213  cc1                      cc1                     
bitmap_set_bit                                                                  
5990      0.2187  no-vmlinux               no-vmlinux               /no-vmlinux
5912      0.2158  libc-2.11.1.so           libc-2.11.1.so           _int_free   
5077      0.1854  libc-2.11.1.so           libc-2.11.1.so          
malloc_consolidate
4516      0.1649  cc1                      cc1                     
htab_find_slot_with_hash                                                        
4515      0.1648  opreport                 opreport                
/usr/bin/opreport
4284      0.1564  libc-2.11.1.so           libc-2.11.1.so           free        
4234      0.1546  libc-2.11.1.so           libc-2.11.1.so           malloc
4106      0.1499  cc1                      cc1                     
htab_traverse_noresize                                                          
3737      0.1364  libc-2.11.1.so           libc-2.11.1.so           calloc
3197      0.1167  cc1                      cc1                      eq_node     
2996      0.1094  cc1                      cc1                     
df_note_compute
2632      0.0961  cc1                      cc1                     
ggc_internal_alloc_stat                                                         
2476      0.0904  cc1                      cc1                     
bitmap_bit_p
Other passes are sub 10s each.

Maybe tree-ccp just gets the overhead of merging the large BBs into single?

Honza


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2010-12-13 13:22 ` hubicka at gcc dot gnu.org
@ 2010-12-17  0:08 ` hubicka at gcc dot gnu.org
  2015-03-07  0:22 ` hubicka at gcc dot gnu.org
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu.org @ 2010-12-17  0:08 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #12 from Jan Hubicka <hubicka at gcc dot gnu.org> 2010-12-17 00:08:07 UTC ---
Author: hubicka
Date: Fri Dec 17 00:08:02 2010
New Revision: 167964

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=167964
Log:

    PR middle-end/44563 
    * ipa-inline.c: Update doplevel comment. 
    (cgraph_estimate_size_after_inlining): Remove times attribute. 
    (cgraph_mark_inline_edge): Update. 
    (cgraph_mark_inline): Remove. 
    (cgraph_estimate_growth): Update. 
    (cgraph_check_inline_limits): Remove one only argument. 
    (cgraph_edge_badness): Update. 
    (cgraph_decide_recursive_inlining): Update. 
    (cgraph_decide_inlining_of_small_function): Fix handling of
tree_can_inline_p 
    and call_stmt_cannot_inline_p. 
    (cgraph_flatten): Likewise. 
    (cgraph_decide_inlining): Update. 
    (cgraph_decide_inlining_incrementally): Fix handling of
call_stmt_cannot_inline_p. 

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ipa-inline.c


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2010-12-17  0:08 ` hubicka at gcc dot gnu.org
@ 2015-03-07  0:22 ` hubicka at gcc dot gnu.org
  2015-03-09  9:36 ` rguenth at gcc dot gnu.org
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: hubicka at gcc dot gnu.org @ 2015-03-07  0:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #14 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Yeah, this is the old problem that after each inline we recompute the size of
the whole function inlined into. This means walking the whole inline tree and
sum size of all non-inlined call sites. If you get very many functions inlined
into single caller, the nonlinearity kicks in. Here the main() function calls
65536 empty functions that takes time to process.

My plan is to turn the sizes/times to sreal and then update them incrementally
(subtract size of the call statement and account changes). With old fixed
point+capping this did not work well and I always ended up with too many of
misaccounting issues.

This is however more intrussive then what I would like to do in stage3. I did
not completed the sreal conversion because sreal class came bit too late in
this development cycle. Have unfinished patch for that but it is >100K and hits
interestin problems like gengtype not understanding sreal.h header.

I will perf it and check for micro-optimization possibilities.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2015-03-07  0:22 ` hubicka at gcc dot gnu.org
@ 2015-03-09  9:36 ` rguenth at gcc dot gnu.org
  2015-03-09 11:33 ` rguenth at gcc dot gnu.org
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-09  9:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
Meanwhile finished to compile at -O1:

 ipa inlining heuristics : 155.14 (12%) usr   0.37 ( 2%) sys 155.31 (12%) wall 
396289 kB (11%) ggc
 integration             : 958.73 (75%) usr   2.18 (13%) sys 960.89 (74%) wall 
 86527 kB ( 2%) ggc
 tree CFG cleanup        : 116.57 ( 9%) usr   0.30 ( 2%) sys 116.77 ( 9%) wall 
     0 kB ( 0%) ggc
 TOTAL                 :1285.25            17.15          1302.17           
3514948 kB


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2015-03-09  9:36 ` rguenth at gcc dot gnu.org
@ 2015-03-09 11:33 ` rguenth at gcc dot gnu.org
  2015-03-09 15:26 ` rguenth at gcc dot gnu.org
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-09 11:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
callgrind shows the cgraph_edge_hasher quite high in the profile (via
redirect_all_calls).  I suppose as the large main is a single BB walking
all stmts over-and-over is quite bad.  Also hash_pointer isn't inlined!?
Ah - it's external in libiberty hashtab.c ... - it should transition to
using/inheriting from pointer_hash.

cgraph_edge *
cgraph_node::get_edge (gimple call_stmt)
{
  cgraph_edge *e, *e2;
  int n = 0;

  if (call_site_hash)
    return call_site_hash->find_with_hash (call_stmt,
                                           htab_hash_pointer (call_stmt));


The estimate_calls_size_and_time portion is quite smaller.

cleanup-cfgs main portion is split_bb_on_noreturn_calls.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2015-03-09 11:33 ` rguenth at gcc dot gnu.org
@ 2015-03-09 15:26 ` rguenth at gcc dot gnu.org
  2015-03-09 15:36 ` rguenth at gcc dot gnu.org
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-09 15:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #16)
> callgrind shows the cgraph_edge_hasher quite high in the profile (via
> redirect_all_calls).  I suppose as the large main is a single BB walking
> all stmts over-and-over is quite bad.  Also hash_pointer isn't inlined!?
> Ah - it's external in libiberty hashtab.c ... - it should transition to
> using/inheriting from pointer_hash.
> 
> cgraph_edge *
> cgraph_node::get_edge (gimple call_stmt)
> {
>   cgraph_edge *e, *e2;
>   int n = 0;
> 
>   if (call_site_hash)
>     return call_site_hash->find_with_hash (call_stmt,
>                                            htab_hash_pointer (call_stmt));
> 

Btw, for 10000 calls (smaller testcase) we get 100 000 000 calls to
cgraph_edge::redirect_call_stmt_to_callee () (that's from 40000
redirect_all_calls calls which is from 10000 optimize_inline_calls calls).

Ah - we do this also for the ENTRY/EXIT block!

Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c   (revision 221278)
+++ gcc/tree-inline.c   (working copy)
@@ -2802,11 +2802,13 @@ copy_cfg_body (copy_body_data * id, gcov
        if (need_debug_cleanup
            && bb->index != ENTRY_BLOCK
            && bb->index != EXIT_BLOCK)
-         maybe_move_debug_stmts_to_successors (id, (basic_block) bb->aux);
-       /* Update call edge destinations.  This can not be done before loop
-          info is updated, because we may split basic blocks.  */
-       if (id->transform_call_graph_edges == CB_CGE_DUPLICATE)
-         redirect_all_calls (id, (basic_block)bb->aux);
+         {
+           maybe_move_debug_stmts_to_successors (id, (basic_block) bb->aux);
+           /* Update call edge destinations.  This can not be done before loop
+              info is updated, because we may split basic blocks.  */
+           if (id->transform_call_graph_edges == CB_CGE_DUPLICATE)
+             redirect_all_calls (id, (basic_block)bb->aux);
+         }
        ((basic_block)bb->aux)->aux = NULL;
        bb->aux = NULL;
       }

makes sense?

> The estimate_calls_size_and_time portion is quite smaller.
> 
> cleanup-cfgs main portion is split_bb_on_noreturn_calls.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2015-03-09 15:26 ` rguenth at gcc dot gnu.org
@ 2015-03-09 15:36 ` rguenth at gcc dot gnu.org
  2015-03-10  4:55 ` hubicka at ucw dot cz
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-09 15:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #18 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #17)
> (In reply to Richard Biener from comment #16)
> > callgrind shows the cgraph_edge_hasher quite high in the profile (via
> > redirect_all_calls).  I suppose as the large main is a single BB walking
> > all stmts over-and-over is quite bad.  Also hash_pointer isn't inlined!?
> > Ah - it's external in libiberty hashtab.c ... - it should transition to
> > using/inheriting from pointer_hash.
> > 
> > cgraph_edge *
> > cgraph_node::get_edge (gimple call_stmt)
> > {
> >   cgraph_edge *e, *e2;
> >   int n = 0;
> > 
> >   if (call_site_hash)
> >     return call_site_hash->find_with_hash (call_stmt,
> >                                            htab_hash_pointer (call_stmt));
> > 
> 
> Btw, for 10000 calls (smaller testcase) we get 100 000 000 calls to
> cgraph_edge::redirect_call_stmt_to_callee () (that's from 40000
> redirect_all_calls calls which is from 10000 optimize_inline_calls calls).
> 
> Ah - we do this also for the ENTRY/EXIT block!
> 
> Index: gcc/tree-inline.c
> ===================================================================
> --- gcc/tree-inline.c   (revision 221278)
> +++ gcc/tree-inline.c   (working copy)
> @@ -2802,11 +2802,13 @@ copy_cfg_body (copy_body_data * id, gcov
>         if (need_debug_cleanup
>             && bb->index != ENTRY_BLOCK
>             && bb->index != EXIT_BLOCK)
> -         maybe_move_debug_stmts_to_successors (id, (basic_block) bb->aux);
> -       /* Update call edge destinations.  This can not be done before loop
> -          info is updated, because we may split basic blocks.  */
> -       if (id->transform_call_graph_edges == CB_CGE_DUPLICATE)
> -         redirect_all_calls (id, (basic_block)bb->aux);
> +         {
> +           maybe_move_debug_stmts_to_successors (id, (basic_block) bb->aux);
> +           /* Update call edge destinations.  This can not be done before
> loop
> +              info is updated, because we may split basic blocks.  */
> +           if (id->transform_call_graph_edges == CB_CGE_DUPLICATE)
> +             redirect_all_calls (id, (basic_block)bb->aux);
> +         }
>         ((basic_block)bb->aux)->aux = NULL;
>         bb->aux = NULL;
>        }
> 
> makes sense?

Fails to bootstrap :/  But would improve the testcase to only have the
inline heuristic issue.

/space/rguenther/src/svn/trunk/libstdc++-v3/libsupc++/pbase_type_info.cc: In
member function ‘virtual bool __cxxabiv1::__pbase_type_info::__do_catch(const
std::type_info*, void**, unsigned int) const’:
/space/rguenther/src/svn/trunk/libstdc++-v3/libsupc++/pbase_type_info.cc:32:6:
error: reference to dead statement
 bool __pbase_type_info::
      ^
# .MEM = VDEF <.MEM>
_30 = OBJ_TYPE_REF(_28;(const struct __pbase_type_info)this_3(D)->6)
(this_3(D), thr_type_5(D), thr_obj_9(D), outer_29);
_ZNK10__cxxabiv117__pbase_type_info10__do_catchEPKSt9type_infoPPvj/74 (virtual
bool __cxxabiv1::__pbase_type_info::__do_catch(const std::type_info*, void**,
unsigned int) const) @0x2aaaac8d3ab8
  Type: function definition analyzed
  Visibility: externally_visible public visibility_specified virtual
  Address is taken.
  References: _ZNK10__cxxabiv117__pbase_type_info15__pointer_catchEPKS0_PPvj/34
(addr) (speculative)
  Referring: _ZTVN10__cxxabiv117__pbase_type_infoE/77 (addr)
  Availability: overwritable
  First run: 0
  Function flags: body
  Called by: 
  Calls: strcmp/85 (0.39 per call) __cxa_bad_typeid/83 (can throw external)
strcmp/85 (0.61 per call) 
   Indirect call(0.11 per call) (can throw external) 
   Polymorphic indirect call of type const struct __pbase_type_info
token:6(speculative) (0.03 per call) (can throw external)  of param:0
    Outer type (dynamic):struct __pbase_type_info (or a derived type) offset 0
/space/rguenther/src/svn/trunk/libstdc++-v3/libsupc++/pbase_type_info.cc:32:6:
internal compiler error: verify_cgraph_node failed
0xa8eebc cgraph_node::verify_node()
        /space/rguenther/src/svn/trunk/gcc/cgraph.c:3115
0xa8473f symtab_node::verify()
        /space/rguenther/src/svn/trunk/gcc/symtab.c:1103
0x1025861 optimize_inline_calls(tree_node*)
        /space/rguenther/src/svn/trunk/gcc/tree-inline.c:4938

> > The estimate_calls_size_and_time portion is quite smaller.
> > 
> > cleanup-cfgs main portion is split_bb_on_noreturn_calls.
>From gcc-bugs-return-479823-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Mon Mar 09 15:55:22 2015
Return-Path: <gcc-bugs-return-479823-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 130953 invoked by alias); 9 Mar 2015 15:55:22 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 130855 invoked by uid 48); 9 Mar 2015 15:55:16 -0000
From: "derodat at adacore dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug debug/53927] wrong value for DW_AT_static_link
Date: Mon, 09 Mar 2015 15:55:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: debug
X-Bugzilla-Version: 4.6.3
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: derodat at adacore dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-53927-4-sGp0FNDm4B@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-53927-4@http.gcc.gnu.org/bugzilla/>
References: <bug-53927-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-03/txt/msg00967.txt.bz2
Content-length: 3119

https://gcc.gnu.org/bugzilla/show_bug.cgi?idS927

--- Comment #21 from Pierre-Marie de Rodat <derodat at adacore dot com> ---
(In reply to Eric Botcazou from comment #18)
> I think this is worth investigating though because it's conceptually
> much simpler than adding yet another indirection.  And we should
> concentrate on -O0 (and -Og), we don't really care about what happens
> with aggressive optimization.

Understood and agreed. Nevertheless...

> I guess the question is: can we arrange to have a constant offset
> between the frame base and the FRAME object, "constant" meaning valid
> for every function but possibly target-dependent?

I started to hack into cfgexpand.c and dwarf2out.c, but I realized this
is not possible in the general case. Consider the following example:

    #include <stdlib.h>

    int
    nestee (void)
    {
      int a __attribute__((aligned(64))) = 1234;

      void
      nested (int b)
      {
        a = b;
      }

      nested (2345);
      return a;
    }

    int
    call_nestee (int n)
    {
      int *v = alloca (sizeof (int) * n);
      v[0] = nestee ();
      return v[0];
    }

    int
    main (void)
    {
      call_nestee (1);
      call_nestee (8);
      return 0;
    }

With a GCC 5.0 built from fairly recent sources, I get the following CFA
information:

    00000090 000000000000002c 00000064 FDE cie\0000030
pc\0000000004004ac..00000000004004eb
      DW_CFA_advance_loc: 5 to 00000000004004b1
      DW_CFA_def_cfa: r10 (r10) ofs 0
      DW_CFA_advance_loc: 9 to 00000000004004ba
      DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
      DW_CFA_advance_loc: 5 to 00000000004004bf
      DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -8; DW_OP_deref)
      DW_CFA_advance_loc: 38 to 00000000004004e5

And now here is what I get under GDB:

    $ gdb -n -q -ex 'b nestee' ./dyn_frame
    Reading symbols from ./dyn_frame...done.
    Breakpoint 1 at 0x4004c3: file dyn_frame.c, line 6.
    (gdb) r
    [...]

    Breakpoint 1, nestee () at dyn_frame.c:6
    6         int a __attribute__((aligned(64))) = 1234;
    (gdb) p $pc
    $1 = (void (*)()) 0x4004c3 <nestee+23>
    (gdb) x/1xg $rbp - 8
    0x7fffffffdf28: 0x00007fffffffdf60
    (gdb) p/x (char *) 0x00007fffffffdf60 - (char *) &a
    $2 = 0xa0

... so for this frame, the CFA and the FRAME object are 0xa0 bytes from
each other. Now let's resume to see the next "nestee" frame:

    (gdb) c
    Continuing.

    Breakpoint 1, nestee () at dyn_frame.c:6
    6         int a __attribute__((aligned(64))) = 1234;
    (gdb) p $pc
    $3 = (void (*)()) 0x4004c3 <nestee+23>
    (gdb) x/1xg $rbp - 8
    0x7fffffffdf28: 0x00007fffffffdf50
    (gdb) p/x (char *) 0x00007fffffffdf50 - (char *) &a
    $4 = 0x90

The offset between the CFA and e FRAME object is now 0x90 bytes. So
because of alignment constraints, I think we cannot assume we can have a
constant offset (even function-dependent).

This offset is dynamic and the only way to compute it is to use the
frame's context: here, nestee's saved registers, which we don't have
access to in DWARF when computing the static link attribute.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2015-03-09 15:36 ` rguenth at gcc dot gnu.org
@ 2015-03-10  4:55 ` hubicka at ucw dot cz
  2015-03-10  8:26 ` rguenth at gcc dot gnu.org
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: hubicka at ucw dot cz @ 2015-03-10  4:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #19 from Jan Hubicka <hubicka at ucw dot cz> ---
Hmm, it is definitely wasteful to call the call stmt redirection so many times
- it only needs
to be called once per statement.  We probably could call it only on newly
introduced BBs, I will
take a look.

Honza


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (13 preceding siblings ...)
  2015-03-10  4:55 ` hubicka at ucw dot cz
@ 2015-03-10  8:26 ` rguenth at gcc dot gnu.org
  2015-03-10  8:35 ` rguenther at suse dot de
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-10  8:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Tue Mar 10 08:25:31 2015
New Revision: 221308

URL: https://gcc.gnu.org/viewcvs?rev=221308&root=gcc&view=rev
Log:
2015-03-10  Richard Biener  <rguenther@suse.de>

    PR middle-end/44563
    * cgraph.h (struct cgraph_edge_hasher): Add hash overload
    for compare_type.
    * cgraph.c (cgraph_edge_hasher::hash): Inline htab_hash_pointer.
    (cgraph_update_edge_in_call_site_hash): Use cgraph_edge_hasher::hash.
    (cgraph_add_edge_to_call_site_hash): Likewise.
    (cgraph_node::get_edge): Likewise.
    (cgraph_edge::set_call_stmt): Likewise.
    (cgraph_edge::remove_caller): Likewise.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cgraph.c
    trunk/gcc/cgraph.h


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (14 preceding siblings ...)
  2015-03-10  8:26 ` rguenth at gcc dot gnu.org
@ 2015-03-10  8:35 ` rguenther at suse dot de
  2015-03-10 11:54 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: rguenther at suse dot de @ 2015-03-10  8:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #21 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 10 Mar 2015, hubicka at ucw dot cz wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
> 
> --- Comment #19 from Jan Hubicka <hubicka at ucw dot cz> ---
> Hmm, it is definitely wasteful to call the call stmt redirection so many times
> - it only needs
> to be called once per statement.  We probably could call it only on newly
> introduced BBs, I will
> take a look.

Ah - stupid error on my part producing the "obvious" patch (and not
seeing the bogus need_debug_cleanup guard...)

Testing the proper patch.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (15 preceding siblings ...)
  2015-03-10  8:35 ` rguenther at suse dot de
@ 2015-03-10 11:54 ` rguenth at gcc dot gnu.org
  2015-03-10 12:03 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-10 11:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #23 from Richard Biener <rguenth at gcc dot gnu.org> ---
Funnily apart from the IPA inline summary updating issue the next important
time-hog is basic-block splitting we do for inlining a call.  This is because
split_block moves the tail of the block to a new one (and we process inlines
from BB head to BB start).  And thus we hit gimple_set_bb () quite hard
(quadratic in the number of calls in main() which is composed of a single BB).

So in theory it's better to work backwards from the BB - but that doesn't play
well with catching all BBs in gimple_expand_calls_inline and its caller.

We are talking about 8% of compile-time spent in gimple_set_bb here (according
to callgrind).

It's bad that splitting blocks is O(n) (but that stmt -> bb pointer saves us
in other places).

If we'd know that we perform multiple inlinings in a block we could use a
special "split block" function that splits the block in multiple places
at once, avoiding the quadraticness seen here.  Basically first split
all calls that we want to inline to separate blocks and then do the actual
inlining run.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (16 preceding siblings ...)
  2015-03-10 11:54 ` rguenth at gcc dot gnu.org
@ 2015-03-10 12:03 ` rguenth at gcc dot gnu.org
  2015-03-10 12:31 ` jakub at gcc dot gnu.org
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-10 12:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #24 from Richard Biener <rguenth at gcc dot gnu.org> ---
So in gimple_expand_calls_inline we could look only at BBs last stmt for the
actual inlining but for the rest just do the basic-block splitting.  And then
perform that walk backwards. This should remove the quadraticness that arises
when expanding many calls inline from a single basic-block.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (17 preceding siblings ...)
  2015-03-10 12:03 ` rguenth at gcc dot gnu.org
@ 2015-03-10 12:31 ` jakub at gcc dot gnu.org
  2015-03-10 12:44 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-03-10 12:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #25 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Or perhaps add split_block variant that uses the old bb for the second part
rather than the first one, and use it in the inliner?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (18 preceding siblings ...)
  2015-03-10 12:31 ` jakub at gcc dot gnu.org
@ 2015-03-10 12:44 ` rguenth at gcc dot gnu.org
  2015-03-10 12:51 ` rguenther at suse dot de
                   ` (8 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-10 12:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #26 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Tue Mar 10 12:44:01 2015
New Revision: 221321

URL: https://gcc.gnu.org/viewcvs?rev=221321&root=gcc&view=rev
Log:
2015-03-09  Richard Biener  <rguenther@suse.de>

    PR middle-end/44563
    * tree-inline.c (copy_cfg_body): Skip block mapped to entry/exit
    for redirect_all_calls.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-inline.c


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (19 preceding siblings ...)
  2015-03-10 12:44 ` rguenth at gcc dot gnu.org
@ 2015-03-10 12:51 ` rguenther at suse dot de
  2015-03-10 13:19 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: rguenther at suse dot de @ 2015-03-10 12:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #27 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 10 Mar 2015, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
> 
> Jakub Jelinek <jakub at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |jakub at gcc dot gnu.org
> 
> --- Comment #25 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> Or perhaps add split_block variant that uses the old bb for the second part
> rather than the first one, and use it in the inliner?

Seems like

Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c   (revision 221317)
+++ gcc/tree-inline.c   (working copy)
@@ -4777,18 +4781,19 @@ static bool
 gimple_expand_calls_inline (basic_block bb, copy_body_data *id)
 {
   gimple_stmt_iterator gsi;
+  bool inlined = false;

-  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+  for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi);)
     {
       gimple stmt = gsi_stmt (gsi);
+      gsi_prev (&gsi);

       if (is_gimple_call (stmt)
-         && !gimple_call_internal_p (stmt)
-         && expand_call_inline (bb, stmt, id))
-       return true;
+         && !gimple_call_internal_p (stmt))
+       inlined |= expand_call_inline (bb, stmt, id);
     }

-  return false;
+  return inlined;
 }


fixes the issue as well as gsi stays valid over inline expansion if
we advance it before that.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (20 preceding siblings ...)
  2015-03-10 12:51 ` rguenther at suse dot de
@ 2015-03-10 13:19 ` rguenth at gcc dot gnu.org
  2015-03-12 15:09 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-10 13:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #28 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #27)
> On Tue, 10 Mar 2015, jakub at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
> > 
> > Jakub Jelinek <jakub at gcc dot gnu.org> changed:
> > 
> >            What    |Removed                     |Added
> > ----------------------------------------------------------------------------
> >                  CC|                            |jakub at gcc dot gnu.org
> > 
> > --- Comment #25 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> > Or perhaps add split_block variant that uses the old bb for the second part
> > rather than the first one, and use it in the inliner?
> 
> Seems like
> 
> Index: gcc/tree-inline.c
> ===================================================================
> --- gcc/tree-inline.c   (revision 221317)
> +++ gcc/tree-inline.c   (working copy)
> @@ -4777,18 +4781,19 @@ static bool
>  gimple_expand_calls_inline (basic_block bb, copy_body_data *id)
>  {
>    gimple_stmt_iterator gsi;
> +  bool inlined = false;
>  
> -  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +  for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi);)
>      {
>        gimple stmt = gsi_stmt (gsi);
> +      gsi_prev (&gsi);
>  
>        if (is_gimple_call (stmt)
> -         && !gimple_call_internal_p (stmt)
> -         && expand_call_inline (bb, stmt, id))
> -       return true;
> +         && !gimple_call_internal_p (stmt))
> +       inlined |= expand_call_inline (bb, stmt, id);
>      }
>  
> -  return false;
> +  return inlined;
>  }
>  
>  
> fixes the issue as well as gsi stays valid over inline expansion if
> we advance it before that.

Funnily this makes us hit merge_blocks now via cleanup-tree-cfg
walking BBs in block-number order (and the inliner now allocating
blocks in a less "optimal" order...).  This:

676       n = last_basic_block_for_fn (cfun);
677       for (i = NUM_FIXED_BLOCKS; i < n; i++)
678         {
679           bb = BASIC_BLOCK_FOR_FN (cfun, i);
680           if (bb)
681             retval |= cleanup_tree_cfg_bb (bb);
682         }

should work (for merging blocks) from entry to exit but after
new block assignment order now effectively works backwards :/

So the above doesn't really fix the issue but just shifts it elsewhere.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (21 preceding siblings ...)
  2015-03-10 13:19 ` rguenth at gcc dot gnu.org
@ 2015-03-12 15:09 ` rguenth at gcc dot gnu.org
  2015-03-13  8:43 ` [Bug ipa/44563] " rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-12 15:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #29 from Richard Biener <rguenth at gcc dot gnu.org> ---
Sth like

@@ -672,8 +650,18 @@ cleanup_tree_cfg_bb (basic_block bb)
   if (single_succ_p (bb)
       && can_merge_blocks_p (bb, single_succ (bb)))
     {
-      merge_blocks (bb, single_succ (bb));
-      return true;
+      /* If there is a merge opportunity with the predecessor
+         do nothing now but wait until we process the predecessor.
+        This happens when we visit BBs in a non-optimal order and
+        avoids quadratic behavior with adjusting stmts BB pointer.  */
+      if (single_pred_p (bb)
+         && can_merge_blocks_p (single_pred (bb), bb))
+       ;
+      else
+       {
+         merge_blocks (bb, single_succ (bb));
+         return true;
+       }
     }

   return retval;

in addition should do the job.  Iteration on the predecessor should
cover both merges (so we don't actually need to revisit this block itself).


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (22 preceding siblings ...)
  2015-03-12 15:09 ` rguenth at gcc dot gnu.org
@ 2015-03-13  8:43 ` rguenth at gcc dot gnu.org
  2015-03-13  8:47 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-13  8:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|tree-optimization           |ipa
      Known to fail|                            |5.0

--- Comment #30 from Richard Biener <rguenth at gcc dot gnu.org> ---
With all the patches I have for now we end up with a pure IPA issue:

 phase opt and generate  : 193.97 (99%) usr  13.82 (93%) sys 207.75 (99%) wall
3311016 kB (94%) ggc
 ipa inlining heuristics : 140.48 (72%) usr   0.44 ( 3%) sys 141.13 (67%) wall 
396289 kB (11%) ggc
 dominance computation   :   2.99 ( 2%) usr   1.00 ( 7%) sys   3.89 ( 2%) wall 
     0 kB ( 0%) ggc
 integrated RA           :   4.05 ( 2%) usr   0.85 ( 6%) sys   5.26 ( 3%) wall
1577496 kB (45%) ggc
 rest of compilation     :   6.53 ( 3%) usr   1.67 (11%) sys   7.91 ( 4%) wall 
155664 kB ( 4%) ggc
 unaccounted todo        :   3.82 ( 2%) usr   1.07 ( 7%) sys   4.98 ( 2%) wall 
     0 kB ( 0%) ggc
 TOTAL                 : 195.46            14.79           210.23           
3514948 kB

everything <= 1% dropped.  I wonder what that unaccounted todo is ;)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (23 preceding siblings ...)
  2015-03-13  8:43 ` [Bug ipa/44563] " rguenth at gcc dot gnu.org
@ 2015-03-13  8:47 ` rguenth at gcc dot gnu.org
  2015-03-13  8:53 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-13  8:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #31 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Fri Mar 13 08:47:14 2015
New Revision: 221409

URL: https://gcc.gnu.org/viewcvs?rev=221409&root=gcc&view=rev
Log:
2015-03-10  Richard Biener  <rguenther@suse.de>

    PR middle-end/44563
    * tree-cfgcleanup.c (split_bb_on_noreturn_calls): Remove.
    (cleanup_tree_cfg_1): Do not call it.
    (execute_cleanup_cfg_post_optimizing): Fixup the CFG here.
    (fixup_noreturn_call): Mark the stmt as control altering.
    * tree-cfg.c (execute_fixup_cfg): Do not dump the function
    here.
    (pass_data_fixup_cfg): Produce a dump file.
    * tree-ssa-dom.c: Include tree-cfgcleanup.h.
    (need_noreturn_fixup): New global.
    (pass_dominator::execute): Fixup queued noreturn calls.
    (optimize_stmt): Queue calls that became noreturn for fixup.
    * tree-ssa-forwprop.c (pass_forwprop::execute): Likewise.
    * tree-ssa-pre.c: Include tree-cfgcleanup.h.
    (el_to_fixup): New global.
    (eliminate_dom_walker::before_dom_childre): Queue calls that
    became noreturn for fixup.
    (eliminate): Fixup queued noreturn calls.
    * tree-ssa-propagate.c: Include tree-cfgcleanup.h.
    (substitute_and_fold_dom_walker): New member stmts_to_fixup.
    (substitute_and_fold_dom_walker::before_dom_children): Queue
    alls that became noreturn for fixup.
    (substitute_and_fold): Fixup queued noreturn calls.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-cfg.c
    trunk/gcc/tree-cfgcleanup.c
    trunk/gcc/tree-ssa-dom.c
    trunk/gcc/tree-ssa-forwprop.c
    trunk/gcc/tree-ssa-pre.c
    trunk/gcc/tree-ssa-propagate.c


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (24 preceding siblings ...)
  2015-03-13  8:47 ` rguenth at gcc dot gnu.org
@ 2015-03-13  8:53 ` rguenth at gcc dot gnu.org
  2015-03-13  8:55 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-13  8:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #32 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Fri Mar 13 08:52:51 2015
New Revision: 221410

URL: https://gcc.gnu.org/viewcvs?rev=221410&root=gcc&view=rev
Log:
2015-03-12  Richard Biener  <rguenther@suse.de>

    PR middle-end/44563
    * tree-inline.c (gimple_expand_calls_inline): Walk BB backwards
    to avoid quadratic behavior with inline expansion splitting blocks.
    * tree-cfgcleanup.c (cleanup_tree_cfg_bb): Do not merge block
    with the successor if the predecessor will be merged with it.
    * tree-cfg.c (gimple_can_merge_blocks_p): We can't merge the
    entry block with its successor.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-cfg.c
    trunk/gcc/tree-cfgcleanup.c
    trunk/gcc/tree-inline.c


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (25 preceding siblings ...)
  2015-03-13  8:53 ` rguenth at gcc dot gnu.org
@ 2015-03-13  8:55 ` rguenth at gcc dot gnu.org
  2015-03-16  0:07 ` hubicka at ucw dot cz
  2024-02-16 13:52 ` rguenth at gcc dot gnu.org
  28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-13  8:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |hubicka at gcc dot gnu.org

--- Comment #33 from Richard Biener <rguenth at gcc dot gnu.org> ---
Assigning to Honza - I wonder if there is any low-hanging fruit to improve
things for GCC 5 still.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (26 preceding siblings ...)
  2015-03-13  8:55 ` rguenth at gcc dot gnu.org
@ 2015-03-16  0:07 ` hubicka at ucw dot cz
  2024-02-16 13:52 ` rguenth at gcc dot gnu.org
  28 siblings, 0 replies; 29+ messages in thread
From: hubicka at ucw dot cz @ 2015-03-16  0:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

--- Comment #34 from Jan Hubicka <hubicka at ucw dot cz> ---
The problem is (as described earlier) the fact htat we sum size of all call
statmts
in function after every inline decision.
Most of time is spent in calling estimate_edge_size_and_time:
 79.95%       cc1  cc1                [.]
_ZL28estimate_calls_size_and_timeP11cgraph_nodePiS1_S1_S1_j3vecIP9tree_node7va_heap6vl_ptrES2_I28ipa_polymorphic_call_contextS5_S6_ES2_IP21ipa_agg
  2.21%       cc1  libc-2.13.so       [.] _int_malloc
  0.59%       cc1  libc-2.13.so       [.] _int_free

Updating summaries incrementally will solve it but at the moment do not see any
really simple change for GCC-5 (i looked at this code couple times already
because of this PR)

Honza


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug ipa/44563] GCC uses a lot of RAM when compiling a large numbers of functions
       [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
                   ` (27 preceding siblings ...)
  2015-03-16  0:07 ` hubicka at ucw dot cz
@ 2024-02-16 13:52 ` rguenth at gcc dot gnu.org
  28 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-16 13:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2010-06-17 10:36:48         |2024-2-16

--- Comment #40 from Richard Biener <rguenth at gcc dot gnu.org> ---
Reconfirmed.  On for GCC 14 we use about 2GB of ram on x86_64 with -O0 and 20s.
With -O1 that regresses to 60s and a little less peak memory.

 callgraph ipa passes               :  14.18 ( 23%)
 tree PTA                           :  16.43 ( 27%) 

And -O2 memory usage improves further at about the same compile-time.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2024-02-16 13:52 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-44563-4@http.gcc.gnu.org/bugzilla/>
2010-12-12 23:55 ` [Bug tree-optimization/44563] GCC uses a lot of RAM when compiling a large numbers of functions hubicka at gcc dot gnu.org
2010-12-13  0:58 ` hubicka at gcc dot gnu.org
2010-12-13  1:00 ` hubicka at gcc dot gnu.org
2010-12-13  1:07 ` hubicka at gcc dot gnu.org
2010-12-13  1:17 ` hubicka at gcc dot gnu.org
2010-12-13  1:47 ` hubicka at gcc dot gnu.org
2010-12-13 13:22 ` hubicka at gcc dot gnu.org
2010-12-17  0:08 ` hubicka at gcc dot gnu.org
2015-03-07  0:22 ` hubicka at gcc dot gnu.org
2015-03-09  9:36 ` rguenth at gcc dot gnu.org
2015-03-09 11:33 ` rguenth at gcc dot gnu.org
2015-03-09 15:26 ` rguenth at gcc dot gnu.org
2015-03-09 15:36 ` rguenth at gcc dot gnu.org
2015-03-10  4:55 ` hubicka at ucw dot cz
2015-03-10  8:26 ` rguenth at gcc dot gnu.org
2015-03-10  8:35 ` rguenther at suse dot de
2015-03-10 11:54 ` rguenth at gcc dot gnu.org
2015-03-10 12:03 ` rguenth at gcc dot gnu.org
2015-03-10 12:31 ` jakub at gcc dot gnu.org
2015-03-10 12:44 ` rguenth at gcc dot gnu.org
2015-03-10 12:51 ` rguenther at suse dot de
2015-03-10 13:19 ` rguenth at gcc dot gnu.org
2015-03-12 15:09 ` rguenth at gcc dot gnu.org
2015-03-13  8:43 ` [Bug ipa/44563] " rguenth at gcc dot gnu.org
2015-03-13  8:47 ` rguenth at gcc dot gnu.org
2015-03-13  8:53 ` rguenth at gcc dot gnu.org
2015-03-13  8:55 ` rguenth at gcc dot gnu.org
2015-03-16  0:07 ` hubicka at ucw dot cz
2024-02-16 13:52 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).