From: Richard Biener <rguenther@suse.de>
To: Jan Hubicka <hubicka@ucw.cz>
Cc: Martin Jambor <mjambor@suse.cz>,
gary@amperecomputing.com, mliska@suse.cz, jakub@redhat.com,
gcc-patches@gcc.gnu.org
Subject: Re: Materialize clones on demand
Date: Mon, 26 Oct 2020 08:41:34 +0100 (CET) [thread overview]
Message-ID: <nycvar.YFH.7.76.2010260824470.10073@p653.nepu.fhfr.qr> (raw)
In-Reply-To: <20201023192748.GB33077@kam.mff.cuni.cz>
On Fri, 23 Oct 2020, Jan Hubicka wrote:
> > Hi,
> >
> > On Thu, Oct 22 2020, Jan Hubicka wrote:
> > > Hi,
> > > this patch removes the pass to materialize all clones and instead this
> > > is now done on demand. The motivation is to reduce lifetime of function
> > > bodies in ltrans that should noticeably reduce memory use for highly
> > > parallel compilations of large programs (like Martin does) or with
> > > partitioning reduced/disabled. For cc1 with one partition the memory use
> > > seems to go down from 4gb to cca 1.5gb (seeing from top, so this is not
> > > particularly accurate).
> > >
> >
> > Nice.
>
> Sadly this is only true w/o debug info. I collected memory usage stats
> at the end of the ltrans stage and it is as folloes
>
> - after streaming in global stream: 126M GGC and 41M heap
> - after streaming symbol table: 373M GGC and 92M heap
> - after stremaing in summaries: 394M GGC and 92M heap
> (only large summary seems to be ipa-cp transformation summary)
> - then compilation starts and memory goes slowly up to 3527M at the end
> of compilation
>
> The following accounts for more than 1% GGC:
>
> Time variable usr sys wall GGC
> ipa inlining heuristics : 6.99 ( 0%) 4.62 ( 1%) 11.17 ( 1%) 241M ( 1%)
> ipa lto gimple in : 50.04 ( 3%) 29.72 ( 7%) 80.22 ( 4%) 3129M ( 14%)
> ipa lto decl in : 0.79 ( 0%) 0.36 ( 0%) 1.15 ( 0%) 135M ( 1%)
> ipa lto cgraph I/O : 0.95 ( 0%) 0.20 ( 0%) 1.15 ( 0%) 269M ( 1%)
> cfg cleanup : 25.83 ( 2%) 2.52 ( 1%) 28.15 ( 1%) 154M ( 1%)
> df reg dead/unused notes : 24.08 ( 2%) 2.09 ( 1%) 26.77 ( 1%) 180M ( 1%)
> alias analysis : 16.94 ( 1%) 1.05 ( 0%) 17.71 ( 1%) 383M ( 2%)
> integration : 45.76 ( 3%) 44.30 ( 11%) 88.99 ( 5%) 2328M ( 10%)
> tree VRP : 41.38 ( 3%) 15.67 ( 4%) 57.71 ( 3%) 560M ( 2%)
> tree SSA rewrite : 6.71 ( 0%) 2.17 ( 1%) 8.96 ( 0%) 194M ( 1%)
> tree SSA incremental : 26.99 ( 2%) 8.23 ( 2%) 34.42 ( 2%) 144M ( 1%)
> tree operand scan : 65.34 ( 4%) 61.50 ( 15%) 127.02 ( 7%) 886M ( 4%)
> dominator optimization : 41.53 ( 3%) 13.56 ( 3%) 55.78 ( 3%) 407M ( 2%)
> tree split crit edges : 1.08 ( 0%) 0.65 ( 0%) 1.63 ( 0%) 127M ( 1%)
> tree PRE : 34.30 ( 2%) 14.52 ( 4%) 49.08 ( 3%) 337M ( 1%)
> tree code sinking : 2.92 ( 0%) 0.58 ( 0%) 3.51 ( 0%) 122M ( 1%)
> tree iv optimization : 6.71 ( 0%) 1.19 ( 0%) 8.46 ( 0%) 133M ( 1%)
> expand : 45.56 ( 3%) 8.24 ( 2%) 55.02 ( 3%) 1980M ( 9%)
> forward prop : 11.89 ( 1%) 1.39 ( 0%) 12.59 ( 1%) 130M ( 1%)
> dead store elim2 : 10.03 ( 1%) 0.70 ( 0%) 11.23 ( 1%) 138M ( 1%)
> loop init : 11.96 ( 1%) 4.95 ( 1%) 17.11 ( 1%) 378M ( 2%)
> CPROP : 22.63 ( 2%) 2.78 ( 1%) 25.19 ( 1%) 359M ( 2%)
> combiner : 41.39 ( 3%) 2.57 ( 1%) 43.30 ( 2%) 558M ( 2%)
> reload CSE regs : 22.38 ( 2%) 1.25 ( 0%) 23.06 ( 1%) 186M ( 1%)
> final : 32.33 ( 2%) 4.28 ( 1%) 36.75 ( 2%) 1105M ( 5%)
> symout : 49.04 ( 3%) 2.23 ( 1%) 52.33 ( 3%) 2517M ( 11%)
> var-tracking emit : 33.26 ( 2%) 1.02 ( 0%) 34.35 ( 2%) 582M ( 3%)
> rest of compilation : 38.05 ( 3%) 15.61 ( 4%) 52.42 ( 3%) 114M ( 1%)
> TOTAL :1486.02 408.79 1899.96 22512M
>
> We seem to leak some hashtables:
> dwarf2out.c:28850 (dwarf2out_init) 31M: 23.8% 47M 19 : 0.0% ggc
that one likely keeps quite some memory live...
> cselib.c:3137 (cselib_init) 34M: 25.9% 34M 1514k: 17.3% heap
> tree-scalar-evolution.c:2984 (scev_initialize) 37M: 27.6% 50M 228k: 2.6% ggc
Hmm, so we do
scalar_evolution_info = hash_table<scev_info_hasher>::create_ggc (100);
and
scalar_evolution_info->empty ();
scalar_evolution_info = NULL;
to reclaim. ->empty () will IIRC at least allocate 7 elements which we
the eventually should reclaim during a GC walk - I guess the hashtable
statistics do not really handle GC reclaimed portions?
If there's a friendlier way of releasing a GC allocated hash-tab
we can switch to that. Note that in principle the hash-table doesn't
need to be GC allocated but it needs to be walked since it refers to
trees that might not be referenced in other ways.
> and hashmaps:
> ipa-reference.c:1133 (ipa_reference_read_optimiz 2047k: 3.0% 3071k 9 : 0.0% heap
> tree-ssa.c:60 (redirect_edge_var_map_add) 4125k: 6.1% 4126k 8190 : 0.1% heap
Similar as SCEV, probably mis-accounting?
> alias.c:1200 (record_alias_subset) 4510k: 6.6% 4510k 4546 : 0.0% ggc
> ipa-prop.h:986 (ipcp_transformation_t) 8191k: 12.0% 11M 16 : 0.0% ggc
> dwarf2out.c:5957 (dwarf2out_register_external_di 47M: 72.2% 71M 12 : 0.0% ggc
>
> and hashsets:
> ipa-devirt.c:3093 (possible_polymorphic_call_tar 15k: 0.9% 23k 8 : 0.0% heap
> ipa-devirt.c:1599 (add_type_duplicate) 412k: 22.2% 412k 4065 : 0.0% heap
> tree-ssa-threadbackward.c:40 (thread_jumps) 1432k: 77.0% 1433k 119k: 0.8% heap
>
> and vectors:
> tree-ssa-structalias.c:5783 (push_fields_onto_fi 8 847k: 0.3% 976k 475621: 0.8% 17k 24k
Huh. It's an auto_vec<>
> tree-ssa-pre.c:334 (alloc_expression_id) 48 1125k: 0.4% 1187k 198336: 0.3% 23k 34k
> tree-into-ssa.c:1787 (register_new_update_single 8 1196k: 0.5% 1264k 380385: 0.6% 24k 36k
> ggc-page.c:1264 (add_finalizer) 8 1232k: 0.5% 1848k 43: 0.0% 77k 81k
> tree-ssa-structalias.c:1609 (topo_visit) 8 1302k: 0.5% 1328k 892964: 1.4% 27k 33k
> graphds.c:254 (graphds_dfs) 4 1469k: 0.6% 1675k 2101780: 3.4% 30k 34k
> dominance.c:955 (get_dominated_to_depth) 8 2251k: 0.9% 2266k 685140: 1.1% 46k 50k
> tree-ssa-structalias.c:410 (new_var_info) 32 2264k: 0.9% 2341k 330758: 0.5% 47k 63k
> tree-ssa-structalias.c:3104 (process_constraint) 48 2376k: 0.9% 2606k 405451: 0.7% 49k 83k
> symtab.c:612 (create_reference) 8 3314k: 1.3% 4897k 75213: 0.1% 414k 612k
> vec.h:1734 (copy) 48 233M:90.5% 234M 6243163:10.1% 4982k 5003k
Those all look OK to me, not sure why we even think there's a leak?
> However main problem is
> cfg.c:202 (connect_src) 5745k: 0.2% 271M: 1.9% 1754k: 0.0% 1132k: 0.2% 7026k
> cfg.c:212 (connect_dest) 6307k: 0.2% 281M: 2.0% 10129k: 0.2% 2490k: 0.5% 7172k
> varasm.c:3359 (build_constant_desc) 7387k: 0.2% 0 : 0.0% 0 : 0.0% 0 : 0.0% 51k
> emit-rtl.c:486 (gen_raw_REG) 7799k: 0.2% 215M: 1.5% 96 : 0.0% 0 : 0.0% 9502k
> dwarf2cfi.c:2341 (add_cfis_to_fde) 8027k: 0.2% 0 : 0.0% 4906k: 0.1% 1405k: 0.3% 78k
> emit-rtl.c:4074 (make_jump_insn_raw) 8239k: 0.2% 93M: 0.7% 0 : 0.0% 0 : 0.0% 1442k
> tree-ssanames.c:308 (make_ssa_name_fn) 9130k: 0.2% 456M: 3.3% 0 : 0.0% 0 : 0.0% 6622k
> gimple.c:1808 (gimple_copy) 9508k: 0.3% 524M: 3.7% 8609k: 0.2% 2972k: 0.6% 7135k
> tree-inline.c:4879 (expand_call_inline) 9590k: 0.3% 21M: 0.2% 0 : 0.0% 0 : 0.0% 328k
> dwarf2cfi.c:418 (new_cfi) 10M: 0.3% 0 : 0.0% 0 : 0.0% 0 : 0.0% 444k
> cfg.c:266 (unchecked_make_edge) 10M: 0.3% 60M: 0.4% 355M: 6.8% 0 : 0.0% 9083k
> tree.c:1642 (wide_int_to_tree_1) 10M: 0.3% 2313k: 0.0% 0 : 0.0% 0 : 0.0% 548k
> stringpool.c:41 (stringpool_ggc_alloc) 10M: 0.3% 7055k: 0.0% 0 : 0.0% 2270k: 0.5% 588k
> stringpool.c:63 (alloc_node) 10M: 0.3% 12M: 0.1% 0 : 0.0% 0 : 0.0% 588k
> tree-phinodes.c:119 (allocate_phi_node) 11M: 0.3% 153M: 1.1% 0 : 0.0% 3539k: 0.7% 340k
> cgraph.c:289 (create_empty) 12M: 0.3% 0 : 0.0% 109M: 2.1% 0 : 0.0% 371k
> cfg.c:127 (alloc_block) 14M: 0.4% 705M: 5.0% 0 : 0.0% 0 : 0.0% 7086k
> tree-streamer-in.c:558 (streamer_read_tree_bitfi 22M: 0.6% 13k: 0.0% 0 : 0.0% 22k: 0.0% 64k
> tree-inline.c:834 (remap_block) 28M: 0.8% 159M: 1.1% 0 : 0.0% 0 : 0.0% 2009k
> stringpool.c:79 (ggc_alloc_string) 28M: 0.8% 5619k: 0.0% 0 : 0.0% 6658k: 1.4% 1785k
> dwarf2out.c:11727 (add_ranges_num) 32M: 0.9% 0 : 0.0% 32M: 0.6% 144 : 0.0% 20
> tree-inline.c:5942 (copy_decl_to_var) 39M: 1.1% 51M: 0.4% 0 : 0.0% 0 : 0.0% 646k
> tree-inline.c:5994 (copy_decl_no_change) 78M: 2.1% 270M: 1.9% 0 : 0.0% 0 : 0.0% 2497k
> function.c:4438 (reorder_blocks_1) 96M: 2.6% 101M: 0.7% 0 : 0.0% 0 : 0.0% 2109k
> hash-table.h:802 (expand) 142M: 3.9% 18M: 0.1% 198M: 3.8% 32M: 6.9% 38k
> dwarf2out.c:10086 (new_loc_list) 219M: 6.0% 11M: 0.1% 0 : 0.0% 0 : 0.0% 2955k
> tree-streamer-in.c:637 (streamer_alloc_tree) 379M: 10.3% 426M: 3.0% 0 : 0.0% 4201k: 0.9% 9828k
> dwarf2out.c:5702 (new_die_raw) 434M: 11.8% 0 : 0.0% 0 : 0.0% 0 : 0.0% 5556k
> dwarf2out.c:1383 (new_loc_descr) 519M: 14.1% 12M: 0.1% 2880 : 0.0% 0 : 0.0% 6812k
> dwarf2out.c:4420 (add_dwarf_attr) 640M: 17.4% 0 : 0.0% 94M: 1.8% 4584k: 1.0% 3877k
> toplev.c:906 (realloc_for_line_map) 768M: 20.8% 0 : 0.0% 767M: 14.6% 255M: 54.4% 33
> --------------------------------------------------------------------------------------------------------------------------------------------
> GGC memory Leak Garbage Freed Overhead Times
> --------------------------------------------------------------------------------------------------------------------------------------------
> Total 3689M:100.0% 14039M:100.0% 5254M:100.0% 470M:100.0% 391M
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Clearly some function bodies leak - I will try to figure out what. But
> main problem is debug info.
> I guess debug info for whole cc1plus is large, but it would be nice if
> it was not in the garbage collector, for example :)
Well, we're building a DIE tree for the whole unit here so I'm not sure
what parts we can optimize. The structures may keep quite some stuff
on the tree side live through the decl -> DIE and block -> DIE maps
and the external_die_map used for LTO streaming (but if we lazily stream
bodies we do need to keep this map ... unless we add some
start/end-stream-body hooks and doing the map per function. But then
we build the DIEs lazily as well so the query of the map is lazy :/)
Richard.
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend
next prev parent reply other threads:[~2020-10-26 7:41 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-22 9:48 Jan Hubicka
2020-10-23 11:21 ` Martin Jambor
2020-10-23 11:26 ` Jan Hubicka
2020-10-23 19:27 ` Jan Hubicka
2020-10-26 7:41 ` Richard Biener [this message]
2020-10-26 9:48 ` Jan Hubicka
2020-10-26 10:32 ` Richard Biener
2020-10-26 10:35 ` Jan Hubicka
2020-10-28 15:51 ` Jan Hubicka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=nycvar.YFH.7.76.2010260824470.10073@p653.nepu.fhfr.qr \
--to=rguenther@suse.de \
--cc=gary@amperecomputing.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=hubicka@ucw.cz \
--cc=jakub@redhat.com \
--cc=mjambor@suse.cz \
--cc=mliska@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).