From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16]) by sourceware.org (Postfix) with ESMTPS id 16D173857838 for ; Fri, 23 Oct 2020 19:27:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 16D173857838 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=ucw.cz Authentication-Results: sourceware.org; spf=none smtp.mailfrom=hubicka@kam.mff.cuni.cz Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id D7A1C282616; Fri, 23 Oct 2020 21:27:48 +0200 (CEST) Date: Fri, 23 Oct 2020 21:27:48 +0200 From: Jan Hubicka To: Martin Jambor Cc: gary@amperecomputing.com, mliska@suse.cz, jakub@redhat.com, gcc-patches@gcc.gnu.org, rguenther@suse.de Subject: Re: Materialize clones on demand Message-ID: <20201023192748.GB33077@kam.mff.cuni.cz> References: <20201022094820.GB97578@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Oct 2020 19:27:53 -0000 > Hi, > > On Thu, Oct 22 2020, Jan Hubicka wrote: > > Hi, > > this patch removes the pass to materialize all clones and instead this > > is now done on demand. The motivation is to reduce lifetime of function > > bodies in ltrans that should noticeably reduce memory use for highly > > parallel compilations of large programs (like Martin does) or with > > partitioning reduced/disabled. For cc1 with one partition the memory use > > seems to go down from 4gb to cca 1.5gb (seeing from top, so this is not > > particularly accurate). > > > > Nice. Sadly this is only true w/o debug info. I collected memory usage stats at the end of the ltrans stage and it is as folloes - after streaming in global stream: 126M GGC and 41M heap - after streaming symbol table: 373M GGC and 92M heap - after stremaing in summaries: 394M GGC and 92M heap (only large summary seems to be ipa-cp transformation summary) - then compilation starts and memory goes slowly up to 3527M at the end of compilation The following accounts for more than 1% GGC: Time variable usr sys wall GGC ipa inlining heuristics : 6.99 ( 0%) 4.62 ( 1%) 11.17 ( 1%) 241M ( 1%) ipa lto gimple in : 50.04 ( 3%) 29.72 ( 7%) 80.22 ( 4%) 3129M ( 14%) ipa lto decl in : 0.79 ( 0%) 0.36 ( 0%) 1.15 ( 0%) 135M ( 1%) ipa lto cgraph I/O : 0.95 ( 0%) 0.20 ( 0%) 1.15 ( 0%) 269M ( 1%) cfg cleanup : 25.83 ( 2%) 2.52 ( 1%) 28.15 ( 1%) 154M ( 1%) df reg dead/unused notes : 24.08 ( 2%) 2.09 ( 1%) 26.77 ( 1%) 180M ( 1%) alias analysis : 16.94 ( 1%) 1.05 ( 0%) 17.71 ( 1%) 383M ( 2%) integration : 45.76 ( 3%) 44.30 ( 11%) 88.99 ( 5%) 2328M ( 10%) tree VRP : 41.38 ( 3%) 15.67 ( 4%) 57.71 ( 3%) 560M ( 2%) tree SSA rewrite : 6.71 ( 0%) 2.17 ( 1%) 8.96 ( 0%) 194M ( 1%) tree SSA incremental : 26.99 ( 2%) 8.23 ( 2%) 34.42 ( 2%) 144M ( 1%) tree operand scan : 65.34 ( 4%) 61.50 ( 15%) 127.02 ( 7%) 886M ( 4%) dominator optimization : 41.53 ( 3%) 13.56 ( 3%) 55.78 ( 3%) 407M ( 2%) tree split crit edges : 1.08 ( 0%) 0.65 ( 0%) 1.63 ( 0%) 127M ( 1%) tree PRE : 34.30 ( 2%) 14.52 ( 4%) 49.08 ( 3%) 337M ( 1%) tree code sinking : 2.92 ( 0%) 0.58 ( 0%) 3.51 ( 0%) 122M ( 1%) tree iv optimization : 6.71 ( 0%) 1.19 ( 0%) 8.46 ( 0%) 133M ( 1%) expand : 45.56 ( 3%) 8.24 ( 2%) 55.02 ( 3%) 1980M ( 9%) forward prop : 11.89 ( 1%) 1.39 ( 0%) 12.59 ( 1%) 130M ( 1%) dead store elim2 : 10.03 ( 1%) 0.70 ( 0%) 11.23 ( 1%) 138M ( 1%) loop init : 11.96 ( 1%) 4.95 ( 1%) 17.11 ( 1%) 378M ( 2%) CPROP : 22.63 ( 2%) 2.78 ( 1%) 25.19 ( 1%) 359M ( 2%) combiner : 41.39 ( 3%) 2.57 ( 1%) 43.30 ( 2%) 558M ( 2%) reload CSE regs : 22.38 ( 2%) 1.25 ( 0%) 23.06 ( 1%) 186M ( 1%) final : 32.33 ( 2%) 4.28 ( 1%) 36.75 ( 2%) 1105M ( 5%) symout : 49.04 ( 3%) 2.23 ( 1%) 52.33 ( 3%) 2517M ( 11%) var-tracking emit : 33.26 ( 2%) 1.02 ( 0%) 34.35 ( 2%) 582M ( 3%) rest of compilation : 38.05 ( 3%) 15.61 ( 4%) 52.42 ( 3%) 114M ( 1%) TOTAL :1486.02 408.79 1899.96 22512M We seem to leak some hashtables: dwarf2out.c:28850 (dwarf2out_init) 31M: 23.8% 47M 19 : 0.0% ggc cselib.c:3137 (cselib_init) 34M: 25.9% 34M 1514k: 17.3% heap tree-scalar-evolution.c:2984 (scev_initialize) 37M: 27.6% 50M 228k: 2.6% ggc and hashmaps: ipa-reference.c:1133 (ipa_reference_read_optimiz 2047k: 3.0% 3071k 9 : 0.0% heap tree-ssa.c:60 (redirect_edge_var_map_add) 4125k: 6.1% 4126k 8190 : 0.1% heap alias.c:1200 (record_alias_subset) 4510k: 6.6% 4510k 4546 : 0.0% ggc ipa-prop.h:986 (ipcp_transformation_t) 8191k: 12.0% 11M 16 : 0.0% ggc dwarf2out.c:5957 (dwarf2out_register_external_di 47M: 72.2% 71M 12 : 0.0% ggc and hashsets: ipa-devirt.c:3093 (possible_polymorphic_call_tar 15k: 0.9% 23k 8 : 0.0% heap ipa-devirt.c:1599 (add_type_duplicate) 412k: 22.2% 412k 4065 : 0.0% heap tree-ssa-threadbackward.c:40 (thread_jumps) 1432k: 77.0% 1433k 119k: 0.8% heap and vectors: tree-ssa-structalias.c:5783 (push_fields_onto_fi 8 847k: 0.3% 976k 475621: 0.8% 17k 24k tree-ssa-pre.c:334 (alloc_expression_id) 48 1125k: 0.4% 1187k 198336: 0.3% 23k 34k tree-into-ssa.c:1787 (register_new_update_single 8 1196k: 0.5% 1264k 380385: 0.6% 24k 36k ggc-page.c:1264 (add_finalizer) 8 1232k: 0.5% 1848k 43: 0.0% 77k 81k tree-ssa-structalias.c:1609 (topo_visit) 8 1302k: 0.5% 1328k 892964: 1.4% 27k 33k graphds.c:254 (graphds_dfs) 4 1469k: 0.6% 1675k 2101780: 3.4% 30k 34k dominance.c:955 (get_dominated_to_depth) 8 2251k: 0.9% 2266k 685140: 1.1% 46k 50k tree-ssa-structalias.c:410 (new_var_info) 32 2264k: 0.9% 2341k 330758: 0.5% 47k 63k tree-ssa-structalias.c:3104 (process_constraint) 48 2376k: 0.9% 2606k 405451: 0.7% 49k 83k symtab.c:612 (create_reference) 8 3314k: 1.3% 4897k 75213: 0.1% 414k 612k vec.h:1734 (copy) 48 233M:90.5% 234M 6243163:10.1% 4982k 5003k However main problem is cfg.c:202 (connect_src) 5745k: 0.2% 271M: 1.9% 1754k: 0.0% 1132k: 0.2% 7026k cfg.c:212 (connect_dest) 6307k: 0.2% 281M: 2.0% 10129k: 0.2% 2490k: 0.5% 7172k varasm.c:3359 (build_constant_desc) 7387k: 0.2% 0 : 0.0% 0 : 0.0% 0 : 0.0% 51k emit-rtl.c:486 (gen_raw_REG) 7799k: 0.2% 215M: 1.5% 96 : 0.0% 0 : 0.0% 9502k dwarf2cfi.c:2341 (add_cfis_to_fde) 8027k: 0.2% 0 : 0.0% 4906k: 0.1% 1405k: 0.3% 78k emit-rtl.c:4074 (make_jump_insn_raw) 8239k: 0.2% 93M: 0.7% 0 : 0.0% 0 : 0.0% 1442k tree-ssanames.c:308 (make_ssa_name_fn) 9130k: 0.2% 456M: 3.3% 0 : 0.0% 0 : 0.0% 6622k gimple.c:1808 (gimple_copy) 9508k: 0.3% 524M: 3.7% 8609k: 0.2% 2972k: 0.6% 7135k tree-inline.c:4879 (expand_call_inline) 9590k: 0.3% 21M: 0.2% 0 : 0.0% 0 : 0.0% 328k dwarf2cfi.c:418 (new_cfi) 10M: 0.3% 0 : 0.0% 0 : 0.0% 0 : 0.0% 444k cfg.c:266 (unchecked_make_edge) 10M: 0.3% 60M: 0.4% 355M: 6.8% 0 : 0.0% 9083k tree.c:1642 (wide_int_to_tree_1) 10M: 0.3% 2313k: 0.0% 0 : 0.0% 0 : 0.0% 548k stringpool.c:41 (stringpool_ggc_alloc) 10M: 0.3% 7055k: 0.0% 0 : 0.0% 2270k: 0.5% 588k stringpool.c:63 (alloc_node) 10M: 0.3% 12M: 0.1% 0 : 0.0% 0 : 0.0% 588k tree-phinodes.c:119 (allocate_phi_node) 11M: 0.3% 153M: 1.1% 0 : 0.0% 3539k: 0.7% 340k cgraph.c:289 (create_empty) 12M: 0.3% 0 : 0.0% 109M: 2.1% 0 : 0.0% 371k cfg.c:127 (alloc_block) 14M: 0.4% 705M: 5.0% 0 : 0.0% 0 : 0.0% 7086k tree-streamer-in.c:558 (streamer_read_tree_bitfi 22M: 0.6% 13k: 0.0% 0 : 0.0% 22k: 0.0% 64k tree-inline.c:834 (remap_block) 28M: 0.8% 159M: 1.1% 0 : 0.0% 0 : 0.0% 2009k stringpool.c:79 (ggc_alloc_string) 28M: 0.8% 5619k: 0.0% 0 : 0.0% 6658k: 1.4% 1785k dwarf2out.c:11727 (add_ranges_num) 32M: 0.9% 0 : 0.0% 32M: 0.6% 144 : 0.0% 20 tree-inline.c:5942 (copy_decl_to_var) 39M: 1.1% 51M: 0.4% 0 : 0.0% 0 : 0.0% 646k tree-inline.c:5994 (copy_decl_no_change) 78M: 2.1% 270M: 1.9% 0 : 0.0% 0 : 0.0% 2497k function.c:4438 (reorder_blocks_1) 96M: 2.6% 101M: 0.7% 0 : 0.0% 0 : 0.0% 2109k hash-table.h:802 (expand) 142M: 3.9% 18M: 0.1% 198M: 3.8% 32M: 6.9% 38k dwarf2out.c:10086 (new_loc_list) 219M: 6.0% 11M: 0.1% 0 : 0.0% 0 : 0.0% 2955k tree-streamer-in.c:637 (streamer_alloc_tree) 379M: 10.3% 426M: 3.0% 0 : 0.0% 4201k: 0.9% 9828k dwarf2out.c:5702 (new_die_raw) 434M: 11.8% 0 : 0.0% 0 : 0.0% 0 : 0.0% 5556k dwarf2out.c:1383 (new_loc_descr) 519M: 14.1% 12M: 0.1% 2880 : 0.0% 0 : 0.0% 6812k dwarf2out.c:4420 (add_dwarf_attr) 640M: 17.4% 0 : 0.0% 94M: 1.8% 4584k: 1.0% 3877k toplev.c:906 (realloc_for_line_map) 768M: 20.8% 0 : 0.0% 767M: 14.6% 255M: 54.4% 33 -------------------------------------------------------------------------------------------------------------------------------------------- GGC memory Leak Garbage Freed Overhead Times -------------------------------------------------------------------------------------------------------------------------------------------- Total 3689M:100.0% 14039M:100.0% 5254M:100.0% 470M:100.0% 391M -------------------------------------------------------------------------------------------------------------------------------------------- Clearly some function bodies leak - I will try to figure out what. But main problem is debug info. I guess debug info for whole cc1plus is large, but it would be nice if it was not in the garbage collector, for example :) Honza