From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 101950 invoked by alias); 11 Dec 2015 01:19:24 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 101937 invoked by uid 89); 11 Dec 2015 01:19:24 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.6 required=5.0 tests=AWL,BAYES_50,KAM_ASCII_DIVIDERS,KAM_LAZY_DOMAIN_SECURITY,T_FILL_THIS_FORM_SHORT,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: nikam.ms.mff.cuni.cz Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Fri, 11 Dec 2015 01:19:20 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id C60F2543B0D; Fri, 11 Dec 2015 02:19:16 +0100 (CET) Date: Fri, 11 Dec 2015 01:19:00 -0000 From: Jan Hubicka To: gcc-patches@gcc.gnu.org, rguenther@suse.de Subject: Do not decompress functions sections when copying them to ltrans Message-ID: <20151211011916.GA5527@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-SW-Source: 2015-12/txt/msg01205.txt.bz2 Hi, this patch makes WPA to copy sections w/o decompressing them. This leads to a nice /tmp usage for GCC bootstrap (about 70%) and little for Firefox. In GCC about 5% of the ltrans object file is the global decl section, while for Firefox it is 85%. I will try to figure out if there is something terribly stupid pickled there. The patch simply adds raw section i/o to lto-section-in.c and lto-section-out.c which is used by copy_function_or_variable. The catch is that WPA->ltrans stremaing is not compressed and this fact is not represented in the object file at all. We simply test flag_wpa and flag_ltrans. Now function sections born at WPA time are uncompressed, while function sections just copied are compressed and we do not know how to read them. I tried to simply turn off the non-compressed path and set compression level to minimal and then to none (which works despite the apparently outdated FIXME comments I removed). Sadly zlib manages to burn about 16% of WPA time at minimal level and about 7% at none because it computes the checksum. Clealry next stage1 it is time to switch to better compression backend. For now I added the information if section is compressed into decl_state. I am not thrilled by this but it is only way I found w/o wasting 4 bytes per every lto section (because the lto header is not really extensible and the stream is assumed to be aligned). The whole lowlevel lto streaming code is grand mess, I hope we will clean this up and get more sane headers in foreseable future. Until that time this solution does not waste extra space as it is easy to pickle the flag as part of reference. The patch saves about 7% of WPA time for firefox: phase opt and generate : 75.66 (39%) usr 1.78 (14%) sys 77.44 (37%) wall 855644 kB (21%) ggc phase stream in : 34.62 (18%) usr 1.95 (16%) sys 36.57 (18%) wall 3245604 kB (79%) ggc phase stream out : 81.89 (42%) usr 8.49 (69%) sys 90.37 (44%) wall 50 kB ( 0%) ggc ipa dead code removal : 4.33 ( 2%) usr 0.06 ( 0%) sys 4.24 ( 2%) wall 0 kB ( 0%) ggc ipa virtual call target : 25.15 (13%) usr 0.14 ( 1%) sys 25.42 (12%) wall 0 kB ( 0%) ggc ipa cp : 3.92 ( 2%) usr 0.21 ( 2%) sys 4.18 ( 2%) wall 340698 kB ( 8%) ggc ipa inlining heuristics : 24.12 (12%) usr 0.38 ( 3%) sys 24.37 (12%) wall 500427 kB (12%) ggc lto stream inflate : 7.07 ( 4%) usr 0.38 ( 3%) sys 7.33 ( 4%) wall 0 kB ( 0%) ggc ipa lto gimple in : 1.95 ( 1%) usr 0.61 ( 5%) sys 2.42 ( 1%) wall 324875 kB ( 8%) ggc ipa lto gimple out : 9.16 ( 5%) usr 1.64 (13%) sys 10.49 ( 5%) wall 50 kB ( 0%) ggc ipa lto decl in : 21.25 (11%) usr 1.01 ( 8%) sys 22.37 (11%) wall 2348869 kB (57%) ggc ipa lto decl out : 67.33 (34%) usr 1.66 (13%) sys 68.96 (33%) wall 0 kB ( 0%) ggc ipa lto constructors out: 1.39 ( 1%) usr 0.38 ( 3%) sys 2.18 ( 1%) wall 0 kB ( 0%) ggc ipa lto decl merge : 2.12 ( 2%) usr 0.00 ( 0%) sys 2.12 ( 2%) wall 13737 kB ( 0%) ggc ipa reference : 2.14 ( 2%) usr 0.00 ( 0%) sys 2.13 ( 2%) wall 0 kB ( 0%) ggc ipa pure const : 2.29 ( 2%) usr 0.01 ( 0%) sys 2.35 ( 2%) wall 0 kB ( 0%) ggc ipa icf : 9.02 ( 7%) usr 0.18 ( 2%) sys 9.72 ( 7%) wall 19203 kB ( 0%) ggc TOTAL : 195.27 12.37 207.64 4103297 kB phase opt and generate : 79.00 (38%) usr 1.61 (13%) sys 80.61 (36%) wall 1000597 kB (24%) ggc phase stream in : 33.93 (16%) usr 1.91 (15%) sys 35.83 (16%) wall 3242293 kB (76%) ggc phase stream out : 96.90 (46%) usr 9.19 (72%) sys 106.09 (48%) wall 52 kB ( 0%) ggc garbage collection : 2.94 ( 1%) usr 0.00 ( 0%) sys 2.93 ( 1%) wall 0 kB ( 0%) ggc ipa dead code removal : 4.60 ( 2%) usr 0.04 ( 0%) sys 4.53 ( 2%) wall 0 kB ( 0%) ggc ipa virtual call target : 24.48 (12%) usr 0.14 ( 1%) sys 24.76 (11%) wall 0 kB ( 0%) ggc ipa cp : 4.92 ( 2%) usr 0.41 ( 3%) sys 5.31 ( 2%) wall 502843 kB (12%) ggc ipa inlining heuristics : 23.72 (11%) usr 0.23 ( 2%) sys 23.92 (11%) wall 490927 kB (12%) ggc lto stream inflate : 14.35 ( 7%) usr 0.35 ( 3%) sys 15.22 ( 7%) wall 0 kB ( 0%) ggc ipa lto gimple in : 1.79 ( 1%) usr 0.57 ( 4%) sys 2.46 ( 1%) wall 324857 kB ( 8%) ggc ipa lto gimple out : 9.98 ( 5%) usr 1.45 (11%) sys 11.05 ( 5%) wall 52 kB ( 0%) ggc ipa lto decl in : 21.01 (10%) usr 0.91 ( 7%) sys 21.90 (10%) wall 2345561 kB (55%) ggc ipa lto decl out : 73.55 (35%) usr 2.09 (16%) sys 75.67 (34%) wall 0 kB ( 0%) ggc ipa lto constructors out: 1.87 ( 1%) usr 0.32 ( 3%) sys 2.18 ( 1%) wall 0 kB ( 0%) ggc ipa lto decl merge : 2.06 ( 1%) usr 0.00 ( 0%) sys 2.05 ( 1%) wall 13737 kB ( 0%) ggc whopr wpa I/O : 2.84 ( 1%) usr 5.14 (40%) sys 7.96 ( 4%) wall 0 kB ( 0%) ggc whopr partitioning : 3.83 ( 2%) usr 0.01 ( 0%) sys 3.84 ( 2%) wall 5958 kB ( 0%) ggc ipa reference : 2.63 ( 1%) usr 0.00 ( 0%) sys 2.64 ( 1%) wall 0 kB ( 0%) ggc ipa icf : 8.23 ( 4%) usr 0.12 ( 1%) sys 8.32 ( 4%) wall 19203 kB ( 0%) ggc TOTAL : 209.83 12.71 222.54 4244939 kB This now compares well to 5.3: Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1989 kB ( 0%) ggc phase opt and generate : 68.61 (31%) usr 2.41 (14%) sys 77.67 (29%) wall 1189579 kB (27%) ggc phase stream in : 36.38 (16%) usr 2.32 (14%) sys 56.20 (21%) wall 3168787 kB (73%) ggc phase stream out : 113.37 (51%) usr 11.90 (71%) sys 130.49 (49%) wall 112 kB ( 0%) ggc phase finalize : 3.40 ( 2%) usr 0.13 ( 1%) sys 3.55 ( 1%) wall 0 kB ( 0%) ggc garbage collection : 6.13 ( 3%) usr 0.01 ( 0%) sys 6.18 ( 2%) wall 0 kB ( 0%) ggc ipa dead code removal : 4.74 ( 2%) usr 0.05 ( 0%) sys 5.09 ( 2%) wall 0 kB ( 0%) ggc ipa virtual call target : 11.29 ( 5%) usr 0.15 ( 1%) sys 11.20 ( 4%) wall 1 kB ( 0%) ggc ipa cp : 5.22 ( 2%) usr 0.21 ( 1%) sys 5.51 ( 2%) wall 507623 kB (12%) ggc ipa inlining heuristics : 24.11 (11%) usr 0.33 ( 2%) sys 24.67 ( 9%) wall 497487 kB (11%) ggc ipa lto gimple in : 4.20 ( 2%) usr 1.08 ( 6%) sys 10.73 ( 4%) wall 467276 kB (11%) ggc ipa lto gimple out : 17.57 ( 8%) usr 1.92 (11%) sys 23.61 ( 9%) wall 112 kB ( 0%) ggc ipa lto decl in : 26.19 (12%) usr 1.20 ( 7%) sys 31.62 (12%) wall 2242394 kB (51%) ggc ipa lto decl out : 89.09 (40%) usr 3.64 (22%) sys 92.79 (35%) wall 0 kB ( 0%) ggc ipa lto constructors in : 0.79 ( 0%) usr 0.28 ( 2%) sys 14.33 ( 5%) wall 17992 kB ( 0%) ggc ipa lto constructors out: 2.57 ( 1%) usr 0.41 ( 2%) sys 4.02 ( 2%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 1.11 ( 1%) usr 0.33 ( 2%) sys 1.81 ( 1%) wall 432544 kB (10%) ggc ipa lto decl merge : 2.47 ( 1%) usr 0.00 ( 0%) sys 2.47 ( 1%) wall 8191 kB ( 0%) ggc ipa lto cgraph merge : 1.91 ( 1%) usr 0.01 ( 0%) sys 1.97 ( 1%) wall 14717 kB ( 0%) ggc whopr wpa I/O : 2.92 ( 1%) usr 5.93 (35%) sys 8.84 ( 3%) wall 0 kB ( 0%) ggc whopr partitioning : 3.91 ( 2%) usr 0.02 ( 0%) sys 3.93 ( 1%) wall 6001 kB ( 0%) ggc ipa icf : 7.77 ( 4%) usr 0.19 ( 1%) sys 8.05 ( 3%) wall 22534 kB ( 1%) ggc TOTAL : 221.76 16.76 267.92 4360470 kB Except that I really need to do something with virtual call targets. As the quality of information improved by improved TBAA we now do more walks. The savings for cc1 build are bigger and incremental linking improvements eveyr bigger (about 50%), but I accidentaly removed the logs... lto-bootstrapped/regtested x86_64-linux, OK? * cgraph.c (cgraph_node::get_untransformed_body): Pass compressed flag to lto_get_section_data. * varpool.c (varpool_node::get_constructor): Likewise. * lto-section-in.c (lto_get_section_data): Add new flag decompress. (lto_free_section_data): Likewise. (lto_get_raw_section_data): New function. (lto_free_raw_section_data): New function. (copy_function_or_variable): Copy sections w/o decompressing. (lto_output_decl_state_refs): Picke compressed bit. * lto-streamer.h (lto_in_decl_state): New flag compressed. (lto_out_decl_state): Likewise. (lto_get_section_data, lto_free_section_data): Update prototypes (lto_get_raw_section_data, lto_free_raw_section_data): Declare. (lto_write_raw_data): Declare. (lto_begin_section): Remove FIXME. (lto_write_raw_data): New function. (lto_write_stream): Remove FIXME. (lto_new_out_decl_state): Set compressed flag. * lto.c (lto_read_in_decl_state): Unpickle compressed bit. Index: cgraph.c =================================================================== --- cgraph.c (revision 231546) +++ cgraph.c (working copy) @@ -3251,9 +3251,11 @@ cgraph_node::get_untransformed_body (voi /* We may have renamed the declaration, e.g., a static function. */ name = lto_get_decl_name_mapping (file_data, name); + struct lto_in_decl_state *decl_state + = lto_get_function_in_decl_state (file_data, decl); data = lto_get_section_data (file_data, LTO_section_function_body, - name, &len); + name, &len, decl_state->compressed); if (!data) fatal_error (input_location, "%s: section %s is missing", file_data->file_name, @@ -3264,7 +3266,7 @@ cgraph_node::get_untransformed_body (voi lto_input_function_body (file_data, this, data); lto_stats.num_function_bodies++; lto_free_section_data (file_data, LTO_section_function_body, name, - data, len); + data, len, decl_state->compressed); lto_free_function_in_decl_state_for_node (this); /* Keep lto file data so ipa-inline-analysis knows about cross module inlining. */ Index: lto-section-in.c =================================================================== --- lto-section-in.c (revision 231546) +++ lto-section-in.c (working copy) @@ -130,7 +130,7 @@ const char * lto_get_section_data (struct lto_file_decl_data *file_data, enum lto_section_type section_type, const char *name, - size_t *len) + size_t *len, bool decompress) { const char *data = (get_section_f) (file_data, section_type, name, len); const size_t header_length = sizeof (struct lto_data_header); @@ -142,9 +142,10 @@ lto_get_section_data (struct lto_file_de if (data == NULL) return NULL; - /* FIXME lto: WPA mode does not write compressed sections, so for now - suppress uncompression if flag_ltrans. */ - if (!flag_ltrans) + /* WPA->ltrans streams are not compressed with exception of function bodies + and variable initializers that has been verbatim copied from earlier + compilations. */ + if (!flag_ltrans || decompress) { /* Create a mapping header containing the underlying data and length, and prepend this to the uncompression buffer. The uncompressed data @@ -170,6 +171,16 @@ lto_get_section_data (struct lto_file_de return data; } +/* Get the section data without any header parsing or uncompression. */ + +const char * +lto_get_raw_section_data (struct lto_file_decl_data *file_data, + enum lto_section_type section_type, + const char *name, + size_t *len) +{ + return (get_section_f) (file_data, section_type, name, len); +} /* Free the data found from the above call. The first three parameters are the same as above. DATA is the data to be freed and @@ -180,7 +191,7 @@ lto_free_section_data (struct lto_file_d enum lto_section_type section_type, const char *name, const char *data, - size_t len) + size_t len, bool decompress) { const size_t header_length = sizeof (struct lto_data_header); const char *real_data = data - header_length; @@ -189,9 +200,7 @@ lto_free_section_data (struct lto_file_d gcc_assert (free_section_f); - /* FIXME lto: WPA mode does not write compressed sections, so for now - suppress uncompression mapping if flag_ltrans. */ - if (flag_ltrans) + if (flag_ltrans && !decompress) { (free_section_f) (file_data, section_type, name, data, len); return; @@ -203,6 +212,17 @@ lto_free_section_data (struct lto_file_d free (CONST_CAST (char *, real_data)); } +/* Free data allocated by lto_get_raw_section_data. */ + +void +lto_free_raw_section_data (struct lto_file_decl_data *file_data, + enum lto_section_type section_type, + const char *name, + const char *data, + size_t len) +{ + (free_section_f) (file_data, section_type, name, data, len); +} /* Load a section of type SECTION_TYPE from FILE_DATA, parse the header and then return an input block pointing to the section. The Index: varpool.c =================================================================== --- varpool.c (revision 231546) +++ varpool.c (working copy) @@ -296,9 +303,11 @@ varpool_node::get_constructor (void) /* We may have renamed the declaration, e.g., a static function. */ name = lto_get_decl_name_mapping (file_data, name); + struct lto_in_decl_state *decl_state + = lto_get_function_in_decl_state (file_data, decl); data = lto_get_section_data (file_data, LTO_section_function_body, - name, &len); + name, &len, decl_state->compressed); if (!data) fatal_error (input_location, "%s: section %s is missing", file_data->file_name, @@ -308,7 +317,7 @@ varpool_node::get_constructor (void) gcc_assert (DECL_INITIAL (decl) != error_mark_node); lto_stats.num_function_bodies++; lto_free_section_data (file_data, LTO_section_function_body, name, - data, len); + data, len, decl_state->compressed); lto_free_function_in_decl_state_for_node (this); timevar_pop (TV_IPA_LTO_CTORS_IN); return DECL_INITIAL (decl); Index: lto-streamer-out.c =================================================================== --- lto-streamer-out.c (revision 231546) +++ lto-streamer-out.c (working copy) @@ -2191,22 +2224,23 @@ copy_function_or_variable (struct symtab struct lto_in_decl_state *in_state; struct lto_out_decl_state *out_state = lto_get_out_decl_state (); - lto_begin_section (section_name, !flag_wpa); + lto_begin_section (section_name, false); free (section_name); /* We may have renamed the declaration, e.g., a static function. */ name = lto_get_decl_name_mapping (file_data, name); - data = lto_get_section_data (file_data, LTO_section_function_body, - name, &len); + data = lto_get_raw_section_data (file_data, LTO_section_function_body, + name, &len); gcc_assert (data); /* Do a bit copy of the function body. */ - lto_write_data (data, len); + lto_write_raw_data (data, len); /* Copy decls. */ in_state = lto_get_function_in_decl_state (node->lto_file_data, function); + out_state->compressed = in_state->compressed; gcc_assert (in_state); for (i = 0; i < LTO_N_DECL_STREAMS; i++) @@ -2224,8 +2258,8 @@ copy_function_or_variable (struct symtab encoder->trees.safe_push ((*trees)[j]); } - lto_free_section_data (file_data, LTO_section_function_body, name, - data, len); + lto_free_raw_section_data (file_data, LTO_section_function_body, name, + data, len); lto_end_section (); } @@ -2431,6 +2465,7 @@ lto_output_decl_state_refs (struct outpu decl = (state->fn_decl) ? state->fn_decl : void_type_node; streamer_tree_cache_lookup (ob->writer_cache, decl, &ref); gcc_assert (ref != (unsigned)-1); + ref = ref * 2 + (state->compressed ? 1 : 0); lto_write_data (&ref, sizeof (uint32_t)); for (i = 0; i < LTO_N_DECL_STREAMS; i++) Index: lto/lto-symtab.c =================================================================== --- lto/lto-symtab.c (revision 231548) +++ lto/lto-symtab.c (working copy) @@ -883,6 +883,11 @@ lto_symtab_merge_symbols_1 (symtab_node else { DECL_INITIAL (e->decl) = error_mark_node; + if (e->lto_file_data) + { + lto_free_function_in_decl_state_for_node (e); + e->lto_file_data = NULL; + } symtab->call_varpool_removal_hooks (dyn_cast (e)); } e->remove_all_references (); Index: lto/lto.c =================================================================== --- lto/lto.c (revision 231546) +++ lto/lto.c (working copy) @@ -234,6 +234,8 @@ lto_read_in_decl_state (struct data_in * uint32_t i, j; ix = *data++; + state->compressed = ix & 1; + ix /= 2; decl = streamer_tree_cache_get_tree (data_in->reader_cache, ix); if (!VAR_OR_FUNCTION_DECL_P (decl)) { Index: lto-streamer.h =================================================================== --- lto-streamer.h (revision 231546) +++ lto-streamer.h (working copy) @@ -504,6 +505,9 @@ struct GTY((for_user)) lto_in_decl_state /* If this in-decl state is associated with a function. FN_DECL point to the FUNCTION_DECL. */ tree fn_decl; + + /* True if decl state is compressed. */ + bool compressed; }; typedef struct lto_in_decl_state *lto_in_decl_state_ptr; @@ -537,6 +541,9 @@ struct lto_out_decl_state /* If this out-decl state belongs to a function, fn_decl points to that function. Otherwise, it is NULL. */ tree fn_decl; + + /* True if decl state is compressed. */ + bool compressed; }; typedef struct lto_out_decl_state *lto_out_decl_state_ptr; @@ -761,10 +768,18 @@ extern void lto_set_in_hooks (struct lto extern struct lto_file_decl_data **lto_get_file_decl_data (void); extern const char *lto_get_section_data (struct lto_file_decl_data *, enum lto_section_type, - const char *, size_t *); + const char *, size_t *, + bool decompress = false); +extern const char *lto_get_raw_section_data (struct lto_file_decl_data *, + enum lto_section_type, + const char *, size_t *); extern void lto_free_section_data (struct lto_file_decl_data *, - enum lto_section_type, - const char *, const char *, size_t); + enum lto_section_type, + const char *, const char *, size_t, + bool decompress = false); +extern void lto_free_raw_section_data (struct lto_file_decl_data *, + enum lto_section_type, + const char *, const char *, size_t); extern htab_t lto_create_renaming_table (void); extern void lto_record_renamed_decl (struct lto_file_decl_data *, const char *, const char *); @@ -785,6 +800,7 @@ extern void lto_value_range_error (const extern void lto_begin_section (const char *, bool); extern void lto_end_section (void); extern void lto_write_data (const void *, unsigned int); +extern void lto_write_raw_data (const void *, unsigned int); extern void lto_write_stream (struct lto_output_stream *); extern bool lto_output_decl_index (struct lto_output_stream *, struct lto_tree_ref_encoder *, Index: lto-section-out.c =================================================================== --- lto-section-out.c (revision 231546) +++ lto-section-out.c (working copy) @@ -66,9 +66,6 @@ lto_begin_section (const char *name, boo { lang_hooks.lto.begin_section (name); - /* FIXME lto: for now, suppress compression if the lang_hook that appends - data is anything other than assembler output. The effect here is that - we get compression of IL only in non-ltrans object files. */ gcc_assert (compression_stream == NULL); if (compress) compression_stream = lto_start_compression (lto_append_data, NULL); @@ -99,6 +96,14 @@ lto_write_data (const void *data, unsign lang_hooks.lto.append_data ((const char *)data, size, NULL); } +/* Write SIZE bytes starting at DATA to the assembler. */ + +void +lto_write_raw_data (const void *data, unsigned int size) +{ + lang_hooks.lto.append_data ((const char *)data, size, NULL); +} + /* Write all of the chars in OBS to the assembler. Recycle the blocks in obs as this is being done. */ @@ -123,10 +128,6 @@ lto_write_stream (struct lto_output_stre if (!next_block) num_chars -= obs->left_in_block; - /* FIXME lto: WPA mode uses an ELF function as a lang_hook to append - output data. This hook is not happy with the way that compression - blocks up output differently to the way it's blocked here. So for - now, we don't compress WPA output. */ if (compression_stream) lto_compress_block (compression_stream, base, num_chars); else @@ -295,6 +296,9 @@ lto_new_out_decl_state (void) for (i = 0; i < LTO_N_DECL_STREAMS; i++) lto_init_tree_ref_encoder (&state->streams[i]); + /* At WPA time we do not compress sections by default. */ + state->compressed = !flag_wpa; + return state; }