public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Use separate sections to stream non-trivial constructors
@ 2014-07-11  9:18 Jan Hubicka
  2014-07-11 11:32 ` Richard Biener
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Hubicka @ 2014-07-11  9:18 UTC (permalink / raw)
  To: gcc-patches, rguenther

Hi,
since we both agreed offlining constructors from global decl stream is a good
idea, I went ahead and implemented it.  I would like to followup by an
cleanups; for example the sections are still tagged as function sections, but I
would like to do it incrementally. There is quite some uglyness in the way we
handle function sections and the patch started to snowball very quickly.

The patch conceptually copies what we do for functions and re-uses most of
infrastructure. varpool_get_constructor is cgraph_get_body (i.e. mean of
getting function in) and it is used by output machinery, by ipa-visibility
while rewritting the constructor and by ctor_for_folding (which makes us to
load the ctor whenever it is needed by ipa-cp or ipa-devirt).

I kept get_symbol_initial_value as an authority to decide if we want to encode
given constructor or not.  The section itself for trivial ctor is about 25
bytes and with header it is probably close to double of it. Currently the heuristic
is to offline only constructors that are CONSTRUCTOR and keep simple expressions
inline.  We may want to tweak it.

The patch does not bring miraculous savings to firefox WPA, but it does some:

GGC memory after global stream is read goes from 1376898k to 1250533k
overall GGC allocations from 4156478 kB to 4012462 kB
read 11006599 SCCs of average size 1.907692 -> read 9119433 SCCs of average size 2.037867
20997206 tree bodies read in total -> 18584194 tree bodies read in total
Size of mmap'd section decls: 299540188 bytes -> Size of mmap'd section decls: 271557265 bytes
Size of mmap'd section function_body: 5711078 bytes -> Size of mmap'd section function_body: 7548680 bytes 

Things would be better if ipa-visibility and ipa-devirt did not load most of
the virtual tables into memory (still better than loading each into memory 20
times at average).  I will work on that incrementally. We load 10311 ctors into
memory at WPA time.

Note that firefox seems to feature really huge data segment these days.
http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html

Bootstrapped/regtested x86_64-linux, tested with firefox, lto bootstrap in progress, OK?

	* vapool.c: Include tree-ssa-alias.h, gimple.h and lto-streamer.h
	(varpool_get_constructor): New function.
	(ctor_for_folding): Use it.
	(varpool_assemble_decl): Likewise.
	* lto-streamer.h (struct output_block): Turn cgraph_node
	to symbol filed.
	(lto_input_variable_constructor): Declare.
	* ipa-visibility.c (function_and_variable_visibility): Use
	varpool_get_constructor.
	* cgraph.h (varpool_get_constructor): Declare.
	* lto-streamer-out.c (get_symbol_initial_value): Take encoder
	parameter; return error_mark_node for non-trivial constructors.
	(lto_write_tree_1, DFS_write_tree): UPdate use of
	get_symbol_initial_value.
	(output_function): Update initialization of symbol.
	(output_constructor): New function.
	(copy_function): Rename to ..
	(copy_function_or_variable): ... this one; handle vars too.
	(lto_output): Output variable sections.
	* lto-streamer-in.c (input_constructor): New function.
	(lto_read_body): Rename from ...
	(lto_read_body_or_constructor): ... this one; handle vars
	too.
	(lto_input_variable_constructor): New function.
	* ipa-prop.c (ipa_prop_write_jump_functions,
	ipa_prop_write_all_agg_replacement): Update.
Index: varpool.c
===================================================================
--- varpool.c	(revision 212426)
+++ varpool.c	(working copy)
@@ -35,6 +35,9 @@ along with GCC; see the file COPYING3.
 #include "gimple-expr.h"
 #include "flags.h"
 #include "pointer-set.h"
+#include "tree-ssa-alias.h"
+#include "gimple.h"
+#include "lto-streamer.h"
 
 const char * const tls_model_names[]={"none", "tls-emulated", "tls-real",
 				      "tls-global-dynamic", "tls-local-dynamic",
@@ -253,6 +256,41 @@ varpool_node_for_asm (tree asmname)
     return NULL;
 }
 
+/* When doing LTO, read NODE's constructor from disk if it is not already present.  */
+
+tree
+varpool_get_constructor (struct varpool_node *node)
+{
+  struct lto_file_decl_data *file_data;
+  const char *data, *name;
+  size_t len;
+  tree decl = node->decl;
+
+  if (DECL_INITIAL (node->decl) != error_mark_node
+      || !in_lto_p)
+    return DECL_INITIAL (node->decl);
+
+  file_data = node->lto_file_data;
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* We may have renamed the declaration, e.g., a static function.  */
+  name = lto_get_decl_name_mapping (file_data, name);
+
+  data = lto_get_section_data (file_data, LTO_section_function_body,
+			       name, &len);
+  if (!data)
+    fatal_error ("%s: section %s is missing",
+		 file_data->file_name,
+		 name);
+
+  lto_input_variable_constructor (file_data, node, data);
+  lto_stats.num_function_bodies++;
+  lto_free_section_data (file_data, LTO_section_function_body, name,
+			 data, len);
+  lto_free_function_in_decl_state_for_node (node);
+  return DECL_INITIAL (node->decl);
+}
+
 /* Return if DECL is constant and its initial value is known (so we can do
    constant folding using DECL_INITIAL (decl)).
    Return ERROR_MARK_NODE when value is unknown.  */
@@ -314,6 +352,9 @@ ctor_for_folding (tree decl)
   if (DECL_VIRTUAL_P (real_decl))
     {
       gcc_checking_assert (TREE_READONLY (real_decl));
+      if (DECL_INITIAL (real_decl) == error_mark_node
+	  && (node = varpool_get_node (real_decl)))
+	return varpool_get_constructor (node);
       if (DECL_INITIAL (real_decl))
 	return DECL_INITIAL (real_decl);
       else
@@ -349,6 +390,9 @@ ctor_for_folding (tree decl)
 
      ??? Previously we behaved so for scalar variables but not for array
      accesses.  */
+  if (DECL_INITIAL (real_decl) == error_mark_node
+      && (node = varpool_get_node (real_decl)))
+    return varpool_get_constructor (node);
   return DECL_INITIAL (real_decl);
 }
 
@@ -471,6 +515,7 @@ varpool_assemble_decl (varpool_node *nod
   if (!node->in_other_partition
       && !DECL_EXTERNAL (decl))
     {
+      varpool_get_constructor (node);
       assemble_variable (decl, 0, 1, 0);
       gcc_assert (TREE_ASM_WRITTEN (decl));
       node->definition = true;
Index: lto-streamer.h
===================================================================
--- lto-streamer.h	(revision 212426)
+++ lto-streamer.h	(working copy)
@@ -685,9 +685,9 @@ struct output_block
      far and the indexes assigned to them.  */
   hash_table<string_slot_hasher> *string_hash_table;
 
-  /* The current cgraph_node that we are currently serializing.  Null
+  /* The current symbol that we are currently serializing.  Null
      if we are serializing something else.  */
-  struct cgraph_node *cgraph_node;
+  struct symtab_node *symbol;
 
   /* These are the last file and line that were seen in the stream.
      If the current node differs from these, it needs to insert
@@ -830,6 +830,9 @@ extern void lto_reader_init (void);
 extern void lto_input_function_body (struct lto_file_decl_data *,
 				     struct cgraph_node *,
 				     const char *);
+extern void lto_input_variable_constructor (struct lto_file_decl_data *,
+					    struct varpool_node *,
+					    const char *);
 extern void lto_input_constructors_and_inits (struct lto_file_decl_data *,
 					      const char *);
 extern void lto_input_toplevel_asms (struct lto_file_decl_data *, int);
Index: ipa-visibility.c
===================================================================
--- ipa-visibility.c	(revision 212426)
+++ ipa-visibility.c	(working copy)
@@ -686,6 +686,8 @@ function_and_variable_visibility (bool w
 	  if (found)
 	    {
 	      struct pointer_set_t *visited_nodes = pointer_set_create ();
+
+	      varpool_get_constructor (vnode);
 	      walk_tree (&DECL_INITIAL (vnode->decl),
 			 update_vtable_references, NULL, visited_nodes);
 	      pointer_set_destroy (visited_nodes);
Index: cgraph.h
===================================================================
--- cgraph.h	(revision 212426)
+++ cgraph.h	(working copy)
@@ -1142,6 +1142,7 @@ void varpool_add_new_variable (tree);
 void symtab_initialize_asm_name_hash (void);
 void symtab_prevail_in_asm_name_hash (symtab_node *node);
 void varpool_remove_initializer (varpool_node *);
+tree varpool_get_constructor (struct varpool_node *node);
 
 /* In cgraph.c */
 extern void change_decl_assembler_name (tree, tree);
Index: lto-streamer-out.c
===================================================================
--- lto-streamer-out.c	(revision 212426)
+++ lto-streamer-out.c	(working copy)
@@ -318,7 +319,7 @@ lto_is_streamable (tree expr)
 /* For EXPR lookup and return what we want to stream to OB as DECL_INITIAL.  */
 
 static tree
-get_symbol_initial_value (struct output_block *ob, tree expr)
+get_symbol_initial_value (lto_symtab_encoder_t encoder, tree expr)
 {
   gcc_checking_assert (DECL_P (expr)
 		       && TREE_CODE (expr) != FUNCTION_DECL
@@ -331,15 +332,13 @@ get_symbol_initial_value (struct output_
       && !DECL_IN_CONSTANT_POOL (expr)
       && initial)
     {
-      lto_symtab_encoder_t encoder;
       varpool_node *vnode;
-
-      encoder = ob->decl_state->symtab_node_encoder;
-      vnode = varpool_get_node (expr);
-      if (!vnode
-	  || !lto_symtab_encoder_encode_initializer_p (encoder,
-						       vnode))
-	initial = error_mark_node;
+      /* Extra section needs about 30 bytes; do not produce it for simple
+	 scalar values.  */
+      if (TREE_CODE (DECL_INITIAL (expr)) == CONSTRUCTOR
+	  || !(vnode = varpool_get_node (expr))
+	  || !lto_symtab_encoder_encode_initializer_p (encoder, vnode))
+        initial = error_mark_node;
     }
 
   return initial;
@@ -369,7 +368,8 @@ lto_write_tree_1 (struct output_block *o
       && TREE_CODE (expr) != TRANSLATION_UNIT_DECL)
     {
       /* Handle DECL_INITIAL for symbols.  */
-      tree initial = get_symbol_initial_value (ob, expr);
+      tree initial = get_symbol_initial_value
+			 (ob->decl_state->symtab_node_encoder, expr);
       stream_write_tree (ob, initial, ref_p);
     }
 }
@@ -1195,7 +1286,8 @@ DFS_write_tree (struct output_block *ob,
 	      && TREE_CODE (expr) != TRANSLATION_UNIT_DECL)
 	    {
 	      /* Handle DECL_INITIAL for symbols.  */
-	      tree initial = get_symbol_initial_value (ob, expr);
+	      tree initial = get_symbol_initial_value (ob->decl_state->symtab_node_encoder,
+						       expr);
 	      DFS_write_tree (ob, cstate, initial, ref_p, ref_p);
 	    }
 	}
@@ -1808,7 +1900,7 @@ output_function (struct cgraph_node *nod
   ob = create_output_block (LTO_section_function_body);
 
   clear_line_info (ob);
-  ob->cgraph_node = node;
+  ob->symbol = node;
 
   gcc_assert (current_function_decl == NULL_TREE && cfun == NULL);
 
@@ -1899,6 +1991,32 @@ output_function (struct cgraph_node *nod
   destroy_output_block (ob);
 }
 
+/* Output the body of function NODE->DECL.  */
+
+static void
+output_constructor (struct varpool_node *node)
+{
+  tree var = node->decl;
+  struct output_block *ob;
+
+  ob = create_output_block (LTO_section_function_body);
+
+  clear_line_info (ob);
+  ob->symbol = node;
+
+  /* Make string 0 be a NULL string.  */
+  streamer_write_char_stream (ob->string_stream, 0);
+
+  /* Output DECL_INITIAL for the function, which contains the tree of
+     lexical scopes.  */
+  stream_write_tree (ob, DECL_INITIAL (var), true);
+
+  /* Create a section to hold the pickled output of this function.   */
+  produce_asm (ob, var);
+
+  destroy_output_block (ob);
+}
+
 
 /* Emit toplevel asms.  */
 
@@ -1957,10 +2075,10 @@ lto_output_toplevel_asms (void)
 }
 
 
-/* Copy the function body of NODE without deserializing. */
+/* Copy the function body or variable constructor of NODE without deserializing. */
 
 static void
-copy_function (struct cgraph_node *node)
+copy_function_or_variable (struct symtab_node *node)
 {
   tree function = node->decl;
   struct lto_file_decl_data *file_data = node->lto_file_data;
@@ -2072,7 +2190,7 @@ lto_output (void)
 	      if (gimple_has_body_p (node->decl) || !flag_wpa)
 		output_function (node);
 	      else
-		copy_function (node);
+		copy_function_or_variable (node);
 	      gcc_assert (lto_get_out_decl_state () == decl_state);
 	      lto_pop_out_decl_state ();
 	      lto_record_function_out_decl_state (node->decl, decl_state);
@@ -2085,6 +2203,25 @@ lto_output (void)
 	  tree ctor = DECL_INITIAL (node->decl);
 	  if (ctor && !in_lto_p)
 	    walk_tree (&ctor, wrap_refs, NULL, NULL);
+	  if (get_symbol_initial_value (encoder, node->decl) == error_mark_node
+	      && lto_symtab_encoder_encode_initializer_p (encoder, node)
+	      && !node->alias)
+	    {
+#ifdef ENABLE_CHECKING
+	      gcc_assert (!bitmap_bit_p (output, DECL_UID (node->decl)));
+	      bitmap_set_bit (output, DECL_UID (node->decl));
+#endif
+	      decl_state = lto_new_out_decl_state ();
+	      lto_push_out_decl_state (decl_state);
+	      if (DECL_INITIAL (node->decl) != error_mark_node
+		  || !flag_wpa)
+		output_constructor (node);
+	      else
+		copy_function_or_variable (node);
+	      gcc_assert (lto_get_out_decl_state () == decl_state);
+	      lto_pop_out_decl_state ();
+	      lto_record_function_out_decl_state (node->decl, decl_state);
+	    }
 	}
     }
 
Index: lto-streamer-in.c
===================================================================
--- lto-streamer-in.c	(revision 212426)
+++ lto-streamer-in.c	(working copy)
@@ -1029,6 +1029,15 @@ input_function (tree fn_decl, struct dat
   pop_cfun ();
 }
 
+/* Read the body of function FN_DECL from DATA_IN using input block IB.  */
+
+static void
+input_constructor (tree var, struct data_in *data_in,
+		   struct lto_input_block *ib)
+{
+  DECL_INITIAL (var) = stream_read_tree (ib, data_in);
+}
+
 
 /* Read the body from DATA for function NODE and fill it in.
    FILE_DATA are the global decls and types.  SECTION_TYPE is either
@@ -1037,8 +1046,8 @@ input_function (tree fn_decl, struct dat
    that function.  */
 
 static void
-lto_read_body (struct lto_file_decl_data *file_data, struct cgraph_node *node,
-	       const char *data, enum lto_section_type section_type)
+lto_read_body_or_constructor (struct lto_file_decl_data *file_data, struct symtab_node *node,
+			      const char *data, enum lto_section_type section_type)
 {
   const struct lto_function_header *header;
   struct data_in *data_in;
@@ -1050,19 +1059,32 @@ lto_read_body (struct lto_file_decl_data
   tree fn_decl = node->decl;
 
   header = (const struct lto_function_header *) data;
-  cfg_offset = sizeof (struct lto_function_header);
-  main_offset = cfg_offset + header->cfg_size;
-  string_offset = main_offset + header->main_size;
-
-  LTO_INIT_INPUT_BLOCK (ib_cfg,
-		        data + cfg_offset,
-			0,
-			header->cfg_size);
-
-  LTO_INIT_INPUT_BLOCK (ib_main,
-			data + main_offset,
-			0,
-			header->main_size);
+  if (TREE_CODE (node->decl) == FUNCTION_DECL)
+    {
+      cfg_offset = sizeof (struct lto_function_header);
+      main_offset = cfg_offset + header->cfg_size;
+      string_offset = main_offset + header->main_size;
+
+      LTO_INIT_INPUT_BLOCK (ib_cfg,
+			    data + cfg_offset,
+			    0,
+			    header->cfg_size);
+
+      LTO_INIT_INPUT_BLOCK (ib_main,
+			    data + main_offset,
+			    0,
+			    header->main_size);
+    }
+  else
+    {
+      main_offset = sizeof (struct lto_function_header);
+      string_offset = main_offset + header->main_size;
+
+      LTO_INIT_INPUT_BLOCK (ib_main,
+			    data + main_offset,
+			    0,
+			    header->main_size);
+    }
 
   data_in = lto_data_in_create (file_data, data + string_offset,
 			      header->string_size, vNULL);
@@ -1082,7 +1104,10 @@ lto_read_body (struct lto_file_decl_data
 
       /* Set up the struct function.  */
       from = data_in->reader_cache->nodes.length ();
-      input_function (fn_decl, data_in, &ib_main, &ib_cfg);
+      if (TREE_CODE (node->decl) == FUNCTION_DECL)
+        input_function (fn_decl, data_in, &ib_main, &ib_cfg);
+      else
+        input_constructor (fn_decl, data_in, &ib_main);
       /* And fixup types we streamed locally.  */
 	{
 	  struct streamer_tree_cache_d *cache = data_in->reader_cache;
@@ -1124,7 +1149,17 @@ void
 lto_input_function_body (struct lto_file_decl_data *file_data,
 			 struct cgraph_node *node, const char *data)
 {
-  lto_read_body (file_data, node, data, LTO_section_function_body);
+  lto_read_body_or_constructor (file_data, node, data, LTO_section_function_body);
+}
+
+/* Read the body of NODE using DATA.  FILE_DATA holds the global
+   decls and types.  */
+
+void
+lto_input_variable_constructor (struct lto_file_decl_data *file_data,
+				struct varpool_node *node, const char *data)
+{
+  lto_read_body_or_constructor (file_data, node, data, LTO_section_function_body);
 }
 
 
Index: ipa-prop.c
===================================================================
--- ipa-prop.c	(revision 212426)
+++ ipa-prop.c	(working copy)
@@ -4835,7 +4864,7 @@ ipa_prop_write_jump_functions (void)
 
   ob = create_output_block (LTO_section_jump_functions);
   encoder = ob->decl_state->symtab_node_encoder;
-  ob->cgraph_node = NULL;
+  ob->symbol = NULL;
   for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei);
        lsei_next_function_in_partition (&lsei))
     {
@@ -5011,7 +5040,7 @@ ipa_prop_write_all_agg_replacement (void
 
   ob = create_output_block (LTO_section_ipcp_transform);
   encoder = ob->decl_state->symtab_node_encoder;
-  ob->cgraph_node = NULL;
+  ob->symbol = NULL;
   for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei);
        lsei_next_function_in_partition (&lsei))
     {

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Use separate sections to stream non-trivial constructors
  2014-07-11  9:18 Use separate sections to stream non-trivial constructors Jan Hubicka
@ 2014-07-11 11:32 ` Richard Biener
  2014-07-11 11:53   ` Jan Hubicka
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2014-07-11 11:32 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches

On Fri, 11 Jul 2014, Jan Hubicka wrote:

> Hi,
> since we both agreed offlining constructors from global decl stream is a good
> idea, I went ahead and implemented it.  I would like to followup by an
> cleanups; for example the sections are still tagged as function sections, but I
> would like to do it incrementally. There is quite some uglyness in the way we
> handle function sections and the patch started to snowball very quickly.
> 
> The patch conceptually copies what we do for functions and re-uses most of
> infrastructure. varpool_get_constructor is cgraph_get_body (i.e. mean of
> getting function in) and it is used by output machinery, by ipa-visibility
> while rewritting the constructor and by ctor_for_folding (which makes us to
> load the ctor whenever it is needed by ipa-cp or ipa-devirt).
> 
> I kept get_symbol_initial_value as an authority to decide if we want to encode
> given constructor or not.  The section itself for trivial ctor is about 25
> bytes and with header it is probably close to double of it. Currently the heuristic
> is to offline only constructors that are CONSTRUCTOR and keep simple expressions
> inline.  We may want to tweak it.

Hmm, so what about artificial testcase with gazillions of

struct X { int i; };

struct X a0001 = { 1 };
struct X a0002 = { 2 };
....

how does it explode LTO IL size and streaming time (compile-out and
LTRANS in)?  I suppose it still helps WPA stage.

Also what we desparately miss is to put CONST_DECLs into the symbol 
table (and thus eventually move the constant pool to symtab).  That
and no longer allowing STRING_CSTs in the IL but only CONST_DECLs
with STRING_CST initializers (to fix PR50199).

> The patch does not bring miraculous savings to firefox WPA, but it does some:
> 
> GGC memory after global stream is read goes from 1376898k to 1250533k
> overall GGC allocations from 4156478 kB to 4012462 kB
> read 11006599 SCCs of average size 1.907692 -> read 9119433 SCCs of average size 2.037867
> 20997206 tree bodies read in total -> 18584194 tree bodies read in total
> Size of mmap'd section decls: 299540188 bytes -> Size of mmap'd section decls: 271557265 bytes
> Size of mmap'd section function_body: 5711078 bytes -> Size of mmap'd section function_body: 7548680 bytes 
> 
> Things would be better if ipa-visibility and ipa-devirt did not load most of
> the virtual tables into memory (still better than loading each into memory 20
> times at average).  I will work on that incrementally. We load 10311 ctors into
> memory at WPA time.
> 
> Note that firefox seems to feature really huge data segment these days.
> http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html
> 
> Bootstrapped/regtested x86_64-linux, tested with firefox, lto bootstrap 
> in progress, OK?

The patch looks ok to me.  How about simply doing 
s/LTO_section_function_body/LTO_section_symbol_content/ instead of
adding LTO_section_variable_initializer?

Thanks,
Richard.

> 	* vapool.c: Include tree-ssa-alias.h, gimple.h and lto-streamer.h
> 	(varpool_get_constructor): New function.
> 	(ctor_for_folding): Use it.
> 	(varpool_assemble_decl): Likewise.
> 	* lto-streamer.h (struct output_block): Turn cgraph_node
> 	to symbol filed.
> 	(lto_input_variable_constructor): Declare.
> 	* ipa-visibility.c (function_and_variable_visibility): Use
> 	varpool_get_constructor.
> 	* cgraph.h (varpool_get_constructor): Declare.
> 	* lto-streamer-out.c (get_symbol_initial_value): Take encoder
> 	parameter; return error_mark_node for non-trivial constructors.
> 	(lto_write_tree_1, DFS_write_tree): UPdate use of
> 	get_symbol_initial_value.
> 	(output_function): Update initialization of symbol.
> 	(output_constructor): New function.
> 	(copy_function): Rename to ..
> 	(copy_function_or_variable): ... this one; handle vars too.
> 	(lto_output): Output variable sections.
> 	* lto-streamer-in.c (input_constructor): New function.
> 	(lto_read_body): Rename from ...
> 	(lto_read_body_or_constructor): ... this one; handle vars
> 	too.
> 	(lto_input_variable_constructor): New function.
> 	* ipa-prop.c (ipa_prop_write_jump_functions,
> 	ipa_prop_write_all_agg_replacement): Update.
> Index: varpool.c
> ===================================================================
> --- varpool.c	(revision 212426)
> +++ varpool.c	(working copy)
> @@ -35,6 +35,9 @@ along with GCC; see the file COPYING3.
>  #include "gimple-expr.h"
>  #include "flags.h"
>  #include "pointer-set.h"
> +#include "tree-ssa-alias.h"
> +#include "gimple.h"
> +#include "lto-streamer.h"
>  
>  const char * const tls_model_names[]={"none", "tls-emulated", "tls-real",
>  				      "tls-global-dynamic", "tls-local-dynamic",
> @@ -253,6 +256,41 @@ varpool_node_for_asm (tree asmname)
>      return NULL;
>  }
>  
> +/* When doing LTO, read NODE's constructor from disk if it is not already present.  */
> +
> +tree
> +varpool_get_constructor (struct varpool_node *node)
> +{
> +  struct lto_file_decl_data *file_data;
> +  const char *data, *name;
> +  size_t len;
> +  tree decl = node->decl;
> +
> +  if (DECL_INITIAL (node->decl) != error_mark_node
> +      || !in_lto_p)
> +    return DECL_INITIAL (node->decl);
> +
> +  file_data = node->lto_file_data;
> +  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
> +
> +  /* We may have renamed the declaration, e.g., a static function.  */
> +  name = lto_get_decl_name_mapping (file_data, name);
> +
> +  data = lto_get_section_data (file_data, LTO_section_function_body,
> +			       name, &len);
> +  if (!data)
> +    fatal_error ("%s: section %s is missing",
> +		 file_data->file_name,
> +		 name);
> +
> +  lto_input_variable_constructor (file_data, node, data);
> +  lto_stats.num_function_bodies++;
> +  lto_free_section_data (file_data, LTO_section_function_body, name,
> +			 data, len);
> +  lto_free_function_in_decl_state_for_node (node);
> +  return DECL_INITIAL (node->decl);
> +}
> +
>  /* Return if DECL is constant and its initial value is known (so we can do
>     constant folding using DECL_INITIAL (decl)).
>     Return ERROR_MARK_NODE when value is unknown.  */
> @@ -314,6 +352,9 @@ ctor_for_folding (tree decl)
>    if (DECL_VIRTUAL_P (real_decl))
>      {
>        gcc_checking_assert (TREE_READONLY (real_decl));
> +      if (DECL_INITIAL (real_decl) == error_mark_node
> +	  && (node = varpool_get_node (real_decl)))
> +	return varpool_get_constructor (node);
>        if (DECL_INITIAL (real_decl))
>  	return DECL_INITIAL (real_decl);
>        else
> @@ -349,6 +390,9 @@ ctor_for_folding (tree decl)
>  
>       ??? Previously we behaved so for scalar variables but not for array
>       accesses.  */
> +  if (DECL_INITIAL (real_decl) == error_mark_node
> +      && (node = varpool_get_node (real_decl)))
> +    return varpool_get_constructor (node);
>    return DECL_INITIAL (real_decl);
>  }
>  
> @@ -471,6 +515,7 @@ varpool_assemble_decl (varpool_node *nod
>    if (!node->in_other_partition
>        && !DECL_EXTERNAL (decl))
>      {
> +      varpool_get_constructor (node);
>        assemble_variable (decl, 0, 1, 0);
>        gcc_assert (TREE_ASM_WRITTEN (decl));
>        node->definition = true;
> Index: lto-streamer.h
> ===================================================================
> --- lto-streamer.h	(revision 212426)
> +++ lto-streamer.h	(working copy)
> @@ -685,9 +685,9 @@ struct output_block
>       far and the indexes assigned to them.  */
>    hash_table<string_slot_hasher> *string_hash_table;
>  
> -  /* The current cgraph_node that we are currently serializing.  Null
> +  /* The current symbol that we are currently serializing.  Null
>       if we are serializing something else.  */
> -  struct cgraph_node *cgraph_node;
> +  struct symtab_node *symbol;
>  
>    /* These are the last file and line that were seen in the stream.
>       If the current node differs from these, it needs to insert
> @@ -830,6 +830,9 @@ extern void lto_reader_init (void);
>  extern void lto_input_function_body (struct lto_file_decl_data *,
>  				     struct cgraph_node *,
>  				     const char *);
> +extern void lto_input_variable_constructor (struct lto_file_decl_data *,
> +					    struct varpool_node *,
> +					    const char *);
>  extern void lto_input_constructors_and_inits (struct lto_file_decl_data *,
>  					      const char *);
>  extern void lto_input_toplevel_asms (struct lto_file_decl_data *, int);
> Index: ipa-visibility.c
> ===================================================================
> --- ipa-visibility.c	(revision 212426)
> +++ ipa-visibility.c	(working copy)
> @@ -686,6 +686,8 @@ function_and_variable_visibility (bool w
>  	  if (found)
>  	    {
>  	      struct pointer_set_t *visited_nodes = pointer_set_create ();
> +
> +	      varpool_get_constructor (vnode);
>  	      walk_tree (&DECL_INITIAL (vnode->decl),
>  			 update_vtable_references, NULL, visited_nodes);
>  	      pointer_set_destroy (visited_nodes);
> Index: cgraph.h
> ===================================================================
> --- cgraph.h	(revision 212426)
> +++ cgraph.h	(working copy)
> @@ -1142,6 +1142,7 @@ void varpool_add_new_variable (tree);
>  void symtab_initialize_asm_name_hash (void);
>  void symtab_prevail_in_asm_name_hash (symtab_node *node);
>  void varpool_remove_initializer (varpool_node *);
> +tree varpool_get_constructor (struct varpool_node *node);
>  
>  /* In cgraph.c */
>  extern void change_decl_assembler_name (tree, tree);
> Index: lto-streamer-out.c
> ===================================================================
> --- lto-streamer-out.c	(revision 212426)
> +++ lto-streamer-out.c	(working copy)
> @@ -318,7 +319,7 @@ lto_is_streamable (tree expr)
>  /* For EXPR lookup and return what we want to stream to OB as DECL_INITIAL.  */
>  
>  static tree
> -get_symbol_initial_value (struct output_block *ob, tree expr)
> +get_symbol_initial_value (lto_symtab_encoder_t encoder, tree expr)
>  {
>    gcc_checking_assert (DECL_P (expr)
>  		       && TREE_CODE (expr) != FUNCTION_DECL
> @@ -331,15 +332,13 @@ get_symbol_initial_value (struct output_
>        && !DECL_IN_CONSTANT_POOL (expr)
>        && initial)
>      {
> -      lto_symtab_encoder_t encoder;
>        varpool_node *vnode;
> -
> -      encoder = ob->decl_state->symtab_node_encoder;
> -      vnode = varpool_get_node (expr);
> -      if (!vnode
> -	  || !lto_symtab_encoder_encode_initializer_p (encoder,
> -						       vnode))
> -	initial = error_mark_node;
> +      /* Extra section needs about 30 bytes; do not produce it for simple
> +	 scalar values.  */
> +      if (TREE_CODE (DECL_INITIAL (expr)) == CONSTRUCTOR
> +	  || !(vnode = varpool_get_node (expr))
> +	  || !lto_symtab_encoder_encode_initializer_p (encoder, vnode))
> +        initial = error_mark_node;
>      }
>  
>    return initial;
> @@ -369,7 +368,8 @@ lto_write_tree_1 (struct output_block *o
>        && TREE_CODE (expr) != TRANSLATION_UNIT_DECL)
>      {
>        /* Handle DECL_INITIAL for symbols.  */
> -      tree initial = get_symbol_initial_value (ob, expr);
> +      tree initial = get_symbol_initial_value
> +			 (ob->decl_state->symtab_node_encoder, expr);
>        stream_write_tree (ob, initial, ref_p);
>      }
>  }
> @@ -1195,7 +1286,8 @@ DFS_write_tree (struct output_block *ob,
>  	      && TREE_CODE (expr) != TRANSLATION_UNIT_DECL)
>  	    {
>  	      /* Handle DECL_INITIAL for symbols.  */
> -	      tree initial = get_symbol_initial_value (ob, expr);
> +	      tree initial = get_symbol_initial_value (ob->decl_state->symtab_node_encoder,
> +						       expr);
>  	      DFS_write_tree (ob, cstate, initial, ref_p, ref_p);
>  	    }
>  	}
> @@ -1808,7 +1900,7 @@ output_function (struct cgraph_node *nod
>    ob = create_output_block (LTO_section_function_body);
>  
>    clear_line_info (ob);
> -  ob->cgraph_node = node;
> +  ob->symbol = node;
>  
>    gcc_assert (current_function_decl == NULL_TREE && cfun == NULL);
>  
> @@ -1899,6 +1991,32 @@ output_function (struct cgraph_node *nod
>    destroy_output_block (ob);
>  }
>  
> +/* Output the body of function NODE->DECL.  */
> +
> +static void
> +output_constructor (struct varpool_node *node)
> +{
> +  tree var = node->decl;
> +  struct output_block *ob;
> +
> +  ob = create_output_block (LTO_section_function_body);
> +
> +  clear_line_info (ob);
> +  ob->symbol = node;
> +
> +  /* Make string 0 be a NULL string.  */
> +  streamer_write_char_stream (ob->string_stream, 0);
> +
> +  /* Output DECL_INITIAL for the function, which contains the tree of
> +     lexical scopes.  */
> +  stream_write_tree (ob, DECL_INITIAL (var), true);
> +
> +  /* Create a section to hold the pickled output of this function.   */
> +  produce_asm (ob, var);
> +
> +  destroy_output_block (ob);
> +}
> +
>  
>  /* Emit toplevel asms.  */
>  
> @@ -1957,10 +2075,10 @@ lto_output_toplevel_asms (void)
>  }
>  
>  
> -/* Copy the function body of NODE without deserializing. */
> +/* Copy the function body or variable constructor of NODE without deserializing. */
>  
>  static void
> -copy_function (struct cgraph_node *node)
> +copy_function_or_variable (struct symtab_node *node)
>  {
>    tree function = node->decl;
>    struct lto_file_decl_data *file_data = node->lto_file_data;
> @@ -2072,7 +2190,7 @@ lto_output (void)
>  	      if (gimple_has_body_p (node->decl) || !flag_wpa)
>  		output_function (node);
>  	      else
> -		copy_function (node);
> +		copy_function_or_variable (node);
>  	      gcc_assert (lto_get_out_decl_state () == decl_state);
>  	      lto_pop_out_decl_state ();
>  	      lto_record_function_out_decl_state (node->decl, decl_state);
> @@ -2085,6 +2203,25 @@ lto_output (void)
>  	  tree ctor = DECL_INITIAL (node->decl);
>  	  if (ctor && !in_lto_p)
>  	    walk_tree (&ctor, wrap_refs, NULL, NULL);
> +	  if (get_symbol_initial_value (encoder, node->decl) == error_mark_node
> +	      && lto_symtab_encoder_encode_initializer_p (encoder, node)
> +	      && !node->alias)
> +	    {
> +#ifdef ENABLE_CHECKING
> +	      gcc_assert (!bitmap_bit_p (output, DECL_UID (node->decl)));
> +	      bitmap_set_bit (output, DECL_UID (node->decl));
> +#endif
> +	      decl_state = lto_new_out_decl_state ();
> +	      lto_push_out_decl_state (decl_state);
> +	      if (DECL_INITIAL (node->decl) != error_mark_node
> +		  || !flag_wpa)
> +		output_constructor (node);
> +	      else
> +		copy_function_or_variable (node);
> +	      gcc_assert (lto_get_out_decl_state () == decl_state);
> +	      lto_pop_out_decl_state ();
> +	      lto_record_function_out_decl_state (node->decl, decl_state);
> +	    }
>  	}
>      }
>  
> Index: lto-streamer-in.c
> ===================================================================
> --- lto-streamer-in.c	(revision 212426)
> +++ lto-streamer-in.c	(working copy)
> @@ -1029,6 +1029,15 @@ input_function (tree fn_decl, struct dat
>    pop_cfun ();
>  }
>  
> +/* Read the body of function FN_DECL from DATA_IN using input block IB.  */
> +
> +static void
> +input_constructor (tree var, struct data_in *data_in,
> +		   struct lto_input_block *ib)
> +{
> +  DECL_INITIAL (var) = stream_read_tree (ib, data_in);
> +}
> +
>  
>  /* Read the body from DATA for function NODE and fill it in.
>     FILE_DATA are the global decls and types.  SECTION_TYPE is either
> @@ -1037,8 +1046,8 @@ input_function (tree fn_decl, struct dat
>     that function.  */
>  
>  static void
> -lto_read_body (struct lto_file_decl_data *file_data, struct cgraph_node *node,
> -	       const char *data, enum lto_section_type section_type)
> +lto_read_body_or_constructor (struct lto_file_decl_data *file_data, struct symtab_node *node,
> +			      const char *data, enum lto_section_type section_type)
>  {
>    const struct lto_function_header *header;
>    struct data_in *data_in;
> @@ -1050,19 +1059,32 @@ lto_read_body (struct lto_file_decl_data
>    tree fn_decl = node->decl;
>  
>    header = (const struct lto_function_header *) data;
> -  cfg_offset = sizeof (struct lto_function_header);
> -  main_offset = cfg_offset + header->cfg_size;
> -  string_offset = main_offset + header->main_size;
> -
> -  LTO_INIT_INPUT_BLOCK (ib_cfg,
> -		        data + cfg_offset,
> -			0,
> -			header->cfg_size);
> -
> -  LTO_INIT_INPUT_BLOCK (ib_main,
> -			data + main_offset,
> -			0,
> -			header->main_size);
> +  if (TREE_CODE (node->decl) == FUNCTION_DECL)
> +    {
> +      cfg_offset = sizeof (struct lto_function_header);
> +      main_offset = cfg_offset + header->cfg_size;
> +      string_offset = main_offset + header->main_size;
> +
> +      LTO_INIT_INPUT_BLOCK (ib_cfg,
> +			    data + cfg_offset,
> +			    0,
> +			    header->cfg_size);
> +
> +      LTO_INIT_INPUT_BLOCK (ib_main,
> +			    data + main_offset,
> +			    0,
> +			    header->main_size);
> +    }
> +  else
> +    {
> +      main_offset = sizeof (struct lto_function_header);
> +      string_offset = main_offset + header->main_size;
> +
> +      LTO_INIT_INPUT_BLOCK (ib_main,
> +			    data + main_offset,
> +			    0,
> +			    header->main_size);
> +    }
>  
>    data_in = lto_data_in_create (file_data, data + string_offset,
>  			      header->string_size, vNULL);
> @@ -1082,7 +1104,10 @@ lto_read_body (struct lto_file_decl_data
>  
>        /* Set up the struct function.  */
>        from = data_in->reader_cache->nodes.length ();
> -      input_function (fn_decl, data_in, &ib_main, &ib_cfg);
> +      if (TREE_CODE (node->decl) == FUNCTION_DECL)
> +        input_function (fn_decl, data_in, &ib_main, &ib_cfg);
> +      else
> +        input_constructor (fn_decl, data_in, &ib_main);
>        /* And fixup types we streamed locally.  */
>  	{
>  	  struct streamer_tree_cache_d *cache = data_in->reader_cache;
> @@ -1124,7 +1149,17 @@ void
>  lto_input_function_body (struct lto_file_decl_data *file_data,
>  			 struct cgraph_node *node, const char *data)
>  {
> -  lto_read_body (file_data, node, data, LTO_section_function_body);
> +  lto_read_body_or_constructor (file_data, node, data, LTO_section_function_body);
> +}
> +
> +/* Read the body of NODE using DATA.  FILE_DATA holds the global
> +   decls and types.  */
> +
> +void
> +lto_input_variable_constructor (struct lto_file_decl_data *file_data,
> +				struct varpool_node *node, const char *data)
> +{
> +  lto_read_body_or_constructor (file_data, node, data, LTO_section_function_body);
>  }
>  
>  
> Index: ipa-prop.c
> ===================================================================
> --- ipa-prop.c	(revision 212426)
> +++ ipa-prop.c	(working copy)
> @@ -4835,7 +4864,7 @@ ipa_prop_write_jump_functions (void)
>  
>    ob = create_output_block (LTO_section_jump_functions);
>    encoder = ob->decl_state->symtab_node_encoder;
> -  ob->cgraph_node = NULL;
> +  ob->symbol = NULL;
>    for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei);
>         lsei_next_function_in_partition (&lsei))
>      {
> @@ -5011,7 +5040,7 @@ ipa_prop_write_all_agg_replacement (void
>  
>    ob = create_output_block (LTO_section_ipcp_transform);
>    encoder = ob->decl_state->symtab_node_encoder;
> -  ob->cgraph_node = NULL;
> +  ob->symbol = NULL;
>    for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei);
>         lsei_next_function_in_partition (&lsei))
>      {
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Use separate sections to stream non-trivial constructors
  2014-07-11 11:32 ` Richard Biener
@ 2014-07-11 11:53   ` Jan Hubicka
  2014-07-11 12:00     ` Richard Biener
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Hubicka @ 2014-07-11 11:53 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, gcc-patches

> On Fri, 11 Jul 2014, Jan Hubicka wrote:
> 
> > Hi,
> > since we both agreed offlining constructors from global decl stream is a good
> > idea, I went ahead and implemented it.  I would like to followup by an
> > cleanups; for example the sections are still tagged as function sections, but I
> > would like to do it incrementally. There is quite some uglyness in the way we
> > handle function sections and the patch started to snowball very quickly.
> > 
> > The patch conceptually copies what we do for functions and re-uses most of
> > infrastructure. varpool_get_constructor is cgraph_get_body (i.e. mean of
> > getting function in) and it is used by output machinery, by ipa-visibility
> > while rewritting the constructor and by ctor_for_folding (which makes us to
> > load the ctor whenever it is needed by ipa-cp or ipa-devirt).
> > 
> > I kept get_symbol_initial_value as an authority to decide if we want to encode
> > given constructor or not.  The section itself for trivial ctor is about 25
> > bytes and with header it is probably close to double of it. Currently the heuristic
> > is to offline only constructors that are CONSTRUCTOR and keep simple expressions
> > inline.  We may want to tweak it.
> 
> Hmm, so what about artificial testcase with gazillions of
> 
> struct X { int i; };
> 
> struct X a0001 = { 1 };
> struct X a0002 = { 2 };
> ....
> 
> how does it explode LTO IL size and streaming time (compile-out and
> LTRANS in)?  I suppose it still helps WPA stage.

Well, nothing really artificial, except that gazzilions of static variables
called a0001 to a000gazzilion are ugly :))

I just put the CONSRUCTOR bits in the initial varsion to not have the path unused
at all.  Either we can base our decision on size of the variable or do simple
walk to see if it needs more than, say 8 trees.

I will play with this incrementally after cleaning up the headers (as those
accounts for the overhead)
> 
> Also what we desparately miss is to put CONST_DECLs into the symbol 
> table (and thus eventually move the constant pool to symtab).  That
> and no longer allowing STRING_CSTs in the IL but only CONST_DECLs
> with STRING_CST initializers (to fix PR50199).

Yep, I have patch for putting CONST_DECLs into symbol table. It however
does not help partitionability because at the moment output machinery do
not expect const decls to have visibilities.

I will push out that change (and LABEL_DECL, too) after Martin's renaming
patches lands to mainline.
> 
> > The patch does not bring miraculous savings to firefox WPA, but it does some:
> > 
> > GGC memory after global stream is read goes from 1376898k to 1250533k
> > overall GGC allocations from 4156478 kB to 4012462 kB
> > read 11006599 SCCs of average size 1.907692 -> read 9119433 SCCs of average size 2.037867
> > 20997206 tree bodies read in total -> 18584194 tree bodies read in total
> > Size of mmap'd section decls: 299540188 bytes -> Size of mmap'd section decls: 271557265 bytes
> > Size of mmap'd section function_body: 5711078 bytes -> Size of mmap'd section function_body: 7548680 bytes 
> > 
> > Things would be better if ipa-visibility and ipa-devirt did not load most of
> > the virtual tables into memory (still better than loading each into memory 20
> > times at average).  I will work on that incrementally. We load 10311 ctors into
> > memory at WPA time.
> > 
> > Note that firefox seems to feature really huge data segment these days.
> > http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html
> > 
> > Bootstrapped/regtested x86_64-linux, tested with firefox, lto bootstrap 
> > in progress, OK?
> 
> The patch looks ok to me.  How about simply doing 
> s/LTO_section_function_body/LTO_section_symbol_content/ instead of
> adding LTO_section_variable_initializer?

Yeah, I was thinking about it, too.
I think variable and constructor sections may differ in its header however, since we do
not need CFG stream for variables.

Thanks!
Honza
> 
> Thanks,
> Richard.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Use separate sections to stream non-trivial constructors
  2014-07-11 11:53   ` Jan Hubicka
@ 2014-07-11 12:00     ` Richard Biener
  2014-07-11 12:09       ` Jan Hubicka
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2014-07-11 12:00 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches

On Fri, 11 Jul 2014, Jan Hubicka wrote:

> > On Fri, 11 Jul 2014, Jan Hubicka wrote:
> > 
> > > Hi,
> > > since we both agreed offlining constructors from global decl stream is a good
> > > idea, I went ahead and implemented it.  I would like to followup by an
> > > cleanups; for example the sections are still tagged as function sections, but I
> > > would like to do it incrementally. There is quite some uglyness in the way we
> > > handle function sections and the patch started to snowball very quickly.
> > > 
> > > The patch conceptually copies what we do for functions and re-uses most of
> > > infrastructure. varpool_get_constructor is cgraph_get_body (i.e. mean of
> > > getting function in) and it is used by output machinery, by ipa-visibility
> > > while rewritting the constructor and by ctor_for_folding (which makes us to
> > > load the ctor whenever it is needed by ipa-cp or ipa-devirt).
> > > 
> > > I kept get_symbol_initial_value as an authority to decide if we want to encode
> > > given constructor or not.  The section itself for trivial ctor is about 25
> > > bytes and with header it is probably close to double of it. Currently the heuristic
> > > is to offline only constructors that are CONSTRUCTOR and keep simple expressions
> > > inline.  We may want to tweak it.
> > 
> > Hmm, so what about artificial testcase with gazillions of
> > 
> > struct X { int i; };
> > 
> > struct X a0001 = { 1 };
> > struct X a0002 = { 2 };
> > ....
> > 
> > how does it explode LTO IL size and streaming time (compile-out and
> > LTRANS in)?  I suppose it still helps WPA stage.
> 
> Well, nothing really artificial, except that gazzilions of static variables
> called a0001 to a000gazzilion are ugly :))
> 
> I just put the CONSRUCTOR bits in the initial varsion to not have the path unused
> at all.  Either we can base our decision on size of the variable or do simple
> walk to see if it needs more than, say 8 trees.

Hum, probably not worth special-casing.

> I will play with this incrementally after cleaning up the headers (as those
> accounts for the overhead)
> > 
> > Also what we desparately miss is to put CONST_DECLs into the symbol 
> > table (and thus eventually move the constant pool to symtab).  That
> > and no longer allowing STRING_CSTs in the IL but only CONST_DECLs
> > with STRING_CST initializers (to fix PR50199).
> 
> Yep, I have patch for putting CONST_DECLs into symbol table. It however
> does not help partitionability because at the moment output machinery do
> not expect const decls to have visibilities.

Well, just make them regular (anonymous) VAR_DECLs then ... (the fact
that a CONST_DECL is anonymous is probably the only real difference - 
and that they are mergeable by content).

> I will push out that change (and LABEL_DECL, too) after Martin's renaming
> patches lands to mainline.

Thanks.

> > 
> > > The patch does not bring miraculous savings to firefox WPA, but it does some:
> > > 
> > > GGC memory after global stream is read goes from 1376898k to 1250533k
> > > overall GGC allocations from 4156478 kB to 4012462 kB
> > > read 11006599 SCCs of average size 1.907692 -> read 9119433 SCCs of average size 2.037867
> > > 20997206 tree bodies read in total -> 18584194 tree bodies read in total
> > > Size of mmap'd section decls: 299540188 bytes -> Size of mmap'd section decls: 271557265 bytes
> > > Size of mmap'd section function_body: 5711078 bytes -> Size of mmap'd section function_body: 7548680 bytes 
> > > 
> > > Things would be better if ipa-visibility and ipa-devirt did not load most of
> > > the virtual tables into memory (still better than loading each into memory 20
> > > times at average).  I will work on that incrementally. We load 10311 ctors into
> > > memory at WPA time.
> > > 
> > > Note that firefox seems to feature really huge data segment these days.
> > > http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html
> > > 
> > > Bootstrapped/regtested x86_64-linux, tested with firefox, lto bootstrap 
> > > in progress, OK?
> > 
> > The patch looks ok to me.  How about simply doing 
> > s/LTO_section_function_body/LTO_section_symbol_content/ instead of
> > adding LTO_section_variable_initializer?
> 
> Yeah, I was thinking about it, too.
> I think variable and constructor sections may differ in its header however, since we do
> not need CFG stream for variables.
> 
> Thanks!
> Honza
> > 
> > Thanks,
> > Richard.
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Use separate sections to stream non-trivial constructors
  2014-07-11 12:00     ` Richard Biener
@ 2014-07-11 12:09       ` Jan Hubicka
  2014-07-11 17:47         ` Jan Hubicka
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Hubicka @ 2014-07-11 12:09 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, gcc-patches

> 
> Well, just make them regular (anonymous) VAR_DECLs then ... (the fact
> that a CONST_DECL is anonymous is probably the only real difference - 
> and that they are mergeable by content).

Something like that, perhaps. Plan to do that incrementally - having them in
symbol tabel first is an important step. There is also an option to update
CONST_DECL into VAR_DECL when it is being turned into hidden.

Currently things are bit inflexible because we still make difference between
const decl and var decl in tree representation.  Once I finish my transition
to push out DECL_WITH_VIS and DECL_NON_COMMON fields, we can turn CONST_DECL
into VAR_DECL with special flag saying that symbol name/address value doesn't
matter.

This and, of course, cleaning up constpool mess can get me occupised for months
;)

Honza

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Use separate sections to stream non-trivial constructors
  2014-07-11 12:09       ` Jan Hubicka
@ 2014-07-11 17:47         ` Jan Hubicka
  2014-07-11 18:42           ` Jan Hubicka
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Hubicka @ 2014-07-11 17:47 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Richard Biener, gcc-patches

Hi,
this is the variant of patch I comitted. I noticed that partitioning actually
calls ctor_for_folding just to figure out if the constant value may be used
that drags in every readonly variable ctor into memory at WPA.
So now we have separate predicate varpool_ctor_useable_for_folding_p to check
if ctor can be used and ctor_for_folding to drag in the actual value.

This actually happens too late to show in LTO report.  In followup patch I will
add timevar for the delayed streaming so we have some idea how often it triggers
and how much memory it takes.

Bootstrapped/regtested x86_64-linux and comitted.

Honza

	* vapool.c: Include tree-ssa-alias.h, gimple.h and lto-streamer.h
	(varpool_get_constructor): New function.
	(varpool_ctor_useable_for_folding_p): Break out from ...
	(ctor_for_folding): ... here; use varpool_get_constructor.
	(varpool_assemble_decl): Likewise.
	* lto-streamer.h (struct output_block): Turn cgraph_node
	to symbol filed.
	(lto_input_variable_constructor): Declare.
	* ipa-visibility.c (function_and_variable_visibility): Use
	varpool_get_constructor.
	* cgraph.h (varpool_get_constructor): Declare.
	(varpool_ctor_useable_for_folding_p): New function.
	* lto-streamer-out.c (get_symbol_initial_value): Take encoder
	parameter; return error_mark_node for non-trivial constructors.
	(lto_write_tree_1, DFS_write_tree): UPdate use of
	get_symbol_initial_value.
	(output_function): Update initialization of symbol.
	(output_constructor): New function.
	(copy_function): Rename to ..
	(copy_function_or_variable): ... this one; handle vars too.
	(lto_output): Output variable sections.
	* lto-streamer-in.c (input_constructor): New function.
	(lto_read_body): Rename from ...
	(lto_read_body_or_constructor): ... this one; handle vars
	too.
	(lto_input_variable_constructor): New function.
	* ipa-prop.c (ipa_prop_write_jump_functions,
	ipa_prop_write_all_agg_replacement): Update.
	* lto-cgraph.c (compute_ltrans_boundary): Use it.
	(output_cgraph_opt_summary): Set symbol to NULL.

	* lto-partition.c (add_references_to_partition): Use 
	varpool_ctor_useable_for_folding_p.
	* lto.c (lto_read_in_decl_state): Update sanity check.
Index: ipa-visibility.c
===================================================================
--- ipa-visibility.c	(revision 212457)
+++ ipa-visibility.c	(working copy)
@@ -686,6 +686,8 @@ function_and_variable_visibility (bool w
 	  if (found)
 	    {
 	      struct pointer_set_t *visited_nodes = pointer_set_create ();
+
+	      varpool_get_constructor (vnode);
 	      walk_tree (&DECL_INITIAL (vnode->decl),
 			 update_vtable_references, NULL, visited_nodes);
 	      pointer_set_destroy (visited_nodes);
Index: lto/lto.c
===================================================================
--- lto/lto.c	(revision 212457)
+++ lto/lto.c	(working copy)
@@ -236,7 +236,7 @@ lto_read_in_decl_state (struct data_in *
 
   ix = *data++;
   decl = streamer_tree_cache_get_tree (data_in->reader_cache, ix);
-  if (TREE_CODE (decl) != FUNCTION_DECL)
+  if (!VAR_OR_FUNCTION_DECL_P (decl))
     {
       gcc_assert (decl == void_type_node);
       decl = NULL_TREE;
Index: lto/lto-partition.c
===================================================================
--- lto/lto-partition.c	(revision 212457)
+++ lto/lto-partition.c	(working copy)
@@ -96,7 +96,7 @@ add_references_to_partition (ltrans_part
        Recursively look into the initializers of the constant variable and add
        references, too.  */
     else if (is_a <varpool_node *> (ref->referred)
-	     && ctor_for_folding (ref->referred->decl) != error_mark_node
+	     && varpool_ctor_useable_for_folding_p (varpool (ref->referred))
 	     && !lto_symtab_encoder_in_partition_p (part->encoder, ref->referred))
       {
 	if (!part->initializers_visited)
Index: lto-cgraph.c
===================================================================
--- lto-cgraph.c	(revision 212457)
+++ lto-cgraph.c	(working copy)
@@ -867,7 +867,7 @@ compute_ltrans_boundary (lto_symtab_enco
 	{
 	  if (!lto_symtab_encoder_encode_initializer_p (encoder,
 							vnode)
-	      && ctor_for_folding (vnode->decl) != error_mark_node)
+	      && varpool_ctor_useable_for_folding_p (vnode))
 	    {
 	      lto_set_symtab_encoder_encode_initializer (encoder, vnode);
 	      add_references (encoder, vnode);
@@ -1808,7 +1808,7 @@ output_cgraph_opt_summary (void)
   struct output_block *ob = create_output_block (LTO_section_cgraph_opt_sum);
   unsigned count = 0;
 
-  ob->cgraph_node = NULL;
+  ob->symbol = NULL;
   encoder = ob->decl_state->symtab_node_encoder;
   n_nodes = lto_symtab_encoder_size (encoder);
   for (i = 0; i < n_nodes; i++)
Index: ipa-prop.c
===================================================================
--- ipa-prop.c	(revision 212466)
+++ ipa-prop.c	(working copy)
@@ -4848,7 +4848,7 @@ ipa_prop_write_jump_functions (void)
 
   ob = create_output_block (LTO_section_jump_functions);
   encoder = ob->decl_state->symtab_node_encoder;
-  ob->cgraph_node = NULL;
+  ob->symbol = NULL;
   for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei);
        lsei_next_function_in_partition (&lsei))
     {
@@ -5024,7 +5024,7 @@ ipa_prop_write_all_agg_replacement (void
 
   ob = create_output_block (LTO_section_ipcp_transform);
   encoder = ob->decl_state->symtab_node_encoder;
-  ob->cgraph_node = NULL;
+  ob->symbol = NULL;
   for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei);
        lsei_next_function_in_partition (&lsei))
     {
Index: cgraph.h
===================================================================
--- cgraph.h	(revision 212457)
+++ cgraph.h	(working copy)
@@ -1134,6 +1134,7 @@ void varpool_analyze_node (varpool_node
 varpool_node * varpool_extra_name_alias (tree, tree);
 varpool_node * varpool_create_variable_alias (tree, tree);
 void varpool_reset_queue (void);
+bool varpool_ctor_useable_for_folding_p (varpool_node *);
 tree ctor_for_folding (tree);
 bool varpool_for_node_and_aliases (varpool_node *,
 		                   bool (*) (varpool_node *, void *),
@@ -1142,6 +1143,7 @@ void varpool_add_new_variable (tree);
 void symtab_initialize_asm_name_hash (void);
 void symtab_prevail_in_asm_name_hash (symtab_node *node);
 void varpool_remove_initializer (varpool_node *);
+tree varpool_get_constructor (struct varpool_node *node);
 
 /* In cgraph.c */
 extern void change_decl_assembler_name (tree, tree);
Index: lto-streamer.h
===================================================================
--- lto-streamer.h	(revision 212457)
+++ lto-streamer.h	(working copy)
@@ -685,9 +685,9 @@ struct output_block
      far and the indexes assigned to them.  */
   hash_table<string_slot_hasher> *string_hash_table;
 
-  /* The current cgraph_node that we are currently serializing.  Null
+  /* The current symbol that we are currently serializing.  Null
      if we are serializing something else.  */
-  struct cgraph_node *cgraph_node;
+  struct symtab_node *symbol;
 
   /* These are the last file and line that were seen in the stream.
      If the current node differs from these, it needs to insert
@@ -830,6 +830,9 @@ extern void lto_reader_init (void);
 extern void lto_input_function_body (struct lto_file_decl_data *,
 				     struct cgraph_node *,
 				     const char *);
+extern void lto_input_variable_constructor (struct lto_file_decl_data *,
+					    struct varpool_node *,
+					    const char *);
 extern void lto_input_constructors_and_inits (struct lto_file_decl_data *,
 					      const char *);
 extern void lto_input_toplevel_asms (struct lto_file_decl_data *, int);
Index: varpool.c
===================================================================
--- varpool.c	(revision 212457)
+++ varpool.c	(working copy)
@@ -35,6 +35,9 @@ along with GCC; see the file COPYING3.
 #include "gimple-expr.h"
 #include "flags.h"
 #include "pointer-set.h"
+#include "tree-ssa-alias.h"
+#include "gimple.h"
+#include "lto-streamer.h"
 
 const char * const tls_model_names[]={"none", "tls-emulated", "tls-real",
 				      "tls-global-dynamic", "tls-local-dynamic",
@@ -163,19 +166,17 @@ varpool_node_for_decl (tree decl)
 void
 varpool_remove_node (varpool_node *node)
 {
-  tree init;
   varpool_call_node_removal_hooks (node);
   symtab_unregister_node (node);
 
-  /* Because we remove references from external functions before final compilation,
-     we may end up removing useful constructors.
-     FIXME: We probably want to trace boundaries better.  */
+  /* When streaming we can have multiple nodes associated with decl.  */
   if (cgraph_state == CGRAPH_LTO_STREAMING)
     ;
-  else if ((init = ctor_for_folding (node->decl)) == error_mark_node)
+  /* Keep constructor when it may be used for folding. We remove
+     references to external variables before final compilation.  */
+  else if (DECL_INITIAL (node->decl) && DECL_INITIAL (node->decl) != error_mark_node
+	   && !varpool_ctor_useable_for_folding_p (node))
     varpool_remove_initializer (node);
-  else
-    DECL_INITIAL (node->decl) = init;
   ggc_free (node);
 }
 
@@ -215,7 +216,7 @@ dump_varpool_node (FILE *f, varpool_node
     fprintf (f, " used-by-single-function");
   if (TREE_READONLY (node->decl))
     fprintf (f, " read-only");
-  if (ctor_for_folding (node->decl) != error_mark_node)
+  if (varpool_ctor_useable_for_folding_p (node))
     fprintf (f, " const-value-known");
   if (node->writeonly)
     fprintf (f, " write-only");
@@ -253,9 +254,101 @@ varpool_node_for_asm (tree asmname)
     return NULL;
 }
 
-/* Return if DECL is constant and its initial value is known (so we can do
-   constant folding using DECL_INITIAL (decl)).
-   Return ERROR_MARK_NODE when value is unknown.  */
+/* When doing LTO, read NODE's constructor from disk if it is not already present.  */
+
+tree
+varpool_get_constructor (struct varpool_node *node)
+{
+  struct lto_file_decl_data *file_data;
+  const char *data, *name;
+  size_t len;
+  tree decl = node->decl;
+
+  if (DECL_INITIAL (node->decl) != error_mark_node
+      || !in_lto_p)
+    return DECL_INITIAL (node->decl);
+
+  file_data = node->lto_file_data;
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* We may have renamed the declaration, e.g., a static function.  */
+  name = lto_get_decl_name_mapping (file_data, name);
+
+  data = lto_get_section_data (file_data, LTO_section_function_body,
+			       name, &len);
+  if (!data)
+    fatal_error ("%s: section %s is missing",
+		 file_data->file_name,
+		 name);
+
+  lto_input_variable_constructor (file_data, node, data);
+  lto_stats.num_function_bodies++;
+  lto_free_section_data (file_data, LTO_section_function_body, name,
+			 data, len);
+  lto_free_function_in_decl_state_for_node (node);
+  return DECL_INITIAL (node->decl);
+}
+
+/* Return ture if NODE has constructor that can be used for folding.  */
+
+bool
+varpool_ctor_useable_for_folding_p (varpool_node *node)
+{
+  varpool_node *real_node = node;
+
+  if (real_node->alias && real_node->definition)
+    real_node = varpool_variable_node (node);
+
+  if (TREE_CODE (node->decl) == CONST_DECL
+      || DECL_IN_CONSTANT_POOL (node->decl))
+    return true;
+  if (TREE_THIS_VOLATILE (node->decl))
+    return false;
+
+  /* If we do not have a constructor, we can't use it.  */
+  if (DECL_INITIAL (real_node->decl) == error_mark_node
+      && !real_node->lto_file_data)
+    return false;
+
+  /* Vtables are defined by their types and must match no matter of interposition
+     rules.  */
+  if (DECL_VIRTUAL_P (node->decl))
+    {
+      /* The C++ front end creates VAR_DECLs for vtables of typeinfo
+	 classes not defined in the current TU so that it can refer
+	 to them from typeinfo objects.  Avoid returning NULL_TREE.  */
+      return DECL_INITIAL (real_node->decl) != NULL;
+    }
+
+  /* Alias of readonly variable is also readonly, since the variable is stored
+     in readonly memory.  We also accept readonly aliases of non-readonly
+     locations assuming that user knows what he is asking for.  */
+  if (!TREE_READONLY (node->decl) && !TREE_READONLY (real_node->decl))
+    return false;
+
+  /* Variables declared 'const' without an initializer
+     have zero as the initializer if they may not be
+     overridden at link or run time.  */
+  if (!DECL_INITIAL (real_node->decl)
+      && (DECL_EXTERNAL (node->decl) || decl_replaceable_p (node->decl)))
+    return false;
+
+  /* Variables declared `const' with an initializer are considered
+     to not be overwritable with different initializer by default. 
+
+     ??? Previously we behaved so for scalar variables but not for array
+     accesses.  */
+  return true;
+}
+
+/* If DECL is constant variable and its initial value is known (so we can
+   do constant folding), return its constructor (DECL_INITIAL). This may
+   be an expression or NULL when DECL is initialized to 0.
+   Return ERROR_MARK_NODE otherwise.
+
+   In LTO this may actually trigger reading the constructor from disk.
+   For this reason varpool_ctor_useable_for_folding_p should be used when
+   the actual constructor value is not needed.  */
 
 tree
 ctor_for_folding (tree decl)
@@ -284,7 +377,7 @@ ctor_for_folding (tree decl)
 
   gcc_assert (TREE_CODE (decl) == VAR_DECL);
 
-  node = varpool_get_node (decl);
+  real_node = node = varpool_get_node (decl);
   if (node)
     {
       real_node = varpool_variable_node (node);
@@ -302,54 +395,25 @@ ctor_for_folding (tree decl)
     {
       gcc_assert (!DECL_INITIAL (decl)
 		  || DECL_INITIAL (decl) == error_mark_node);
-      if (lookup_attribute ("weakref", DECL_ATTRIBUTES (decl)))
+      if (node->weakref)
 	{
 	  node = varpool_alias_target (node);
 	  decl = node->decl;
 	}
     }
 
-  /* Vtables are defined by their types and must match no matter of interposition
-     rules.  */
-  if (DECL_VIRTUAL_P (real_decl))
-    {
-      gcc_checking_assert (TREE_READONLY (real_decl));
-      if (DECL_INITIAL (real_decl))
-	return DECL_INITIAL (real_decl);
-      else
-	{
-	  /* The C++ front end creates VAR_DECLs for vtables of typeinfo
-	     classes not defined in the current TU so that it can refer
-	     to them from typeinfo objects.  Avoid returning NULL_TREE.  */
-	  gcc_checking_assert (!COMPLETE_TYPE_P (DECL_CONTEXT (real_decl)));
-	  return error_mark_node;
-	}
-    }
-
-  /* If there is no constructor, we have nothing to do.  */
-  if (DECL_INITIAL (real_decl) == error_mark_node)
-    return error_mark_node;
-
-  /* Non-readonly alias of readonly variable is also de-facto readonly,
-     because the variable itself is in readonly section.  
-     We also honnor READONLY flag on alias assuming that user knows
-     what he is doing.  */
-  if (!TREE_READONLY (decl) && !TREE_READONLY (real_decl))
-    return error_mark_node;
-
-  /* Variables declared 'const' without an initializer
-     have zero as the initializer if they may not be
-     overridden at link or run time.  */
-  if (!DECL_INITIAL (real_decl)
-      && (DECL_EXTERNAL (decl) || decl_replaceable_p (decl)))
+  if ((!DECL_VIRTUAL_P (real_decl)
+       || DECL_INITIAL (real_decl) == error_mark_node
+       || !DECL_INITIAL (real_decl))
+      && (!node || !varpool_ctor_useable_for_folding_p (node)))
     return error_mark_node;
 
-  /* Variables declared `const' with an initializer are considered
-     to not be overwritable with different initializer by default. 
-
-     ??? Previously we behaved so for scalar variables but not for array
-     accesses.  */
-  return DECL_INITIAL (real_decl);
+  /* OK, we can return constructor.  See if we need to fetch it from disk
+     in LTO mode.  */
+  if (DECL_INITIAL (real_decl) != error_mark_node
+      || !in_lto_p)
+    return DECL_INITIAL (real_decl);
+  return varpool_get_constructor (real_node);
 }
 
 /* Add the variable DECL to the varpool.
@@ -471,6 +535,7 @@ varpool_assemble_decl (varpool_node *nod
   if (!node->in_other_partition
       && !DECL_EXTERNAL (decl))
     {
+      varpool_get_constructor (node);
       assemble_variable (decl, 0, 1, 0);
       gcc_assert (TREE_ASM_WRITTEN (decl));
       node->definition = true;
Index: lto-streamer-in.c
===================================================================
--- lto-streamer-in.c	(revision 212457)
+++ lto-streamer-in.c	(working copy)
@@ -1029,6 +1029,15 @@ input_function (tree fn_decl, struct dat
   pop_cfun ();
 }
 
+/* Read the body of function FN_DECL from DATA_IN using input block IB.  */
+
+static void
+input_constructor (tree var, struct data_in *data_in,
+		   struct lto_input_block *ib)
+{
+  DECL_INITIAL (var) = stream_read_tree (ib, data_in);
+}
+
 
 /* Read the body from DATA for function NODE and fill it in.
    FILE_DATA are the global decls and types.  SECTION_TYPE is either
@@ -1037,8 +1046,8 @@ input_function (tree fn_decl, struct dat
    that function.  */
 
 static void
-lto_read_body (struct lto_file_decl_data *file_data, struct cgraph_node *node,
-	       const char *data, enum lto_section_type section_type)
+lto_read_body_or_constructor (struct lto_file_decl_data *file_data, struct symtab_node *node,
+			      const char *data, enum lto_section_type section_type)
 {
   const struct lto_function_header *header;
   struct data_in *data_in;
@@ -1050,19 +1059,32 @@ lto_read_body (struct lto_file_decl_data
   tree fn_decl = node->decl;
 
   header = (const struct lto_function_header *) data;
-  cfg_offset = sizeof (struct lto_function_header);
-  main_offset = cfg_offset + header->cfg_size;
-  string_offset = main_offset + header->main_size;
-
-  LTO_INIT_INPUT_BLOCK (ib_cfg,
-		        data + cfg_offset,
-			0,
-			header->cfg_size);
-
-  LTO_INIT_INPUT_BLOCK (ib_main,
-			data + main_offset,
-			0,
-			header->main_size);
+  if (TREE_CODE (node->decl) == FUNCTION_DECL)
+    {
+      cfg_offset = sizeof (struct lto_function_header);
+      main_offset = cfg_offset + header->cfg_size;
+      string_offset = main_offset + header->main_size;
+
+      LTO_INIT_INPUT_BLOCK (ib_cfg,
+			    data + cfg_offset,
+			    0,
+			    header->cfg_size);
+
+      LTO_INIT_INPUT_BLOCK (ib_main,
+			    data + main_offset,
+			    0,
+			    header->main_size);
+    }
+  else
+    {
+      main_offset = sizeof (struct lto_function_header);
+      string_offset = main_offset + header->main_size;
+
+      LTO_INIT_INPUT_BLOCK (ib_main,
+			    data + main_offset,
+			    0,
+			    header->main_size);
+    }
 
   data_in = lto_data_in_create (file_data, data + string_offset,
 			      header->string_size, vNULL);
@@ -1082,7 +1104,10 @@ lto_read_body (struct lto_file_decl_data
 
       /* Set up the struct function.  */
       from = data_in->reader_cache->nodes.length ();
-      input_function (fn_decl, data_in, &ib_main, &ib_cfg);
+      if (TREE_CODE (node->decl) == FUNCTION_DECL)
+        input_function (fn_decl, data_in, &ib_main, &ib_cfg);
+      else
+        input_constructor (fn_decl, data_in, &ib_main);
       /* And fixup types we streamed locally.  */
 	{
 	  struct streamer_tree_cache_d *cache = data_in->reader_cache;
@@ -1124,7 +1149,17 @@ void
 lto_input_function_body (struct lto_file_decl_data *file_data,
 			 struct cgraph_node *node, const char *data)
 {
-  lto_read_body (file_data, node, data, LTO_section_function_body);
+  lto_read_body_or_constructor (file_data, node, data, LTO_section_function_body);
+}
+
+/* Read the body of NODE using DATA.  FILE_DATA holds the global
+   decls and types.  */
+
+void
+lto_input_variable_constructor (struct lto_file_decl_data *file_data,
+				struct varpool_node *node, const char *data)
+{
+  lto_read_body_or_constructor (file_data, node, data, LTO_section_function_body);
 }
 
 
Index: lto-streamer-out.c
===================================================================
--- lto-streamer-out.c	(revision 212457)
+++ lto-streamer-out.c	(working copy)
@@ -318,7 +318,7 @@ lto_is_streamable (tree expr)
 /* For EXPR lookup and return what we want to stream to OB as DECL_INITIAL.  */
 
 static tree
-get_symbol_initial_value (struct output_block *ob, tree expr)
+get_symbol_initial_value (lto_symtab_encoder_t encoder, tree expr)
 {
   gcc_checking_assert (DECL_P (expr)
 		       && TREE_CODE (expr) != FUNCTION_DECL
@@ -331,15 +331,13 @@ get_symbol_initial_value (struct output_
       && !DECL_IN_CONSTANT_POOL (expr)
       && initial)
     {
-      lto_symtab_encoder_t encoder;
       varpool_node *vnode;
-
-      encoder = ob->decl_state->symtab_node_encoder;
-      vnode = varpool_get_node (expr);
-      if (!vnode
-	  || !lto_symtab_encoder_encode_initializer_p (encoder,
-						       vnode))
-	initial = error_mark_node;
+      /* Extra section needs about 30 bytes; do not produce it for simple
+	 scalar values.  */
+      if (TREE_CODE (DECL_INITIAL (expr)) == CONSTRUCTOR
+	  || !(vnode = varpool_get_node (expr))
+	  || !lto_symtab_encoder_encode_initializer_p (encoder, vnode))
+        initial = error_mark_node;
     }
 
   return initial;
@@ -369,7 +367,8 @@ lto_write_tree_1 (struct output_block *o
       && TREE_CODE (expr) != TRANSLATION_UNIT_DECL)
     {
       /* Handle DECL_INITIAL for symbols.  */
-      tree initial = get_symbol_initial_value (ob, expr);
+      tree initial = get_symbol_initial_value
+			 (ob->decl_state->symtab_node_encoder, expr);
       stream_write_tree (ob, initial, ref_p);
     }
 }
@@ -1195,7 +1194,8 @@ DFS_write_tree (struct output_block *ob,
 	      && TREE_CODE (expr) != TRANSLATION_UNIT_DECL)
 	    {
 	      /* Handle DECL_INITIAL for symbols.  */
-	      tree initial = get_symbol_initial_value (ob, expr);
+	      tree initial = get_symbol_initial_value (ob->decl_state->symtab_node_encoder,
+						       expr);
 	      DFS_write_tree (ob, cstate, initial, ref_p, ref_p);
 	    }
 	}
@@ -1808,7 +1808,7 @@ output_function (struct cgraph_node *nod
   ob = create_output_block (LTO_section_function_body);
 
   clear_line_info (ob);
-  ob->cgraph_node = node;
+  ob->symbol = node;
 
   gcc_assert (current_function_decl == NULL_TREE && cfun == NULL);
 
@@ -1899,6 +1899,32 @@ output_function (struct cgraph_node *nod
   destroy_output_block (ob);
 }
 
+/* Output the body of function NODE->DECL.  */
+
+static void
+output_constructor (struct varpool_node *node)
+{
+  tree var = node->decl;
+  struct output_block *ob;
+
+  ob = create_output_block (LTO_section_function_body);
+
+  clear_line_info (ob);
+  ob->symbol = node;
+
+  /* Make string 0 be a NULL string.  */
+  streamer_write_char_stream (ob->string_stream, 0);
+
+  /* Output DECL_INITIAL for the function, which contains the tree of
+     lexical scopes.  */
+  stream_write_tree (ob, DECL_INITIAL (var), true);
+
+  /* Create a section to hold the pickled output of this function.   */
+  produce_asm (ob, var);
+
+  destroy_output_block (ob);
+}
+
 
 /* Emit toplevel asms.  */
 
@@ -1957,10 +1983,10 @@ lto_output_toplevel_asms (void)
 }
 
 
-/* Copy the function body of NODE without deserializing. */
+/* Copy the function body or variable constructor of NODE without deserializing. */
 
 static void
-copy_function (struct cgraph_node *node)
+copy_function_or_variable (struct symtab_node *node)
 {
   tree function = node->decl;
   struct lto_file_decl_data *file_data = node->lto_file_data;
@@ -2072,7 +2098,7 @@ lto_output (void)
 	      if (gimple_has_body_p (node->decl) || !flag_wpa)
 		output_function (node);
 	      else
-		copy_function (node);
+		copy_function_or_variable (node);
 	      gcc_assert (lto_get_out_decl_state () == decl_state);
 	      lto_pop_out_decl_state ();
 	      lto_record_function_out_decl_state (node->decl, decl_state);
@@ -2085,6 +2111,25 @@ lto_output (void)
 	  tree ctor = DECL_INITIAL (node->decl);
 	  if (ctor && !in_lto_p)
 	    walk_tree (&ctor, wrap_refs, NULL, NULL);
+	  if (get_symbol_initial_value (encoder, node->decl) == error_mark_node
+	      && lto_symtab_encoder_encode_initializer_p (encoder, node)
+	      && !node->alias)
+	    {
+#ifdef ENABLE_CHECKING
+	      gcc_assert (!bitmap_bit_p (output, DECL_UID (node->decl)));
+	      bitmap_set_bit (output, DECL_UID (node->decl));
+#endif
+	      decl_state = lto_new_out_decl_state ();
+	      lto_push_out_decl_state (decl_state);
+	      if (DECL_INITIAL (node->decl) != error_mark_node
+		  || !flag_wpa)
+		output_constructor (node);
+	      else
+		copy_function_or_variable (node);
+	      gcc_assert (lto_get_out_decl_state () == decl_state);
+	      lto_pop_out_decl_state ();
+	      lto_record_function_out_decl_state (node->decl, decl_state);
+	    }
 	}
     }
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Use separate sections to stream non-trivial constructors
  2014-07-11 17:47         ` Jan Hubicka
@ 2014-07-11 18:42           ` Jan Hubicka
  0 siblings, 0 replies; 7+ messages in thread
From: Jan Hubicka @ 2014-07-11 18:42 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Richard Biener, gcc-patches

Hi
this is patch i am going to commit after testing.  It removes DECL_INIT_IO
timevar that guards only one variable set (so hardly measure anything) and
moves GIMPLE_IN to proper place. It also adds CTORS_IN and CTORS_OUT.
I get:
 ipa lto gimple out      :   0.37 ( 0%) usr   0.21 ( 3%) sys   0.64 ( 1%) wall       0 kB ( 0%) ggc
 ipa lto decl in         :  23.56 (26%) usr   1.24 (15%) sys  24.81 (23%) wall 2429174 kB (60%) ggc
 ipa lto decl out        :   5.58 ( 6%) usr   0.34 ( 4%) sys   5.94 ( 5%) wall       0 kB ( 0%) ggc
 ipa lto constructors in :   0.34 ( 0%) usr   0.10 ( 1%) sys   0.47 ( 0%) wall   14864 kB ( 0%) ggc
 ipa lto constructors out:   0.06 ( 0%) usr   0.01 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 ipa lto cgraph I/O      :   1.20 ( 1%) usr   0.25 ( 3%) sys   1.45 ( 1%) wall  437317 kB (11%) ggc

for Firefox WPA that is surprisingly good. We traded about 400MB for 14MB. So perhaps I do not
need to care about not bringing in all vtables needed for devirt machinery.

honza

	* lto.c (read_cgraph_and_symbols): Do not push DECL_INIT_IO
	timevar
	(materialize_cgraph): Do not push GIMPLE_IN timevar.

	* timevar.def (TV_IPA_LTO_DECL_INIT_IO): Remove.
	(TV_IPA_LTO_CTORS_IN, TV_IPA_LTO_CTORS_OUT): New timevar.
	* cgraph.c (cgraph_get_body): Push GIMPLE_IN timevar.
	(varpool_get_constructor): Push CTORS_IN timevar.
	* lto-streamer-out.c (lto_output): Push TV_IPA_LTO_CTORS_OUT
	timevar.
Index: lto/lto.c
===================================================================
--- lto/lto.c	(revision 212467)
+++ lto/lto.c	(working copy)
@@ -3094,12 +3094,9 @@ read_cgraph_and_symbols (unsigned nfiles
 
   timevar_pop (TV_IPA_LTO_CGRAPH_MERGE);
 
-  timevar_push (TV_IPA_LTO_DECL_INIT_IO);
-
   /* Indicate that the cgraph is built and ready.  */
   cgraph_function_flags_ready = true;
 
-  timevar_pop (TV_IPA_LTO_DECL_INIT_IO);
   ggc_free (all_file_decl_data);
   all_file_decl_data = NULL;
 }
@@ -3117,9 +3114,6 @@ materialize_cgraph (void)
     fprintf (stderr,
 	     flag_wpa ? "Materializing decls:" : "Reading function bodies:");
 
-  /* Now that we have input the cgraph, we need to clear all of the aux
-     nodes and read the functions if we are not running in WPA mode.  */
-  timevar_push (TV_IPA_LTO_GIMPLE_IN);
 
   FOR_EACH_FUNCTION (node)
     {
@@ -3130,7 +3124,6 @@ materialize_cgraph (void)
 	}
     }
 
-  timevar_pop (TV_IPA_LTO_GIMPLE_IN);
 
   /* Start the appropriate timer depending on the mode that we are
      operating in.  */
Index: timevar.def
===================================================================
--- timevar.def	(revision 212457)
+++ timevar.def	(working copy)
@@ -77,7 +77,8 @@ DEFTIMEVAR (TV_IPA_LTO_GIMPLE_IN     , "
 DEFTIMEVAR (TV_IPA_LTO_GIMPLE_OUT    , "ipa lto gimple out")
 DEFTIMEVAR (TV_IPA_LTO_DECL_IN       , "ipa lto decl in")
 DEFTIMEVAR (TV_IPA_LTO_DECL_OUT      , "ipa lto decl out")
-DEFTIMEVAR (TV_IPA_LTO_DECL_INIT_IO  , "ipa lto decl init I/O")
+DEFTIMEVAR (TV_IPA_LTO_CTORS_IN      , "ipa lto constructors in")
+DEFTIMEVAR (TV_IPA_LTO_CTORS_OUT     , "ipa lto constructors out")
 DEFTIMEVAR (TV_IPA_LTO_CGRAPH_IO     , "ipa lto cgraph I/O")
 DEFTIMEVAR (TV_IPA_LTO_DECL_MERGE    , "ipa lto decl merge")
 DEFTIMEVAR (TV_IPA_LTO_CGRAPH_MERGE  , "ipa lto cgraph merge")
Index: cgraph.c
===================================================================
--- cgraph.c	(revision 212457)
+++ cgraph.c	(working copy)
@@ -3053,6 +3053,8 @@ cgraph_get_body (struct cgraph_node *nod
 
   gcc_assert (in_lto_p);
 
+  timevar_push (TV_IPA_LTO_GIMPLE_IN);
+
   file_data = node->lto_file_data;
   name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
 
@@ -3076,6 +3078,9 @@ cgraph_get_body (struct cgraph_node *nod
   lto_free_section_data (file_data, LTO_section_function_body, name,
 			 data, len);
   lto_free_function_in_decl_state_for_node (node);
+
+  timevar_pop (TV_IPA_LTO_GIMPLE_IN);
+
   return true;
 }
 
Index: varpool.c
===================================================================
--- varpool.c	(revision 212467)
+++ varpool.c	(working copy)
@@ -268,6 +268,8 @@ varpool_get_constructor (struct varpool_
       || !in_lto_p)
     return DECL_INITIAL (node->decl);
 
+  timevar_push (TV_IPA_LTO_CTORS_IN);
+
   file_data = node->lto_file_data;
   name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
 
@@ -286,6 +288,7 @@ varpool_get_constructor (struct varpool_
   lto_free_section_data (file_data, LTO_section_function_body, name,
 			 data, len);
   lto_free_function_in_decl_state_for_node (node);
+  timevar_pop (TV_IPA_LTO_CTORS_IN);
   return DECL_INITIAL (node->decl);
 }
 
Index: lto-streamer-out.c
===================================================================
--- lto-streamer-out.c	(revision 212467)
+++ lto-streamer-out.c	(working copy)
@@ -2115,6 +2115,7 @@ lto_output (void)
 	      && lto_symtab_encoder_encode_initializer_p (encoder, node)
 	      && !node->alias)
 	    {
+	      timevar_push (TV_IPA_LTO_CTORS_OUT);
 #ifdef ENABLE_CHECKING
 	      gcc_assert (!bitmap_bit_p (output, DECL_UID (node->decl)));
 	      bitmap_set_bit (output, DECL_UID (node->decl));
@@ -2129,6 +2130,7 @@ lto_output (void)
 	      gcc_assert (lto_get_out_decl_state () == decl_state);
 	      lto_pop_out_decl_state ();
 	      lto_record_function_out_decl_state (node->decl, decl_state);
+	      timevar_pop (TV_IPA_LTO_CTORS_OUT);
 	    }
 	}
     }

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-07-11 18:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-11  9:18 Use separate sections to stream non-trivial constructors Jan Hubicka
2014-07-11 11:32 ` Richard Biener
2014-07-11 11:53   ` Jan Hubicka
2014-07-11 12:00     ` Richard Biener
2014-07-11 12:09       ` Jan Hubicka
2014-07-11 17:47         ` Jan Hubicka
2014-07-11 18:42           ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).