public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC] Getting LTO incremental linking work
@ 2015-11-25  9:04 Jan Hubicka
  2015-11-25 11:19 ` Richard Biener
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Jan Hubicka @ 2015-11-25  9:04 UTC (permalink / raw)
  To: gcc-patches, rguenther, ak, hongjiu.lu, ccoutant, iant

Hi,
PR 67548 is about LTO not supporting incremental linking.  I never really
considered our current incremental linking very useful, because it triggers
code generation at the incremental link time basically nullifying any
benefits of whole program optimization and in fact I think it is harmful,
because it sort of works and w/o any warning produce not very optimized code.

Basically there are 3 schemes how to make incremental link work
 1) Turn LTO objects to non-LTO as we do now
 2) concatenate LTO sections as implemented by Andi and Hj
 3) Do actual linking of LTO sections

The problem of current implementation of 1) is that GCC thinks the resulting
object file will not be used for static linking and thus assume that hidden
symbols can be turned to static.

In the log of PR67548 HJ actually pointed out that we do have API at linker
plugin side which says what type of output is done.  This is cool because we
can also use it to drop -fpic when building static binary. This is common in
Firefox, where some objects are built with -fpic and linked to both binaries
and libraries.

Moreover we do have all infrastructure ready to implement 3).  Our tree merging
and symbol table handling is fuly incremental and I think made a patch to 
implement it today.   The scheme is easy:

 1) linker plugin is modified to pass -flinker-output to lto wrapper
    linker-output is either dyn (.so), pie or exec
    for incremental linking I added .rel for 3) and noltorel for 1)

    currently it does rel because 3) (nor 2) can not be done when incremnetal
    linking is done on both LTO and non-LTO objects.  In this case linker
    plugin output warings about code quality loss and switch to
    noltorel.
 2) with -flinker-ouptut the lto wrapper behaves same way as with
    -flto-partition=none.
 3) lto frontend parses -flinker-output and sets our internal flags accordingly.
    I added new flag_incremental_linking to inform middle-end about the fact
    that the output is going to be statically linked again.  This disables
    the privatization of hidden symbols and if set to 2 it also triggers
    the LTO IL streaming

The incremental linking with rel mode now streams in all global streams,
merges trees, merges symbol table, removes unreachable symbols (which are
result of merging) and streams everything out to .s file.

I only tested the patch on incremental linnking libbackend.o.  The linking
time is 46 seconds:

Execution times (seconds)
 phase opt and generate  :  35.75 (81%) usr   0.90 (76%) sys  36.63 (81%) wall    5008 kB ( 1%) ggc
 phase stream in         :   8.57 (19%) usr   0.28 (24%) sys   8.86 (19%) wall  700851 kB (99%) ggc
 callgraph optimization  :   0.08 ( 0%) usr   0.01 ( 1%) sys   0.08 ( 0%) wall       0 kB ( 0%) ggc
 ipa dead code removal   :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       0 kB ( 0%) ggc
 ipa cp                  :   0.36 ( 1%) usr   0.04 ( 3%) sys   0.41 ( 1%) wall   42862 kB ( 6%) ggc
 ipa inlining heuristics :   0.18 ( 0%) usr   0.02 ( 2%) sys   0.19 ( 0%) wall   26771 kB ( 4%) ggc
 lto stream inflate      :   3.57 ( 8%) usr   0.14 (12%) sys   3.70 ( 8%) wall       0 kB ( 0%) ggc
 lto stream deflate      :  20.13 (45%) usr   0.05 ( 4%) sys  19.42 (43%) wall       0 kB ( 0%) ggc
 lto stream output       :   9.70 (22%) usr   0.32 (27%) sys  10.50 (23%) wall       0 kB ( 0%) ggc
 ipa lto gimple out      :   0.66 ( 1%) usr   0.24 (20%) sys   1.09 ( 2%) wall    4655 kB ( 1%) ggc
 ipa lto decl in         :   5.87 (13%) usr   0.11 ( 9%) sys   6.10 (13%) wall  552108 kB (78%) ggc
 ipa lto decl out        :   2.91 ( 7%) usr   0.16 (14%) sys   3.07 ( 7%) wall       0 kB ( 0%) ggc
 ipa lto constructors in :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     108 kB ( 0%) ggc
 ipa lto constructors out:   0.12 ( 0%) usr   0.03 ( 3%) sys   0.13 ( 0%) wall     178 kB ( 0%) ggc
 ipa lto cgraph I/O      :   0.12 ( 0%) usr   0.02 ( 2%) sys   0.15 ( 0%) wall   70005 kB (10%) ggc
 ipa lto decl merge      :   0.31 ( 1%) usr   0.00 ( 0%) sys   0.30 ( 1%) wall    1023 kB ( 0%) ggc
 ipa lto cgraph merge    :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall    7972 kB ( 1%) ggc
 ipa profile             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 ipa pure const          :   0.01 ( 0%) usr   0.01 ( 1%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 ipa icf                 :   0.04 ( 0%) usr   0.01 ( 1%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 varconst                :   0.02 ( 0%) usr   0.01 ( 1%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 TOTAL                 :  44.32             1.18            45.49             707846 kB

There are few low hanging fruits.  First streaming LTO files is slow because of vprintf:
        case 1:
          /* TODO: Print in hex with fast function, important for -flto. */
          fprintf (f, "\\%03o", c);
          break;
a trivial bug to fix, will send separate patch for this.

Second most of inflate/deflate time goes to compressing and uncompressing
sections that are being copied. Also something that is trivial to fix, will
do that in separate patch - this also affects WPA and /tmp space usage.

The size of library is cut to about a half.
-rw-r--r-- 1 hubicka _cvsadmin 211854942 Nov 25 09:18 libbackend.a
-rw-r--r-- 1 hubicka _cvsadmin 121986816 Nov 25 09:16 libbackend.o

and linking of cc1 binary goes from 1m31s to 1m20s. Because we link
libbackend.a more than 4 times, it would actually pay back even in GCC setting,
though i suppose the main utility would be in parallelizing the builds (like
kernel does).

WPA stage times are:
Execution times (seconds)                                                       
 phase opt and generate  :   3.76 (52%) usr   0.07 ( 6%) sys   3.83 (41%) wall   53777 kB (13%) ggc
 phase stream in         :   3.04 (42%) usr   0.33 (28%) sys   3.37 (36%) wall  346427 kB (86%) ggc
 phase stream out        :   0.40 ( 6%) usr   0.78 (66%) sys   2.18 (23%) wall       0 kB ( 0%) ggc
 callgraph optimization  :   0.05 ( 1%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall      18 kB ( 0%) ggc
 ipa dead code removal   :   0.46 ( 6%) usr   0.00 ( 0%) sys   0.44 ( 5%) wall       0 kB ( 0%) ggc
 ipa cp                  :   0.40 ( 6%) usr   0.05 ( 4%) sys   0.47 ( 5%) wall   55439 kB (14%) ggc
 ipa inlining heuristics :   1.95 (27%) usr   0.02 ( 2%) sys   1.97 (21%) wall   65871 kB (16%) ggc
 lto stream inflate      :   0.60 ( 8%) usr   0.11 ( 9%) sys   0.67 ( 7%) wall       0 kB ( 0%) ggc
 ipa lto decl in         :   1.93 (27%) usr   0.18 (15%) sys   2.10 (22%) wall  205593 kB (51%) ggc
 ipa lto decl out        :   0.28 ( 4%) usr   0.02 ( 2%) sys   0.29 ( 3%) wall       0 kB ( 0%) ggc
 ipa lto cgraph I/O      :   0.09 ( 1%) usr   0.02 ( 2%) sys   0.12 ( 1%) wall   62797 kB (16%) ggc
 ipa lto decl merge      :   0.20 ( 3%) usr   0.00 ( 0%) sys   0.20 ( 2%) wall    1023 kB ( 0%) ggc
 whopr partitioning      :   0.56 ( 8%) usr   0.00 ( 0%) sys   0.56 ( 6%) wall    1419 kB ( 0%) ggc
 ipa reference           :   0.17 ( 2%) usr   0.00 ( 0%) sys   0.17 ( 2%) wall       0 kB ( 0%) ggc
 ipa pure const          :   0.17 ( 2%) usr   0.00 ( 0%) sys   0.16 ( 2%) wall       0 kB ( 0%) ggc
 ipa icf                 :   0.07 ( 1%) usr   0.00 ( 0%) sys   0.07 ( 1%) wall     485 kB ( 0%) ggc
 unaccounted todo        :   0.06 ( 1%) usr   0.00 ( 0%) sys   0.06 ( 1%) wall       0 kB ( 0%) ggc
 TOTAL                 :   7.20             1.18             9.39             402192 kB


Execution times (seconds)                                                       
 phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1986 kB ( 0%) ggc
 phase opt and generate  :   6.66 (39%) usr   0.38 (22%) sys   7.03 (36%) wall  199143 kB (21%) ggc
 phase stream in         :   9.33 (54%) usr   0.38 (22%) sys   9.71 (50%) wall  764698 kB (79%) ggc
 phase stream out        :   0.82 ( 5%) usr   0.97 (55%) sys   2.23 (11%) wall       2 kB ( 0%) ggc
 phase finalize          :   0.40 ( 2%) usr   0.03 ( 2%) sys   0.43 ( 2%) wall       0 kB ( 0%) ggc
 garbage collection      :   0.79 ( 5%) usr   0.01 ( 1%) sys   0.80 ( 4%) wall       0 kB ( 0%) ggc
 ipa dead code removal   :   0.41 ( 2%) usr   0.00 ( 0%) sys   0.45 ( 2%) wall       0 kB ( 0%) ggc
 ipa cp                  :   0.33 ( 2%) usr   0.05 ( 3%) sys   0.41 ( 2%) wall   56753 kB ( 6%) ggc
 ipa inlining heuristics :   1.74 (10%) usr   0.02 ( 1%) sys   1.80 ( 9%) wall   55600 kB ( 6%) ggc
 lto stream inflate      :   2.18 (13%) usr   0.12 ( 7%) sys   2.28 (12%) wall       0 kB ( 0%) ggc
 ipa lto gimple in       :   0.62 ( 4%) usr   0.23 (13%) sys   0.96 ( 5%) wall  135317 kB (14%) ggc
 ipa lto decl in         :   6.63 (39%) usr   0.15 ( 9%) sys   6.70 (35%) wall  598144 kB (62%) ggc
 ipa lto decl out        :   0.55 ( 3%) usr   0.01 ( 1%) sys   0.57 ( 3%) wall       0 kB ( 0%) ggc
 ipa lto cgraph I/O      :   0.14 ( 1%) usr   0.03 ( 2%) sys   0.15 ( 1%) wall   76843 kB ( 8%) ggc
 ipa lto decl merge      :   0.35 ( 2%) usr   0.00 ( 0%) sys   0.34 ( 2%) wall    1023 kB ( 0%) ggc
 ipa lto cgraph merge    :   0.13 ( 1%) usr   0.00 ( 0%) sys   0.13 ( 1%) wall    9284 kB ( 1%) ggc
 whopr partitioning      :   0.51 ( 3%) usr   0.00 ( 0%) sys   0.50 ( 3%) wall    1496 kB ( 0%) ggc
 ipa reference           :   0.18 ( 1%) usr   0.00 ( 0%) sys   0.19 ( 1%) wall       0 kB ( 0%) ggc
 ipa pure const          :   0.20 ( 1%) usr   0.01 ( 1%) sys   0.20 ( 1%) wall       0 kB ( 0%) ggc
 ipa icf                 :   1.82 (11%) usr   0.05 ( 3%) sys   1.85 (10%) wall    2138 kB ( 0%) ggc
 tree operand scan       :   0.13 ( 1%) usr   0.06 ( 3%) sys   0.17 ( 1%) wall   21674 kB ( 2%) ggc
 TOTAL                 :  17.21             1.76            19.41             965830 kB

so 50% cut in memory use and resonable speedup. I need to check what happens
with ICF.

The WPA stats are as follows:
WPA statistics
[WPA] read 891308 SCCs of average size 1.972195
[WPA] 1757833 tree bodies read in total
[WPA] tree SCC table: size 524287, 230881 elements, collision ratio: 1.107788
[WPA] tree SCC max chain length 39 (size 1)
[WPA] Compared 73318 SCCs, 81315 collisions (1.109073)
[WPA] Merged 52578 SCCs
[WPA] Merged 502850 tree bodies
[WPA] Merged 36730 types
[WPA] 205971 types prevailed (565069 associated trees)
[WPA] GIMPLE canonical type table: size 16381, 1251 elements, 28138 searches, 444 collisions (ratio: 0.015779)
[WPA] GIMPLE canonical type pointer-map: 1251 elements, 99917 searches
[WPA] # of input files: 125
[WPA] Compression: 23123694 input bytes, 79799028 uncompressed bytes (ratio: 3.450963)
[WPA] Size of mmap'd section decls: 23123694 bytes

compoared to
WPA statistics
[WPA] read 3633234 SCCs of average size 2.539347
[WPA] 9226041 tree bodies read in total
[WPA] tree SCC table: size 524287, 257562 elements, collision ratio: 0.673833
[WPA] tree SCC max chain length 39 (size 1)
[WPA] Compared 500618 SCCs, 646007 collisions (1.290419)
[WPA] Merged 478513 SCCs
[WPA] Merged 5659960 tree bodies
[WPA] Merged 326141 types
[WPA] 207806 types prevailed (562649 associated trees)
[WPA] GIMPLE canonical type table: size 16381, 1246 elements, 27925 searches, 437 collisions (ratio: 0.015649)
[WPA] GIMPLE canonical type pointer-map: 1246 elements, 97858 searches
[WPA] # of input files: 461
[WPA] Compression: 95695388 input bytes, 303240971 uncompressed bytes (ratio: 3.168815)
[WPA] Size of mmap'd section decls: 95695388 bytes

So about 5fold improvement in number of trees and decls read. By end of WPA:

[WPA] 1757833 tree bodies read in total
[WPA] # of input files: 125
[WPA] # of input cgraph nodes: 36977
[WPA] # of function bodies: 651
[WPA] # of output files: 31
[WPA] # of output symtab nodes: 185336
[WPA] # of output tree pickle references: 629336
[WPA] # of output tree bodies: 129898
[WPA] # callgraph partitions: 31
[WPA] Compression: 30134544 input bytes, 100590102 uncompressed bytes (ratio: 3.338033)
[WPA] Size of mmap'd section decls: 23123694 bytes
[WPA] Size of mmap'd section function_body: 2641029 bytes
[WPA] Size of mmap'd section statics: 0 bytes
[WPA] Size of mmap'd section symtab: 0 bytes
[WPA] Size of mmap'd section refs: 408500 bytes
[WPA] Size of mmap'd section asm: 0 bytes
[WPA] Size of mmap'd section jmpfuncs: 1432063 bytes
[WPA] Size of mmap'd section pureconst: 80213 bytes
[WPA] Size of mmap'd section reference: 0 bytes
[WPA] Size of mmap'd section profile: 2439 bytes
[WPA] Size of mmap'd section symbol_nodes: 1413364 bytes
[WPA] Size of mmap'd section opts: 0 bytes
[WPA] Size of mmap'd section cgraphopt: 0 bytes
[WPA] Size of mmap'd section inline: 1005113 bytes
[WPA] Size of mmap'd section ipcp_trans: 0 bytes
[WPA] Size of mmap'd section icf: 28129 bytes
[WPA] Size of mmap'd section offload_table: 0 bytes
[WPA] Size of mmap'd section mode_table: 0 bytes

[WPA] 9226041 tree bodies read in total
[WPA] # of input files: 461
[WPA] # of input cgraph nodes: 36888
[WPA] # of function bodies: 7690
[WPA] # of output files: 31
[WPA] # of output symtab nodes: 191489
[WPA] # of output tree pickle references: 1444221
[WPA] # of output tree bodies: 261141
[WPA] # callgraph partitions: 31
[WPA] Compression: 112942159 input bytes, 347530231 uncompressed bytes (ratio: 3.077064)
[WPA] Size of mmap'd section decls: 95695388 bytes
[WPA] Size of mmap'd section function_body: 11747200 bytes
[WPA] Size of mmap'd section statics: 0 bytes
[WPA] Size of mmap'd section symtab: 0 bytes
[WPA] Size of mmap'd section refs: 395831 bytes
[WPA] Size of mmap'd section asm: 0 bytes
[WPA] Size of mmap'd section jmpfuncs: 1666954 bytes
[WPA] Size of mmap'd section pureconst: 94608 bytes
[WPA] Size of mmap'd section reference: 0 bytes
[WPA] Size of mmap'd section profile: 9259 bytes
[WPA] Size of mmap'd section symbol_nodes: 1769069 bytes
[WPA] Size of mmap'd section opts: 0 bytes
[WPA] Size of mmap'd section cgraphopt: 0 bytes
[WPA] Size of mmap'd section inline: 1266586 bytes
[WPA] Size of mmap'd section ipcp_trans: 0 bytes
[WPA] Size of mmap'd section icf: 297264 bytes
[WPA] Size of mmap'd section offload_table: 0 bytes
[WPA] Size of mmap'd section mode_table: 0 bytes

Does anyone see problems with this approach? I think this is easy enough 
and fixes PR67548 so it may still get to mainline?
I need to do more testing, but in general I think the implemntation is OK 
as it is.  We need a way to force noltorel model for testsuite, as the
new default will bypass codegen for all our -r -nostdlib testcases.

BTW ltrans now dies with -ftime-report. Any ideas why?


Honza

Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 230847)
+++ gcc/common.opt	(working copy)
@@ -46,6 +46,13 @@ int optimize_fast
 Variable
 bool in_lto_p = false
 
+; This variable is set to non-0 only by LTO front-end.  1 indicates that
+; the output produced will be used for incrmeental linking (thus weak symbols
+; can still be bound) and 2 indicates that the IL is going to be linked and
+; and output to LTO object file.
+Variable
+int flag_incremental_link = 0
+
 ; 0 means straightforward implementation of complex divide acceptable.
 ; 1 means wide ranges of inputs must work for complex divide.
 ; 2 means C99-like requirements for complex multiply and divide.
Index: gcc/lto-streamer-out.c
===================================================================
--- gcc/lto-streamer-out.c	(revision 230847)
+++ gcc/lto-streamer-out.c	(working copy)
@@ -2286,13 +2286,16 @@ lto_output (void)
 		}
 	      decl_state = lto_new_out_decl_state ();
 	      lto_push_out_decl_state (decl_state);
-	      if (gimple_has_body_p (node->decl) || !flag_wpa
+	      if (gimple_has_body_p (node->decl)
 		  /* Thunks have no body but they may be synthetized
 		     at WPA time.  */
 		  || DECL_ARGUMENTS (node->decl))
 		output_function (node);
 	      else
-		copy_function_or_variable (node);
+		{
+		  gcc_checking_assert (flag_wpa || flag_incremental_link == 2);
+		  copy_function_or_variable (node);
+		}
 	      gcc_assert (lto_get_out_decl_state () == decl_state);
 	      lto_pop_out_decl_state ();
 	      lto_record_function_out_decl_state (node->decl, decl_state);
@@ -2318,7 +2321,7 @@ lto_output (void)
 	      decl_state = lto_new_out_decl_state ();
 	      lto_push_out_decl_state (decl_state);
 	      if (DECL_INITIAL (node->decl) != error_mark_node
-		  || !flag_wpa)
+		  || (!flag_wpa && flag_incremental_link != 2))
 		output_constructor (node);
 	      else
 		copy_function_or_variable (node);
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 230847)
+++ gcc/passes.c	(working copy)
@@ -2530,7 +2530,7 @@ ipa_write_summaries (void)
     {
       struct cgraph_node *node = order[i];
 
-      if (node->has_gimple_body_p ())
+      if (gimple_has_body_p (node->decl))
 	{
 	  /* When streaming out references to statements as part of some IPA
 	     pass summary, the statements need to have uids assigned and the
Index: gcc/cgraphunit.c
===================================================================
--- gcc/cgraphunit.c	(revision 230847)
+++ gcc/cgraphunit.c	(working copy)
@@ -2270,8 +2270,10 @@ ipa_passes (void)
   if (flag_generate_lto || flag_generate_offload)
     targetm.asm_out.lto_start ();
 
-  if (!in_lto_p)
+  if (!in_lto_p || flag_incremental_link == 2)
     {
+      if (!quiet_flag)
+	fprintf (stderr, "Streaming LTO\n");
       if (g->have_offload)
 	{
 	  section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
@@ -2290,7 +2292,9 @@ ipa_passes (void)
   if (flag_generate_lto || flag_generate_offload)
     targetm.asm_out.lto_end ();
 
-  if (!flag_ltrans && (in_lto_p || !flag_lto || flag_fat_lto_objects))
+  if (!flag_ltrans
+      && ((in_lto_p && flag_incremental_link != 2)
+	  || !flag_lto || flag_fat_lto_objects))
     execute_ipa_pass_list (passes->all_regular_ipa_passes);
   invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_END, NULL);
 
@@ -2381,7 +2385,8 @@ symbol_table::compile (void)
 
   /* Do nothing else if any IPA pass found errors or if we are just streaming LTO.  */
   if (seen_error ()
-      || (!in_lto_p && flag_lto && !flag_fat_lto_objects))
+      || ((!in_lto_p || flag_incremental_link == 2)
+	  && flag_lto && !flag_fat_lto_objects))
     {
       timevar_pop (TV_CGRAPHOPT);
       return;
Index: gcc/lto-cgraph.c
===================================================================
--- gcc/lto-cgraph.c	(revision 230847)
+++ gcc/lto-cgraph.c	(working copy)
@@ -534,7 +534,10 @@ lto_output_node (struct lto_simple_outpu
   bp_pack_value (&bp, node->thunk.thunk_p, 1);
   bp_pack_value (&bp, node->parallelized_function, 1);
   bp_pack_enum (&bp, ld_plugin_symbol_resolution,
-	        LDPR_NUM_KNOWN, node->resolution);
+	        LDPR_NUM_KNOWN,
+		/* When doing incremental link, we will get new resolution
+		   info next time we process the file.  */
+		flag_incremental_link ? LDPR_UNKNOWN : node->resolution);
   bp_pack_value (&bp, node->instrumentation_clone, 1);
   bp_pack_value (&bp, node->split_part, 1);
   streamer_write_bitpack (&bp);
Index: gcc/toplev.c
===================================================================
--- gcc/toplev.c	(revision 230847)
+++ gcc/toplev.c	(working copy)
@@ -504,7 +504,8 @@ compile_file (void)
 
   /* Compilation unit is finalized.  When producing non-fat LTO object, we are
      basically finished.  */
-  if (in_lto_p || !flag_lto || flag_fat_lto_objects)
+  if ((in_lto_p && flag_incremental_link != 2)
+      || !flag_lto || flag_fat_lto_objects)
     {
       /* File-scope initialization for AddressSanitizer.  */
       if (flag_sanitize & SANITIZE_ADDRESS)
Index: gcc/flag-types.h
===================================================================
--- gcc/flag-types.h	(revision 230847)
+++ gcc/flag-types.h	(working copy)
@@ -265,6 +265,15 @@ enum lto_partition_model {
   LTO_PARTITION_MAX = 4
 };
 
+/* flag_lto_linker_output initialization values.  */
+enum lto_linker_output {
+  LTO_LINKER_OUTPUT_UNKNOWN,
+  LTO_LINKER_OUTPUT_REL,
+  LTO_LINKER_OUTPUT_NOLTOREL,
+  LTO_LINKER_OUTPUT_DYN,
+  LTO_LINKER_OUTPUT_PIE,
+  LTO_LINKER_OUTPUT_EXEC
+};
 
 /* gfortran -finit-real= values.  */
 
Index: gcc/lto/lto.c
===================================================================
--- gcc/lto/lto.c	(revision 230847)
+++ gcc/lto/lto.c	(working copy)
@@ -3188,6 +3188,8 @@ lto_eh_personality (void)
 static void 
 lto_process_name (void)
 {
+  if (flag_incremental_link == 2)
+    setproctitle ("lto1-incremental-link");
   if (flag_lto)
     setproctitle ("lto1-lto");
   if (flag_wpa)
Index: gcc/lto/lang.opt
===================================================================
--- gcc/lto/lang.opt	(revision 230847)
+++ gcc/lto/lang.opt	(working copy)
@@ -24,6 +24,32 @@
 Language
 LTO
 
+Enum
+Name(lto_linker_output) Type(enum lto_linker_output) UnknownError(unknown linker output %qs)
+
+EnumValue
+Enum(lto_linker_output) String(unknown) Value(LTO_LINKER_OUTPUT_UNKNOWN)
+
+EnumValue
+Enum(lto_linker_output) String(rel) Value(LTO_LINKER_OUTPUT_REL)
+
+EnumValue
+Enum(lto_linker_output) String(noltorel) Value(LTO_LINKER_OUTPUT_NOLTOREL)
+
+EnumValue
+Enum(lto_linker_output) String(dyn) Value(LTO_LINKER_OUTPUT_DYN)
+
+EnumValue
+Enum(lto_linker_output) String(pie) Value(LTO_LINKER_OUTPUT_PIE)
+
+EnumValue
+Enum(lto_linker_output) String(exec) Value(LTO_LINKER_OUTPUT_EXEC)
+
+flinker-output=
+LTO Report Driver Joined RejectNegative Enum(lto_linker_output) Var(flag_lto_linker_output) Init(LTO_LINKER_OUTPUT_UNKNOWN)
+Set linker output type (used internally during LTO optimization)
+
+
 fltrans
 LTO Report Var(flag_ltrans)
 Run the link-time optimizer in local transformation (LTRANS) mode.
Index: gcc/lto/lto-lang.c
===================================================================
--- gcc/lto/lto-lang.c	(revision 230847)
+++ gcc/lto/lto-lang.c	(working copy)
@@ -819,6 +819,56 @@ lto_post_options (const char **pfilename
   if (flag_wpa)
     flag_generate_lto = 1;
 
+  /* Initialize the codegen flags according to the output type.  */
+  switch (flag_lto_linker_output)
+    {
+    case LTO_LINKER_OUTPUT_REL: /* .o: incremental link producing LTO IL  */
+      /* Configure compiler same way as normal frontend would do with -flto:
+	 this way we read the trees (declarations & types), symbol table,
+	 optimization summaries and link them. Subsequently we output new LTO
+	 file.  */
+      flag_lto = "";
+      flag_incremental_link = 2;
+      flag_whole_program = 0;
+      flag_wpa = 0;
+      flag_generate_lto = 1;
+      /* It would be cool to produce .o file directly, but our current
+	 simple objects does not contain the lto symbol markers.  Go the slow
+	 way through the asm file.  */
+      lang_hooks.lto.begin_section = lhd_begin_section;
+      lang_hooks.lto.append_data = lhd_append_data;
+      lang_hooks.lto.end_section = lhd_end_section;
+      if (flag_ltrans)
+	error ("-flinker-output=rel and -fltrans are mutually exclussive");
+      break;
+
+    case LTO_LINKER_OUTPUT_NOLTOREL: /* .o: incremental link producing asm  */
+      flag_whole_program = 0;
+      flag_incremental_link = 1;
+      break;
+
+    case LTO_LINKER_OUTPUT_DYN: /* .so: PID library */
+      /* On some targets, like i386 it makes sense to build PIC library wihout
+	 -fpic for performance reasons.  So no need to adjust flags.  */
+      break;
+
+    case LTO_LINKER_OUTPUT_PIE: /* PIE binary */
+      /* If -fPIC or -fPIE was used at compile time, be sure that
+         flag_pie is 2.  */
+      if (!flag_pie && flag_pic)
+	flag_pie = flag_pic;
+      flag_pic = 0;
+      break;
+
+    case LTO_LINKER_OUTPUT_EXEC: /* Normal executable */
+      flag_pic = 0;
+      flag_pie = 0;
+      break;
+
+    case LTO_LINKER_OUTPUT_UNKNOWN:
+      break;
+    }
+
   /* Excess precision other than "fast" requires front-end
      support.  */
   flag_excess_precision_cmdline = EXCESS_PRECISION_FAST;
@@ -1214,7 +1264,7 @@ lto_init (void)
   int i;
 
   /* We need to generate LTO if running in WPA mode.  */
-  flag_generate_lto = (flag_wpa != NULL);
+  flag_generate_lto = (flag_incremental_link == 2 || flag_wpa != NULL);
 
   /* Create the basic integer types.  */
   build_common_tree_nodes (flag_signed_char, flag_short_double);
Index: gcc/ipa-visibility.c
===================================================================
--- gcc/ipa-visibility.c	(revision 230847)
+++ gcc/ipa-visibility.c	(working copy)
@@ -217,13 +217,13 @@ cgraph_externally_visible_p (struct cgra
      This improves code quality and we know we will duplicate them at most twice
      (in the case that we are not using plugin and link with object file
       implementing same COMDAT)  */
-  if ((in_lto_p || whole_program)
+  if (((in_lto_p || whole_program) && !flag_incremental_link)
       && DECL_COMDAT (node->decl)
       && comdat_can_be_unshared_p (node))
     return false;
 
   /* When doing link time optimizations, hidden symbols become local.  */
-  if (in_lto_p
+  if ((in_lto_p && !flag_incremental_link)
       && (DECL_VISIBILITY (node->decl) == VISIBILITY_HIDDEN
 	  || DECL_VISIBILITY (node->decl) == VISIBILITY_INTERNAL)
       /* Be sure that node is defined in IR file, not in other object
@@ -293,13 +293,13 @@ varpool_node::externally_visible_p (void
      so this does not enable more optimization, but referring static var
      is faster for dynamic linking.  Also this match logic hidding vtables
      from LTO symbol tables.  */
-  if ((in_lto_p || flag_whole_program)
+  if (((in_lto_p || flag_whole_program) && !flag_incremental_link)
       && DECL_COMDAT (decl)
       && comdat_can_be_unshared_p (this))
     return false;
 
   /* When doing link time optimizations, hidden symbols become local.  */
-  if (in_lto_p
+  if (in_lto_p && !flag_incremental_link
       && (DECL_VISIBILITY (decl) == VISIBILITY_HIDDEN
 	  || DECL_VISIBILITY (decl) == VISIBILITY_INTERNAL)
       /* Be sure that node is defined in IR file, not in other object
Index: gcc/lto-wrapper.c
===================================================================
--- gcc/lto-wrapper.c	(revision 230847)
+++ gcc/lto-wrapper.c	(working copy)
@@ -953,9 +953,15 @@ run_gcc (unsigned argc, char *argv[])
 	  file_offset = (off_t) loffset;
 	}
       fd = open (filename, O_RDONLY | O_BINARY);
+      /* Linker plugin passes -fresolution and -flinker-output options.  */
       if (fd == -1)
 	{
 	  lto_argv[lto_argc++] = argv[i];
+	  if (strcmp (argv[i], "-flinker-output=rel") == 0)
+	    {
+	       no_partition = true;
+	       lto_mode = LTO_MODE_LTO;
+	    }
 	  continue;
 	}
 
Index: lto-plugin/lto-plugin.c
===================================================================
--- lto-plugin/lto-plugin.c	(revision 230847)
+++ lto-plugin/lto-plugin.c	(working copy)
@@ -151,6 +151,7 @@ static ld_plugin_add_symbols add_symbols
 
 static struct plugin_file_info *claimed_files = NULL;
 static unsigned int num_claimed_files = 0;
+static unsigned int non_claimed_files = 0;
 
 static struct plugin_file_info *offload_files = NULL;
 static unsigned int num_offload_files = 0;
@@ -167,6 +168,7 @@ static unsigned int num_pass_through_ite
 static char debug;
 static char nop;
 static char *resolution_file = NULL;
+static const char *linker_output = NULL;
 
 /* The version of gold being used, or -1 if not gold.  The number is
    MAJOR * 100 + MINOR.  */
@@ -624,7 +626,7 @@ all_symbols_read_handler (void)
 {
   unsigned i;
   unsigned num_lto_args
-    = num_claimed_files + num_offload_files + lto_wrapper_num_args + 1;
+    = num_claimed_files + num_offload_files + lto_wrapper_num_args + 2;
   char **lto_argv;
   const char **lto_arg_ptr;
   if (num_claimed_files + num_offload_files == 0)
@@ -648,6 +650,15 @@ all_symbols_read_handler (void)
   for (i = 0; i < lto_wrapper_num_args; i++)
     *lto_arg_ptr++ = lto_wrapper_argv[i];
 
+  assert (linker_output);
+  if (non_claimed_files && !strcmp (linker_output, "-flinker-output=rel"))
+    {
+      linker_output="-flinker-output=nonltorel";
+      message (LDPL_WARNING, "incremental linking of LTO and non-LTO "
+	       "objects will produce final assembly for LTO objects and "
+	       "bypass whole program optimization");
+    }
+  *lto_arg_ptr++ = xstrdup (linker_output);
   for (i = 0; i < num_claimed_files; i++)
     {
       struct plugin_file_info *info = &claimed_files[i];
@@ -985,6 +996,8 @@ claim_file_handler (const struct ld_plug
 		  num_claimed_files * sizeof (struct plugin_file_info));
       claimed_files[num_claimed_files - 1] = lto_file;
     }
+  else
+    non_claimed_files++;
 
   if (obj.found == 0 && obj.offload == 1)
     {
@@ -1054,6 +1067,31 @@ process_option (const char *option)
     }
 }
 
+/* Pass -flinker-output to the wrapper.  */
+
+void
+add_linker_output_option (int val)
+{
+  switch (val)
+    {
+    case LDPO_REL:
+      linker_output = "-flinker-output=rel";
+      break;
+    case LDPO_DYN:
+      linker_output = "-flinker-output=dyn";
+      break;
+    case LDPO_PIE:
+      linker_output = "-flinker-output=pie";
+      break;
+    case LDPO_EXEC:
+      linker_output = "-flinker-output=exec";
+      break;
+    default:
+      message (LDPL_FATAL, "unsupported linker output %i", val);
+      break;
+    }
+}
+
 /* Called by gold after loading the plugin. TV is the transfer vector. */
 
 enum ld_plugin_status
@@ -1100,6 +1138,9 @@ onload (struct ld_plugin_tv *tv)
 	case LDPT_GOLD_VERSION:
 	  gold_version = p->tv_u.tv_val;
 	  break;
+	case LDPT_LINKER_OUTPUT:
+	  add_linker_output_option (p->tv_u.tv_val);
+	  break;
 	default:
 	  break;
 	}

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-25  9:04 [RFC] Getting LTO incremental linking work Jan Hubicka
@ 2015-11-25 11:19 ` Richard Biener
  2015-11-25 15:45   ` H.J. Lu
                     ` (2 more replies)
  2015-11-26  0:24 ` Andi Kleen
  2016-03-16 17:33 ` H.J. Lu
  2 siblings, 3 replies; 23+ messages in thread
From: Richard Biener @ 2015-11-25 11:19 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, ak, hongjiu.lu, ccoutant, iant

On Wed, 25 Nov 2015, Jan Hubicka wrote:

> Hi,
> PR 67548 is about LTO not supporting incremental linking.  I never really
> considered our current incremental linking very useful, because it triggers
> code generation at the incremental link time basically nullifying any
> benefits of whole program optimization and in fact I think it is harmful,
> because it sort of works and w/o any warning produce not very optimized code.
> 
> Basically there are 3 schemes how to make incremental link work
>  1) Turn LTO objects to non-LTO as we do now
>  2) concatenate LTO sections as implemented by Andi and Hj
>  3) Do actual linking of LTO sections
> 
> The problem of current implementation of 1) is that GCC thinks the resulting
> object file will not be used for static linking and thus assume that hidden
> symbols can be turned to static.
> 
> In the log of PR67548 HJ actually pointed out that we do have API at linker
> plugin side which says what type of output is done.  This is cool because we
> can also use it to drop -fpic when building static binary. This is common in
> Firefox, where some objects are built with -fpic and linked to both binaries
> and libraries.
> 
> Moreover we do have all infrastructure ready to implement 3).  Our tree merging
> and symbol table handling is fuly incremental and I think made a patch to 
> implement it today.   The scheme is easy:
> 
>  1) linker plugin is modified to pass -flinker-output to lto wrapper
>     linker-output is either dyn (.so), pie or exec
>     for incremental linking I added .rel for 3) and noltorel for 1)
> 
>     currently it does rel because 3) (nor 2) can not be done when incremnetal
>     linking is done on both LTO and non-LTO objects.

That's because the result would be a "fat" object where both pieces
would be needed.  Btw, I wonder why you are not running into the
same issues as me when producing linker plugin output (the "merged"
LTO IL) that is LTO IL.  Ah, possibly because the link is incremental,
and thus all special-handling of LTO sections is disabled.

>     In this case linker
>     plugin output warings about code quality loss and switch to
>     noltorel.
>  2) with -flinker-ouptut the lto wrapper behaves same way as with
>     -flto-partition=none.
>  3) lto frontend parses -flinker-output and sets our internal flags accordingly.
>     I added new flag_incremental_linking to inform middle-end about the fact
>     that the output is going to be statically linked again.  This disables
>     the privatization of hidden symbols and if set to 2 it also triggers
>     the LTO IL streaming

I wonder why it behaves like -flto-partition=none in the case it does
not need to do LTO IL streaming (which I hope does LTO IL streaming
only?  or does this implement fat objects "correctly"?).  Can't
we still parallelize the build via LTRANS and then incrementally
link the result (I suppose the linker will do that for us with the
linker plugin outputs already?)?

-flto-partition=none itself isn't more memory intensive than
WPA in these days, it's only about compile-time, correct?

Your patch means that Andis/HJs work is no longer needed and we can
drop the section suffixes again?

> The incremental linking with rel mode now streams in all global streams,
> merges trees, merges symbol table, removes unreachable symbols (which are
> result of merging) and streams everything out to .s file.
> 
> I only tested the patch on incremental linnking libbackend.o.  The linking
> time is 46 seconds:
> 
> Execution times (seconds)
>  phase opt and generate  :  35.75 (81%) usr   0.90 (76%) sys  36.63 (81%) wall    5008 kB ( 1%) ggc
>  phase stream in         :   8.57 (19%) usr   0.28 (24%) sys   8.86 (19%) wall  700851 kB (99%) ggc
>  callgraph optimization  :   0.08 ( 0%) usr   0.01 ( 1%) sys   0.08 ( 0%) wall       0 kB ( 0%) ggc
>  ipa dead code removal   :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       0 kB ( 0%) ggc
>  ipa cp                  :   0.36 ( 1%) usr   0.04 ( 3%) sys   0.41 ( 1%) wall   42862 kB ( 6%) ggc
>  ipa inlining heuristics :   0.18 ( 0%) usr   0.02 ( 2%) sys   0.19 ( 0%) wall   26771 kB ( 4%) ggc
>  lto stream inflate      :   3.57 ( 8%) usr   0.14 (12%) sys   3.70 ( 8%) wall       0 kB ( 0%) ggc
>  lto stream deflate      :  20.13 (45%) usr   0.05 ( 4%) sys  19.42 (43%) wall       0 kB ( 0%) ggc
>  lto stream output       :   9.70 (22%) usr   0.32 (27%) sys  10.50 (23%) wall       0 kB ( 0%) ggc
>  ipa lto gimple out      :   0.66 ( 1%) usr   0.24 (20%) sys   1.09 ( 2%) wall    4655 kB ( 1%) ggc
>  ipa lto decl in         :   5.87 (13%) usr   0.11 ( 9%) sys   6.10 (13%) wall  552108 kB (78%) ggc
>  ipa lto decl out        :   2.91 ( 7%) usr   0.16 (14%) sys   3.07 ( 7%) wall       0 kB ( 0%) ggc
>  ipa lto constructors in :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     108 kB ( 0%) ggc
>  ipa lto constructors out:   0.12 ( 0%) usr   0.03 ( 3%) sys   0.13 ( 0%) wall     178 kB ( 0%) ggc
>  ipa lto cgraph I/O      :   0.12 ( 0%) usr   0.02 ( 2%) sys   0.15 ( 0%) wall   70005 kB (10%) ggc
>  ipa lto decl merge      :   0.31 ( 1%) usr   0.00 ( 0%) sys   0.30 ( 1%) wall    1023 kB ( 0%) ggc
>  ipa lto cgraph merge    :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall    7972 kB ( 1%) ggc
>  ipa profile             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
>  ipa pure const          :   0.01 ( 0%) usr   0.01 ( 1%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
>  ipa icf                 :   0.04 ( 0%) usr   0.01 ( 1%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
>  varconst                :   0.02 ( 0%) usr   0.01 ( 1%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
>  TOTAL                 :  44.32             1.18            45.49             707846 kB
> 
> There are few low hanging fruits.  First streaming LTO files is slow because of vprintf:
>         case 1:
>           /* TODO: Print in hex with fast function, important for -flto. */
>           fprintf (f, "\\%03o", c);
>           break;
> a trivial bug to fix, will send separate patch for this.
> 
> Second most of inflate/deflate time goes to compressing and uncompressing
> sections that are being copied. Also something that is trivial to fix, will
> do that in separate patch - this also affects WPA and /tmp space usage.
> 
> The size of library is cut to about a half.
> -rw-r--r-- 1 hubicka _cvsadmin 211854942 Nov 25 09:18 libbackend.a
> -rw-r--r-- 1 hubicka _cvsadmin 121986816 Nov 25 09:16 libbackend.o
> 
> and linking of cc1 binary goes from 1m31s to 1m20s. Because we link
> libbackend.a more than 4 times, it would actually pay back even in GCC setting,
> though i suppose the main utility would be in parallelizing the builds (like
> kernel does).
> 
> WPA stage times are:
> Execution times (seconds)                                                       
>  phase opt and generate  :   3.76 (52%) usr   0.07 ( 6%) sys   3.83 (41%) wall   53777 kB (13%) ggc
>  phase stream in         :   3.04 (42%) usr   0.33 (28%) sys   3.37 (36%) wall  346427 kB (86%) ggc
>  phase stream out        :   0.40 ( 6%) usr   0.78 (66%) sys   2.18 (23%) wall       0 kB ( 0%) ggc
>  callgraph optimization  :   0.05 ( 1%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall      18 kB ( 0%) ggc
>  ipa dead code removal   :   0.46 ( 6%) usr   0.00 ( 0%) sys   0.44 ( 5%) wall       0 kB ( 0%) ggc
>  ipa cp                  :   0.40 ( 6%) usr   0.05 ( 4%) sys   0.47 ( 5%) wall   55439 kB (14%) ggc
>  ipa inlining heuristics :   1.95 (27%) usr   0.02 ( 2%) sys   1.97 (21%) wall   65871 kB (16%) ggc
>  lto stream inflate      :   0.60 ( 8%) usr   0.11 ( 9%) sys   0.67 ( 7%) wall       0 kB ( 0%) ggc
>  ipa lto decl in         :   1.93 (27%) usr   0.18 (15%) sys   2.10 (22%) wall  205593 kB (51%) ggc
>  ipa lto decl out        :   0.28 ( 4%) usr   0.02 ( 2%) sys   0.29 ( 3%) wall       0 kB ( 0%) ggc
>  ipa lto cgraph I/O      :   0.09 ( 1%) usr   0.02 ( 2%) sys   0.12 ( 1%) wall   62797 kB (16%) ggc
>  ipa lto decl merge      :   0.20 ( 3%) usr   0.00 ( 0%) sys   0.20 ( 2%) wall    1023 kB ( 0%) ggc
>  whopr partitioning      :   0.56 ( 8%) usr   0.00 ( 0%) sys   0.56 ( 6%) wall    1419 kB ( 0%) ggc
>  ipa reference           :   0.17 ( 2%) usr   0.00 ( 0%) sys   0.17 ( 2%) wall       0 kB ( 0%) ggc
>  ipa pure const          :   0.17 ( 2%) usr   0.00 ( 0%) sys   0.16 ( 2%) wall       0 kB ( 0%) ggc
>  ipa icf                 :   0.07 ( 1%) usr   0.00 ( 0%) sys   0.07 ( 1%) wall     485 kB ( 0%) ggc
>  unaccounted todo        :   0.06 ( 1%) usr   0.00 ( 0%) sys   0.06 ( 1%) wall       0 kB ( 0%) ggc
>  TOTAL                 :   7.20             1.18             9.39             402192 kB
> 
> 
> Execution times (seconds)                                                       
>  phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1986 kB ( 0%) ggc
>  phase opt and generate  :   6.66 (39%) usr   0.38 (22%) sys   7.03 (36%) wall  199143 kB (21%) ggc
>  phase stream in         :   9.33 (54%) usr   0.38 (22%) sys   9.71 (50%) wall  764698 kB (79%) ggc
>  phase stream out        :   0.82 ( 5%) usr   0.97 (55%) sys   2.23 (11%) wall       2 kB ( 0%) ggc
>  phase finalize          :   0.40 ( 2%) usr   0.03 ( 2%) sys   0.43 ( 2%) wall       0 kB ( 0%) ggc
>  garbage collection      :   0.79 ( 5%) usr   0.01 ( 1%) sys   0.80 ( 4%) wall       0 kB ( 0%) ggc
>  ipa dead code removal   :   0.41 ( 2%) usr   0.00 ( 0%) sys   0.45 ( 2%) wall       0 kB ( 0%) ggc
>  ipa cp                  :   0.33 ( 2%) usr   0.05 ( 3%) sys   0.41 ( 2%) wall   56753 kB ( 6%) ggc
>  ipa inlining heuristics :   1.74 (10%) usr   0.02 ( 1%) sys   1.80 ( 9%) wall   55600 kB ( 6%) ggc
>  lto stream inflate      :   2.18 (13%) usr   0.12 ( 7%) sys   2.28 (12%) wall       0 kB ( 0%) ggc
>  ipa lto gimple in       :   0.62 ( 4%) usr   0.23 (13%) sys   0.96 ( 5%) wall  135317 kB (14%) ggc
>  ipa lto decl in         :   6.63 (39%) usr   0.15 ( 9%) sys   6.70 (35%) wall  598144 kB (62%) ggc
>  ipa lto decl out        :   0.55 ( 3%) usr   0.01 ( 1%) sys   0.57 ( 3%) wall       0 kB ( 0%) ggc
>  ipa lto cgraph I/O      :   0.14 ( 1%) usr   0.03 ( 2%) sys   0.15 ( 1%) wall   76843 kB ( 8%) ggc
>  ipa lto decl merge      :   0.35 ( 2%) usr   0.00 ( 0%) sys   0.34 ( 2%) wall    1023 kB ( 0%) ggc
>  ipa lto cgraph merge    :   0.13 ( 1%) usr   0.00 ( 0%) sys   0.13 ( 1%) wall    9284 kB ( 1%) ggc
>  whopr partitioning      :   0.51 ( 3%) usr   0.00 ( 0%) sys   0.50 ( 3%) wall    1496 kB ( 0%) ggc
>  ipa reference           :   0.18 ( 1%) usr   0.00 ( 0%) sys   0.19 ( 1%) wall       0 kB ( 0%) ggc
>  ipa pure const          :   0.20 ( 1%) usr   0.01 ( 1%) sys   0.20 ( 1%) wall       0 kB ( 0%) ggc
>  ipa icf                 :   1.82 (11%) usr   0.05 ( 3%) sys   1.85 (10%) wall    2138 kB ( 0%) ggc
>  tree operand scan       :   0.13 ( 1%) usr   0.06 ( 3%) sys   0.17 ( 1%) wall   21674 kB ( 2%) ggc
>  TOTAL                 :  17.21             1.76            19.41             965830 kB
> 
> so 50% cut in memory use and resonable speedup. I need to check what happens
> with ICF.
> 
> The WPA stats are as follows:
> WPA statistics
> [WPA] read 891308 SCCs of average size 1.972195
> [WPA] 1757833 tree bodies read in total
> [WPA] tree SCC table: size 524287, 230881 elements, collision ratio: 1.107788
> [WPA] tree SCC max chain length 39 (size 1)
> [WPA] Compared 73318 SCCs, 81315 collisions (1.109073)
> [WPA] Merged 52578 SCCs
> [WPA] Merged 502850 tree bodies
> [WPA] Merged 36730 types
> [WPA] 205971 types prevailed (565069 associated trees)
> [WPA] GIMPLE canonical type table: size 16381, 1251 elements, 28138 searches, 444 collisions (ratio: 0.015779)
> [WPA] GIMPLE canonical type pointer-map: 1251 elements, 99917 searches
> [WPA] # of input files: 125
> [WPA] Compression: 23123694 input bytes, 79799028 uncompressed bytes (ratio: 3.450963)
> [WPA] Size of mmap'd section decls: 23123694 bytes
> 
> compoared to
> WPA statistics
> [WPA] read 3633234 SCCs of average size 2.539347
> [WPA] 9226041 tree bodies read in total
> [WPA] tree SCC table: size 524287, 257562 elements, collision ratio: 0.673833
> [WPA] tree SCC max chain length 39 (size 1)
> [WPA] Compared 500618 SCCs, 646007 collisions (1.290419)
> [WPA] Merged 478513 SCCs
> [WPA] Merged 5659960 tree bodies
> [WPA] Merged 326141 types
> [WPA] 207806 types prevailed (562649 associated trees)
> [WPA] GIMPLE canonical type table: size 16381, 1246 elements, 27925 searches, 437 collisions (ratio: 0.015649)
> [WPA] GIMPLE canonical type pointer-map: 1246 elements, 97858 searches
> [WPA] # of input files: 461
> [WPA] Compression: 95695388 input bytes, 303240971 uncompressed bytes (ratio: 3.168815)
> [WPA] Size of mmap'd section decls: 95695388 bytes
> 
> So about 5fold improvement in number of trees and decls read. By end of WPA:
> 
> [WPA] 1757833 tree bodies read in total
> [WPA] # of input files: 125
> [WPA] # of input cgraph nodes: 36977
> [WPA] # of function bodies: 651
> [WPA] # of output files: 31
> [WPA] # of output symtab nodes: 185336
> [WPA] # of output tree pickle references: 629336
> [WPA] # of output tree bodies: 129898
> [WPA] # callgraph partitions: 31
> [WPA] Compression: 30134544 input bytes, 100590102 uncompressed bytes (ratio: 3.338033)
> [WPA] Size of mmap'd section decls: 23123694 bytes
> [WPA] Size of mmap'd section function_body: 2641029 bytes
> [WPA] Size of mmap'd section statics: 0 bytes
> [WPA] Size of mmap'd section symtab: 0 bytes
> [WPA] Size of mmap'd section refs: 408500 bytes
> [WPA] Size of mmap'd section asm: 0 bytes
> [WPA] Size of mmap'd section jmpfuncs: 1432063 bytes
> [WPA] Size of mmap'd section pureconst: 80213 bytes
> [WPA] Size of mmap'd section reference: 0 bytes
> [WPA] Size of mmap'd section profile: 2439 bytes
> [WPA] Size of mmap'd section symbol_nodes: 1413364 bytes
> [WPA] Size of mmap'd section opts: 0 bytes
> [WPA] Size of mmap'd section cgraphopt: 0 bytes
> [WPA] Size of mmap'd section inline: 1005113 bytes
> [WPA] Size of mmap'd section ipcp_trans: 0 bytes
> [WPA] Size of mmap'd section icf: 28129 bytes
> [WPA] Size of mmap'd section offload_table: 0 bytes
> [WPA] Size of mmap'd section mode_table: 0 bytes
> 
> [WPA] 9226041 tree bodies read in total
> [WPA] # of input files: 461
> [WPA] # of input cgraph nodes: 36888
> [WPA] # of function bodies: 7690
> [WPA] # of output files: 31
> [WPA] # of output symtab nodes: 191489
> [WPA] # of output tree pickle references: 1444221
> [WPA] # of output tree bodies: 261141
> [WPA] # callgraph partitions: 31
> [WPA] Compression: 112942159 input bytes, 347530231 uncompressed bytes (ratio: 3.077064)
> [WPA] Size of mmap'd section decls: 95695388 bytes
> [WPA] Size of mmap'd section function_body: 11747200 bytes
> [WPA] Size of mmap'd section statics: 0 bytes
> [WPA] Size of mmap'd section symtab: 0 bytes
> [WPA] Size of mmap'd section refs: 395831 bytes
> [WPA] Size of mmap'd section asm: 0 bytes
> [WPA] Size of mmap'd section jmpfuncs: 1666954 bytes
> [WPA] Size of mmap'd section pureconst: 94608 bytes
> [WPA] Size of mmap'd section reference: 0 bytes
> [WPA] Size of mmap'd section profile: 9259 bytes
> [WPA] Size of mmap'd section symbol_nodes: 1769069 bytes
> [WPA] Size of mmap'd section opts: 0 bytes
> [WPA] Size of mmap'd section cgraphopt: 0 bytes
> [WPA] Size of mmap'd section inline: 1266586 bytes
> [WPA] Size of mmap'd section ipcp_trans: 0 bytes
> [WPA] Size of mmap'd section icf: 297264 bytes
> [WPA] Size of mmap'd section offload_table: 0 bytes
> [WPA] Size of mmap'd section mode_table: 0 bytes
> 
> Does anyone see problems with this approach? I think this is easy enough 
> and fixes PR67548 so it may still get to mainline?

Yes, it would be a very nice feature to have indeed.

I don't see anything trying to change things with the collect2 path?

> I need to do more testing, but in general I think the implemntation is OK 
> as it is.  We need a way to force noltorel model for testsuite, as the
> new default will bypass codegen for all our -r -nostdlib testcases.

Maybe we can turn most of them to -shared?

> BTW ltrans now dies with -ftime-report. Any ideas why?

It works for me.

Some comments below.

Richard.

> 
> Honza
> 
> Index: gcc/common.opt
> ===================================================================
> --- gcc/common.opt	(revision 230847)
> +++ gcc/common.opt	(working copy)
> @@ -46,6 +46,13 @@ int optimize_fast
>  Variable
>  bool in_lto_p = false
>  
> +; This variable is set to non-0 only by LTO front-end.  1 indicates that
> +; the output produced will be used for incrmeental linking (thus weak symbols
> +; can still be bound) and 2 indicates that the IL is going to be linked and
> +; and output to LTO object file.
> +Variable
> +int flag_incremental_link = 0
> +
>  ; 0 means straightforward implementation of complex divide acceptable.
>  ; 1 means wide ranges of inputs must work for complex divide.
>  ; 2 means C99-like requirements for complex multiply and divide.
> Index: gcc/lto-streamer-out.c
> ===================================================================
> --- gcc/lto-streamer-out.c	(revision 230847)
> +++ gcc/lto-streamer-out.c	(working copy)
> @@ -2286,13 +2286,16 @@ lto_output (void)
>  		}
>  	      decl_state = lto_new_out_decl_state ();
>  	      lto_push_out_decl_state (decl_state);
> -	      if (gimple_has_body_p (node->decl) || !flag_wpa
> +	      if (gimple_has_body_p (node->decl)
>  		  /* Thunks have no body but they may be synthetized
>  		     at WPA time.  */
>  		  || DECL_ARGUMENTS (node->decl))
>  		output_function (node);
>  	      else
> -		copy_function_or_variable (node);
> +		{
> +		  gcc_checking_assert (flag_wpa || flag_incremental_link == 2);
> +		  copy_function_or_variable (node);
> +		}
>  	      gcc_assert (lto_get_out_decl_state () == decl_state);
>  	      lto_pop_out_decl_state ();
>  	      lto_record_function_out_decl_state (node->decl, decl_state);
> @@ -2318,7 +2321,7 @@ lto_output (void)
>  	      decl_state = lto_new_out_decl_state ();
>  	      lto_push_out_decl_state (decl_state);
>  	      if (DECL_INITIAL (node->decl) != error_mark_node
> -		  || !flag_wpa)
> +		  || (!flag_wpa && flag_incremental_link != 2))
>  		output_constructor (node);
>  	      else
>  		copy_function_or_variable (node);
> Index: gcc/passes.c
> ===================================================================
> --- gcc/passes.c	(revision 230847)
> +++ gcc/passes.c	(working copy)
> @@ -2530,7 +2530,7 @@ ipa_write_summaries (void)
>      {
>        struct cgraph_node *node = order[i];
>  
> -      if (node->has_gimple_body_p ())
> +      if (gimple_has_body_p (node->decl))

?

>  	{
>  	  /* When streaming out references to statements as part of some IPA
>  	     pass summary, the statements need to have uids assigned and the
> Index: gcc/cgraphunit.c
> ===================================================================
> --- gcc/cgraphunit.c	(revision 230847)
> +++ gcc/cgraphunit.c	(working copy)
> @@ -2270,8 +2270,10 @@ ipa_passes (void)
>    if (flag_generate_lto || flag_generate_offload)
>      targetm.asm_out.lto_start ();
>  
> -  if (!in_lto_p)
> +  if (!in_lto_p || flag_incremental_link == 2)
>      {
> +      if (!quiet_flag)
> +	fprintf (stderr, "Streaming LTO\n");
>        if (g->have_offload)
>  	{
>  	  section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
> @@ -2290,7 +2292,9 @@ ipa_passes (void)
>    if (flag_generate_lto || flag_generate_offload)
>      targetm.asm_out.lto_end ();
>  
> -  if (!flag_ltrans && (in_lto_p || !flag_lto || flag_fat_lto_objects))
> +  if (!flag_ltrans
> +      && ((in_lto_p && flag_incremental_link != 2)
> +	  || !flag_lto || flag_fat_lto_objects))
>      execute_ipa_pass_list (passes->all_regular_ipa_passes);
>    invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_END, NULL);
>  
> @@ -2381,7 +2385,8 @@ symbol_table::compile (void)
>  
>    /* Do nothing else if any IPA pass found errors or if we are just streaming LTO.  */
>    if (seen_error ()
> -      || (!in_lto_p && flag_lto && !flag_fat_lto_objects))
> +      || ((!in_lto_p || flag_incremental_link == 2)
> +	  && flag_lto && !flag_fat_lto_objects))
>      {
>        timevar_pop (TV_CGRAPHOPT);
>        return;
> Index: gcc/lto-cgraph.c
> ===================================================================
> --- gcc/lto-cgraph.c	(revision 230847)
> +++ gcc/lto-cgraph.c	(working copy)
> @@ -534,7 +534,10 @@ lto_output_node (struct lto_simple_outpu
>    bp_pack_value (&bp, node->thunk.thunk_p, 1);
>    bp_pack_value (&bp, node->parallelized_function, 1);
>    bp_pack_enum (&bp, ld_plugin_symbol_resolution,
> -	        LDPR_NUM_KNOWN, node->resolution);
> +	        LDPR_NUM_KNOWN,
> +		/* When doing incremental link, we will get new resolution
> +		   info next time we process the file.  */
> +		flag_incremental_link ? LDPR_UNKNOWN : node->resolution);
>    bp_pack_value (&bp, node->instrumentation_clone, 1);
>    bp_pack_value (&bp, node->split_part, 1);
>    streamer_write_bitpack (&bp);
> Index: gcc/toplev.c
> ===================================================================
> --- gcc/toplev.c	(revision 230847)
> +++ gcc/toplev.c	(working copy)
> @@ -504,7 +504,8 @@ compile_file (void)
>  
>    /* Compilation unit is finalized.  When producing non-fat LTO object, we are
>       basically finished.  */
> -  if (in_lto_p || !flag_lto || flag_fat_lto_objects)
> +  if ((in_lto_p && flag_incremental_link != 2)
> +      || !flag_lto || flag_fat_lto_objects)
>      {
>        /* File-scope initialization for AddressSanitizer.  */
>        if (flag_sanitize & SANITIZE_ADDRESS)
> Index: gcc/flag-types.h
> ===================================================================
> --- gcc/flag-types.h	(revision 230847)
> +++ gcc/flag-types.h	(working copy)
> @@ -265,6 +265,15 @@ enum lto_partition_model {
>    LTO_PARTITION_MAX = 4
>  };
>  
> +/* flag_lto_linker_output initialization values.  */
> +enum lto_linker_output {
> +  LTO_LINKER_OUTPUT_UNKNOWN,
> +  LTO_LINKER_OUTPUT_REL,
> +  LTO_LINKER_OUTPUT_NOLTOREL,
> +  LTO_LINKER_OUTPUT_DYN,
> +  LTO_LINKER_OUTPUT_PIE,
> +  LTO_LINKER_OUTPUT_EXEC
> +};
>  
>  /* gfortran -finit-real= values.  */
>  
> Index: gcc/lto/lto.c
> ===================================================================
> --- gcc/lto/lto.c	(revision 230847)
> +++ gcc/lto/lto.c	(working copy)
> @@ -3188,6 +3188,8 @@ lto_eh_personality (void)
>  static void 
>  lto_process_name (void)
>  {
> +  if (flag_incremental_link == 2)
> +    setproctitle ("lto1-incremental-link");
>    if (flag_lto)
>      setproctitle ("lto1-lto");
>    if (flag_wpa)
> Index: gcc/lto/lang.opt
> ===================================================================
> --- gcc/lto/lang.opt	(revision 230847)
> +++ gcc/lto/lang.opt	(working copy)
> @@ -24,6 +24,32 @@
>  Language
>  LTO
>  
> +Enum
> +Name(lto_linker_output) Type(enum lto_linker_output) UnknownError(unknown linker output %qs)
> +
> +EnumValue
> +Enum(lto_linker_output) String(unknown) Value(LTO_LINKER_OUTPUT_UNKNOWN)
> +
> +EnumValue
> +Enum(lto_linker_output) String(rel) Value(LTO_LINKER_OUTPUT_REL)
> +
> +EnumValue
> +Enum(lto_linker_output) String(noltorel) Value(LTO_LINKER_OUTPUT_NOLTOREL)
> +
> +EnumValue
> +Enum(lto_linker_output) String(dyn) Value(LTO_LINKER_OUTPUT_DYN)
> +
> +EnumValue
> +Enum(lto_linker_output) String(pie) Value(LTO_LINKER_OUTPUT_PIE)
> +
> +EnumValue
> +Enum(lto_linker_output) String(exec) Value(LTO_LINKER_OUTPUT_EXEC)
> +
> +flinker-output=
> +LTO Report Driver Joined RejectNegative Enum(lto_linker_output) Var(flag_lto_linker_output) Init(LTO_LINKER_OUTPUT_UNKNOWN)
> +Set linker output type (used internally during LTO optimization)
> +
> +
>  fltrans
>  LTO Report Var(flag_ltrans)
>  Run the link-time optimizer in local transformation (LTRANS) mode.
> Index: gcc/lto/lto-lang.c
> ===================================================================
> --- gcc/lto/lto-lang.c	(revision 230847)
> +++ gcc/lto/lto-lang.c	(working copy)
> @@ -819,6 +819,56 @@ lto_post_options (const char **pfilename
>    if (flag_wpa)
>      flag_generate_lto = 1;
>  
> +  /* Initialize the codegen flags according to the output type.  */
> +  switch (flag_lto_linker_output)
> +    {
> +    case LTO_LINKER_OUTPUT_REL: /* .o: incremental link producing LTO IL  */
> +      /* Configure compiler same way as normal frontend would do with -flto:
> +	 this way we read the trees (declarations & types), symbol table,
> +	 optimization summaries and link them. Subsequently we output new LTO
> +	 file.  */
> +      flag_lto = "";
> +      flag_incremental_link = 2;
> +      flag_whole_program = 0;
> +      flag_wpa = 0;
> +      flag_generate_lto = 1;
> +      /* It would be cool to produce .o file directly, but our current
> +	 simple objects does not contain the lto symbol markers.  Go the slow
> +	 way through the asm file.  */

We should get away from the symbol markers and instead rely on section
names.  Not in this patch of course.

> +      lang_hooks.lto.begin_section = lhd_begin_section;
> +      lang_hooks.lto.append_data = lhd_append_data;
> +      lang_hooks.lto.end_section = lhd_end_section;
> +      if (flag_ltrans)
> +	error ("-flinker-output=rel and -fltrans are mutually exclussive");
> +      break;
> +
> +    case LTO_LINKER_OUTPUT_NOLTOREL: /* .o: incremental link producing asm  */
> +      flag_whole_program = 0;
> +      flag_incremental_link = 1;
> +      break;
> +
> +    case LTO_LINKER_OUTPUT_DYN: /* .so: PID library */
> +      /* On some targets, like i386 it makes sense to build PIC library wihout
> +	 -fpic for performance reasons.  So no need to adjust flags.  */
> +      break;
> +
> +    case LTO_LINKER_OUTPUT_PIE: /* PIE binary */
> +      /* If -fPIC or -fPIE was used at compile time, be sure that
> +         flag_pie is 2.  */
> +      if (!flag_pie && flag_pic)
> +	flag_pie = flag_pic;
> +      flag_pic = 0;

The code doesn't seem to do what the comment says...

> +      break;
> +
> +    case LTO_LINKER_OUTPUT_EXEC: /* Normal executable */
> +      flag_pic = 0;
> +      flag_pie = 0;
> +      break;
> +
> +    case LTO_LINKER_OUTPUT_UNKNOWN:
> +      break;
> +    }
> +
>    /* Excess precision other than "fast" requires front-end
>       support.  */
>    flag_excess_precision_cmdline = EXCESS_PRECISION_FAST;
> @@ -1214,7 +1264,7 @@ lto_init (void)
>    int i;
>  
>    /* We need to generate LTO if running in WPA mode.  */
> -  flag_generate_lto = (flag_wpa != NULL);
> +  flag_generate_lto = (flag_incremental_link == 2 || flag_wpa != NULL);
>  
>    /* Create the basic integer types.  */
>    build_common_tree_nodes (flag_signed_char, flag_short_double);
> Index: gcc/ipa-visibility.c
> ===================================================================
> --- gcc/ipa-visibility.c	(revision 230847)
> +++ gcc/ipa-visibility.c	(working copy)
> @@ -217,13 +217,13 @@ cgraph_externally_visible_p (struct cgra
>       This improves code quality and we know we will duplicate them at most twice
>       (in the case that we are not using plugin and link with object file
>        implementing same COMDAT)  */
> -  if ((in_lto_p || whole_program)
> +  if (((in_lto_p || whole_program) && !flag_incremental_link)
>        && DECL_COMDAT (node->decl)
>        && comdat_can_be_unshared_p (node))
>      return false;
>  
>    /* When doing link time optimizations, hidden symbols become local.  */
> -  if (in_lto_p
> +  if ((in_lto_p && !flag_incremental_link)
>        && (DECL_VISIBILITY (node->decl) == VISIBILITY_HIDDEN
>  	  || DECL_VISIBILITY (node->decl) == VISIBILITY_INTERNAL)
>        /* Be sure that node is defined in IR file, not in other object
> @@ -293,13 +293,13 @@ varpool_node::externally_visible_p (void
>       so this does not enable more optimization, but referring static var
>       is faster for dynamic linking.  Also this match logic hidding vtables
>       from LTO symbol tables.  */
> -  if ((in_lto_p || flag_whole_program)
> +  if (((in_lto_p || flag_whole_program) && !flag_incremental_link)
>        && DECL_COMDAT (decl)
>        && comdat_can_be_unshared_p (this))
>      return false;
>  
>    /* When doing link time optimizations, hidden symbols become local.  */
> -  if (in_lto_p
> +  if (in_lto_p && !flag_incremental_link
>        && (DECL_VISIBILITY (decl) == VISIBILITY_HIDDEN
>  	  || DECL_VISIBILITY (decl) == VISIBILITY_INTERNAL)
>        /* Be sure that node is defined in IR file, not in other object
> Index: gcc/lto-wrapper.c
> ===================================================================
> --- gcc/lto-wrapper.c	(revision 230847)
> +++ gcc/lto-wrapper.c	(working copy)
> @@ -953,9 +953,15 @@ run_gcc (unsigned argc, char *argv[])
>  	  file_offset = (off_t) loffset;
>  	}
>        fd = open (filename, O_RDONLY | O_BINARY);
> +      /* Linker plugin passes -fresolution and -flinker-output options.  */
>        if (fd == -1)
>  	{
>  	  lto_argv[lto_argc++] = argv[i];
> +	  if (strcmp (argv[i], "-flinker-output=rel") == 0)
> +	    {
> +	       no_partition = true;
> +	       lto_mode = LTO_MODE_LTO;
> +	    }
>  	  continue;
>  	}
>  
> Index: lto-plugin/lto-plugin.c
> ===================================================================
> --- lto-plugin/lto-plugin.c	(revision 230847)
> +++ lto-plugin/lto-plugin.c	(working copy)
> @@ -151,6 +151,7 @@ static ld_plugin_add_symbols add_symbols
>  
>  static struct plugin_file_info *claimed_files = NULL;
>  static unsigned int num_claimed_files = 0;
> +static unsigned int non_claimed_files = 0;
>  
>  static struct plugin_file_info *offload_files = NULL;
>  static unsigned int num_offload_files = 0;
> @@ -167,6 +168,7 @@ static unsigned int num_pass_through_ite
>  static char debug;
>  static char nop;
>  static char *resolution_file = NULL;
> +static const char *linker_output = NULL;
>  
>  /* The version of gold being used, or -1 if not gold.  The number is
>     MAJOR * 100 + MINOR.  */
> @@ -624,7 +626,7 @@ all_symbols_read_handler (void)
>  {
>    unsigned i;
>    unsigned num_lto_args
> -    = num_claimed_files + num_offload_files + lto_wrapper_num_args + 1;
> +    = num_claimed_files + num_offload_files + lto_wrapper_num_args + 2;
>    char **lto_argv;
>    const char **lto_arg_ptr;
>    if (num_claimed_files + num_offload_files == 0)
> @@ -648,6 +650,15 @@ all_symbols_read_handler (void)
>    for (i = 0; i < lto_wrapper_num_args; i++)
>      *lto_arg_ptr++ = lto_wrapper_argv[i];
>  
> +  assert (linker_output);
> +  if (non_claimed_files && !strcmp (linker_output, "-flinker-output=rel"))
> +    {
> +      linker_output="-flinker-output=nonltorel";
> +      message (LDPL_WARNING, "incremental linking of LTO and non-LTO "
> +	       "objects will produce final assembly for LTO objects and "
> +	       "bypass whole program optimization");
> +    }
> +  *lto_arg_ptr++ = xstrdup (linker_output);
>    for (i = 0; i < num_claimed_files; i++)
>      {
>        struct plugin_file_info *info = &claimed_files[i];
> @@ -985,6 +996,8 @@ claim_file_handler (const struct ld_plug
>  		  num_claimed_files * sizeof (struct plugin_file_info));
>        claimed_files[num_claimed_files - 1] = lto_file;
>      }
> +  else
> +    non_claimed_files++;
>  
>    if (obj.found == 0 && obj.offload == 1)
>      {
> @@ -1054,6 +1067,31 @@ process_option (const char *option)
>      }
>  }
>  
> +/* Pass -flinker-output to the wrapper.  */
> +
> +void
> +add_linker_output_option (int val)
> +{
> +  switch (val)
> +    {
> +    case LDPO_REL:
> +      linker_output = "-flinker-output=rel";
> +      break;
> +    case LDPO_DYN:
> +      linker_output = "-flinker-output=dyn";
> +      break;
> +    case LDPO_PIE:
> +      linker_output = "-flinker-output=pie";
> +      break;
> +    case LDPO_EXEC:
> +      linker_output = "-flinker-output=exec";
> +      break;
> +    default:
> +      message (LDPL_FATAL, "unsupported linker output %i", val);
> +      break;
> +    }
> +}
> +
>  /* Called by gold after loading the plugin. TV is the transfer vector. */
>  
>  enum ld_plugin_status
> @@ -1100,6 +1138,9 @@ onload (struct ld_plugin_tv *tv)
>  	case LDPT_GOLD_VERSION:
>  	  gold_version = p->tv_u.tv_val;
>  	  break;
> +	case LDPT_LINKER_OUTPUT:
> +	  add_linker_output_option (p->tv_u.tv_val);
> +	  break;
>  	default:
>  	  break;
>  	}

I wonder what this does to old toolchains using the linker plugin
with this change.  I suppose it will fail with an "unknown option"
error.

Not sure what to do about this though given the plugin doesn't
really know which GCC it is targeting.  An idea would be to
spawn another enviroment from the driver like
COLLECT_GCC_LTO_WRAPPER_VER=2 and only adding this option if
that is present and >= 2?

I don't think LTO will work properly when you invoke ld directly
as lto-wrapper expects COLLECT_GCC[_OPTIONS] to be set.

Otherwise the patch looks straight-forward to me...

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-25 11:19 ` Richard Biener
@ 2015-11-25 15:45   ` H.J. Lu
  2015-11-25 19:21     ` Jan Hubicka
  2015-11-25 18:54   ` Jan Hubicka
  2015-11-25 23:59   ` Andi Kleen
  2 siblings, 1 reply; 23+ messages in thread
From: H.J. Lu @ 2015-11-25 15:45 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jan Hubicka, GCC Patches, Andi Kleen, Cary Coutant, Ian Lance Taylor

On Wed, Nov 25, 2015 at 3:15 AM, Richard Biener <rguenther@suse.de> wrote:
> On Wed, 25 Nov 2015, Jan Hubicka wrote:
>
>> Hi,
>> PR 67548 is about LTO not supporting incremental linking.  I never really
>> considered our current incremental linking very useful, because it triggers
>> code generation at the incremental link time basically nullifying any
>> benefits of whole program optimization and in fact I think it is harmful,
>> because it sort of works and w/o any warning produce not very optimized code.
>>
>> Basically there are 3 schemes how to make incremental link work
>>  1) Turn LTO objects to non-LTO as we do now
>>  2) concatenate LTO sections as implemented by Andi and Hj
>>  3) Do actual linking of LTO sections
>>
>> The problem of current implementation of 1) is that GCC thinks the resulting
>> object file will not be used for static linking and thus assume that hidden
>> symbols can be turned to static.
>>
>> In the log of PR67548 HJ actually pointed out that we do have API at linker
>> plugin side which says what type of output is done.  This is cool because we
>> can also use it to drop -fpic when building static binary. This is common in
>> Firefox, where some objects are built with -fpic and linked to both binaries
>> and libraries.
>>
>> Moreover we do have all infrastructure ready to implement 3).  Our tree merging
>> and symbol table handling is fuly incremental and I think made a patch to
>> implement it today.   The scheme is easy:
>>
>>  1) linker plugin is modified to pass -flinker-output to lto wrapper
>>     linker-output is either dyn (.so), pie or exec
>>     for incremental linking I added .rel for 3) and noltorel for 1)
>>
>>     currently it does rel because 3) (nor 2) can not be done when incremnetal
>>     linking is done on both LTO and non-LTO objects.
>
> That's because the result would be a "fat" object where both pieces
> would be needed.  Btw, I wonder why you are not running into the
> same issues as me when producing linker plugin output (the "merged"
> LTO IL) that is LTO IL.  Ah, possibly because the link is incremental,
> and thus all special-handling of LTO sections is disabled.
>
>>     In this case linker
>>     plugin output warings about code quality loss and switch to
>>     noltorel.
>>  2) with -flinker-ouptut the lto wrapper behaves same way as with
>>     -flto-partition=none.
>>  3) lto frontend parses -flinker-output and sets our internal flags accordingly.
>>     I added new flag_incremental_linking to inform middle-end about the fact
>>     that the output is going to be statically linked again.  This disables
>>     the privatization of hidden symbols and if set to 2 it also triggers
>>     the LTO IL streaming
>
> I wonder why it behaves like -flto-partition=none in the case it does
> not need to do LTO IL streaming (which I hope does LTO IL streaming
> only?  or does this implement fat objects "correctly"?).  Can't
> we still parallelize the build via LTRANS and then incrementally
> link the result (I suppose the linker will do that for us with the
> linker plugin outputs already?)?
>
> -flto-partition=none itself isn't more memory intensive than
> WPA in these days, it's only about compile-time, correct?
>
> Your patch means that Andis/HJs work is no longer needed and we can
> drop the section suffixes again?
>
>

There is a difference between "ld -r " and "gcc -r". "ld -r" may not
perform any LTO.

-- 
H.J.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-25 11:19 ` Richard Biener
  2015-11-25 15:45   ` H.J. Lu
@ 2015-11-25 18:54   ` Jan Hubicka
  2015-11-26 10:15     ` Richard Biener
  2015-11-25 23:59   ` Andi Kleen
  2 siblings, 1 reply; 23+ messages in thread
From: Jan Hubicka @ 2015-11-25 18:54 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, gcc-patches, ak, hongjiu.lu, ccoutant, iant

> > 
> >  1) linker plugin is modified to pass -flinker-output to lto wrapper
> >     linker-output is either dyn (.so), pie or exec
> >     for incremental linking I added .rel for 3) and noltorel for 1)
> > 
> >     currently it does rel because 3) (nor 2) can not be done when incremnetal
> >     linking is done on both LTO and non-LTO objects.
> 
> That's because the result would be a "fat" object where both pieces
> would be needed.  Btw, I wonder why you are not running into the

Yep, we woud end up with both LTO and non-LTO in one object and because
we have no way to claim just part of it in next linking, the non-LTO will
be ignored (just as is the case with far objects)

> same issues as me when producing linker plugin output (the "merged"
> LTO IL) that is LTO IL.  Ah, possibly because the link is incremental,
> and thus all special-handling of LTO sections is disabled.

Yep, i just throw in the LTO IL and linker passes it through .
> 
> >     In this case linker
> >     plugin output warings about code quality loss and switch to
> >     noltorel.
> >  2) with -flinker-ouptut the lto wrapper behaves same way as with
> >     -flto-partition=none.
> >  3) lto frontend parses -flinker-output and sets our internal flags accordingly.
> >     I added new flag_incremental_linking to inform middle-end about the fact
> >     that the output is going to be statically linked again.  This disables
> >     the privatization of hidden symbols and if set to 2 it also triggers
> >     the LTO IL streaming
> 
> I wonder why it behaves like -flto-partition=none in the case it does
> not need to do LTO IL streaming (which I hope does LTO IL streaming
> only?  or does this implement fat objects "correctly"?).  Can't

Yes, I do stream LTO il into assembler file, like normal -flto build would do
for non-lto1 frontend.  So I produce one .s file that I need assembler to be
called on.  By default lto-wrapper thinks we do WPA and it would look for list
of ltrans partitions and execute ltranses that I do not want to happen.

Since no codegen is done we have no use for ltranses.  It would be nice to spit
the .o file through simple-object interface.  Sadly we can't do that because
simple-object won't put the LTO marker symbols in.  Something I want to track
and drop assembler stage from LTO generaltion in general
https://gcc.gnu.org/ml/gcc/2014-09/msg00340.html

Well, one case where WPA would help is production of fat-objects.  Currently
it works (by compiling the LTO data into assembly again) but it is not done
in parallel.  I suppose we could deal with this later - it is non-critical.
My longer term plan is to make WPA parallelization independent of LTO - it
makes sense when you build one large non-LTO object, too.

> we still parallelize the build via LTRANS and then incrementally
> link the result (I suppose the linker will do that for us with the
> linker plugin outputs already?)?
> 
> -flto-partition=none itself isn't more memory intensive than
> WPA in these days, it's only about compile-time, correct?

It is.  Just by streaming everything in and out we "compress" the memory layout
noticeably.  -flto-partition=one has smaller peak than -flto-partition=none.
But again, here all this triggers with -ffat-objects only.
> 
> Your patch means that Andis/HJs work is no longer needed and we can
> drop the section suffixes again?

Maybe. It is different implementation of same thing. They can be both used,
though I suppose real incremental linking is better in longer term than
section merging.
> > 
> > Does anyone see problems with this approach? I think this is easy enough 
> > and fixes PR67548 so it may still get to mainline?
> 
> Yes, it would be a very nice feature to have indeed.
> 
> I don't see anything trying to change things with the collect2 path?

Hmm, with collect2 we don't even support static libraries, do we need to support
incremental link?  I suppose collect2 can recognize -r and LTO objects and spawn
the linker same way.
> 
> > I need to do more testing, but in general I think the implemntation is OK 
> > as it is.  We need a way to force noltorel model for testsuite, as the
> > new default will bypass codegen for all our -r -nostdlib testcases.
> 
> Maybe we can turn most of them to -shared?

Would that work on all targets? (i.e. mingw?).
For testing purposes I suppose I will add a flag. It should also silence the linker
plugin warning about generating assembly early. -rno-lto perhaps?
> >        struct cgraph_node *node = order[i];
> >  
> > -      if (node->has_gimple_body_p ())
> > +      if (gimple_has_body_p (node->decl))
> 
> ?

node->has_gimple_body_p returns true for if gimple body is available, but not neccesarily
read to memory (in WPA), while gimple_has_body_p returns true only when body is in memory.
The statement renumbering which is guarded is not needed if we only shuffle the sections
(and will ICE)
> > +      /* It would be cool to produce .o file directly, but our current
> > +	 simple objects does not contain the lto symbol markers.  Go the slow
> > +	 way through the asm file.  */
> 
> We should get away from the symbol markers and instead rely on section
> names.  Not in this patch of course.

Yes, we need to get simple-object interface somehow working here.  The symbols
markers are documented by the LTO specification.  I do not mind that much of changing
it. 
For your debug work, I think simple-object will need quite some work to output
dwarf anyway.  Perhaps something that can be done as part of SoC?
> 
> > +      lang_hooks.lto.begin_section = lhd_begin_section;
> > +      lang_hooks.lto.append_data = lhd_append_data;
> > +      lang_hooks.lto.end_section = lhd_end_section;
> > +      if (flag_ltrans)
> > +	error ("-flinker-output=rel and -fltrans are mutually exclussive");
> > +      break;
> > +
> > +    case LTO_LINKER_OUTPUT_NOLTOREL: /* .o: incremental link producing asm  */
> > +      flag_whole_program = 0;
> > +      flag_incremental_link = 1;
> > +      break;
> > +
> > +    case LTO_LINKER_OUTPUT_DYN: /* .so: PID library */
> > +      /* On some targets, like i386 it makes sense to build PIC library wihout
> > +	 -fpic for performance reasons.  So no need to adjust flags.  */
> > +      break;
> > +
> > +    case LTO_LINKER_OUTPUT_PIE: /* PIE binary */
> > +      /* If -fPIC or -fPIE was used at compile time, be sure that
> > +         flag_pie is 2.  */
> > +      if (!flag_pie && flag_pic)
> > +	flag_pie = flag_pic;
> > +      flag_pic = 0;
> 
> The code doesn't seem to do what the comment says...

Hmm, indeed we want flag_pie = MAX (flag_pie, flag_pic)

> >  enum ld_plugin_status
> > @@ -1100,6 +1138,9 @@ onload (struct ld_plugin_tv *tv)
> >  	case LDPT_GOLD_VERSION:
> >  	  gold_version = p->tv_u.tv_val;
> >  	  break;
> > +	case LDPT_LINKER_OUTPUT:
> > +	  add_linker_output_option (p->tv_u.tv_val);
> > +	  break;
> >  	default:
> >  	  break;
> >  	}
> 
> I wonder what this does to old toolchains using the linker plugin
> with this change.  I suppose it will fail with an "unknown option"
> error.
> 
> Not sure what to do about this though given the plugin doesn't
> really know which GCC it is targeting.  An idea would be to
> spawn another enviroment from the driver like
> COLLECT_GCC_LTO_WRAPPER_VER=2 and only adding this option if
> that is present and >= 2?

I tough every GCC version ships its own linker plugin, so there should
be no conflicts?
> 
> I don't think LTO will work properly when you invoke ld directly
> as lto-wrapper expects COLLECT_GCC[_OPTIONS] to be set.

I will look into this incrementally. Linker plugin should be able to execute
GCC itself. We do not need to pass any options around and all we need to know
is where to find the wrapper.

Honza
> 
> Otherwise the patch looks straight-forward to me...
> 
> Thanks,
> Richard.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-25 15:45   ` H.J. Lu
@ 2015-11-25 19:21     ` Jan Hubicka
  2015-11-25 23:09       ` Jan Hubicka
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Hubicka @ 2015-11-25 19:21 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Biener, Jan Hubicka, GCC Patches, Andi Kleen,
	Cary Coutant, Ian Lance Taylor

> >
> > Your patch means that Andis/HJs work is no longer needed and we can
> > drop the section suffixes again?
> >
> >
> 
> There is a difference between "ld -r " and "gcc -r". "ld -r" may not
> perform any LTO.

Theoretically ld -r may look up for the linker plugin on it search path that will
in turn execute GCC to link the IL. It is not implemented though.
I am not proposing to drop the section based incremental linking code however.
In fact I never paid too much attention to incremental linking.  What are the main
use cases?  I know some build systems use it to reduce final linking time. What
are the other uses?

It is out of scope of this patch, but eventually it would be nice to teach LTO
optimizers to work incrementally: it is perfectly possible for optimizers to
do their execute methods just not do any decisions that need whole program (i.e.
inlining when size increases) and apply the changes to IL/re-earlyoptimize changed
functions and stream.

Honza
> 
> -- 
> H.J.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-25 19:21     ` Jan Hubicka
@ 2015-11-25 23:09       ` Jan Hubicka
  2015-11-25 23:56         ` Jan Hubicka
  2015-11-28 10:35         ` Tom de Vries
  0 siblings, 2 replies; 23+ messages in thread
From: Jan Hubicka @ 2015-11-25 23:09 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: H.J. Lu, Richard Biener, GCC Patches, Andi Kleen, Cary Coutant,
	Ian Lance Taylor

Hi,
this is the first part of patch that adds -flinker-output flags and gets symbol
visibility right.  It makes the testcase in the PR to pass, but I do not know
how to turn it into a testsuite ready version.
I remember there was other PRs related to incremental linking and symbol visibility,
I will try to find them and see if any of those can be turned into a testcase.

I think this may even be backportable to GCC 5 if it does not cause any new
issues.  Also the collect2 path can probably indeed be updated, it has info
about what type of binary it produces.  I will try to look into it incrementally.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

	PR lto/67548
	* lto-plugin.c (linker_output, linker_output_set): New statics.
	(all_symbols_read_handler): Add -flinker-output option.
	(onload): Record linker_output info.

	* ipa-visibility.c (cgraph_externally_visible_p,
	varpool_node::externally_visible_p): When doing incremental linking,
	hidden symbols may be still used later.
	(update_visibility_by_resolution_info): Do not drop weak during
	incremental link.
	(function_and_variable_visibility): Fix formating.
	* flag-types.h (lto_linker_output): Declare.
	* common.opt 9flag_incremental_link): New flag.

	* lto-lang.c (lto_post_options): Process flag_lto_linker_output.
	* lang.opt (lto_linker_output): New enum.
	(flinker_output): New flag.

Index: lto-plugin/lto-plugin.c
===================================================================
--- lto-plugin/lto-plugin.c	(revision 230897)
+++ lto-plugin/lto-plugin.c	(working copy)
@@ -167,6 +167,8 @@ static unsigned int num_pass_through_ite
 static char debug;
 static char nop;
 static char *resolution_file = NULL;
+static enum ld_plugin_output_file_type linker_output;
+static int linker_output_set;
 
 /* The version of gold being used, or -1 if not gold.  The number is
    MAJOR * 100 + MINOR.  */
@@ -624,8 +626,9 @@ all_symbols_read_handler (void)
 {
   unsigned i;
   unsigned num_lto_args
-    = num_claimed_files + num_offload_files + lto_wrapper_num_args + 1;
+    = num_claimed_files + num_offload_files + lto_wrapper_num_args + 2;
   char **lto_argv;
+  const char *linker_output_str;
   const char **lto_arg_ptr;
   if (num_claimed_files + num_offload_files == 0)
     return LDPS_OK;
@@ -648,6 +651,26 @@ all_symbols_read_handler (void)
   for (i = 0; i < lto_wrapper_num_args; i++)
     *lto_arg_ptr++ = lto_wrapper_argv[i];
 
+  assert (linker_output_set);
+  switch (linker_output)
+    {
+    case LDPO_REL:
+      linker_output_str = "-flinker-output=rel";
+      break;
+    case LDPO_DYN:
+      linker_output_str = "-flinker-output=dyn";
+      break;
+    case LDPO_PIE:
+      linker_output_str = "-flinker-output=pie";
+      break;
+    case LDPO_EXEC:
+      linker_output_str = "-flinker-output=exec";
+      break;
+    default:
+      message (LDPL_FATAL, "unsupported linker output %i", linker_output);
+      break;
+    }
+  *lto_arg_ptr++ = xstrdup (linker_output_str);
   for (i = 0; i < num_claimed_files; i++)
     {
       struct plugin_file_info *info = &claimed_files[i];
@@ -1100,6 +1123,10 @@ onload (struct ld_plugin_tv *tv)
 	case LDPT_GOLD_VERSION:
 	  gold_version = p->tv_u.tv_val;
 	  break;
+	case LDPT_LINKER_OUTPUT:
+	  linker_output = (enum ld_plugin_output_file_type) p->tv_u.tv_val;
+	  linker_output_set = 1;
+	  break;
 	default:
 	  break;
 	}
Index: gcc/lto/lto-lang.c
===================================================================
--- gcc/lto/lto-lang.c	(revision 230902)
+++ gcc/lto/lto-lang.c	(working copy)
@@ -819,6 +819,35 @@ lto_post_options (const char **pfilename
   if (flag_wpa)
     flag_generate_lto = 1;
 
+  /* Initialize the codegen flags according to the output type.  */
+  switch (flag_lto_linker_output)
+    {
+    case LTO_LINKER_OUTPUT_REL: /* .o: incremental link producing LTO IL  */
+      flag_whole_program = 0;
+      flag_incremental_link = 1;
+      break;
+
+    case LTO_LINKER_OUTPUT_DYN: /* .so: PID library */
+      /* On some targets, like i386 it makes sense to build PIC library wihout
+	 -fpic for performance reasons.  So no need to adjust flags.  */
+      break;
+
+    case LTO_LINKER_OUTPUT_PIE: /* PIE binary */
+      /* If -fPIC or -fPIE was used at compile time, be sure that
+         flag_pie is 2.  */
+      flag_pie = MAX (flag_pie, flag_pic);
+      flag_pic = 0;
+      break;
+
+    case LTO_LINKER_OUTPUT_EXEC: /* Normal executable */
+      flag_pic = 0;
+      flag_pie = 0;
+      break;
+
+    case LTO_LINKER_OUTPUT_UNKNOWN:
+      break;
+    }
+
   /* Excess precision other than "fast" requires front-end
      support.  */
   flag_excess_precision_cmdline = EXCESS_PRECISION_FAST;
Index: gcc/lto/lang.opt
===================================================================
--- gcc/lto/lang.opt	(revision 230902)
+++ gcc/lto/lang.opt	(working copy)
@@ -24,6 +24,29 @@
 Language
 LTO
 
+Enum
+Name(lto_linker_output) Type(enum lto_linker_output) UnknownError(unknown linker output %qs)
+
+EnumValue
+Enum(lto_linker_output) String(unknown) Value(LTO_LINKER_OUTPUT_UNKNOWN)
+
+EnumValue
+Enum(lto_linker_output) String(rel) Value(LTO_LINKER_OUTPUT_REL)
+
+EnumValue
+Enum(lto_linker_output) String(dyn) Value(LTO_LINKER_OUTPUT_DYN)
+
+EnumValue
+Enum(lto_linker_output) String(pie) Value(LTO_LINKER_OUTPUT_PIE)
+
+EnumValue
+Enum(lto_linker_output) String(exec) Value(LTO_LINKER_OUTPUT_EXEC)
+
+flinker-output=
+LTO Report Driver Joined RejectNegative Enum(lto_linker_output) Var(flag_lto_linker_output) Init(LTO_LINKER_OUTPUT_UNKNOWN)
+Set linker output type (used internally during LTO optimization)
+
+
 fltrans
 LTO Report Var(flag_ltrans)
 Run the link-time optimizer in local transformation (LTRANS) mode.
Index: gcc/flag-types.h
===================================================================
--- gcc/flag-types.h	(revision 230902)
+++ gcc/flag-types.h	(working copy)
@@ -265,6 +265,14 @@ enum lto_partition_model {
   LTO_PARTITION_MAX = 4
 };
 
+/* flag_lto_linker_output initialization values.  */
+enum lto_linker_output {
+  LTO_LINKER_OUTPUT_UNKNOWN,
+  LTO_LINKER_OUTPUT_REL,
+  LTO_LINKER_OUTPUT_DYN,
+  LTO_LINKER_OUTPUT_PIE,
+  LTO_LINKER_OUTPUT_EXEC
+};
 
 /* gfortran -finit-real= values.  */
 
Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 230902)
+++ gcc/common.opt	(working copy)
@@ -46,6 +46,12 @@ int optimize_fast
 Variable
 bool in_lto_p = false
 
+; This variable is set to non-0 only by LTO front-end.  1 indicates that
+; the output produced will be used for incrmeental linking (thus weak symbols
+; can still be bound).
+Variable
+int flag_incremental_link = 0
+
 ; 0 means straightforward implementation of complex divide acceptable.
 ; 1 means wide ranges of inputs must work for complex divide.
 ; 2 means C99-like requirements for complex multiply and divide.
Index: ipa-visibility.c
===================================================================
--- ipa-visibility.c	(revision 230902)
+++ ipa-visibility.c	(working copy)
@@ -217,13 +217,13 @@ cgraph_externally_visible_p (struct cgra
      This improves code quality and we know we will duplicate them at most twice
      (in the case that we are not using plugin and link with object file
       implementing same COMDAT)  */
-  if ((in_lto_p || whole_program)
+  if (((in_lto_p || whole_program) && !flag_incremental_link)
       && DECL_COMDAT (node->decl)
       && comdat_can_be_unshared_p (node))
     return false;
 
   /* When doing link time optimizations, hidden symbols become local.  */
-  if (in_lto_p
+  if ((in_lto_p && !flag_incremental_link)
       && (DECL_VISIBILITY (node->decl) == VISIBILITY_HIDDEN
 	  || DECL_VISIBILITY (node->decl) == VISIBILITY_INTERNAL)
       /* Be sure that node is defined in IR file, not in other object
@@ -293,13 +293,13 @@ varpool_node::externally_visible_p (void
      so this does not enable more optimization, but referring static var
      is faster for dynamic linking.  Also this match logic hidding vtables
      from LTO symbol tables.  */
-  if ((in_lto_p || flag_whole_program)
+  if (((in_lto_p || flag_whole_program) && !flag_incremental_link)
       && DECL_COMDAT (decl)
       && comdat_can_be_unshared_p (this))
     return false;
 
   /* When doing link time optimizations, hidden symbols become local.  */
-  if (in_lto_p
+  if (in_lto_p && !flag_incremental_link
       && (DECL_VISIBILITY (decl) == VISIBILITY_HIDDEN
 	  || DECL_VISIBILITY (decl) == VISIBILITY_INTERNAL)
       /* Be sure that node is defined in IR file, not in other object
@@ -405,17 +405,36 @@ update_visibility_by_resolution_info (sy
     for (symtab_node *next = node->same_comdat_group;
 	 next != node; next = next->same_comdat_group)
       {
-	next->set_comdat_group (NULL);
-	DECL_WEAK (next->decl) = false;
+	/* During incremental linking we need to keep symbol weak for future
+	   linking.  We can still drop definition if we know non-LTO world
+	   prevails.  */
+	if (!flag_incremental_link)
+	  {
+	    DECL_WEAK (next->decl) = false;
+	    next->set_comdat_group (NULL);
+	  }
 	if (next->externally_visible
 	    && !define)
-	  DECL_EXTERNAL (next->decl) = true;
+	  {
+	    DECL_EXTERNAL (next->decl) = true;
+	    next->set_comdat_group (NULL);
+	  }
       }
-  node->set_comdat_group (NULL);
-  DECL_WEAK (node->decl) = false;
+
+  /* During incremental linking we need to keep symbol weak for future
+     linking.  We can still drop definition if we know non-LTO world prevails.  */
+  if (!flag_incremental_link)
+    {
+      DECL_WEAK (node->decl) = false;
+      node->set_comdat_group (NULL);
+      node->dissolve_same_comdat_group_list ();
+    }
   if (!define)
-    DECL_EXTERNAL (node->decl) = true;
-  node->dissolve_same_comdat_group_list ();
+    {
+      DECL_EXTERNAL (node->decl) = true;
+      node->set_comdat_group (NULL);
+      node->dissolve_same_comdat_group_list ();
+    }
 }
 
 /* Decide on visibility of all symbols.  */
@@ -639,8 +658,9 @@ function_and_variable_visibility (bool w
 	{
 	  gcc_assert (in_lto_p || whole_program || !TREE_PUBLIC (vnode->decl));
 	  vnode->unique_name = ((vnode->resolution == LDPR_PREVAILING_DEF_IRONLY
-				       || vnode->resolution == LDPR_PREVAILING_DEF_IRONLY_EXP)
-				       && TREE_PUBLIC (vnode->decl));
+			         || vnode->resolution
+				      == LDPR_PREVAILING_DEF_IRONLY_EXP)
+			        && TREE_PUBLIC (vnode->decl));
 	  if (vnode->same_comdat_group && TREE_PUBLIC (vnode->decl))
 	    {
 	      symtab_node *next = vnode;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-25 23:09       ` Jan Hubicka
@ 2015-11-25 23:56         ` Jan Hubicka
  2015-11-28 10:35         ` Tom de Vries
  1 sibling, 0 replies; 23+ messages in thread
From: Jan Hubicka @ 2015-11-25 23:56 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: H.J. Lu, Richard Biener, GCC Patches, Andi Kleen, Cary Coutant,
	Ian Lance Taylor

Hi,
here is the patch that implement incremental LTO linking.  I will wait few days
for feedback.  gcc -r now does LTO IL linking only.  To force codegen, one can
use -Wl,-rnolto which I found no right place to document.  We may want -rnolto
flag uspported by GCC driver, so the testsuite can be updated to use -rnolto
whenever it uses -r currently and it won't fail on unrecognized option with
non-GNU linkers. I got lost in gcc.c and I do not know where -rdynamic is parsed
(I suppose at the same spot I can do -rnolto and turn it into -r -Wl,-rnolto on
plugin enabled setups and -r on others)

There are still few bugs to track. Most notably WPA will produce hidden symbols
with non-obstructated names which may conflict with later static linking.  I will
look into that separately and also try to find more testcases.

	* lto-streamer-out.c: Also copy sections when 
	flag_incremental_link == 2
	* flag-types.h (lto_partition_model): Add LTO_LINKER_OUTPUT_NOLTOREL.
	* common.opt (flag_incremental_link): Update docs.
	* passes.c (ipa_write_summaries): Only renumber statements when
	the body is really in memory.
	* lto-wrapper.c (run_gcc): Parse -flinker-output and turn into
	non-WPA mode at -flinker-output=rel.

	* lang.opt (lto_linker_output): New value noltorel.
	* lto-lang.c (lto_post_options): Handle LTO_LINKER_OUTPUT_REL
	and LTO_LINKER_OUTPUT_NOLTOREL.
	(lto_init): We also generate LTO during incremental_link.

	* toplev.c (compile_file): Cut compilation when doing incremental
	link.
	* cgraphunit.c (ipa_passes): Support incremental link.
	(symbol_table::compile): Likewise.
	* lto-cgraph.c (lto_output_node): Do not propagate resolution info
	when linking incrmentally.

	* lto-plugin.c: Document flags; add new flag -rnolto
	(non_claimed_files, rnolto): New statics.
	(all_symbols_read_handler): Decide on rel model.
	(claim_file_handler): Count non_claimed_files
	(process_option): Process rnolto.
Index: gcc/lto-streamer-out.c
===================================================================
--- gcc/lto-streamer-out.c	(revision 230915)
+++ gcc/lto-streamer-out.c	(working copy)
@@ -2286,13 +2286,16 @@ lto_output (void)
 		}
 	      decl_state = lto_new_out_decl_state ();
 	      lto_push_out_decl_state (decl_state);
-	      if (gimple_has_body_p (node->decl) || !flag_wpa
+	      if (gimple_has_body_p (node->decl)
 		  /* Thunks have no body but they may be synthetized
 		     at WPA time.  */
 		  || DECL_ARGUMENTS (node->decl))
 		output_function (node);
 	      else
-		copy_function_or_variable (node);
+		{
+		  gcc_checking_assert (flag_wpa || flag_incremental_link == 2);
+		  copy_function_or_variable (node);
+		}
 	      gcc_assert (lto_get_out_decl_state () == decl_state);
 	      lto_pop_out_decl_state ();
 	      lto_record_function_out_decl_state (node->decl, decl_state);
@@ -2318,7 +2321,7 @@ lto_output (void)
 	      decl_state = lto_new_out_decl_state ();
 	      lto_push_out_decl_state (decl_state);
 	      if (DECL_INITIAL (node->decl) != error_mark_node
-		  || !flag_wpa)
+		  || (!flag_wpa && flag_incremental_link != 2))
 		output_constructor (node);
 	      else
 		copy_function_or_variable (node);
Index: gcc/flag-types.h
===================================================================
--- gcc/flag-types.h	(revision 230915)
+++ gcc/flag-types.h	(working copy)
@@ -269,6 +269,7 @@ enum lto_partition_model {
 enum lto_linker_output {
   LTO_LINKER_OUTPUT_UNKNOWN,
   LTO_LINKER_OUTPUT_REL,
+  LTO_LINKER_OUTPUT_NOLTOREL,
   LTO_LINKER_OUTPUT_DYN,
   LTO_LINKER_OUTPUT_PIE,
   LTO_LINKER_OUTPUT_EXEC
Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 230915)
+++ gcc/common.opt	(working copy)
@@ -48,7 +48,8 @@ bool in_lto_p = false
 
 ; This variable is set to non-0 only by LTO front-end.  1 indicates that
 ; the output produced will be used for incrmeental linking (thus weak symbols
-; can still be bound).
+; can still be bound) and 2 indicates that the IL is going to be linked and
+; and output to LTO object file.
 Variable
 int flag_incremental_link = 0
 
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 230915)
+++ gcc/passes.c	(working copy)
@@ -2530,7 +2530,7 @@ ipa_write_summaries (void)
     {
       struct cgraph_node *node = order[i];
 
-      if (node->has_gimple_body_p ())
+      if (gimple_has_body_p (node->decl))
 	{
 	  /* When streaming out references to statements as part of some IPA
 	     pass summary, the statements need to have uids assigned and the
Index: gcc/lto-wrapper.c
===================================================================
--- gcc/lto-wrapper.c	(revision 230915)
+++ gcc/lto-wrapper.c	(working copy)
@@ -953,9 +953,15 @@ run_gcc (unsigned argc, char *argv[])
 	  file_offset = (off_t) loffset;
 	}
       fd = open (filename, O_RDONLY | O_BINARY);
+      /* Linker plugin passes -fresolution and -flinker-output options.  */
       if (fd == -1)
 	{
 	  lto_argv[lto_argc++] = argv[i];
+	  if (strcmp (argv[i], "-flinker-output=rel") == 0)
+	    {
+	       no_partition = true;
+	       lto_mode = LTO_MODE_LTO;
+	    }
 	  continue;
 	}
 
Index: gcc/lto/lang.opt
===================================================================
--- gcc/lto/lang.opt	(revision 230915)
+++ gcc/lto/lang.opt	(working copy)
@@ -34,6 +34,9 @@ EnumValue
 Enum(lto_linker_output) String(rel) Value(LTO_LINKER_OUTPUT_REL)
 
 EnumValue
+Enum(lto_linker_output) String(noltorel) Value(LTO_LINKER_OUTPUT_NOLTOREL)
+
+EnumValue
 Enum(lto_linker_output) String(dyn) Value(LTO_LINKER_OUTPUT_DYN)
 
 EnumValue
Index: gcc/lto/lto-lang.c
===================================================================
--- gcc/lto/lto-lang.c	(revision 230915)
+++ gcc/lto/lto-lang.c	(working copy)
@@ -823,6 +823,26 @@ lto_post_options (const char **pfilename
   switch (flag_lto_linker_output)
     {
     case LTO_LINKER_OUTPUT_REL: /* .o: incremental link producing LTO IL  */
+      /* Configure compiler same way as normal frontend would do with -flto:
+	 this way we read the trees (declarations & types), symbol table,
+	 optimization summaries and link them. Subsequently we output new LTO
+	 file.  */
+      flag_lto = "";
+      flag_incremental_link = 2;
+      flag_whole_program = 0;
+      flag_wpa = 0;
+      flag_generate_lto = 1;
+      /* It would be cool to produce .o file directly, but our current
+	 simple objects does not contain the lto symbol markers.  Go the slow
+	 way through the asm file.  */
+      lang_hooks.lto.begin_section = lhd_begin_section;
+      lang_hooks.lto.append_data = lhd_append_data;
+      lang_hooks.lto.end_section = lhd_end_section;
+      if (flag_ltrans)
+	error ("-flinker-output=rel and -fltrans are mutually exclussive");
+      break;
+
+    case LTO_LINKER_OUTPUT_NOLTOREL: /* .o: incremental link producing asm  */
       flag_whole_program = 0;
       flag_incremental_link = 1;
       break;
@@ -1243,7 +1263,7 @@ lto_init (void)
   int i;
 
   /* We need to generate LTO if running in WPA mode.  */
-  flag_generate_lto = (flag_wpa != NULL);
+  flag_generate_lto = (flag_incremental_link == 2 || flag_wpa != NULL);
 
   /* Create the basic integer types.  */
   build_common_tree_nodes (flag_signed_char, flag_short_double);
Index: gcc/toplev.c
===================================================================
--- gcc/toplev.c	(revision 230915)
+++ gcc/toplev.c	(working copy)
@@ -504,7 +504,8 @@ compile_file (void)
 
   /* Compilation unit is finalized.  When producing non-fat LTO object, we are
      basically finished.  */
-  if (in_lto_p || !flag_lto || flag_fat_lto_objects)
+  if ((in_lto_p && flag_incremental_link != 2)
+      || !flag_lto || flag_fat_lto_objects)
     {
       /* File-scope initialization for AddressSanitizer.  */
       if (flag_sanitize & SANITIZE_ADDRESS)
Index: gcc/cgraphunit.c
===================================================================
--- gcc/cgraphunit.c	(revision 230915)
+++ gcc/cgraphunit.c	(working copy)
@@ -2270,8 +2270,10 @@ ipa_passes (void)
   if (flag_generate_lto || flag_generate_offload)
     targetm.asm_out.lto_start ();
 
-  if (!in_lto_p)
+  if (!in_lto_p || flag_incremental_link == 2)
     {
+      if (!quiet_flag)
+	fprintf (stderr, "Streaming LTO\n");
       if (g->have_offload)
 	{
 	  section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
@@ -2290,7 +2292,9 @@ ipa_passes (void)
   if (flag_generate_lto || flag_generate_offload)
     targetm.asm_out.lto_end ();
 
-  if (!flag_ltrans && (in_lto_p || !flag_lto || flag_fat_lto_objects))
+  if (!flag_ltrans
+      && ((in_lto_p && flag_incremental_link != 2)
+	  || !flag_lto || flag_fat_lto_objects))
     execute_ipa_pass_list (passes->all_regular_ipa_passes);
   invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_END, NULL);
 
@@ -2381,7 +2385,8 @@ symbol_table::compile (void)
 
   /* Do nothing else if any IPA pass found errors or if we are just streaming LTO.  */
   if (seen_error ()
-      || (!in_lto_p && flag_lto && !flag_fat_lto_objects))
+      || ((!in_lto_p || flag_incremental_link == 2)
+	  && flag_lto && !flag_fat_lto_objects))
     {
       timevar_pop (TV_CGRAPHOPT);
       return;
Index: gcc/lto-cgraph.c
===================================================================
--- gcc/lto-cgraph.c	(revision 230915)
+++ gcc/lto-cgraph.c	(working copy)
@@ -534,7 +534,10 @@ lto_output_node (struct lto_simple_outpu
   bp_pack_value (&bp, node->thunk.thunk_p, 1);
   bp_pack_value (&bp, node->parallelized_function, 1);
   bp_pack_enum (&bp, ld_plugin_symbol_resolution,
-	        LDPR_NUM_KNOWN, node->resolution);
+	        LDPR_NUM_KNOWN,
+		/* When doing incremental link, we will get new resolution
+		   info next time we process the file.  */
+		flag_incremental_link ? LDPR_UNKNOWN : node->resolution);
   bp_pack_value (&bp, node->instrumentation_clone, 1);
   bp_pack_value (&bp, node->split_part, 1);
   streamer_write_bitpack (&bp);
Index: lto-plugin/lto-plugin.c
===================================================================
--- lto-plugin/lto-plugin.c	(revision 230915)
+++ lto-plugin/lto-plugin.c	(working copy)
@@ -27,10 +27,13 @@ along with this program; see the file CO
    More information at http://gcc.gnu.org/wiki/whopr/driver.
 
    This plugin should be passed the lto-wrapper options and will forward them.
-   It also has 2 options of its own:
+   It also has options at his own:
    -debug: Print the command line used to run lto-wrapper.
    -nop: Instead of running lto-wrapper, pass the original to the plugin. This
-   only works if the input files are hybrid.  */
+   only works if the input files are hybrid. 
+   -rnolto: When doing incremental linking, turn the result into actual binary
+   -sym-style={none,win32,underscore|uscore}
+   -pass-through  */
 
 #ifdef HAVE_CONFIG_H
 #include "config.h"
@@ -151,6 +154,7 @@ static ld_plugin_add_symbols add_symbols
 
 static struct plugin_file_info *claimed_files = NULL;
 static unsigned int num_claimed_files = 0;
+static unsigned int non_claimed_files = 0;
 
 static struct plugin_file_info *offload_files = NULL;
 static unsigned int num_offload_files = 0;
@@ -169,6 +173,7 @@ static char nop;
 static char *resolution_file = NULL;
 static enum ld_plugin_output_file_type linker_output;
 static int linker_output_set;
+static int rnolto;
 
 /* The version of gold being used, or -1 if not gold.  The number is
    MAJOR * 100 + MINOR.  */
@@ -655,7 +660,17 @@ all_symbols_read_handler (void)
   switch (linker_output)
     {
     case LDPO_REL:
-      linker_output_str = "-flinker-output=rel";
+      if (non_claimed_files)
+	{
+	  rnolto = 1;
+	  message (LDPL_WARNING, "incremental linking of LTO and non-LTO "
+		   "objects will produce final assembly for LTO objects and "
+		   "bypass whole program optimization");
+	}
+      if (rnolto)
+	linker_output_str = "-flinker-output=nonltorel";
+      else
+        linker_output_str = "-flinker-output=rel";
       break;
     case LDPO_DYN:
       linker_output_str = "-flinker-output=dyn";
@@ -1008,6 +1023,8 @@ claim_file_handler (const struct ld_plug
 		  num_claimed_files * sizeof (struct plugin_file_info));
       claimed_files[num_claimed_files - 1] = lto_file;
     }
+  else
+    non_claimed_files++;
 
   if (obj.found == 0 && obj.offload == 1)
     {
@@ -1037,6 +1054,8 @@ claim_file_handler (const struct ld_plug
 static void
 process_option (const char *option)
 {
+  if (strcmp (option, "-rnolto") == 0)
+    rnolto = 1;
   if (strcmp (option, "-debug") == 0)
     debug = 1;
   else if (strcmp (option, "-nop") == 0)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-25 11:19 ` Richard Biener
  2015-11-25 15:45   ` H.J. Lu
  2015-11-25 18:54   ` Jan Hubicka
@ 2015-11-25 23:59   ` Andi Kleen
  2 siblings, 0 replies; 23+ messages in thread
From: Andi Kleen @ 2015-11-25 23:59 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, gcc-patches, hongjiu.lu, ccoutant, iant

> Your patch means that Andis/HJs work is no longer needed and we can
> drop the section suffixes again?

Doing that would break existing setups that do ld -r instead of gcc -r
Maybe longer term.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-25  9:04 [RFC] Getting LTO incremental linking work Jan Hubicka
  2015-11-25 11:19 ` Richard Biener
@ 2015-11-26  0:24 ` Andi Kleen
  2015-11-26  0:54   ` Jan Hubicka
  2016-03-16 17:33 ` H.J. Lu
  2 siblings, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2015-11-26  0:24 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, rguenther, hongjiu.lu, ccoutant, iant

> Moreover we do have all infrastructure ready to implement 3).  Our tree merging
> and symbol table handling is fuly incremental and I think made a patch to 
> implement it today.   The scheme is easy:

What happens when .S (assembler) files are part of the incremential object?
The kernel does that. Your patch would do the final generation in this case,
right?

In theory we could change the build system to avoid that case though, but
it would need some changes.

It would be better if that could be handled somehow.

-Andi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-26  0:24 ` Andi Kleen
@ 2015-11-26  0:54   ` Jan Hubicka
  2015-11-26  1:55     ` Andi Kleen
  2015-11-26 10:33     ` Richard Biener
  0 siblings, 2 replies; 23+ messages in thread
From: Jan Hubicka @ 2015-11-26  0:54 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jan Hubicka, gcc-patches, rguenther, hongjiu.lu, ccoutant, iant

> > Moreover we do have all infrastructure ready to implement 3).  Our tree merging
> > and symbol table handling is fuly incremental and I think made a patch to 
> > implement it today.   The scheme is easy:
> 
> What happens when .S (assembler) files are part of the incremential object?
> The kernel does that. Your patch would do the final generation in this case,
> right?

Yes, it will spit out warning (which can be silenced -Wl,-rnolto is used) and turn
the whole object into non-LTO one.
> 
> In theory we could change the build system to avoid that case though, but
> it would need some changes.
> 
> It would be better if that could be handled somehow.

How does this work with your patchset?  Ideally we should have way to claim
only portions of object files, but we don't have that. If we claim the file,
the symbols in real symbol table are not visible.

I suppose we could play a games here with slim LTO: claim the file, see if
there are any symbols defined in the non-LTO symbol table and if so, interpret
read the symbol table and tell linker about the symbols and at the very end
include the offending object file in the list of objects returned back to
linker.

The linker then should take the symbols it wants.  There would be some fun
involved, because the resolution info we get will consider the symbols
defined in that object file to be IR which would need to be compensated for.

Honza

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-26  0:54   ` Jan Hubicka
@ 2015-11-26  1:55     ` Andi Kleen
  2015-11-26  2:02       ` Jan Hubicka
  2015-11-26 10:33     ` Richard Biener
  1 sibling, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2015-11-26  1:55 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, rguenther, hongjiu.lu, ccoutant, iant

> > In theory we could change the build system to avoid that case though, but
> > it would need some changes.
> > 
> > It would be better if that could be handled somehow.
> 
> How does this work with your patchset?  Ideally we should have way to claim
> only portions of object files, but we don't have that. If we claim the file,
> the symbols in real symbol table are not visible.

It works with HJ's Linux binutils. It handles LTO and non LTO separately.

> I suppose we could play a games here with slim LTO: claim the file, see if
> there are any symbols defined in the non-LTO symbol table and if so, interpret
> read the symbol table and tell linker about the symbols and at the very end
> include the offending object file in the list of objects returned back to
> linker.
> 
> The linker then should take the symbols it wants.  There would be some fun
> involved, because the resolution info we get will consider the symbols
> defined in that object file to be IR which would need to be compensated for.

Yes something like that would be needed.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-26  1:55     ` Andi Kleen
@ 2015-11-26  2:02       ` Jan Hubicka
  2015-11-26  2:12         ` Andi Kleen
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Hubicka @ 2015-11-26  2:02 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jan Hubicka, gcc-patches, rguenther, hongjiu.lu, ccoutant, iant

> > > In theory we could change the build system to avoid that case though, but
> > > it would need some changes.
> > > 
> > > It would be better if that could be handled somehow.
> > 
> > How does this work with your patchset?  Ideally we should have way to claim
> > only portions of object files, but we don't have that. If we claim the file,
> > the symbols in real symbol table are not visible.
> 
> It works with HJ's Linux binutils. It handles LTO and non LTO separately.
> 
> > I suppose we could play a games here with slim LTO: claim the file, see if
> > there are any symbols defined in the non-LTO symbol table and if so, interpret
> > read the symbol table and tell linker about the symbols and at the very end
> > include the offending object file in the list of objects returned back to
> > linker.
> > 
> > The linker then should take the symbols it wants.  There would be some fun
> > involved, because the resolution info we get will consider the symbols
> > defined in that object file to be IR which would need to be compensated for.
> 
> Yes something like that would be needed.

Actually I think it is harder than that, because we need to strip LTO data
from the object files, so we do not end up with duplicated LTO if the object
file was already having both LTO and non-LTO stuff in it.

I am not sure we can/want to implement this w/o some sort of support from 
plugin side. It would basically mean doing another incremnetal linker in the
plugin.

How does HJ's binutils work for fat LTO?

Honza

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-26  2:02       ` Jan Hubicka
@ 2015-11-26  2:12         ` Andi Kleen
  2015-11-26  6:33           ` Jan Hubicka
  0 siblings, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2015-11-26  2:12 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, rguenther, hongjiu.lu, ccoutant, iant

On Thu, Nov 26, 2015 at 02:55:04AM +0100, Jan Hubicka wrote:
> > 
> > > I suppose we could play a games here with slim LTO: claim the file, see if
> > > there are any symbols defined in the non-LTO symbol table and if so, interpret
> > > read the symbol table and tell linker about the symbols and at the very end
> > > include the offending object file in the list of objects returned back to
> > > linker.
> > > 
> > > The linker then should take the symbols it wants.  There would be some fun
> > > involved, because the resolution info we get will consider the symbols
> > > defined in that object file to be IR which would need to be compensated for.
> > 
> > Yes something like that would be needed.
> 
> Actually I think it is harder than that, because we need to strip LTO data
> from the object files, so we do not end up with duplicated LTO if the object
> file was already having both LTO and non-LTO stuff in it.

When I started with LTO I was looking into that, and that is why I originally
implemented slim LTO as a first step. But then I realized that that just adding
the postfixes is much easier, after HJ proposed his linker based solution.

Anyways can stay with the special binutils for the kernel for now, but it's 
a bit of a pain for users to install them (user feedback is generally that 
this is the hardest part)

I'm a bit surprised that the programs you test (Firefox, LibreOffice etc.)
don't have .S files.

> 
> I am not sure we can/want to implement this w/o some sort of support from 
> plugin side. It would basically mean doing another incremnetal linker in the
> plugin.
> 
> How does HJ's binutils work for fat LTO?

I believe it works too (pretty sure I tested it at some point)

Here's the original design spec

https://sourceware.org/ml/binutils/2011-04/msg00404.html


-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-26  2:12         ` Andi Kleen
@ 2015-11-26  6:33           ` Jan Hubicka
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Hubicka @ 2015-11-26  6:33 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jan Hubicka, gcc-patches, rguenther, hongjiu.lu, ccoutant, iant

> > Actually I think it is harder than that, because we need to strip LTO data
> > from the object files, so we do not end up with duplicated LTO if the object
> > file was already having both LTO and non-LTO stuff in it.
> 
> When I started with LTO I was looking into that, and that is why I originally
> implemented slim LTO as a first step. But then I realized that that just adding
> the postfixes is much easier, after HJ proposed his linker based solution.
> 
> Anyways can stay with the special binutils for the kernel for now, but it's 
> a bit of a pain for users to install them (user feedback is generally that 
> this is the hardest part)
> 
> I'm a bit surprised that the programs you test (Firefox, LibreOffice etc.)
> don't have .S files.

They don't do incermental linking. They build static libraries that works just fine.
Indeed it would be nice to have things working in general.
> 
> > 
> > I am not sure we can/want to implement this w/o some sort of support from 
> > plugin side. It would basically mean doing another incremnetal linker in the
> > plugin.
> > 
> > How does HJ's binutils work for fat LTO?
> 
> I believe it works too (pretty sure I tested it at some point)
> 
> Here's the original design spec
> 
> https://sourceware.org/ml/binutils/2011-04/msg00404.html
> 
Yep, i saw this a while ago, but forgot how to find it.  Thanks!
Now I remember that HJ's binutils has IR objects (which are slim or fat
LTO) and mixed objects which are essentially two objects together.

I suppose the IR linking I impleemnted should work just fine with HJ's
approach and we could make lto-plugin to skip the path switching to
early codegen.  Over the current HJ's implementation the advantage is
that you will get faster WPA at kernel link time.

It would be nice to arrive with a solution for mainline bintutils.

Honza

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-25 18:54   ` Jan Hubicka
@ 2015-11-26 10:15     ` Richard Biener
  2015-11-26 20:30       ` Jan Hubicka
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Biener @ 2015-11-26 10:15 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, ak, hongjiu.lu, ccoutant, iant

On Wed, 25 Nov 2015, Jan Hubicka wrote:

> > > 
> > >  1) linker plugin is modified to pass -flinker-output to lto wrapper
> > >     linker-output is either dyn (.so), pie or exec
> > >     for incremental linking I added .rel for 3) and noltorel for 1)
> > > 
> > >     currently it does rel because 3) (nor 2) can not be done when incremnetal
> > >     linking is done on both LTO and non-LTO objects.
> > 
> > That's because the result would be a "fat" object where both pieces
> > would be needed.  Btw, I wonder why you are not running into the
> 
> Yep, we woud end up with both LTO and non-LTO in one object and because
> we have no way to claim just part of it in next linking, the non-LTO will
> be ignored (just as is the case with far objects)
> 
> > same issues as me when producing linker plugin output (the "merged"
> > LTO IL) that is LTO IL.  Ah, possibly because the link is incremental,
> > and thus all special-handling of LTO sections is disabled.
> 
> Yep, i just throw in the LTO IL and linker passes it through .
> > 
> > >     In this case linker
> > >     plugin output warings about code quality loss and switch to
> > >     noltorel.
> > >  2) with -flinker-ouptut the lto wrapper behaves same way as with
> > >     -flto-partition=none.
> > >  3) lto frontend parses -flinker-output and sets our internal flags accordingly.
> > >     I added new flag_incremental_linking to inform middle-end about the fact
> > >     that the output is going to be statically linked again.  This disables
> > >     the privatization of hidden symbols and if set to 2 it also triggers
> > >     the LTO IL streaming
> > 
> > I wonder why it behaves like -flto-partition=none in the case it does
> > not need to do LTO IL streaming (which I hope does LTO IL streaming
> > only?  or does this implement fat objects "correctly"?).  Can't
> 
> Yes, I do stream LTO il into assembler file, like normal -flto build would do
> for non-lto1 frontend.  So I produce one .s file that I need assembler to be
> called on.  By default lto-wrapper thinks we do WPA and it would look for list
> of ltrans partitions and execute ltranses that I do not want to happen.
> 
> Since no codegen is done we have no use for ltranses.  It would be nice to spit
> the .o file through simple-object interface.  Sadly we can't do that because
> simple-object won't put the LTO marker symbols in.  Something I want to track
> and drop assembler stage from LTO generaltion in general
> https://gcc.gnu.org/ml/gcc/2014-09/msg00340.html
> 
> Well, one case where WPA would help is production of fat-objects.  Currently
> it works (by compiling the LTO data into assembly again) but it is not done
> in parallel.  I suppose we could deal with this later - it is non-critical.
> My longer term plan is to make WPA parallelization independent of LTO - it
> makes sense when you build one large non-LTO object, too.
> 
> > we still parallelize the build via LTRANS and then incrementally
> > link the result (I suppose the linker will do that for us with the
> > linker plugin outputs already?)?
> > 
> > -flto-partition=none itself isn't more memory intensive than
> > WPA in these days, it's only about compile-time, correct?
> 
> It is.  Just by streaming everything in and out we "compress" the memory layout
> noticeably.  -flto-partition=one has smaller peak than -flto-partition=none.
> But again, here all this triggers with -ffat-objects only.
> > 
> > Your patch means that Andis/HJs work is no longer needed and we can
> > drop the section suffixes again?
> 
> Maybe. It is different implementation of same thing. They can be both used,
> though I suppose real incremental linking is better in longer term than
> section merging.
> > > 
> > > Does anyone see problems with this approach? I think this is easy enough 
> > > and fixes PR67548 so it may still get to mainline?
> > 
> > Yes, it would be a very nice feature to have indeed.
> > 
> > I don't see anything trying to change things with the collect2 path?
> 
> Hmm, with collect2 we don't even support static libraries, do we need to support
> incremental link?  I suppose collect2 can recognize -r and LTO objects and spawn
> the linker same way.
> > 
> > > I need to do more testing, but in general I think the implemntation is OK 
> > > as it is.  We need a way to force noltorel model for testsuite, as the
> > > new default will bypass codegen for all our -r -nostdlib testcases.
> > 
> > Maybe we can turn most of them to -shared?
> 
> Would that work on all targets? (i.e. mingw?).

We do have some testcases using -shared already, they require us to use
PIC flags though AFAICS.  -shared isn't 1:1 equivalent to -r -nostdlib...

> For testing purposes I suppose I will add a flag. It should also silence 
> the linker plugin warning about generating assembly early. -rno-lto 
> perhaps?

what about allowing -flinker-output=XXX at link time as a driver option
and avoiding to override it if already present?

> > >        struct cgraph_node *node = order[i];
> > >  
> > > -      if (node->has_gimple_body_p ())
> > > +      if (gimple_has_body_p (node->decl))
> > 
> > ?
> 
> node->has_gimple_body_p returns true for if gimple body is available, but not neccesarily
> read to memory (in WPA), while gimple_has_body_p returns true only when body is in memory.
> The statement renumbering which is guarded is not needed if we only shuffle the sections
> (and will ICE)
> > > +      /* It would be cool to produce .o file directly, but our current
> > > +	 simple objects does not contain the lto symbol markers.  Go the slow
> > > +	 way through the asm file.  */
> > 
> > We should get away from the symbol markers and instead rely on section
> > names.  Not in this patch of course.
> 
> Yes, we need to get simple-object interface somehow working here.  The symbols
> markers are documented by the LTO specification.  I do not mind that much of changing
> it. 
> For your debug work, I think simple-object will need quite some work to output
> dwarf anyway.  Perhaps something that can be done as part of SoC?

Well, my plan is still to not rely on simple-object but have binutils
fixed for the issues I encounter.

But yes, if we go the simple-object route we need to handle adding
symbols and parsing and rewriting relocations (ugh).  Basically
simple-object needs to handle full partial linking under the
constraint of all relocations being involed not needing resolving
but only (offset) rewriting.

Thus it needs a relocation representation and parsing and generating
code for them as well as the same for symbols.

> > 
> > > +      lang_hooks.lto.begin_section = lhd_begin_section;
> > > +      lang_hooks.lto.append_data = lhd_append_data;
> > > +      lang_hooks.lto.end_section = lhd_end_section;
> > > +      if (flag_ltrans)
> > > +	error ("-flinker-output=rel and -fltrans are mutually exclussive");
> > > +      break;
> > > +
> > > +    case LTO_LINKER_OUTPUT_NOLTOREL: /* .o: incremental link producing asm  */
> > > +      flag_whole_program = 0;
> > > +      flag_incremental_link = 1;
> > > +      break;
> > > +
> > > +    case LTO_LINKER_OUTPUT_DYN: /* .so: PID library */
> > > +      /* On some targets, like i386 it makes sense to build PIC library wihout
> > > +	 -fpic for performance reasons.  So no need to adjust flags.  */
> > > +      break;
> > > +
> > > +    case LTO_LINKER_OUTPUT_PIE: /* PIE binary */
> > > +      /* If -fPIC or -fPIE was used at compile time, be sure that
> > > +         flag_pie is 2.  */
> > > +      if (!flag_pie && flag_pic)
> > > +	flag_pie = flag_pic;
> > > +      flag_pic = 0;
> > 
> > The code doesn't seem to do what the comment says...
> 
> Hmm, indeed we want flag_pie = MAX (flag_pie, flag_pic)
> 
> > >  enum ld_plugin_status
> > > @@ -1100,6 +1138,9 @@ onload (struct ld_plugin_tv *tv)
> > >  	case LDPT_GOLD_VERSION:
> > >  	  gold_version = p->tv_u.tv_val;
> > >  	  break;
> > > +	case LDPT_LINKER_OUTPUT:
> > > +	  add_linker_output_option (p->tv_u.tv_val);
> > > +	  break;
> > >  	default:
> > >  	  break;
> > >  	}
> > 
> > I wonder what this does to old toolchains using the linker plugin
> > with this change.  I suppose it will fail with an "unknown option"
> > error.
> > 
> > Not sure what to do about this though given the plugin doesn't
> > really know which GCC it is targeting.  An idea would be to
> > spawn another enviroment from the driver like
> > COLLECT_GCC_LTO_WRAPPER_VER=2 and only adding this option if
> > that is present and >= 2?
> 
> I tough every GCC version ships its own linker plugin, so there should
> be no conflicts?

Well, "ships", yes.  But with plugin auto-loading in ld we end up
with a single lto-plugin.so file in the auto-load path and choosing
the "newest" is supposed to work with older GCC as well ...

Of course plugin auto-loading is used for ar and friends which
might not be affected here.  auto-loading isn't important for
ld itself (it won't work without using the GCC driver and the
GCC driver indeed explicitely loads its own plugin).

So maybe it's an on-issue...

Richard.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-26  0:54   ` Jan Hubicka
  2015-11-26  1:55     ` Andi Kleen
@ 2015-11-26 10:33     ` Richard Biener
  1 sibling, 0 replies; 23+ messages in thread
From: Richard Biener @ 2015-11-26 10:33 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Andi Kleen, gcc-patches, hongjiu.lu, ccoutant, iant

On Thu, 26 Nov 2015, Jan Hubicka wrote:

> > > Moreover we do have all infrastructure ready to implement 3).  Our tree merging
> > > and symbol table handling is fuly incremental and I think made a patch to 
> > > implement it today.   The scheme is easy:
> > 
> > What happens when .S (assembler) files are part of the incremential object?
> > The kernel does that. Your patch would do the final generation in this case,
> > right?
> 
> Yes, it will spit out warning (which can be silenced -Wl,-rnolto is used) and turn
> the whole object into non-LTO one.
> > 
> > In theory we could change the build system to avoid that case though, but
> > it would need some changes.
> > 
> > It would be better if that could be handled somehow.

The final output of the incremental link would need to be two objects,
one with the LTO IL and one with the incrementally linked non-LTO
objects.  The only way to make it "one" object is a static archive?
Or extend ELF to behave as a "container" for multiple sub-objects...

> How does this work with your patchset?  Ideally we should have way to claim
> only portions of object files, but we don't have that. If we claim the file,
> the symbols in real symbol table are not visible.
> 
> I suppose we could play a games here with slim LTO: claim the file, see if
> there are any symbols defined in the non-LTO symbol table and if so, interpret
> read the symbol table and tell linker about the symbols and at the very end
> include the offending object file in the list of objects returned back to
> linker.

This is what I was trying with early-LTO-debug btw... the slim object
also contains early debug sections which I don't "claim" and I feed
the objects back to the linker (as plugin output), expecting it to
drop the LTO IL and take the early debug sections...

> The linker then should take the symbols it wants.  There would be some fun
> involved, because the resolution info we get will consider the symbols
> defined in that object file to be IR which would need to be compensated for.

A sensible option might be to simply error on incrementally linking
slim-LTO with non-LTO objects.  For fat objects we could either
drop LTO or error as well.

Fixing this on the user (Makefile) side would be easiest.  But it
has to use two incrementally linked objects in this case of course
so it wouldn't be very transparent.

Richard.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-26 10:15     ` Richard Biener
@ 2015-11-26 20:30       ` Jan Hubicka
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Hubicka @ 2015-11-26 20:30 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, gcc-patches, ak, hongjiu.lu, ccoutant, iant

> 
> what about allowing -flinker-output=XXX at link time as a driver option
> and avoiding to override it if already present?

That sounds good.  I will implement that.
> 
> > > >        struct cgraph_node *node = order[i];
> > > >  
> > > > -      if (node->has_gimple_body_p ())
> > > > +      if (gimple_has_body_p (node->decl))
> > > 
> > > ?
> > 
> > node->has_gimple_body_p returns true for if gimple body is available, but not neccesarily
> > read to memory (in WPA), while gimple_has_body_p returns true only when body is in memory.
> > The statement renumbering which is guarded is not needed if we only shuffle the sections
> > (and will ICE)
> > > > +      /* It would be cool to produce .o file directly, but our current
> > > > +	 simple objects does not contain the lto symbol markers.  Go the slow
> > > > +	 way through the asm file.  */
> > > 
> > > We should get away from the symbol markers and instead rely on section
> > > names.  Not in this patch of course.
> > 
> > Yes, we need to get simple-object interface somehow working here.  The symbols
> > markers are documented by the LTO specification.  I do not mind that much of changing
> > it. 
> > For your debug work, I think simple-object will need quite some work to output
> > dwarf anyway.  Perhaps something that can be done as part of SoC?
> 
> Well, my plan is still to not rely on simple-object but have binutils
> fixed for the issues I encounter.

Well, I still hope we will be able to bypass the useless asm step with slim LTO.
For that we need simple object capable of outputting object file with those symbols & the
debug info once early debug lands.
> 
> But yes, if we go the simple-object route we need to handle adding
> symbols and parsing and rewriting relocations (ugh).  Basically
> simple-object needs to handle full partial linking under the
> constraint of all relocations being involed not needing resolving
> but only (offset) rewriting.

Yep, that sounds bit too involved.  See the ohter email about HJ's binutils 
> 
> Well, "ships", yes.  But with plugin auto-loading in ld we end up
> with a single lto-plugin.so file in the auto-load path and choosing
> the "newest" is supposed to work with older GCC as well ...
> 
> Of course plugin auto-loading is used for ar and friends which
> might not be affected here.  auto-loading isn't important for
> ld itself (it won't work without using the GCC driver and the
> GCC driver indeed explicitely loads its own plugin).
> 
> So maybe it's an on-issue...

Yep, I think the lto-plugin currently won't start GCC unless called from
driver that chose proper plugin. So hope we are safe here.

Honza
> 
> Richard.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-25 23:09       ` Jan Hubicka
  2015-11-25 23:56         ` Jan Hubicka
@ 2015-11-28 10:35         ` Tom de Vries
  2015-11-28 12:03           ` Tom de Vries
  1 sibling, 1 reply; 23+ messages in thread
From: Tom de Vries @ 2015-11-28 10:35 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: H.J. Lu, Richard Biener, GCC Patches, Andi Kleen, Cary Coutant,
	Ian Lance Taylor, Kirill Yukhin

On 26/11/15 00:07, Jan Hubicka wrote:
> (flinker_output): New flag.

Hi,

this seems to have cause a regression when using a compiler configured 
for offloading (giving ~1000 fails in libgomp testing).

For test-case libgomp.c/examples-4/array_sections-3.c, we enter run_gcc 
in lto-wrapper with args:
...
Breakpoint 1, run_gcc (argc=4, argv=0x7fffffffde68) at 
src/gcc-gomp-4_0-branch/gcc/lto-wrapper.c:897
897       char *list_option_full = NULL;
(gdb) p argv[0]
$8 = 0x7fffffffe104 "lto-wrapper"
(gdb) p argv[1]
$9 = 0x7fffffffe1af "-fresolution=array_sections-3.res"
(gdb) p argv[2]
$10 = 0x7fffffffe1d1 "-flinker-output=exec"
(gdb) p argv[3]
$11 = 0x7fffffffe1e6 "array_sections-3.o"
...

And here (cc-ing author of this bit) we decide that -flinker-output=exec 
is a file:
...
/* If object files contain offload sections, but do not contain LTO
    sections,
    then there is no need to perform a link-time recompilation, i.e.
    lto-wrapper is used only for a compilation of offload images. */
if (have_offload && !have_lto)
   {
     for (i = 1; i < argc; ++i)
       if (strncmp (argv[i], "-fresolution=",
		   sizeof ("-fresolution=") - 1))
	{
	  char *out_file;
	  /* Can be ".o" or ".so". */
	  char *ext = strrchr (argv[i], '.');
	  if (ext == NULL)
	    out_file = make_temp_file ("");
	  else
	    out_file = make_temp_file (ext);
	  /* The linker will delete the files we give it, so make
	     copies. */
	  copy_file (out_file, argv[i]);
	  printf ("%s\n", out_file);
	}
     goto finish;
   }
...

And try to copy it:
...
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff783d7e0 in feof () from /lib/libc.so.6
(gdb) bt
#0  0x00007ffff783d7e0 in feof () from /lib/libc.so.6
#1  0x0000000000406ff5 in copy_file (dest=0x71cdd0 "/tmp/ccL6HCCe", 
src=0x7fffffffe1d1 "-flinker-output=exec")
     at lto-wrapper.c:769
#2  0x00000000004080b7 in run_gcc (argc=4, argv=0x7fffffffde68) at 
gcc/lto-wrapper.c:1109
#3  0x0000000000409873 in main (argc=4, argv=0x7fffffffde68) at 
gcc/lto-wrapper.c:1396
...

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-28 10:35         ` Tom de Vries
@ 2015-11-28 12:03           ` Tom de Vries
  2015-11-28 16:05             ` Ilya Verbin
  0 siblings, 1 reply; 23+ messages in thread
From: Tom de Vries @ 2015-11-28 12:03 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: H.J. Lu, Richard Biener, GCC Patches, Andi Kleen, Cary Coutant,
	Ian Lance Taylor, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 2360 bytes --]

On 28/11/15 10:35, Tom de Vries wrote:
> On 26/11/15 00:07, Jan Hubicka wrote:
>> (flinker_output): New flag.
>
> Hi,
>
> this seems to have cause a regression when using a compiler configured
> for offloading (giving ~1000 fails in libgomp testing).
>
> For test-case libgomp.c/examples-4/array_sections-3.c, we enter run_gcc
> in lto-wrapper with args:
> ...
> Breakpoint 1, run_gcc (argc=4, argv=0x7fffffffde68) at
> src/gcc-gomp-4_0-branch/gcc/lto-wrapper.c:897
> 897       char *list_option_full = NULL;
> (gdb) p argv[0]
> $8 = 0x7fffffffe104 "lto-wrapper"
> (gdb) p argv[1]
> $9 = 0x7fffffffe1af "-fresolution=array_sections-3.res"
> (gdb) p argv[2]
> $10 = 0x7fffffffe1d1 "-flinker-output=exec"
> (gdb) p argv[3]
> $11 = 0x7fffffffe1e6 "array_sections-3.o"
> ...
>
> And here (cc-ing author of this bit) we decide that -flinker-output=exec
> is a file:
> ...
> /* If object files contain offload sections, but do not contain LTO
>     sections,
>     then there is no need to perform a link-time recompilation, i.e.
>     lto-wrapper is used only for a compilation of offload images. */
> if (have_offload && !have_lto)
>    {
>      for (i = 1; i < argc; ++i)
>        if (strncmp (argv[i], "-fresolution=",
>             sizeof ("-fresolution=") - 1))
>      {
>        char *out_file;
>        /* Can be ".o" or ".so". */
>        char *ext = strrchr (argv[i], '.');
>        if (ext == NULL)
>          out_file = make_temp_file ("");
>        else
>          out_file = make_temp_file (ext);
>        /* The linker will delete the files we give it, so make
>           copies. */
>        copy_file (out_file, argv[i]);
>        printf ("%s\n", out_file);
>      }
>      goto finish;
>    }
> ...
>
> And try to copy it:
> ...
> Program received signal SIGSEGV, Segmentation fault.
> 0x00007ffff783d7e0 in feof () from /lib/libc.so.6
> (gdb) bt
> #0  0x00007ffff783d7e0 in feof () from /lib/libc.so.6
> #1  0x0000000000406ff5 in copy_file (dest=0x71cdd0 "/tmp/ccL6HCCe",
> src=0x7fffffffe1d1 "-flinker-output=exec")
>      at lto-wrapper.c:769
> #2  0x00000000004080b7 in run_gcc (argc=4, argv=0x7fffffffde68) at
> gcc/lto-wrapper.c:1109
> #3  0x0000000000409873 in main (argc=4, argv=0x7fffffffde68) at
> gcc/lto-wrapper.c:1396
> ...
>

This patch fixes the failures. I'm not sure if this is the right or 
complete fix though.

Thanks,
- Tom




[-- Attachment #2: tmp.patch --]
[-- Type: text/x-patch, Size: 574 bytes --]

diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index b9ac535..e4772d1 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1096,7 +1096,10 @@ run_gcc (unsigned argc, char *argv[])
   if (have_offload && !have_lto)
     {
       for (i = 1; i < argc; ++i)
-	if (strncmp (argv[i], "-fresolution=", sizeof ("-fresolution=") - 1))
+	if (strncmp (argv[i], "-fresolution=",
+		     sizeof ("-fresolution=") - 1) != 0
+	    && strncmp (argv[i], "-flinker-output=",
+			sizeof ("-flinker-output=") - 1) != 0)
 	  {
 	    char *out_file;
 	    /* Can be ".o" or ".so".  */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-28 12:03           ` Tom de Vries
@ 2015-11-28 16:05             ` Ilya Verbin
  2015-11-28 17:41               ` Tom de Vries
  0 siblings, 1 reply; 23+ messages in thread
From: Ilya Verbin @ 2015-11-28 16:05 UTC (permalink / raw)
  To: Tom de Vries
  Cc: Jan Hubicka, H.J. Lu, Richard Biener, GCC Patches, Andi Kleen,
	Cary Coutant, Ian Lance Taylor, Kirill Yukhin

2015-11-28 14:01 GMT+03:00 Tom de Vries <Tom_deVries@mentor.com>:
> This patch fixes the failures. I'm not sure if this is the right or complete
> fix though.

I think it's ok, at least until we decide how to rework the offloading
stuff in lto-wrapper (see PR68463).

Thanks,
  -- Ilya

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-28 16:05             ` Ilya Verbin
@ 2015-11-28 17:41               ` Tom de Vries
  2015-11-29 21:15                 ` Jan Hubicka
  0 siblings, 1 reply; 23+ messages in thread
From: Tom de Vries @ 2015-11-28 17:41 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Jan Hubicka, H.J. Lu, Richard Biener, GCC Patches, Andi Kleen,
	Cary Coutant, Ian Lance Taylor, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 405 bytes --]

On 28/11/15 13:02, Ilya Verbin wrote:
> 2015-11-28 14:01 GMT+03:00 Tom de Vries <Tom_deVries@mentor.com>:
>> This patch fixes the failures. I'm not sure if this is the right or complete
>> fix though.
>
> I think it's ok, at least until we decide how to rework the offloading
> stuff in lto-wrapper (see PR68463).
>

Bootstrapped and reg-tested on x86_64.

Committed to trunk as attached.

Thanks,
- Tom


[-- Attachment #2: 0001-Handle-flinker-output-in-lto-wrapper.patch --]
[-- Type: text/x-patch, Size: 806 bytes --]

Handle flinker-output in lto-wrapper

2015-11-28  Tom de Vries  <tom@codesourcery.com>

	* lto-wrapper.c (run_gcc): Handle -flinker-output argument.

---
 gcc/lto-wrapper.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index b9ac535..e4772d1 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1096,7 +1096,10 @@ run_gcc (unsigned argc, char *argv[])
   if (have_offload && !have_lto)
     {
       for (i = 1; i < argc; ++i)
-	if (strncmp (argv[i], "-fresolution=", sizeof ("-fresolution=") - 1))
+	if (strncmp (argv[i], "-fresolution=",
+		     sizeof ("-fresolution=") - 1) != 0
+	    && strncmp (argv[i], "-flinker-output=",
+			sizeof ("-flinker-output=") - 1) != 0)
 	  {
 	    char *out_file;
 	    /* Can be ".o" or ".so".  */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-28 17:41               ` Tom de Vries
@ 2015-11-29 21:15                 ` Jan Hubicka
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Hubicka @ 2015-11-29 21:15 UTC (permalink / raw)
  To: Tom de Vries
  Cc: Ilya Verbin, Jan Hubicka, H.J. Lu, Richard Biener, GCC Patches,
	Andi Kleen, Cary Coutant, Ian Lance Taylor, Kirill Yukhin

> 
> 2015-11-28  Tom de Vries  <tom@codesourcery.com>
> 
> 	* lto-wrapper.c (run_gcc): Handle -flinker-output argument.

Thanks, this looks fine to me.  THe way options are handled in lto-wrapper
seems like a gross hack.  Hopefully we will manage to clean this up eventually.
What happens when I call one of input objects -fresolution=.o?

Honza
> 
> ---
>  gcc/lto-wrapper.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
> index b9ac535..e4772d1 100644
> --- a/gcc/lto-wrapper.c
> +++ b/gcc/lto-wrapper.c
> @@ -1096,7 +1096,10 @@ run_gcc (unsigned argc, char *argv[])
>    if (have_offload && !have_lto)
>      {
>        for (i = 1; i < argc; ++i)
> -	if (strncmp (argv[i], "-fresolution=", sizeof ("-fresolution=") - 1))
> +	if (strncmp (argv[i], "-fresolution=",
> +		     sizeof ("-fresolution=") - 1) != 0
> +	    && strncmp (argv[i], "-flinker-output=",
> +			sizeof ("-flinker-output=") - 1) != 0)
>  	  {
>  	    char *out_file;
>  	    /* Can be ".o" or ".so".  */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] Getting LTO incremental linking work
  2015-11-25  9:04 [RFC] Getting LTO incremental linking work Jan Hubicka
  2015-11-25 11:19 ` Richard Biener
  2015-11-26  0:24 ` Andi Kleen
@ 2016-03-16 17:33 ` H.J. Lu
  2 siblings, 0 replies; 23+ messages in thread
From: H.J. Lu @ 2016-03-16 17:33 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: GCC Patches, Richard Guenther, Andi Kleen, Lu, Hongjiu,
	Cary Coutant, Ian Lance Taylor

On Wed, Nov 25, 2015 at 12:59 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> PR 67548 is about LTO not supporting incremental linking.  I never really
> considered our current incremental linking very useful, because it triggers
> code generation at the incremental link time basically nullifying any
> benefits of whole program optimization and in fact I think it is harmful,
> because it sort of works and w/o any warning produce not very optimized code.
>

> --- gcc/lto/lto-lang.c  (revision 230847)
> +++ gcc/lto/lto-lang.c  (working copy)
> @@ -819,6 +819,56 @@ lto_post_options (const char **pfilename
>    if (flag_wpa)
>      flag_generate_lto = 1;
>
> +  /* Initialize the codegen flags according to the output type.  */
> +  switch (flag_lto_linker_output)
> +    {
> +    case LTO_LINKER_OUTPUT_REL: /* .o: incremental link producing LTO IL  */
> +      /* Configure compiler same way as normal frontend would do with -flto:
> +        this way we read the trees (declarations & types), symbol table,
> +        optimization summaries and link them. Subsequently we output new LTO
> +        file.  */
> +      flag_lto = "";
> +      flag_incremental_link = 2;
> +      flag_whole_program = 0;
> +      flag_wpa = 0;
> +      flag_generate_lto = 1;
> +      /* It would be cool to produce .o file directly, but our current
> +        simple objects does not contain the lto symbol markers.  Go the slow
> +        way through the asm file.  */
> +      lang_hooks.lto.begin_section = lhd_begin_section;
> +      lang_hooks.lto.append_data = lhd_append_data;
> +      lang_hooks.lto.end_section = lhd_end_section;
> +      if (flag_ltrans)
> +       error ("-flinker-output=rel and -fltrans are mutually exclussive");
> +      break;
> +
> +    case LTO_LINKER_OUTPUT_NOLTOREL: /* .o: incremental link producing asm  */
> +      flag_whole_program = 0;
> +      flag_incremental_link = 1;
> +      break;
> +
> +    case LTO_LINKER_OUTPUT_DYN: /* .so: PID library */
> +      /* On some targets, like i386 it makes sense to build PIC library wihout
> +        -fpic for performance reasons.  So no need to adjust flags.  */
> +      break;
> +
> +    case LTO_LINKER_OUTPUT_PIE: /* PIE binary */
> +      /* If -fPIC or -fPIE was used at compile time, be sure that
> +         flag_pie is 2.  */
> +      if (!flag_pie && flag_pic)
> +       flag_pie = flag_pic;
> +      flag_pic = 0;
          ^^^^^^^^^^^^^^^^ This is wrong since PIE implies PIC.

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70258

> +      break;
> +
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-03-16 17:33 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-25  9:04 [RFC] Getting LTO incremental linking work Jan Hubicka
2015-11-25 11:19 ` Richard Biener
2015-11-25 15:45   ` H.J. Lu
2015-11-25 19:21     ` Jan Hubicka
2015-11-25 23:09       ` Jan Hubicka
2015-11-25 23:56         ` Jan Hubicka
2015-11-28 10:35         ` Tom de Vries
2015-11-28 12:03           ` Tom de Vries
2015-11-28 16:05             ` Ilya Verbin
2015-11-28 17:41               ` Tom de Vries
2015-11-29 21:15                 ` Jan Hubicka
2015-11-25 18:54   ` Jan Hubicka
2015-11-26 10:15     ` Richard Biener
2015-11-26 20:30       ` Jan Hubicka
2015-11-25 23:59   ` Andi Kleen
2015-11-26  0:24 ` Andi Kleen
2015-11-26  0:54   ` Jan Hubicka
2015-11-26  1:55     ` Andi Kleen
2015-11-26  2:02       ` Jan Hubicka
2015-11-26  2:12         ` Andi Kleen
2015-11-26  6:33           ` Jan Hubicka
2015-11-26 10:33     ` Richard Biener
2016-03-16 17:33 ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).