[whopr] Design/implementation alternatives for the driver and WPA

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* [whopr] Design/implementation alternatives for the driver and WPA
@ 2008-06-03 16:46 Diego Novillo
  2008-06-04  2:27 ` Chris Lattner
                   ` (3 more replies)
  0 siblings, 4 replies; 61+ messages in thread
From: Diego Novillo @ 2008-06-03 16:46 UTC (permalink / raw)
  To: gcc
  Cc: Kenneth Zadeck, Jan Hubicka, Rafael Espindola, Ollie Wild, Robert Hundt

We've started working on the driver and WPA components for whopr.
These are some of our initial thoughts and implementation strategy.  I
have linked these to the WHOPR page as well.  I'm hoping we can
discuss these at the Summit BoF, so I'm posting them now to start the
discussion.

Robert, Ollie, Rafael, I hope I haven't mangled the originals too
much.  Feel free to edit the wiki pages to fix anything I missed.  I
am pasting a text version to this message to simplify replies.  The
originals are at:

Driver: http://gcc.gnu.org/wiki/whopr/driver
WPA: http://gcc.gnu.org/wiki/whopr/wpa

Thanks.  Diego.

====================================================================

= WPA Implementation =
This document outlines two approaches for implementing WPA and
discusses their pros and cons.  For a full description of WPA, see the
WHOPR design document.

== Cherry-Picking ==
Under this proposal, the WPA phase leaves its input files unmodified.
Its output is one optimization plan per input file.  LTRANS reads each
plan and its associated object file.  Then, following the plan's
instructions, it cherry-picks specific inlinable functions from other
object files.  This approach is roughly equivalent to the 1-to-1
mapping approach described in the WHOPR design document.

=== Implementation Plan: ===
 1. Disable deserialization of function bodies during WPA.
 1. Disable non-IPA_PASS optimizations during WPA.
 1. Add serialization/deserialization of inlining decisions.
 1. Modify LTRANS to cherry-pick function bodies from non-primary
files.  Until we are able to disentangle type/object dependencies,
this will likely require reading in all DECL's from those files.  Flag
non-primary functions and DECL's to prevent duplicate assembly output.
 1. Add LTRANS driver (so a single gcc invocation runs WPA followed by LTRANS).

=== Pros: ===
 1. No direct-to-ELF serialization!  That's one less feature to implement.
 1. No need to index/repackage DECL's.  We just load everything from
the cherry-picked files.
 1. Probably easier to implement than the repackaging scheme.

=== Cons: ===
 1. We'll probably need to implement repackaging later.  Several
parallel build tools, like distcc, are stateless on the remote side
and don't have access to locally-mounted network filesystems.  The
cherry-picking approach will require transmission of multiple object
files per LTRANS process invocation.  For example, if a.o uses inlined
functions from b.o, c.o, and d.o, all four files must be transmitted
to re-compile a.o.
 1. If we pursue repackaging later, LTRANS cherry-picking is throw-away code.

== Repackaging ==
Under this proposal, WPA repackages its input files.  Each output file
consists of the contents of a primary input file plus additional
DECL's and functions required for inlining.  ELF data is output
directly so that functions don't need to be deserialized.  LTRANS
reads each output file without reference to other files.  Initially,
only inlining will be supported.  Because inlining decisions can also
be made at the LTRANS phase, IPA serialization may be deferred to
phase 2.  This is roughly equivalent to the
many-to-1/many-to-many/1-to-many mappings approach described in the
WHOPR design document.

=== Implementation Plan: ===
 1. Disable deserialization of function bodies during WPA.
 1. Disable non-IPA_PASS optimizations during WPA.
 1. Add support for outputting ELF directly.
 1. Add support for identifying and serializing subsets of DECL's
based on the collection of functions being output.  This probably
means adding a DECL index to each serialized function body.
 1. Add LTRANS driver (so a single gcc invocation runs WPA followed by LTRANS).

=== Pros: ===
 1. Closer to the approach we'll probably use in production.  Will
more easily integrate into parallel build tools while limiting excess
network transmission.
 1. Initially, we don't need to implement IPA serialization.
Repackaging implicitly allows LTRANS to perform inlining decisions
that would not otherwise be available.

== Cons: ==
 1. Requires implementing direct-to-ELF serialization.
 1. Requires (at least partial) re-serialization of DECL's and
per-function DECL indexes.
 1. Probably harder to implement than cherry-picking.

====================================================================

= WHOPR Driver Design =

This document proposes a driver design for WHOPR based on
the linker.  Although this document focuses on gold, but a similar
approach can also be implemented in GNU ld.

== Design Philosophy ==
 * The implementation provides complete transparency. Developers
should be able to take advantage of LTO without having to modify
existing build systems and/or Makefiles, all that's needed is to add
an LTO option (-flto).

 * Transparency is achieved through tight integration with the linker.
 Ideally, the linker communicates with LTO via a shared library
(plugin), eliminating any dependencies between the source bases of
linker and LTO, but other callback methods are also possible.

 * For scalability, we expect that after IPA multiple backend
invocations may/will follow. The system should be flexible enough to
accommodate existing parallel build infrastructures.

 * Debugability - debugging IPA and post-IPA problems can be
complicated. The design offers ways to simplify the overall strategy.

=== Why in the Linker? ===
As of this writing, the pre-ld driver collect2 performs the LTO file
identification. However, this is sub-optimal. The benefits of driving
LTO from the linker are:

 * The linker performs full symbol resolution. Therefore, it will only
bring in objects that are necessary.  This can greatly reduce build
and library extraction times.

 * Several build systems use ld -r to build components and/or shared libraries.

 * The linker properly handles archives

 * The linker knows which functions and globals are externally
referenced. [[http://llvm.org/docs/LinkTimeOptimization.html|LLVM's
IPA]] page provides an extended example on why the integration in the
linker is necessary to perform precise dead function elimination. The
same chain of arguments holds for globals. LTO needs to know about
externally referenced symbols.

 * Less work - currently, collect2 needs to fork/exec 'nm' on every
input file to determine whether it contains IR, which is not optimal.
In the new scheme, the linker will search for a particular section
(note: for ELF files, the linker traverses the section table in all
cases to find the symbol table).

= Process Structure =
The WHOPR design document outlines three drivers, LGEN (front-end
driver), WPA (actual IPA), and LTRANS (backend / code generation).
This section desribes how they call each other.

=== Front End: LGEN ===
LGEN is the independent FE driver, which produces files containing IR
and which can be invoked via any parallel build infrastructure.
Generation of IR is controlled by option {{{-flto}}}.

'''TODO''': Right now, LGEN puts a specifically named symbol in the
file to mark it as containing IR. This will change and a specifically
named section will be added instead.

=== Link: Collect2, gold, plugin ===
The link is either started with the gcc/g++ drivers (which call
collect2, which calls ld), or by calling ld (gold) directly. In the
gcc/g++ drivers ''and'' in collect2, files are still treated as
regular ELF files, nothing needs to be changed. This approach changes
the currently implemented strategy on the LTO branch. collect2
fork/exec's the linker.

The linker, upon start, examines a configuration file at a known
location relative to its own location. If this file exists, it
extracts the location of linker plugins (shared libraries) and loads
those.  A fixed set of function interfaces needs to be implemented in
the plugin, these functions are described below. One of many possible
plugins is a plugin that controls LTO.

Another way to locate a plugin would be via command-line.  This would
make it easier for two different compilers (and therefore two
different plugins) to use the same linker.

The linker performs regular symbol resolution. For each object file it
touches, it calls a specific function in the plugin (int
ldplugin_claim_file(const char *fname, size_t offset)). This
function returns 1 if it intends to claim a file (e.g. it contains
IR), and 0 if it doesn't.   The offset is used in the case of an
archive file. This way the plugin doesn't need to understand archives.

The linker marks each claimed file in its internal data structures and
continues with regular symbol resolution, until all references have
been resolved.

The linker also creates a list of all externally referenced symbols
and passes these to the plugin via the function
ldplugin_add_external_symbol(const char *mangled_name).

'''TODO''': Would it be better to pass an abstract object to
ldplugin_add_external_symbol? What should we pass to it if there are
two symbols in IL files with the same name?  One strong and one weak
for example.

At this point, the linker calls the main entry point to the pluging
(ldplugin_main(int argc, char *argv[]), passing its own arguments.
It's the plugin's responsibility to extract its related {{{-Wx,...}}}
values.

'''TODO''': Linker needs to understand these options. There will be a
single option 'letter' for all plugins, so plugins should be made
resilient against options they don't understand.

'''TODO''': How do we handle symbols defined in more then one file?
Should ldplugin_add_external_symbol take a abstract pointer/index into
the linker symbol table?

'''TODO''': What is passed to ldpluging_claim_file if the file is in a
.a file? '''TODO:'''Are we assuming that the files with IL contain a
normal symbol table? Should we make it possible for the plugin to call
back into the linker to add symbols? This should make it possible to
support a "full custom" file format for the IL files.

=== Plugin ===
The plugin munches the options passed to it. It already has a list of
all input object files containing IR, as well as a list of the
external references. Note, we could also pass in the list of all other
regular object files to it. Some of these files might be located in an
archive.

The plugin performs these actions:

 * It creates and manages a temporary directory for all intermediate files.

 * It manages the DEBUG facility. For example, to debug post-WPA
 problems, one needs the various outputs of WPA. In other words,
 intermediate files need to be kept. DEBUG should allow naming
 temporary directories, and control other DEBUG related behavior (e.g.
 dumping options).

 * It extracts IR object files from archives and places them in the
 tmp directory. This may be done via fork/'exec'ing 'ar x ...', or
 directly calling linker helper functions. To avoid name collision,
 every generated and/or copied object file gets a running serial
 number. This way, when two files or archives from different
 directories participate in a link, no further name collision will
 occur.

 * The plugin creates a REDO script which contains the exact command
 lines for the original link and WPA, as well as the environment as it
 was during the original build. The WPA command line contains all the
 options and the extracted IR files. REDO will also build an ld
 command line where archives are replaced with the extracted object
 files. This REDO script allows restarting WPA, and restarting the
 final link (with some magic). The redo script is essential for
 automatic triaging.

 * If automatic triaging is used to identify performance regressions,
 a subtle corner case may arise related to code layout. This will be
 addressed later.

 * The plugin constructs the command line for WPA (options + IR files)
and fork/exec's it.

 * The plugin "collects" resulting real object files and feeds them
back to the linker.

=== Inter-Procedural Optimization: WPA ===

WPA parses command-line and does its thing. It will generate 1..N post
IPA IR files for LTRANS. Depending on the model, the post-IPA IR files
don't need to have symbol table. Single post-IPA files or groups of
such files will be passed to LTRANS invocations. These invocations are
independent and can be parallelized. WPA will create a list containing
these file groups. For each group a list of specific command-line
options to LTRANS can be specified, as well as its designated output
file name, e.g.:

0.o base.a.threads.o  -O3
1.o base.a.walltime.o inline-candidate-1.o inline-candidate-2.o
2.o myapp.o 2.o

WPA calls the parallel "LTRANS magic", which, by default, is a script
in a default location, let's call it ltrans_ctrl. Command line options
should allow to specify alternative scripts. The location of the tmp
directory, the name of the control file, as well as all original
command line options to WPA are being passed to ltrans_ctrl. It is
ltrans_ctrl's role to support various existing build systems:

'''local build - parallel make'''

For local builds on multi-core machines, parallel make can be used
efficiently, as it already does process management. For this scenario,
  ltrans_ctrl may call a script ltrans_parallel_make, which

 * identifies the current platform (uname -a), finds and identifies 'make'

 * generates a Makefile

 * invokes make -s -j ''x'' -f Makefile

To customize LTO for a specific installation, ltrans_parallel_make can
be customized using the output from getconf _NPROCESSORS_ONLN to
specify parallelism ''x'' as a default, and to use an environment
variable to allow overriding. The generated Makefile might look like
this:

goal: 0.o 1.o 2.o

0.o: base.a.threads.o
   ltrans -O3 -o 0.o base.a.threads.o

1.0: base.a.walltime.o inline-candidate-1.o inline-candidate-2.o
  ltrans -o 1.o base.a.walltime.o inline-candidate-1.o inline-candidate-2.o

...

This mechanism works for regular make and gmake, for which only the
parameters need to change. There are issues that all generated file
must be visible on all build machines for the dependency mechanism to
work. This can usually be achieved by making sure the build happens on
NFS, or by introducing pseudo targets and remote copy operations in
the Makefile.

'''distributed build - distcc'''
TBD - but should be similar. The related files will be copied to build
server, ltrans will be invoked there, the resulting object file will
be copies back. As a matter of fact, if there was an LTRANS wrapper
script for that, the Makefile infrastructure could be reused. The
wrapper script would have to:

 * for a given target, select a build server.

 * generate unique temporary name and directory on build server

 * copy involved files to this location

 * secure shell invoke ltrans with proper parameters

 * scp back the resulting real .o

 * srm tmp directory.

== Final Link - ld ==
After all real object files have been generated, these files, along
with the rest of the originally passed real object files, need to be
passed to the linker. There are a few ways to do this:

 * Call a plugin / linker interface which allows to explicitly add
 files to the linker's internal data structures. '''TODO''': Unclear
 about the consequences for linker file/code generation.

 * Restart the linker with a new command line, where all original real
 objects and the objects are being passed in. There are subtle
 problems possible in terms of symbol resolution. Well - these
 problems are always there, unless a 1x1 mapping from pre- to post-IPA
 object files exist.

 * WPA could call the linker, it has all proper command line options,
 the plugin could do it, but only with difficulties, as WPA decides on
 the actual number and names of the final real .o files. The plugin
 could just pick up any object files it finds in the tmp directories,
 but this may introduce problems - in case of actual problems or
 debugging.

 * What about adding individual symbols via an API call? The linker
 will still be running during WPA. The plugin can collect the symbols
 and pass them back to the linker. With this it shouldn't be necessary
 to restart the linker. Final strategy to be determined.

== Cleanup ==
The plugin cleans up all temporary directories, unless directed not to.

== Plugin Interfaces ==
The plugin function entry points have C linkage. From linker to plugin:

// pass an object file name to the plugin.
// return 1: plugin can make use of file
// return 0: no use for plugin
int ldplugin_claim_file(const char *fname, size_t offset);

int ldplugin_claim_archive_file(const char *archive, const char *fname);

// pass external reference to plugin
void ldplugin_add_external_symbol(const char *mangled_name);

// call plugin's main entry point
// return 0 on success or an error code
int ldplugin_main(int argc, char *argv[]);

// finalize plugin's job, clean up
void ldplugin_cleanup();

Linker provided interfaces (from plugin to linker):

// query symbol attribute (pre-emptive, size, etc)
int ld_query_symbol_attribute(const char *symbol_name, enum ld_query query);

// after WPA, pass a real object file back to linker
void ld_add_object_file(const char *orig_object_fname, const char
*post_wpa_fname);

=== Issues ===
 * Question: How do things work in the linker if 10 files are on the
 original link line, but only 3 files come back? Can the linker be
 made to ignore the other files?

 * Question: The symbol attribute query needs to be refined - what are
 we going to query exactly?

 * Command line options: A FE file might be compiled with a special
 option, such as optimization level. Question: How is this information
 stored in the object file? How is the scenario handled where 2 IPA
 files are being compiled at different optimization levels. What does
 WPA do? There are many ways to do things - we need to decide on one.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-03 16:46 [whopr] Design/implementation alternatives for the driver and WPA Diego Novillo
@ 2008-06-04  2:27 ` Chris Lattner
  2008-06-04  7:28   ` Rafael Espindola
                     ` (3 more replies)
  2008-06-04 14:45 ` Ian Lance Taylor
                   ` (2 subsequent siblings)
  3 siblings, 4 replies; 61+ messages in thread
From: Chris Lattner @ 2008-06-04  2:27 UTC (permalink / raw)
  To: Diego Novillo
  Cc: gcc, Kenneth Zadeck, Jan Hubicka, Rafael Espindola, Ollie Wild,
	Robert Hundt

On Jun 3, 2008, at 9:45 AM, Diego Novillo wrote:
> We've started working on the driver and WPA components for whopr.
> These are some of our initial thoughts and implementation strategy.  I
> have linked these to the WHOPR page as well.  I'm hoping we can
> discuss these at the Summit BoF, so I'm posting them now to start the
> discussion.

This is a very interesting design, and a very nice evolution from the  
previous proposal.  I'm not completely clear on the difference between  
LTO and whopr here.  Is LTO the mode "normal people" will use, and  
whopr is the mode where "people with huge clusters" will use?  Will  
LTO/whopr support useful optimization on common multicore machines?

Some thoughts:

> == Repackaging ==
> Under this proposal, WPA repackages its input files.  Each output file
> consists of the contents of a primary input file plus additional
> DECL's and functions required for inlining.  ELF data is output
> directly so that functions don't need to be deserialized.  LTRANS
> reads each output file without reference to other files.  Initially,
> only inlining will be supported.  Because inlining decisions can also
> be made at the LTRANS phase, IPA serialization may be deferred to
> phase 2.  This is roughly equivalent to the
> many-to-1/many-to-many/1-to-many mappings approach described in the
> WHOPR design document.

Are you focusing on inlining here as a specific example, or is this  
the only planned IPA optimization that can use summaries?  It seems  
unfortunate to design a system where inlining is the only real IPO  
transformation you can do.  Does adding new interprocedural  
optimizations require adding whole new phases?

> = WHOPR Driver Design =
>
> This document proposes a driver design for WHOPR based on
> the linker.  Although this document focuses on gold, but a similar
> approach can also be implemented in GNU ld.

I'm glad you guys finally came around to this design, it is far more  
sane.

> == Design Philosophy ==
> * The implementation provides complete transparency. Developers
> should be able to take advantage of LTO without having to modify
> existing build systems and/or Makefiles, all that's needed is to add
> an LTO option (-flto).

Ok.  How do you handle merging of optimization info?   If I build  
one .o file with -Os and one with -O3 who wins or what does this  
mean?  If I build one with -ffast-math and one without, does the right  
thing happen?

Also, where does debug info (i.e. DWARF for -g) get stored?  I'm not  
talking about people debugging the compiler, I'm talking about people  
who want to build an executable with debug info.

> * Transparency is achieved through tight integration with the linker.
> Ideally, the linker communicates with LTO via a shared library
> (plugin), eliminating any dependencies between the source bases of
> linker and LTO, but other callback methods are also possible.

Excellent.  Is the FSF/RMS ok with the linker having plugins?  It  
seems strange (but great!) to allow plugins for the linker but not the  
compiler.

> === Why in the Linker? ===
> As of this writing, the pre-ld driver collect2 performs the LTO file
> identification. However, this is sub-optimal. The benefits of driving
> LTO from the linker are:
>
> * The linker performs full symbol resolution. Therefore, it will only
> bring in objects that are necessary.  This can greatly reduce build
> and library extraction times.

Yes, as we discussed before, this is tantamount to re-implementing  
system linkers and all their associated craziness.  This is a much  
better approach.

> The linker performs regular symbol resolution. For each object file it
> touches, it calls a specific function in the plugin (int
> ldplugin_claim_file(const char *fname, size_t offset)). This
> function returns 1 if it intends to claim a file (e.g. it contains
> IR), and 0 if it doesn't.   The offset is used in the case of an
> archive file. This way the plugin doesn't need to understand archives.

Is there a specific reason you don't use the LLVM LTO interface?  It  
seems to be roughly the same as your proposed interface:

a) it has a simple C interface like your proposed one
b) it is already implemented in one system linker (Apple's), so GCC  
would just provide its own linker plugin and it would work on apple  
platforms
c) it is richer than your interface
d) it is battle tested, and exists today
e) it is completely independent of llvm (by design)
f) it is fully documented: http://llvm.org/docs/LinkTimeOptimization.html

Is there something specific you don't like about the LLVM interface?

-Chris

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-04  2:27 ` Chris Lattner
@ 2008-06-04  7:28   ` Rafael Espindola
  2008-06-04 16:34     ` Chris Lattner
  2008-06-04 13:00   ` Diego Novillo
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 61+ messages in thread
From: Rafael Espindola @ 2008-06-04  7:28 UTC (permalink / raw)
  To: Chris Lattner
  Cc: Diego Novillo, gcc, Kenneth Zadeck, Jan Hubicka, Ollie Wild,
	Robert Hundt

> Is there a specific reason you don't use the LLVM LTO interface?  It seems
> to be roughly the same as your proposed interface:
>
> a) it has a simple C interface like your proposed one
> b) it is already implemented in one system linker (Apple's), so GCC would
> just provide its own linker plugin and it would work on apple platforms
> c) it is richer than your interface
> d) it is battle tested, and exists today
> e) it is completely independent of llvm (by design)
> f) it is fully documented: http://llvm.org/docs/LinkTimeOptimization.html
>
> Is there something specific you don't like about the LLVM interface?

We are still discussing how we are going to implement, so the API is
still not final. Some things that have been pointed out:

*) Plugins could have other uses and the naming used on the LLVM LTO
interface is LTO specific.
*) We have a normal symbol table on the .o files. It is not clear if
we should assume that it will always be the case. If so, we don't need
the API part that handles that.
*) How do you handle the case of multiple symbols with the same name
(say a weak and a strong one)? lto_codegen_add_must_preserve_symbol
has a char* argument. How does it know which symbol we are talking
about?
*) To save memory, one option is to have the plugin exec WPA and WPA
exec the linker again with the new objects. In this case the api
should be a bit different.

> -Chris
>

Cheers,
-- 
Rafael Avila de Espindola

Google Ireland Ltd.
Gordon House
Barrow Street
Dublin 4
Ireland

Registered in Dublin, Ireland
Registration Number: 368047

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-04  2:27 ` Chris Lattner
  2008-06-04  7:28   ` Rafael Espindola
@ 2008-06-04 13:00   ` Diego Novillo
  2008-06-04 15:28     ` Kenneth Zadeck
  2008-06-04 14:28   ` Ian Lance Taylor
       [not found]   ` <65dd6fd50806032310u2bda0953qb911e3ccfe3f305e@mail.gmail.com>
  3 siblings, 1 reply; 61+ messages in thread
From: Diego Novillo @ 2008-06-04 13:00 UTC (permalink / raw)
  To: Chris Lattner
  Cc: gcc, Kenneth Zadeck, Jan Hubicka, Rafael Espindola, Ollie Wild,
	Robert Hundt

On Tue, Jun 3, 2008 at 22:26, Chris Lattner <clattner@apple.com> wrote:

> and whopr here.  Is LTO the mode "normal people" will use, and whopr is the
> mode where "people with huge clusters" will use?  Will LTO/whopr support
> useful optimization on common multicore machines?

As Ollie said, WHOPR is just an extension on the LTO framework to
cater for scalability when building large applications.  As such, when
building large applications we expect not to be able to apply IPA
passes that rely on having the whole program callgraph and bodies
loaded in memory.

However, WHOPR does not limit IPA passes to summary-only.  That's why
you see the distinction between IPA_PASS and SIMPLE_IPA_PASS in the
pass manager.

> Are you focusing on inlining here as a specific example, or is this the only
> planned IPA optimization that can use summaries?  It seems unfortunate to

No.  Just the first pass that we are going to concentrate for the
initial implementation.

>> == Design Philosophy ==
>> * The implementation provides complete transparency. Developers
>> should be able to take advantage of LTO without having to modify
>> existing build systems and/or Makefiles, all that's needed is to add
>> an LTO option (-flto).
>
> Ok.  How do you handle merging of optimization info?   If I build one .o
> file with -Os and one with -O3 who wins or what does this mean?  If I build
> one with -ffast-math and one without, does the right thing happen?

Right now, mixed optimization flags will likely cause trouble.  We
have not really talked about this issue in detail.  I expect many/most
of these issues will be orthogonal to the driver, though.  We've
talked a bit about different ways of encoding the options into the IR,
but there is nothing concrete yet.  It's in my list of things to
discuss at the next BoF.

> Also, where does debug info (i.e. DWARF for -g) get stored?  I'm not talking
> about people debugging the compiler, I'm talking about people who want to
> build an executable with debug info.

In the .o file.  We are generating regular .o files (for now).

> Is there a specific reason you don't use the LLVM LTO interface?  It seems
> to be roughly the same as your proposed interface:

Not really.  This is mostly the first iteration.  Rafael and Robert
will be able to tell you much more about this.  I'm not directly
working on this aspect.

Diego.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-04  2:27 ` Chris Lattner
  2008-06-04  7:28   ` Rafael Espindola
  2008-06-04 13:00   ` Diego Novillo
@ 2008-06-04 14:28   ` Ian Lance Taylor
  2008-06-04 16:29     ` Chris Lattner
       [not found]   ` <65dd6fd50806032310u2bda0953qb911e3ccfe3f305e@mail.gmail.com>
  3 siblings, 1 reply; 61+ messages in thread
From: Ian Lance Taylor @ 2008-06-04 14:28 UTC (permalink / raw)
  To: Chris Lattner
  Cc: Diego Novillo, gcc, Kenneth Zadeck, Jan Hubicka,
	Rafael Espindola, Ollie Wild, Robert Hundt

Chris Lattner <clattner@apple.com> writes:

> Is there a specific reason you don't use the LLVM LTO interface?  It
> seems to be roughly the same as your proposed interface:
>
> a) it has a simple C interface like your proposed one
> b) it is already implemented in one system linker (Apple's), so GCC
> would just provide its own linker plugin and it would work on apple
> platforms
> c) it is richer than your interface
> d) it is battle tested, and exists today
> e) it is completely independent of llvm (by design)
> f) it is fully documented: http://llvm.org/docs/LinkTimeOptimization.html
>
> Is there something specific you don't like about the LLVM interface?

(I didn't design the proposed linker interface, and I'm not sure my
earlier comments were included in the proposal sent to the list.  I'm
going to reply to that next.)

When I look at the LLVM interface as described on that web page, I see
these issues, all fixable:

* No support for symbol versioning.
* The return value of lto_module_get_symbol_attributes is not
  defined.
* lto_codegen_set_debug_model and lto_codegen_set_pic_model appear to
  be underspecified--don't they need an additional parameter?
* Interfaces like lto_module_get_symbol_name and
  lto_codegen_add_must_preserve_symbol are inefficient when dealing
  with large symbol tables.

A more general problem is that the main reason I see to use a linker
plugin is to let the linker handle symbol resolution.  The LLVM
interface does not do that.  Suppose the linker is invoked on a
sequence of object files, some with with LTO information, some
without, all interspersed.  Suppose some symbols are defined in
multiple .o files, through the use of common symbols, weak symbols,
and/or section groups.  The LLVM interface simply passes each object
file to the plugin.  The result is that the plugin is required to do
symbol resolution itself.  This 1) loses one of the benefits of having
the linker around; 2) will yield incorrect results when some non-LTO
object is linked in between LTO objects but redefines some earlier
weak symbol.

Also, returning a single object file restricts the possibilities.  The
design of WHOPR, as I understand it, permits creating several
different object files in parallel based on a fast analysis of which
code should be compiled together.  When the linker supports concurrent
linking, it will be desirable to be able to provide it with each
object file as it is completed.

Ian

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-03 16:46 [whopr] Design/implementation alternatives for the driver and WPA Diego Novillo
  2008-06-04  2:27 ` Chris Lattner
@ 2008-06-04 14:45 ` Ian Lance Taylor
  2008-06-04 14:48   ` Diego Novillo
  2008-06-04 15:28   ` Rafael Espindola
  2008-06-04 16:31 ` Mark Mitchell
  2008-07-04  3:31 ` Cary Coutant
  3 siblings, 2 replies; 61+ messages in thread
From: Ian Lance Taylor @ 2008-06-04 14:45 UTC (permalink / raw)
  To: Diego Novillo
  Cc: gcc, Kenneth Zadeck, Jan Hubicka, Rafael Espindola, Ollie Wild,
	Robert Hundt

Diego Novillo <dnovillo@google.com> writes:

I have a feeling that the comments I wrote within Google about the
linker interface were lost.  I am going to try to recreate them here.

> The linker, upon start, examines a configuration file at a known
> location relative to its own location. If this file exists, it
> extracts the location of linker plugins (shared libraries) and loads
> those.  A fixed set of function interfaces needs to be implemented in
> the plugin, these functions are described below. One of many possible
> plugins is a plugin that controls LTO.
>
> Another way to locate a plugin would be via command-line.  This would
> make it easier for two different compilers (and therefore two
> different plugins) to use the same linker.

I think the plugin should always be specified on the command line, and
the linker should never search for it.  The plugin is inherently a
property of the compiler, not the linker.  We already expect that the
linker will always be invoked via the gcc driver program.  It is
trivial for the driver program to pass an option specifying the plugin
or the plugin directory.

> The linker performs regular symbol resolution. For each object file it
> touches, it calls a specific function in the plugin (int
> ldplugin_claim_file(const char *fname, size_t offset)). This
> function returns 1 if it intends to claim a file (e.g. it contains
> IR), and 0 if it doesn't.   The offset is used in the case of an
> archive file. This way the plugin doesn't need to understand archives.

There should be an interface to pass a pointer to the contents of the
file rather than the filename.  Otherwise each file has to be opened
twice, which is pointless.

> The linker also creates a list of all externally referenced symbols
> and passes these to the plugin via the function
> ldplugin_add_external_symbol(const char *mangled_name).
>
> '''TODO''': Would it be better to pass an abstract object to
> ldplugin_add_external_symbol? What should we pass to it if there are
> two symbols in IL files with the same name?  One strong and one weak
> for example.

"Externally referenced" is a bad term.  I think that is meant here is
"referenced by some part of the program which the plugin did not
claim".

There needs to be a way to specify the symbol version.

The interface should not require a separate function call for each
symbol.  This is inefficient.  Some executables have hundreds of
thousands of symbols.  There should be a way to pass a list of
symbols.

More seriously, this interface is much too simple.  In the general
case, for each input file, we need to specify the exact disposition of
each symbol.  If we don't provide a way for the linker to communicate
that to the plugin, then the plugin is forced to do symbol resolution
itself.  That is what we want to get away from.

My assumption is that the symbol table in an LTO object is fully
correct: it correctly reports weak symbols, section groups, etc.
Given that, the linker should be determining the symbol resolution.
For each defined symbol in the symbol table, the linker should say
whether that symbol should be included in the link.  For each
undefined symbol, the linker should say where the definition of that
symbol may be found--it could be in an LTO file or a non-LTO file.

> At this point, the linker calls the main entry point to the pluging
> (ldplugin_main(int argc, char *argv[]), passing its own arguments.
> It's the plugin's responsibility to extract its related {{{-Wx,...}}}
> values.

This does not make sense.  The linker options are complex and varied.
We do not want to require the plugin to understand how to parse them.
We need to define a different approach for sending options to the
plugin.

> '''TODO''': How do we handle symbols defined in more then one file?
> Should ldplugin_add_external_symbol take a abstract pointer/index into
> the linker symbol table?

Yes, this is required.

> '''TODO''': What is passed to ldpluging_claim_file if the file is in a
> .a file?

We should pass a buffer, not a file name.

> '''TODO:'''Are we assuming that the files with IL contain a
> normal symbol table? Should we make it possible for the plugin to call
> back into the linker to add symbols? This should make it possible to
> support a "full custom" file format for the IL files.

If the LTO files do not contain a normal symbol table, then the plugin
will have to provide one for the linker.  The symbol table provided by
the plugin will have to include symbol names and versions, weak
vs. strong, defined vs. common vs. undefined, symbol visibility,
symbol type, section group information.

> == Final Link - ld ==
> After all real object files have been generated, these files, along
> with the rest of the originally passed real object files, need to be
> passed to the linker. There are a few ways to do this:
>
>  * Call a plugin / linker interface which allows to explicitly add
>  files to the linker's internal data structures. '''TODO''': Unclear
>  about the consequences for linker file/code generation.
>
>  * Restart the linker with a new command line, where all original real
>  objects and the objects are being passed in. There are subtle
>  problems possible in terms of symbol resolution. Well - these
>  problems are always there, unless a 1x1 mapping from pre- to post-IPA
>  object files exist.
>
>  * WPA could call the linker, it has all proper command line options,
>  the plugin could do it, but only with difficulties, as WPA decides on
>  the actual number and names of the final real .o files. The plugin
>  could just pick up any object files it finds in the tmp directories,
>  but this may introduce problems - in case of actual problems or
>  debugging.
>
>  * What about adding individual symbols via an API call? The linker
>  will still be running during WPA. The plugin can collect the symbols
>  and pass them back to the linker. With this it shouldn't be necessary
>  to restart the linker. Final strategy to be determined.

Of these options, my preferences would be the first one or the last
one.  The linker must already do all the symbol resolution before
invoking the plugin.  We shouldn't go through that again.  Instead,
the plugin should pass the resulting object files back to the linker,
in one form or another.  The linker should not need to do any further
symbol resolution at that point.  The linker should have told the
plugin precisely which symbols it needs to defined, and the plugin
should define precisely those symbols.

Ian

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-04 14:45 ` Ian Lance Taylor
@ 2008-06-04 14:48   ` Diego Novillo
  2008-06-04 15:28   ` Rafael Espindola
  1 sibling, 0 replies; 61+ messages in thread
From: Diego Novillo @ 2008-06-04 14:48 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: gcc, Kenneth Zadeck, Jan Hubicka, Rafael Espindola, Ollie Wild,
	Robert Hundt

On Wed, Jun 4, 2008 at 10:44, Ian Lance Taylor <iant@google.com> wrote:

> I have a feeling that the comments I wrote within Google about the
> linker interface were lost.  I am going to try to recreate them here.

Sorry.  I should've been more careful when I transcribed it over.


Diego.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-04 14:45 ` Ian Lance Taylor
  2008-06-04 14:48   ` Diego Novillo
@ 2008-06-04 15:28   ` Rafael Espindola
  1 sibling, 0 replies; 61+ messages in thread
From: Rafael Espindola @ 2008-06-04 15:28 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Diego Novillo, gcc, Kenneth Zadeck, Jan Hubicka, Ollie Wild,
	Robert Hundt

2008/6/4 Ian Lance Taylor <iant@google.com>:
> Diego Novillo <dnovillo@google.com> writes:
>
> I have a feeling that the comments I wrote within Google about the
> linker interface were lost.  I am going to try to recreate them here.

I have added them to the gcc wiki.

I have also removed some of the TODOs that are now obsolete (passing
all of the liker options to the plugin, passing only the symbol name).

I created an abstract type ldplugin_symbol_t. We need to define some
inline functions that the plugin can use to extract data from it.

Thank a lot,
-- 
Rafael Avila de Espindola

Google Ireland Ltd.
Gordon House
Barrow Street
Dublin 4
Ireland

Registered in Dublin, Ireland
Registration Number: 368047

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and  WPA
  2008-06-04 13:00   ` Diego Novillo
@ 2008-06-04 15:28     ` Kenneth Zadeck
  2008-06-04 15:54       ` Ian Lance Taylor
  2008-06-04 16:15       ` Chris Lattner
  0 siblings, 2 replies; 61+ messages in thread
From: Kenneth Zadeck @ 2008-06-04 15:28 UTC (permalink / raw)
  To: Diego Novillo
  Cc: Chris Lattner, gcc, Jan Hubicka, Rafael Espindola, Ollie Wild,
	Robert Hundt

Diego Novillo wrote:
> On Tue, Jun 3, 2008 at 22:26, Chris Lattner <clattner@apple.com> wrote:
>
>   
>> and whopr here.  Is LTO the mode "normal people" will use, and whopr is the
>> mode where "people with huge clusters" will use?  Will LTO/whopr support
>> useful optimization on common multicore machines?
>>     
>
> As Ollie said, WHOPR is just an extension on the LTO framework to
> cater for scalability when building large applications.  As such, when
> building large applications we expect not to be able to apply IPA
> passes that rely on having the whole program callgraph and bodies
> loaded in memory.
>
> However, WHOPR does not limit IPA passes to summary-only.  That's why
> you see the distinction between IPA_PASS and SIMPLE_IPA_PASS in the
> pass manager.
>
>   
>> Are you focusing on inlining here as a specific example, or is this the only
>> planned IPA optimization that can use summaries?  It seems unfortunate to
>>     
>
> No.  Just the first pass that we are going to concentrate for the
> initial implementation.
>
>   

I think that one thing that the gcc community should understand is that 
to a great extent whopr is a google thing.   All of the documents are 
drafted by google people, in meetings that are only open to google 
people and it is only after these documents have been drafted do the 
people who are outside of google who are working on lto, like Honza and 
myself, see the documents and get to comment.  The gcc community never 
sees the constraints, deadlines, needs, or benchmarks that are 
motivating the decisions that are made in the whopr documents.

Honza and I plan, and are implementing, a system where most, but 
probably all of the ipa passes, will be able to work in an environment 
where the entire call graph and all of the decls and types are 
available.  I.e. only the function bodies are missing.    In this 
environment, we plan to do all of the interprocedural analysis and 
generate work orders that will be applied to each function.  

In a distributed environment, these "work orders" can then be streamed 
out to the machines that are actually going to read the function bodies 
and compile them. 

It is certainly not going to be possible to do this for all ipa passes, 
in particular any pass that requires the function body to be reanalyzed 
as part of the analysis pass will not be done, or will be degraded so 
that it does not use this mechanism.  But for a large number of passes 
this will work.

How this scales to google sized applications will have to be seen.  The 
point is that there is a rich space with a complex set tradeoffs to be 
explored with lto.   The decision to farm off the function bodies to 
other processors because we "cannot" have all of the function bodies in 
memory will have a dramatic effect on what gcc/lto/whopr compilation 
will be able to achieve.  We did not make this decision just because gcc 
is fat, we made it because we wanted to be able to compile larger 
programs that could fit into memory even if we did go on a real diet. 

However, in other lto systems like ibm's and (i believe) llvm where the 
link time compilation is done with everything in memory, you can do a 
lot more transformation because you can iterate and propagate 
information discovered from improvements in one function to another.  
IBM seems to sell 64 processor machines with up to 28tb of memory.   I 
do not know whether they can compile all of db2 at one time on this box, 
the last time i talked to them, a year ago, they could not (or at least 
did not) compile all of db2 at one time.  But they are able to do 
several rounds that consist of global analysis and local 
analysis/transformation.   This is certainly the way to squeeze out 
everything that static compilation has to offer.  However it is unlikely 
that many in the gcc community are going to have this kind of horsepower 
available (balrog is a toy compared to one of these monsters).

The bet (guess) that we are making in gcc is that doing weaker analysis 
over a larger context is going to win.  In the initial whopr 
proposal/implementation, this is taken to the extreme, to say that 
inlining is the only ipa transformation, but it is going to be applied 
the entire code base of some monster app.  The rest of the gcc community 
may not see the need to go here, and in fact i would guess (an 
uninformed guess from an outsider) that even google will not need this 
for all of their apps either.  In particular, as consumer machines get 
larger memories and more processors, the assumption that we cannot see 
all of the functions bodies gets more questionable, especially for 
modest sized apps that are the staple of the gcc community.

In particular, Google may be willing to compile the "entire" app, even 
sucking in the code from shared libraries if it provides any benefit.   
Users in the gcc community will most likely rarely go there, since it 
makes the process of doing updates almost impossible.  

There is also a rich set of choices that need to be made to support 
distributed compilation.  I think that it is dangerous to depend so 
heavily on the experience of distcc.  (on the other hand, I realize that 
nfs is a real problem).   For one, the size of the merged type system 
and global decls is going to be quite large.  Cutting this down to 
produce a custom compilable file for each of the processors is not going 
to be without cost either.  I fear that for very large problems, the 
cost of doing this and the cherry picking is going to severely limit the 
scalability of lto/whopr.

Furthermore, as nodes move to having more cores, the cherry picking 
model begins to look bad because each of the machines could have copies 
of all of the hot inlinable functions in their file cache.  The bottom 
line here is that there is not a best solution here because the ground 
is shifting. On the other hand, nfs is still bad and is likely to remain 
that way.

kenny

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and  WPA
  2008-06-04 15:28     ` Kenneth Zadeck
@ 2008-06-04 15:54       ` Ian Lance Taylor
  2008-06-04 16:50         ` Kenneth Zadeck
  2008-06-04 16:15       ` Chris Lattner
  1 sibling, 1 reply; 61+ messages in thread
From: Ian Lance Taylor @ 2008-06-04 15:54 UTC (permalink / raw)
  To: Kenneth Zadeck
  Cc: Diego Novillo, Chris Lattner, gcc, Jan Hubicka, Rafael Espindola,
	Ollie Wild, Robert Hundt

Kenneth Zadeck <zadeck@naturalbridge.com> writes:

> I think that one thing that the gcc community should understand is
> that to a great extent whopr is a google thing.   All of the documents
> are drafted by google people, in meetings that are only open to google
> people and it is only after these documents have been drafted do the
> people who are outside of google who are working on lto, like Honza
> and myself, see the documents and get to comment.  The gcc community
> never sees the constraints, deadlines, needs, or benchmarks that are
> motivating the decisions that are made in the whopr documents.

Every new gcc development starts that way.  Somebody has to put
together the initial proposal.  How many people were invited to work
on the initial LTO proposal before it was sent out?  Did anybody
outside of Red Hat see the tree-ssa proposal before it was sent out?

The WHOPR document has been out there for some time, and it was sent
out before any implementation work started.  There is no Google cabal
pushing it.  There is no secret information behind it, no constraints
or deadlines or benchmarks.  We did have the advantage of talking to
Google employees about their experience with LTO-style work done at
Intel and HP and Transmeta.  Some of the people we talked to have no
plans or interest in working on gcc, and it would not be fair to rope
them into the conversation further.  Google's needs are clear: we have
large programs.

Let's deal with these issues on the technical merits, not on
organizational issues.  If Google were dumping code on gcc, you would
have a legitimate complaint.  Here Google is proposing plans before
any work is started.  You seem to be complaining that the community
should have seen the plans at an earlier stage.  That makes no sense.
They are still just plans, they were based on all of two days of
meetings and discussions, and they are still completely open to
discussion and change.

> Honza and I plan, and are implementing, a system where most, but
> probably all of the ipa passes, will be able to work in an environment
> where the entire call graph and all of the decls and types are
> available.  I.e. only the function bodies are missing.    In this
> environment, we plan to do all of the interprocedural analysis and
> generate work orders that will be applied to each function.  

I don't see that as being opposed to the WHOPR ideas.  It's not like
WHOPR will prohibit that approach.  It's a limiting case.

> In particular, as consumer
> machines get larger memories and more processors, the assumption that
> we cannot see all of the functions bodies gets more questionable,
> especially for modest sized apps that are the staple of the gcc
> community.

I question that assumption, and I especially question any assumption
that gcc should only work for modest sized apps.

Ian

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and  WPA
  2008-06-04 15:28     ` Kenneth Zadeck
  2008-06-04 15:54       ` Ian Lance Taylor
@ 2008-06-04 16:15       ` Chris Lattner
       [not found]         ` <65dd6fd50806041223l1871ecfbh384aa175c3da0645@mail.gmail.com>
  1 sibling, 1 reply; 61+ messages in thread
From: Chris Lattner @ 2008-06-04 16:15 UTC (permalink / raw)
  To: Kenneth Zadeck
  Cc: Diego Novillo, gcc, Jan Hubicka, Rafael Espindola, Ollie Wild,
	Robert Hundt

On Jun 4, 2008, at 8:27 AM, Kenneth Zadeck wrote:

> It is certainly not going to be possible to do this for all ipa  
> passes, in particular any pass that requires the function body to be  
> reanalyzed as part of the analysis pass will not be done, or will be  
> degraded so that it does not use this mechanism.  But for a large  
> number of passes this will work.
>
> How this scales to google sized applications will have to be seen.   
> The point is that there is a rich space with a complex set tradeoffs  
> to be explored with lto.   The decision to farm off the function  
> bodies to other processors because we "cannot" have all of the  
> function bodies in memory will have a dramatic effect on what gcc/ 
> lto/whopr compilation will be able to achieve.

I agree with a lot of the sentiment that you express here Kenny.  In  
LLVM, we've intentionally taken a very incremental approach:

1) start with all code in memory and see how far you can get.  It  
seems that on reasonable developer machines (e.g. 2GB memory) that we  
can handle C programs on the order of a million lines of code, or C++  
code on the order of 400K lines of code without a problem with LLVM.

2) start leaving function bodies on disk, use lazily accesses, and a  
cache manager to keep things in memory when needed.  I think this will  
let us scale to tens or hundreds of million line code bases them.  I  
see no reason to take a whopr approach just to be able to handle large  
programs.

Independent of program size is the efficiency of LTO.  To me, allowing  
lto to scale and work well on 2 to 16 way shared memory machine is the  
first interesting order of business, just because that is what many  
developer's have on their desk.  Once that issue is nailed, going  
across a cluster is an interesting next step.

In the world I deal with, most code is built out of a large number of  
moderate sized libraries/plugins, not as a gigantic monolithic a.out  
file.  I admit that this shifts the emphasis we've been placing on to  
making things integration transparent, support for LTO across code  
bases with pieces missing, etc and not on support for ridiculously  
huge code bases.

I guess one difference between the LLVM and GCC approaches stems from  
the "constant factor" order of magnitude of efficiency difference  
between llvm and gcc.  If you can't reasonable hold a few hundred  
thousand lines of code in memory then you need more advanced  
techniques in order to be generally usable for moderate-sized code  
bases.

-Chris

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-04 14:28   ` Ian Lance Taylor
@ 2008-06-04 16:29     ` Chris Lattner
  2008-06-04 16:41       ` Chris Lattner
                         ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Chris Lattner @ 2008-06-04 16:29 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Diego Novillo, GCC Mailing List, Kenneth Zadeck, Jan Hubicka,
	Rafael Espindola, Ollie Wild, Robert Hundt, Nick Kledzik,
	Devang Patel

On Jun 4, 2008, at 7:22 AM, Ian Lance Taylor wrote:
> Chris Lattner <clattner@apple.com> writes:
>> Is there a specific reason you don't use the LLVM LTO interface?  It
>> seems to be roughly the same as your proposed interface:
>>
>> a) it has a simple C interface like your proposed one
>> b) it is already implemented in one system linker (Apple's), so GCC
>> would just provide its own linker plugin and it would work on apple
>> platforms
>> c) it is richer than your interface
>> d) it is battle tested, and exists today
>> e) it is completely independent of llvm (by design)
>> f) it is fully documented: http://llvm.org/docs/LinkTimeOptimization.html
>>
>> Is there something specific you don't like about the LLVM interface?
>
> (I didn't design the proposed linker interface, and I'm not sure my
> earlier comments were included in the proposal sent to the list.  I'm
> going to reply to that next.)
>
> When I look at the LLVM interface as described on that web page, I see
> these issues, all fixable:
> * No support for symbol versioning.

Very true.  I think it would great to work from a common model that  
can be extended to support both compilers.  Having a unified interface  
would be very useful, and we are happy to evolve the interface to suit  
more general needs.

> * The return value of lto_module_get_symbol_attributes is not
>  defined.

Ah, sorry about that.  Most of the details are actually in the public  
header.  The result of this function is a 'lto_symbol_attributes'  
bitmask.  This should be more useful and revealing:
http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/lto.h?revision=HEAD&view=markup

> * lto_codegen_set_debug_model and lto_codegen_set_pic_model appear to
>  be underspecified--don't they need an additional parameter?

These are actually likely to change.  We are currently working on  
extending the model to better handle the case whentranslation units  
are compiled with different flags.  I expect this to subsume the debug  
and pic handling, which are pretty ad-hoc right now.  There should be  
a proposal going out to llvmdev in the next few days on this.

> * Interfaces like lto_module_get_symbol_name and
>  lto_codegen_add_must_preserve_symbol are inefficient when dealing
>  with large symbol tables.

The intended model is for the linker to query the LTO plugin for its  
symbol list and build up its own linker-specific hash table.  This way  
you don't need to force the linker to use the plugin's data structure  
or the plugin to use the linker data structure.  We converged on this  
approach after trying it the other way.

Does this make sense, do you have a better idea?

> A more general problem is that the main reason I see to use a linker
> plugin is to let the linker handle symbol resolution.

There is that, but also it lets the linker handle things like export  
maps, visibility, strange platform-specific options, etc.  As you  
know, linker's are very complex :)

> The LLVM
> interface does not do that.

Yes it does, the linker fully handles symbol resolution in our model.

> Suppose the linker is invoked on a
> sequence of object files, some with with LTO information, some
> without, all interspersed.  Suppose some symbols are defined in
> multiple .o files, through the use of common symbols, weak symbols,
> and/or section groups.  The LLVM interface simply passes each object
> file to the plugin.

No, the native linker handles all the native .o files.

>  The result is that the plugin is required to do
> symbol resolution itself.  This 1) loses one of the benefits of having
> the linker around; 2) will yield incorrect results when some non-LTO
> object is linked in between LTO objects but redefines some earlier
> weak symbol.

In the LLVM LTO model, the plugin only needs to know about its .o  
files, and the linker uses this information to reason about symbol  
merging etc.  The Mac OS X linker can even do dead code stripping  
across Macho .o files and LLVM .bc files.

Further other pieces of the toolchain (nm, ar, etc) also use the same  
interface so that they can return useful information about LLVM LTO  
files.

> Also, returning a single object file restricts the possibilities.  The
> design of WHOPR, as I understand it, permits creating several
> different object files in parallel based on a fast analysis of which
> code should be compiled together.  When the linker supports concurrent
> linking, it will be desirable to be able to provide it with each
> object file as it is completed.

This sounds like a natural and easy extension once whopr gets farther  
along.

This is our second major revision of the LTO interfaces, and the  
interface continues to slowly evolve.  I think it would be great to  
work with you guys to extend the design to support GCC's needs.

-Chris

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and  WPA
  2008-06-03 16:46 [whopr] Design/implementation alternatives for the driver and WPA Diego Novillo
  2008-06-04  2:27 ` Chris Lattner
  2008-06-04 14:45 ` Ian Lance Taylor
@ 2008-06-04 16:31 ` Mark Mitchell
  2008-07-04  3:31 ` Cary Coutant
  3 siblings, 0 replies; 61+ messages in thread
From: Mark Mitchell @ 2008-06-04 16:31 UTC (permalink / raw)
  To: Diego Novillo
  Cc: gcc, Kenneth Zadeck, Jan Hubicka, Rafael Espindola, Ollie Wild,
	Robert Hundt

Diego Novillo wrote:
> We've started working on the driver and WPA components for whopr.
> These are some of our initial thoughts and implementation strategy.  I
> have linked these to the WHOPR page as well.  I'm hoping we can
> discuss these at the Summit BoF, so I'm posting them now to start the
> discussion.

> == Repackaging ==
> Under this proposal, WPA repackages its input files. 

FWIW, I'd suggest going this way.  I agree that this is probably the way 
to go in the long term, and avoiding the throw-away stage seems beneficial.

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-04  7:28   ` Rafael Espindola
@ 2008-06-04 16:34     ` Chris Lattner
  2008-06-04 16:48       ` Rafael Espindola
  0 siblings, 1 reply; 61+ messages in thread
From: Chris Lattner @ 2008-06-04 16:34 UTC (permalink / raw)
  To: Rafael Espindola
  Cc: Diego Novillo, gcc, Kenneth Zadeck, Jan Hubicka, Ollie Wild,
	Robert Hundt

On Jun 4, 2008, at 12:27 AM, Rafael Espindola wrote:

>> Is there a specific reason you don't use the LLVM LTO interface?   
>> It seems
>> to be roughly the same as your proposed interface:
>>
>> a) it has a simple C interface like your proposed one
>> b) it is already implemented in one system linker (Apple's), so GCC  
>> would
>> just provide its own linker plugin and it would work on apple  
>> platforms
>> c) it is richer than your interface
>> d) it is battle tested, and exists today
>> e) it is completely independent of llvm (by design)
>> f) it is fully documented: http://llvm.org/docs/LinkTimeOptimization.html
>>
>> Is there something specific you don't like about the LLVM interface?
>
>
> We are still discussing how we are going to implement, so the API is
> still not final. Some things that have been pointed out:

Hey Rafael!

> *) Plugins could have other uses and the naming used on the LLVM LTO
> interface is LTO specific.

The LLVM interface uses an lto_ prefix.  This interface is used by nm/ 
ar/etc as well as the linker.  Is there something specific about lto_  
that is bad?
http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/lto.h?revision=HEAD&view=markup

> *) We have a normal symbol table on the .o files. It is not clear if
> we should assume that it will always be the case. If so, we don't need
> the API part that handles that.

This seems like a pretty minor point, but it would be easy to either:

1) make this an optional interface
2) make the plugin implement the symtab interfaces, but query the ELF  
symbol table instead of the LTO symbol table if possible.

> *) How do you handle the case of multiple symbols with the same name
> (say a weak and a strong one)? lto_codegen_add_must_preserve_symbol
> has a char* argument. How does it know which symbol we are talking
> about?

The lto_symbol_attributes enum specifies linkage.

> *) To save memory, one option is to have the plugin exec WPA and WPA
> exec the linker again with the new objects. In this case the api
> should be a bit different.

That's an interesting idea, but it is very unclear to me whether it  
would save a significant amount of memory.  Operating system VM  
systems are pretty good at paging out data that isn't used (e.g.  
the .o files the linker loaded into memory that exist when WPA is  
going on).

-Chris

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-04 16:29     ` Chris Lattner
@ 2008-06-04 16:41       ` Chris Lattner
  2008-06-04 18:48       ` Devang Patel
  2008-06-04 19:45       ` Ian Lance Taylor
  2 siblings, 0 replies; 61+ messages in thread
From: Chris Lattner @ 2008-06-04 16:41 UTC (permalink / raw)
  To: Chris Lattner
  Cc: Ian Lance Taylor, Diego Novillo, GCC Mailing List,
	Kenneth Zadeck, Jan Hubicka, Rafael Espindola, Ollie Wild,
	Robert Hundt, Nick Kledzik, Devang Patel


On Jun 4, 2008, at 9:29 AM, Chris Lattner wrote:

>> Suppose the linker is invoked on a
>> sequence of object files, some with with LTO information, some
>> without, all interspersed.  Suppose some symbols are defined in
>> multiple .o files, through the use of common symbols, weak symbols,
>> and/or section groups.  The LLVM interface simply passes each object
>> file to the plugin.
>
> No, the native linker handles all the native .o files.

Incidentally, this is very easy to verify, as you can download this  
today and try it out.  LTO works fine in the Xcode 3.1 beta, which is  
available off developer.apple.com, including when you mix and match  
LLVM-compiled LTO .o files with GCC compiled ones.

For example, this works fine and does LTO across a.c/b.cpp/c.m:

llvm-gcc a.c   -O4 -o a.o -c
llvm-g++ b.cpp -O4 -o b.o -c
llvm-gcc c.m   -O4 -o c.o -c
gcc d.m        -O3 -o d.o -c
g++ a.o b.o c.o d.o -o a.out

-Chris

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-04 16:34     ` Chris Lattner
@ 2008-06-04 16:48       ` Rafael Espindola
  0 siblings, 0 replies; 61+ messages in thread
From: Rafael Espindola @ 2008-06-04 16:48 UTC (permalink / raw)
  To: Chris Lattner
  Cc: Diego Novillo, gcc, Kenneth Zadeck, Jan Hubicka, Ollie Wild,
	Robert Hundt

> Hey Rafael!
Hello!

>> *) Plugins could have other uses and the naming used on the LLVM LTO
>> interface is LTO specific.
>
> The LLVM interface uses an lto_ prefix.  This interface is used by nm/ar/etc
> as well as the linker.  Is there something specific about lto_ that is bad?
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/lto.h?revision=HEAD&view=markup

This is a minor comment. All the design is done based on what LTO
needs, but since it could be possible to use plugins for other things
the proposed API uses a generic prefix.

>> *) We have a normal symbol table on the .o files. It is not clear if
>> we should assume that it will always be the case. If so, we don't need
>> the API part that handles that.
>
> This seems like a pretty minor point, but it would be easy to either:
>
> 1) make this an optional interface
> 2) make the plugin implement the symtab interfaces, but query the ELF symbol
> table instead of the LTO symbol table if possible.

Sure. There is just the issue of the many function calls. Not sure how
expensive this is. Maybe have the plugin construct a symbol table with
everything that is on the file?

>> *) How do you handle the case of multiple symbols with the same name
>> (say a weak and a strong one)? lto_codegen_add_must_preserve_symbol
>> has a char* argument. How does it know which symbol we are talking
>> about?
>
> The lto_symbol_attributes enum specifies linkage.

That allows the plugin to pass information to the linker. But if there
are two symbols named "foo". How can the linker instruct the plugin to
generate code for only one? The function that LLVM uses is

lto_codegen_add_must_preserve_symbol(lto_code_gen_t, const char*)

right? Maybe adding a lto_symbol_attributes argument would be enough,
but having an abstract object is probably better.

>> *) To save memory, one option is to have the plugin exec WPA and WPA
>> exec the linker again with the new objects. In this case the api
>> should be a bit different.
>
> That's an interesting idea, but it is very unclear to me whether it would
> save a significant amount of memory.  Operating system VM systems are pretty
> good at paging out data that isn't used (e.g. the .o files the linker loaded
> into memory that exist when WPA is going on).

Sure. Again, the document is in a early stage, and we listed most of
the options. Restarting the linker is not a very popular option, but
might be worth trying.

> -Chris
>


Cheers,
-- 
Rafael Avila de Espindola

Google Ireland Ltd.
Gordon House
Barrow Street
Dublin 4
Ireland

Registered in Dublin, Ireland
Registration Number: 368047

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and   WPA
  2008-06-04 15:54       ` Ian Lance Taylor
@ 2008-06-04 16:50         ` Kenneth Zadeck
  2008-06-04 17:05           ` Diego Novillo
  2008-06-04 17:37           ` Ian Lance Taylor
  0 siblings, 2 replies; 61+ messages in thread
From: Kenneth Zadeck @ 2008-06-04 16:50 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Diego Novillo, Chris Lattner, gcc, Jan Hubicka, Rafael Espindola,
	Ollie Wild, Robert Hundt

Ian Lance Taylor wrote:
> Kenneth Zadeck <zadeck@naturalbridge.com> writes:
>
>   
>> I think that one thing that the gcc community should understand is
>> that to a great extent whopr is a google thing.   All of the documents
>> are drafted by google people, in meetings that are only open to google
>> people and it is only after these documents have been drafted do the
>> people who are outside of google who are working on lto, like Honza
>> and myself, see the documents and get to comment.  The gcc community
>> never sees the constraints, deadlines, needs, or benchmarks that are
>> motivating the decisions that are made in the whopr documents.
>>     
>
> Every new gcc development starts that way.  Somebody has to put
> together the initial proposal.  How many people were invited to work
> on the initial LTO proposal before it was sent out?  Did anybody
> outside of Red Hat see the tree-ssa proposal before it was sent out?
>
> The WHOPR document has been out there for some time, and it was sent
> out before any implementation work started.  There is no Google cabal
> pushing it.  There is no secret information behind it, no constraints
> or deadlines or benchmarks.  We did have the advantage of talking to
> Google employees about their experience with LTO-style work done at
> Intel and HP and Transmeta.  Some of the people we talked to have no
> plans or interest in working on gcc, and it would not be fair to rope
> them into the conversation further.  Google's needs are clear: we have
> large programs.
>
> Let's deal with these issues on the technical merits, not on
> organizational issues.  If Google were dumping code on gcc, you would
> have a legitimate complaint.  Here Google is proposing plans before
> any work is started.  You seem to be complaining that the community
> should have seen the plans at an earlier stage.  That makes no sense.
> They are still just plans, they were based on all of two days of
> meetings and discussions, and they are still completely open to
> discussion and change.
>
>
>   
Ian, i am not dumping on google.   But there is a particular perspective 
that you have which is driven by your legitimate need to handle very 
large applications.  This perspective may not be shared by the rest of 
the gcc community.   I was really only pointing that out.   In 
particular, there are a lot of decisions that are being made in whopr to 
support very large applications that are done so at the expense of 
compiling modest and even large applications.  I do not necessarily 
disagree with these decisions, but I think that it is very important to 
get that out in front of everyone and let the community come to an 
informed consensus.  

>> Honza and I plan, and are implementing, a system where most, but
>> probably all of the ipa passes, will be able to work in an environment
>> where the entire call graph and all of the decls and types are
>> available.  I.e. only the function bodies are missing.    In this
>> environment, we plan to do all of the interprocedural analysis and
>> generate work orders that will be applied to each function.  
>>     
>
> I don't see that as being opposed to the WHOPR ideas.  It's not like
> WHOPR will prohibit that approach.  It's a limiting case.
>
>   

>> In particular, as consumer
>> machines get larger memories and more processors, the assumption that
>> we cannot see all of the functions bodies gets more questionable,
>> especially for modest sized apps that are the staple of the gcc
>> community.
>>     
>
> I question that assumption, and I especially question any assumption
> that gcc should only work for modest sized apps.
>
>   
Ian, there are tradeoffs here.   My point is that there are a lot of 
things that can be done with modest sized apps or libraries that cannot 
be done on google sized applications.  Remember that the majority of the 
world outside of google neither has google sized applications or could 
compile them if they did.

While i agree that some form of lto needs to support monster apps, that 
should not inhibit us from supporting transformations or models of 
compilation that are only practical with 100k line programs.  


> Ian
>   

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-04 16:50         ` Kenneth Zadeck
@ 2008-06-04 17:05           ` Diego Novillo
  2008-06-04 17:37           ` Ian Lance Taylor
  1 sibling, 0 replies; 61+ messages in thread
From: Diego Novillo @ 2008-06-04 17:05 UTC (permalink / raw)
  To: Kenneth Zadeck
  Cc: Ian Lance Taylor, Chris Lattner, gcc, Jan Hubicka,
	Rafael Espindola, Ollie Wild, Robert Hundt

On Wed, Jun 4, 2008 at 12:50, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:

> While i agree that some form of lto needs to support monster apps, that
> should not inhibit us from supporting transformations or models of
> compilation that are only practical with 100k line programs.

Of course not.  That was never the intent.  Supporting small/medium
sized applications is inherent in the WHOPR design.  If we can't
handle that efficiently, then we have a design/implementation bug.

While we (Google) are mostly interested in summary-based IPA for very
large applications, we do not want the design to negate other uses of
LTO.  WHOPR is designed to support the whole spectrum, from
small/medium sized applications where the whole program fits in
memory, to extremely large applications where memory/computing
requirements are prohibitive for a single machine.

In practice, the full distributed model that WHOPR offers will not
need to be triggered for small applications, only very large ones.
Ideally, we should be able to hide all that behind 'gcc -flto' and let
the compiler decide how to operate.

The natural restriction is that passes of type SMALL_IPA will not be
able to run when the full distributed version is being used.  Again,
this is something I expect the compiler to be able to figure out for
itself.

Diego.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-04 16:50         ` Kenneth Zadeck
  2008-06-04 17:05           ` Diego Novillo
@ 2008-06-04 17:37           ` Ian Lance Taylor
  1 sibling, 0 replies; 61+ messages in thread
From: Ian Lance Taylor @ 2008-06-04 17:37 UTC (permalink / raw)
  To: Kenneth Zadeck
  Cc: Diego Novillo, Chris Lattner, gcc, Jan Hubicka, Rafael Espindola,
	Ollie Wild, Robert Hundt

Kenneth Zadeck <zadeck@naturalbridge.com> writes:

> In particular, there are a lot of decisions that are being made in
> whopr to support very large applications that are done so at the
> expense of compiling modest and even large applications.  I do not
> necessarily disagree with these decisions, but I think that it is very
> important to get that out in front of everyone and let the community
> come to an informed consensus.  

If WHOPR does not work efficiently on small programs, then that is
clearly a problem.  But I don't see that in the design.

Ian

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-04 16:29     ` Chris Lattner
  2008-06-04 16:41       ` Chris Lattner
@ 2008-06-04 18:48       ` Devang Patel
  2008-06-04 19:45       ` Ian Lance Taylor
  2 siblings, 0 replies; 61+ messages in thread
From: Devang Patel @ 2008-06-04 18:48 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Diego Novillo, GCC Mailing List, Kenneth Zadeck, Jan Hubicka,
	Rafael Espindola, Ollie Wild, Robert Hundt, Nick Kledzik,
	Chris Lattner

>> Also, returning a single object file restricts the possibilities.   
>> The
>> design of WHOPR, as I understand it, permits creating several
>> different object files in parallel based on a fast analysis of which
>> code should be compiled together.  When the linker supports  
>> concurrent
>> linking, it will be desirable to be able to provide it with each
>> object file as it is completed.

By definition, inter modular optimizer (aka lto) blurs object files  
boundaries. Typically, it will construct and walk combined call graph  
instead of dividing work based on input files. It does not add lots of  
value to preserve one to one direct relationship between optimizer  
input files and output files. I agree, it makes sense to have an  
additional interface to incrementally feed linker optimized chunks of  
code to take advantage of concurrent linking supported by the linker.

-
Devang

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Fwd: [whopr] Design/implementation alternatives for the driver and WPA
       [not found]   ` <65dd6fd50806032310u2bda0953qb911e3ccfe3f305e@mail.gmail.com>
@ 2008-06-04 19:29     ` Ollie Wild
  0 siblings, 0 replies; 61+ messages in thread
From: Ollie Wild @ 2008-06-04 19:29 UTC (permalink / raw)
  To: GCC Development

Reposting to the gcc list since my first email got bounced.

Ollie

On Tue, Jun 3, 2008 at 7:26 PM, Chris Lattner <clattner@apple.com> wrote:

> This is a very interesting design, and a very nice evolution from the previous proposal.  I'm not completely clear on the difference between LTO and whopr here.  Is LTO the mode "normal people" will use, and whopr is the mode where "people with huge clusters" will use?  Will LTO/whopr support useful optimization on common multicore machines?

WHOPR is just an extension of the original LTO proposal.  It seeks to
augment the LTO design by providing a mechanism for parallelizing the
final (link-time) optimization phase.  The design has been based on a
distcc-like distributed compilation model, so it should be beneficial
even to those with small to moderate sized clusters.  This doesn't
preclude parallelization on multi-core machines (and that has been
discussed to some degree), but I at least have treated that as a
secondary consideration.  A good example of this is in the WPA
discussion below.  On a multicore machine, repackaging doesn't make a
lot of sense because the compiler can efficiently cherry-pick function
bodies from different files.  However, in a distcc compiler farm, the
entirety of a file must be transferred, so this would result in a lot
of excess network overhead.

> Are you focusing on inlining here as a specific example, or is this the only planned IPA optimization that can use summaries?  It seems unfortunate to design a system where inlining is the only real IPO transformation you can do.  Does adding new interprocedural optimizations require adding whole new phases?

The WPA document is a cleaned up transcription of an internal document
I wrote.  During the transcription, some context got lost.  It's not
meant to be a description of a final implementation but rather a
pro/con comparison between two possible draft implementations.  The
goal is to get some basic infrastructure in place so that we can start
experimenting with it and better parallelize additional work.
Inlining is chosen as an initial feature because it's relatively easy
to implement and can be (coarsely) handled without support for
serializing IPA summary information.  Other IPA passes (e.g.
inter-procedural constant propagation) require additional
serialization capabilities (which Kenneth Zadeck is working on now).

Ollie

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Fwd: [whopr] Design/implementation alternatives for the driver and WPA
       [not found]         ` <65dd6fd50806041223l1871ecfbh384aa175c3da0645@mail.gmail.com>
@ 2008-06-04 19:30           ` Ollie Wild
       [not found]           ` <89069638-6D2B-4AE6-ACB3-99A2B09091BA@apple.com>
  2008-06-04 20:03           ` Kenneth Zadeck
  2 siblings, 0 replies; 61+ messages in thread
From: Ollie Wild @ 2008-06-04 19:30 UTC (permalink / raw)
  To: GCC Development

Reposting this as well.

Ollie

On Wed, Jun 4, 2008 at 9:14 AM, Chris Lattner <clattner@apple.com> wrote:
>
> 1) start with all code in memory and see how far you can get.  It seems that on reasonable developer machines (e.g. 2GB memory) that we can handle C programs on the order of a million lines of code, or C++ code on the order of 400K lines of code without a problem with LLVM.

This is essentially what the lto branch does today, and I don't see
any reason to disable this feature.  In the language of the WHOPR
design, the lto branch supports LGEN + LTRANS, with WPA bypassed
completely.  For implementing WPA, my intention is to add a new flag
(-fpartition or whatever else people think is suitable) to instruct
the lto1 front end to perform partitioning (aka repackaging) of .o
files, execute summary IPA analysese, and kick off a separate LTRANS
phase.

This gives us two modes of operation: one in which all object files
are loaded into memory and optimized using the full array of passes
available to GCC; and one which does some high-level analysis on the
whole program, partitions the program into smaller pieces, and does
more detailed analysis + grunt work on the smaller pieces.

>
> 2) start leaving function bodies on disk, use lazily accesses, and a cache manager to keep things in memory when needed.  I think this will let us scale to tens or hundreds of million line code bases them.  I see no reason to take a whopr approach just to be able to handle large programs.

In addition to memory consumption, there is also the question of time
consumption.  Alternative LTO implementations by HP, Intel, and others
follow this model and spend multiple hours optimizing even moderately
large programs.

Ollie

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-04 16:29     ` Chris Lattner
  2008-06-04 16:41       ` Chris Lattner
  2008-06-04 18:48       ` Devang Patel
@ 2008-06-04 19:45       ` Ian Lance Taylor
  2008-06-04 20:38         ` Nick Kledzik
  2 siblings, 1 reply; 61+ messages in thread
From: Ian Lance Taylor @ 2008-06-04 19:45 UTC (permalink / raw)
  To: Chris Lattner
  Cc: Diego Novillo, GCC Mailing List, Kenneth Zadeck, Jan Hubicka,
	Rafael Espindola, Ollie Wild, Robert Hundt, Nick Kledzik,
	Devang Patel

Chris Lattner <clattner@apple.com> writes:

>> * The return value of lto_module_get_symbol_attributes is not
>>  defined.
>
> Ah, sorry about that.  Most of the details are actually in the public
> header.  The result of this function is a 'lto_symbol_attributes'
> bitmask.  This should be more useful and revealing:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/lto.h?revision=HEAD&view=markup

From an ELF perspective, this doesn't seem to have a way to indicate a
common symbol, and it doesn't provide the symbol's type.  It also
doesn't have a way to indicate section groups.

(How do section groups work in Mach-O?  Example is a C++ template
function with a static constant array which winds up in the .rodata
section.  Section groups permit discarding the array when we discard
the function code.)

>> * Interfaces like lto_module_get_symbol_name and
>>  lto_codegen_add_must_preserve_symbol are inefficient when dealing
>>  with large symbol tables.
>
> The intended model is for the linker to query the LTO plugin for its
> symbol list and build up its own linker-specific hash table.  This way
> you don't need to force the linker to use the plugin's data structure
> or the plugin to use the linker data structure.  We converged on this
> approach after trying it the other way.
>
> Does this make sense, do you have a better idea?

In gcc's LTO approach, I think the linker will already have access to
the symbol table anyhow.  But my actual point here is that requiring a
function call for every symbol is inefficient.  These functions should
take an array and a count.  There can be hundreds of thousands of
entries in a symbol table, and the interface should scale accordingly.

>> The LLVM
>> interface does not do that.
>
> Yes it does, the linker fully handles symbol resolution in our model.
>
>> Suppose the linker is invoked on a
>> sequence of object files, some with with LTO information, some
>> without, all interspersed.  Suppose some symbols are defined in
>> multiple .o files, through the use of common symbols, weak symbols,
>> and/or section groups.  The LLVM interface simply passes each object
>> file to the plugin.
>
> No, the native linker handles all the native .o files.
>
>>  The result is that the plugin is required to do
>> symbol resolution itself.  This 1) loses one of the benefits of having
>> the linker around; 2) will yield incorrect results when some non-LTO
>> object is linked in between LTO objects but redefines some earlier
>> weak symbol.
>
> In the LLVM LTO model, the plugin only needs to know about its .o
> files, and the linker uses this information to reason about symbol
> merging etc.  The Mac OS X linker can even do dead code stripping
> across Macho .o files and LLVM .bc files.

To be clear, when I said object file here, I meant any input file.
You may have understood that.

In ELF you have to think about symbol overriding.  Let's say you link
a.o b.o c.o.  a.o has a reference to symbol S.  b.o has a strong
definition.  c.o has a weak definition.  a.o and c.o have LTO
information, b.o does not.  ELF requires that a.o call the symbol from
b.o, not the symbol from c.o.  I don't see how to make that work with
the LLVM interface.

This is not a particularly likely example, of course.  People rely on
this sort of symbol overriding quite a bit, but it's unlikely that a.o
and c.o would have LTO information while b.o would not.  However,
given that we are designing an interface, I think we should design it
so that correctness is possible.

> Further other pieces of the toolchain (nm, ar, etc) also use the same
> interface so that they can return useful information about LLVM LTO
> files.

Useful, but as I understand it gcc's LTO files will have that
information anyhow.

> This is our second major revision of the LTO interfaces, and the
> interface continues to slowly evolve.  I think it would be great to
> work with you guys to extend the design to support GCC's needs.

Agreed.

Ian

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
       [not found]           ` <89069638-6D2B-4AE6-ACB3-99A2B09091BA@apple.com>
@ 2008-06-04 20:02             ` Ollie Wild
  2008-06-04 23:59             ` Diego Novillo
  1 sibling, 0 replies; 61+ messages in thread
From: Ollie Wild @ 2008-06-04 20:02 UTC (permalink / raw)
  To: Chris Lattner
  Cc: Kenneth Zadeck, Diego Novillo, gcc, Jan Hubicka,
	Rafael Espindola, Robert Hundt

On Wed, Jun 4, 2008 at 12:33 PM, Chris Lattner <clattner@apple.com> wrote:
>
> Right, I understand that you design "stacks" on LTO.  It just seems strange
> to work on the advanced stuff before the basic GCC LTO stuff is close to
> being useful.

To some degree, we're scratching our own itch here.  Basic LTO doesn't
help us much.  Obviously, though, we want to implement this in a way
which is generally useful to the external community.  The scalability
techniques required to work with distcc are different from the
techniques which are useful on a single machine.

> I don't know anything about the implementation of the HP or Intel LTO
> implementation, but it sounds like there is much room for improvement there.
>  With LLVM LTO, we see a compile-time slowdown on the order of 30-50% switch
> from O3 to O4, not an order of magnitude.  There is also still much room for
> improvement in the LLVM implementation of course.

I think we're working from different baselines.  We use distributed
techniques for compiling individual .o files.  With a tool like
distcc, you can get something on the order of 20x speedup.  Linking
becomes 20% or more of total execution time.  LTO *is* an order of
magnitude increase compared to a basic link operation.

Ollie

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and  WPA
       [not found]         ` <65dd6fd50806041223l1871ecfbh384aa175c3da0645@mail.gmail.com>
  2008-06-04 19:30           ` Fwd: " Ollie Wild
       [not found]           ` <89069638-6D2B-4AE6-ACB3-99A2B09091BA@apple.com>
@ 2008-06-04 20:03           ` Kenneth Zadeck
  2008-06-04 20:30             ` Ian Lance Taylor
  2008-06-04 20:56             ` Diego Novillo
  2 siblings, 2 replies; 61+ messages in thread
From: Kenneth Zadeck @ 2008-06-04 20:03 UTC (permalink / raw)
  To: Ollie Wild
  Cc: Chris Lattner, Diego Novillo, gcc, Jan Hubicka, Rafael Espindola,
	Robert Hundt

Ollie Wild wrote:
> On Wed, Jun 4, 2008 at 9:14 AM, Chris Lattner <clattner@apple.com 
> <mailto:clattner@apple.com>> wrote:
>
>
>     1) start with all code in memory and see how far you can get.  It
>     seems that on reasonable developer machines (e.g. 2GB memory) that
>     we can handle C programs on the order of a million lines of code,
>     or C++ code on the order of 400K lines of code without a problem
>     with LLVM.
>
>
> This is essentially what the lto branch does today, and I don't see 
> any reason to disable this feature.  In the language of the WHOPR 
> design, the lto branch supports LGEN + LTRANS, with WPA bypassed 
> completely.  For implementing WPA, my intention is to add a new flag 
> (-fpartition or whatever else people think is suitable) to instruct 
> the lto1 front end to perform partitioning (aka repackaging) of .o 
> files, execute summary IPA analysese, and kick off a separate LTRANS 
> phase.
This is what lto does today because this was the easiest thing to do to 
be able to continue to develop and test the other parts of the 
system.    it is stupidly implemented - it required only five lines of 
code (two of them being curly braces according to the gcc coding 
standards) so it allowed us to work on other things.

However this was not the point of my mail. The point of my mail was 
whopr's design that seems to basically sacrifice almost all 
interprocedural analysis and transformation except for inlining in order 
to scale so as to be able to compile programs of such size that most of 
the gcc community (including the distros) will never see.   I realize 
that there is handwaving that sure, there is this or that could possibly 
be implemented by someone else for programs of modest scale, but that is 
not what whopr is all about.  

I do not want to imply that google's needs are not real and that they 
should not use gcc to fulfill them.   I only want to raise the point 
that whopr is at one end of a spectrum in which REAL tradeoffs are being 
made in the quality of compilation vs size of program handled and there 
there is a real possibility that being able to handle an entire program 
with these tradeoffs is going to yield the fastest program or a 
reasonable compilation time.

kenny

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and  WPA
  2008-06-04 20:03           ` Kenneth Zadeck
@ 2008-06-04 20:30             ` Ian Lance Taylor
  2008-06-04 20:56             ` Diego Novillo
  1 sibling, 0 replies; 61+ messages in thread
From: Ian Lance Taylor @ 2008-06-04 20:30 UTC (permalink / raw)
  To: Kenneth Zadeck
  Cc: Ollie Wild, Chris Lattner, Diego Novillo, gcc, Jan Hubicka,
	Rafael Espindola, Robert Hundt

Kenneth Zadeck <zadeck@naturalbridge.com> writes:

> I do not want to imply that google's needs are not real and that they
> should not use gcc to fulfill them.   I only want to raise the point
> that whopr is at one end of a spectrum in which REAL tradeoffs are
> being made in the quality of compilation vs size of program handled
> and there there is a real possibility that being able to handle an
> entire program with these tradeoffs is going to yield the fastest
> program or a reasonable compilation time.

What you need to ask is whether WHOPR is going to slow down or prevent
handling an entire program.  If so, then why, and how can we avoid
that?

(Your answer should not be something along the lines of "people will
be working on WHOPR rather than something else."  People will work on
what they find to be important.)

Ian

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-04 19:45       ` Ian Lance Taylor
@ 2008-06-04 20:38         ` Nick Kledzik
  2008-06-04 20:46           ` Ian Lance Taylor
  2008-06-05  8:41           ` Rafael Espindola
  0 siblings, 2 replies; 61+ messages in thread
From: Nick Kledzik @ 2008-06-04 20:38 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Chris Lattner, Diego Novillo, GCC Mailing List, Kenneth Zadeck,
	Jan Hubicka, Rafael Espindola, Ollie Wild, Robert Hundt,
	Devang Patel


On Jun 4, 2008, at 12:44 PM, Ian Lance Taylor wrote:
> Chris Lattner <clattner@apple.com> writes:
>
>>> * The return value of lto_module_get_symbol_attributes is not
>>> defined.
>>
>> Ah, sorry about that.  Most of the details are actually in the public
>> header.  The result of this function is a 'lto_symbol_attributes'
>> bitmask.  This should be more useful and revealing:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/lto.h?revision=HEAD&view=markup
>
> From an ELF perspective, this doesn't seem to have a way to indicate a
> common symbol, and it doesn't provide the symbol's type.
The current lto interface does return whether a  symbol is
REGULAR, TENTATIVE, WEAK_DEF, or UNDEFINED.  There is also
CODE vs DATA which could be used to indicate STT_FUNC vs STT_OBJECT.


>  It also
> doesn't have a way to indicate section groups.
>
> (How do section groups work in Mach-O?  Example is a C++ template
> function with a static constant array which winds up in the .rodata
> section.  Section groups permit discarding the array when we discard
> the function code.)
Neither mach-o or llvm have group comdat.   We just rely on dead code  
stripping.
If the temple function was coalesced away, there would no longer be
a reference to that static const array, so it would get dead stripped.

Dead stripping is an important pass in LTO.


>>> * Interfaces like lto_module_get_symbol_name and
>>> lto_codegen_add_must_preserve_symbol are inefficient when dealing
>>> with large symbol tables.
>>
>> The intended model is for the linker to query the LTO plugin for its
>> symbol list and build up its own linker-specific hash table.  This  
>> way
>> you don't need to force the linker to use the plugin's data structure
>> or the plugin to use the linker data structure.  We converged on this
>> approach after trying it the other way.
>>
>> Does this make sense, do you have a better idea?
>
> In gcc's LTO approach, I think the linker will already have access to
> the symbol table anyhow.  But my actual point here is that requiring a
> function call for every symbol is inefficient.  These functions should
> take an array and a count.  There can be hundreds of thousands of
> entries in a symbol table, and the interface should scale accordingly.
I see you have your gold hat on here!  The current interface is
simple and clean.  If it does turn out that repeated calls to  
lto_module_get_symbol*
are really a bottleneck, we could add a "bulk" function.


>>> The LLVM
>>> interface does not do that.
>>
>> Yes it does, the linker fully handles symbol resolution in our model.
>>
>>> Suppose the linker is invoked on a
>>> sequence of object files, some with with LTO information, some
>>> without, all interspersed.  Suppose some symbols are defined in
>>> multiple .o files, through the use of common symbols, weak symbols,
>>> and/or section groups.  The LLVM interface simply passes each object
>>> file to the plugin.
>>
>> No, the native linker handles all the native .o files.
>>
>>> The result is that the plugin is required to do
>>> symbol resolution itself.  This 1) loses one of the benefits of  
>>> having
>>> the linker around; 2) will yield incorrect results when some non-LTO
>>> object is linked in between LTO objects but redefines some earlier
>>> weak symbol.
>>
>> In the LLVM LTO model, the plugin only needs to know about its .o
>> files, and the linker uses this information to reason about symbol
>> merging etc.  The Mac OS X linker can even do dead code stripping
>> across Macho .o files and LLVM .bc files.
>
> To be clear, when I said object file here, I meant any input file.
> You may have understood that.
>
> In ELF you have to think about symbol overriding.  Let's say you link
> a.o b.o c.o.  a.o has a reference to symbol S.  b.o has a strong
> definition.  c.o has a weak definition.  a.o and c.o have LTO
> information, b.o does not.  ELF requires that a.o call the symbol from
> b.o, not the symbol from c.o.  I don't see how to make that work with
> the LLVM interface.
This does work.  There are two parts to it.  First the linker's master  
symbol
table sees the strong definition of S in b.o and the weak in c.o and
decides to use the strong one from b.o.  Second (because of that) the  
linker
calls  lto_codegen_add_must_preserve_symbol("S"). The LTO engine then
sees it has a weak global function S and it cannot inline those.  Put  
together
the LTO engine does generate a copy of S, but the linker throws it away
and uses the one from b.o.

-Nick

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-04 20:38         ` Nick Kledzik
@ 2008-06-04 20:46           ` Ian Lance Taylor
  2008-06-04 21:43             ` Nick Kledzik
  2008-06-05  8:41           ` Rafael Espindola
  1 sibling, 1 reply; 61+ messages in thread
From: Ian Lance Taylor @ 2008-06-04 20:46 UTC (permalink / raw)
  To: Nick Kledzik
  Cc: Chris Lattner, Diego Novillo, GCC Mailing List, Kenneth Zadeck,
	Jan Hubicka, Rafael Espindola, Ollie Wild, Robert Hundt,
	Devang Patel

Nick Kledzik <kledzik@apple.com> writes:

> On Jun 4, 2008, at 12:44 PM, Ian Lance Taylor wrote:
>> Chris Lattner <clattner@apple.com> writes:
>>
>>>> * The return value of lto_module_get_symbol_attributes is not
>>>> defined.
>>>
>>> Ah, sorry about that.  Most of the details are actually in the public
>>> header.  The result of this function is a 'lto_symbol_attributes'
>>> bitmask.  This should be more useful and revealing:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/lto.h?revision=HEAD&view=markup
>>
>> From an ELF perspective, this doesn't seem to have a way to indicate a
>> common symbol, and it doesn't provide the symbol's type.
> The current lto interface does return whether a  symbol is
> REGULAR, TENTATIVE, WEAK_DEF, or UNDEFINED.  There is also
> CODE vs DATA which could be used to indicate STT_FUNC vs STT_OBJECT.

By "type" I mean STT_FUNC or STT_OBJECT.  I took CODE vs. DATA to
refer to the section in which the symbol is defined (SHF_EXECINSTR
vs. SHF_WRITE).  But, you're right, with appropriate squinting CODE
vs. DATA is probably adequate.


> I see you have your gold hat on here!  The current interface is
> simple and clean.  If it does turn out that repeated calls to
> lto_module_get_symbol*
> are really a bottleneck, we could add a "bulk" function.

I would like to add the bulk function now, because I know that we will
want it.


>>>> The LLVM
>>>> interface does not do that.
>>>
>>> Yes it does, the linker fully handles symbol resolution in our model.
>>>
>>>> Suppose the linker is invoked on a
>>>> sequence of object files, some with with LTO information, some
>>>> without, all interspersed.  Suppose some symbols are defined in
>>>> multiple .o files, through the use of common symbols, weak symbols,
>>>> and/or section groups.  The LLVM interface simply passes each object
>>>> file to the plugin.
>>>
>>> No, the native linker handles all the native .o files.
>>>
>>>> The result is that the plugin is required to do
>>>> symbol resolution itself.  This 1) loses one of the benefits of
>>>> having
>>>> the linker around; 2) will yield incorrect results when some non-LTO
>>>> object is linked in between LTO objects but redefines some earlier
>>>> weak symbol.
>>>
>>> In the LLVM LTO model, the plugin only needs to know about its .o
>>> files, and the linker uses this information to reason about symbol
>>> merging etc.  The Mac OS X linker can even do dead code stripping
>>> across Macho .o files and LLVM .bc files.
>>
>> To be clear, when I said object file here, I meant any input file.
>> You may have understood that.
>>
>> In ELF you have to think about symbol overriding.  Let's say you link
>> a.o b.o c.o.  a.o has a reference to symbol S.  b.o has a strong
>> definition.  c.o has a weak definition.  a.o and c.o have LTO
>> information, b.o does not.  ELF requires that a.o call the symbol from
>> b.o, not the symbol from c.o.  I don't see how to make that work with
>> the LLVM interface.
> This does work.  There are two parts to it.  First the linker's master
> symbol
> table sees the strong definition of S in b.o and the weak in c.o and
> decides to use the strong one from b.o.  Second (because of that) the
> linker
> calls  lto_codegen_add_must_preserve_symbol("S"). The LTO engine then
> sees it has a weak global function S and it cannot inline those.  Put
> together
> the LTO engine does generate a copy of S, but the linker throws it away
> and uses the one from b.o.

OK, for that case.  But are you asserting that this works in all
cases?  Should I come up with other examples of mixing LTO objects
with non-LTO objects using different types of symbols?

Ian

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-04 20:03           ` Kenneth Zadeck
  2008-06-04 20:30             ` Ian Lance Taylor
@ 2008-06-04 20:56             ` Diego Novillo
  2008-06-05 15:10               ` Jan Hubicka
  1 sibling, 1 reply; 61+ messages in thread
From: Diego Novillo @ 2008-06-04 20:56 UTC (permalink / raw)
  To: Kenneth Zadeck
  Cc: Ollie Wild, Chris Lattner, gcc, Jan Hubicka, Rafael Espindola,
	Robert Hundt

On Wed, Jun 4, 2008 at 16:03, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:

> However this was not the point of my mail. The point of my mail was whopr's
> design that seems to basically sacrifice almost all interprocedural analysis
> and transformation except for inlining in order to scale so as to be able to
> compile programs of such size that most of the gcc community (including the
> distros) will never see.

There is absolutely nothing in WHOPR's design that sacrifices IPA
transformations.  I have tried to explain this several times, but I
seem to have failed.

I will try one more time.

Suppose that you have a program with a callgraph in the million node
range and no way to hold it in memory.  With the current design, you
either can't run IPA because of memory/computing limitations or you
can start loading and unloading function bodies, types, symbols
on-demand as you go in and out of each node in the callgraph.

WHOPR simply adds another alternative, if you are willing to only run
summary-based transformations, we can split the analysis and
transformation phases in two such that you can parallelize the work
over a cluster or a large SMP.  That's it.  Nothing more.

All the other transformations may still be executed, nothing in the
design prohibits this.  If the program is small enough to fit on one
machine, WHOPR simply runs the way LTO operates today.  The only case
where that can't happen is when you committed to spreading this out
over a cluster.

> I only want to raise the point that whopr is
> at one end of a spectrum in which REAL tradeoffs are being made in the
> quality of compilation vs size of program handled and there there is a real
> possibility that being able to handle an entire program with these tradeoffs
> is going to yield the fastest program or a reasonable compilation time.

How is this detrimental to the rest of LTO?  Your point seems moot.
We are simply adding a new feature on top of the basic LTO machinery
that we are also helping to build.  I still don't see what you find so
objectionable about this.

Diego.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-04 20:46           ` Ian Lance Taylor
@ 2008-06-04 21:43             ` Nick Kledzik
  2008-06-05  0:01               ` Ian Lance Taylor
  0 siblings, 1 reply; 61+ messages in thread
From: Nick Kledzik @ 2008-06-04 21:43 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Chris Lattner, Diego Novillo, GCC Mailing List, Kenneth Zadeck,
	Jan Hubicka, Rafael Espindola, Ollie Wild, Robert Hundt,
	Devang Patel


On Jun 4, 2008, at 1:45 PM, Ian Lance Taylor wrote:
>>>>>
>>>>> The result is that the plugin is required to do
>>>>> symbol resolution itself.  This 1) loses one of the benefits of
>>>>> having
>>>>> the linker around; 2) will yield incorrect results when some non- 
>>>>> LTO
>>>>> object is linked in between LTO objects but redefines some earlier
>>>>> weak symbol.
>>>>
>>>> In the LLVM LTO model, the plugin only needs to know about its .o
>>>> files, and the linker uses this information to reason about symbol
>>>> merging etc.  The Mac OS X linker can even do dead code stripping
>>>> across Macho .o files and LLVM .bc files.
>>>
>>> To be clear, when I said object file here, I meant any input file.
>>> You may have understood that.
>>>
>>> In ELF you have to think about symbol overriding.  Let's say you  
>>> link
>>> a.o b.o c.o.  a.o has a reference to symbol S.  b.o has a strong
>>> definition.  c.o has a weak definition.  a.o and c.o have LTO
>>> information, b.o does not.  ELF requires that a.o call the symbol  
>>> from
>>> b.o, not the symbol from c.o.  I don't see how to make that work  
>>> with
>>> the LLVM interface.
>> This does work.  There are two parts to it.  First the linker's  
>> master
>> symbol
>> table sees the strong definition of S in b.o and the weak in c.o and
>> decides to use the strong one from b.o.  Second (because of that) the
>> linker
>> calls  lto_codegen_add_must_preserve_symbol("S"). The LTO engine then
>> sees it has a weak global function S and it cannot inline those.  Put
>> together
>> the LTO engine does generate a copy of S, but the linker throws it  
>> away
>> and uses the one from b.o.
>
> OK, for that case.  But are you asserting that this works in all
> cases?  Should I come up with other examples of mixing LTO objects
> with non-LTO objects using different types of symbols?
I don't claim our current implementation is bug free, but the lto  
interface
matches the Apple linker internal model, so we don't expect and have
not encountered any problems mixing mach-o and llvm bitcode files.

-Nick

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
       [not found]           ` <89069638-6D2B-4AE6-ACB3-99A2B09091BA@apple.com>
  2008-06-04 20:02             ` Ollie Wild
@ 2008-06-04 23:59             ` Diego Novillo
  1 sibling, 0 replies; 61+ messages in thread
From: Diego Novillo @ 2008-06-04 23:59 UTC (permalink / raw)
  To: Chris Lattner
  Cc: Ollie Wild, Kenneth Zadeck, gcc, Jan Hubicka, Rafael Espindola,
	Robert Hundt

On Wed, Jun 4, 2008 at 15:33, Chris Lattner <clattner@apple.com> wrote:

> Right, I understand that you design "stacks" on LTO.  It just seems strange
> to work on the advanced stuff before the basic GCC LTO stuff is close to
> being useful.

Not at all.  We are working on both fronts.


Diego.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-04 21:43             ` Nick Kledzik
@ 2008-06-05  0:01               ` Ian Lance Taylor
  2008-06-05  0:20                 ` Nick Kledzik
  0 siblings, 1 reply; 61+ messages in thread
From: Ian Lance Taylor @ 2008-06-05  0:01 UTC (permalink / raw)
  To: Nick Kledzik
  Cc: Chris Lattner, Diego Novillo, GCC Mailing List, Kenneth Zadeck,
	Jan Hubicka, Rafael Espindola, Ollie Wild, Robert Hundt,
	Devang Patel

Nick Kledzik <kledzik@apple.com> writes:

> I don't claim our current implementation is bug free, but the lto
> interface
> matches the Apple linker internal model, so we don't expect and have
> not encountered any problems mixing mach-o and llvm bitcode files.

Hmmm, OK, how about this example:

a.o: contains LTO information, refers to S
b.o: no LTO information, defines S
c.o: contains LTO information, defines S at version V, S/V is not hidden

In the absence of b.o, the reference to S in a.o will be resolved
against the definition of S in c.o.  In the presence of b.o, the
reference to S in a.o will be resolved against the definition of S in
b.o.

I suppose we could refuse to inline versioned symbols, but that
doesn't seem desirable since it is normally fine.

Ian

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-05  0:01               ` Ian Lance Taylor
@ 2008-06-05  0:20                 ` Nick Kledzik
  2008-06-05  0:43                   ` Ian Lance Taylor
  0 siblings, 1 reply; 61+ messages in thread
From: Nick Kledzik @ 2008-06-05  0:20 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Chris Lattner, Diego Novillo, GCC Mailing List, Kenneth Zadeck,
	Jan Hubicka, Rafael Espindola, Ollie Wild, Robert Hundt,
	Devang Patel


On Jun 4, 2008, at 5:00 PM, Ian Lance Taylor wrote:
> Nick Kledzik <kledzik@apple.com> writes:
>
>> I don't claim our current implementation is bug free, but the lto
>> interface
>> matches the Apple linker internal model, so we don't expect and have
>> not encountered any problems mixing mach-o and llvm bitcode files.
>
> Hmmm, OK, how about this example:
>
> a.o: contains LTO information, refers to S
> b.o: no LTO information, defines S
> c.o: contains LTO information, defines S at version V, S/V is not  
> hidden
>
> In the absence of b.o, the reference to S in a.o will be resolved
> against the definition of S in c.o.  In the presence of b.o, the
> reference to S in a.o will be resolved against the definition of S in
> b.o.
>
> I suppose we could refuse to inline versioned symbols, but that
> doesn't seem desirable since it is normally fine.


As Chris mentioned earlier today, the Apple tool chain does not  
support versioned symbols.
But if versioned symbols are a naming convention (that is everything  
is encoded in
the symbol name), then this would work the same as your previous  
example.  Namely,
the linker would coalesce away S in c.o, which in turns tell the LTO  
engine that it
can't inline/optimize away c.o's S and after LTO is done, the linker  
throws away
the LTO generated S and uses b.o's S instead.

-Nick


On Jun 4, 2008, at 9:29 AM, Chris Lattner wrote:
>> When I look at the LLVM interface as described on that web page, I  
>> see
>> these issues, all fixable:
>> * No support for symbol versioning.
>
> Very true.  I think it would great to work from a common model that  
> can be extended to support both compilers.  Having a unified  
> interface would be very useful, and we are happy to evolve the  
> interface to suit more general needs.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-05  0:20                 ` Nick Kledzik
@ 2008-06-05  0:43                   ` Ian Lance Taylor
  2008-06-05  1:09                     ` Nick Kledzik
  0 siblings, 1 reply; 61+ messages in thread
From: Ian Lance Taylor @ 2008-06-05  0:43 UTC (permalink / raw)
  To: Nick Kledzik
  Cc: Chris Lattner, Diego Novillo, GCC Mailing List, Kenneth Zadeck,
	Jan Hubicka, Rafael Espindola, Ollie Wild, Robert Hundt,
	Devang Patel

Nick Kledzik <kledzik@apple.com> writes:

> On Jun 4, 2008, at 5:00 PM, Ian Lance Taylor wrote:
>> Nick Kledzik <kledzik@apple.com> writes:
>>
>>> I don't claim our current implementation is bug free, but the lto
>>> interface
>>> matches the Apple linker internal model, so we don't expect and have
>>> not encountered any problems mixing mach-o and llvm bitcode files.
>>
>> Hmmm, OK, how about this example:
>>
>> a.o: contains LTO information, refers to S
>> b.o: no LTO information, defines S
>> c.o: contains LTO information, defines S at version V, S/V is not
>> hidden
>>
>> In the absence of b.o, the reference to S in a.o will be resolved
>> against the definition of S in c.o.  In the presence of b.o, the
>> reference to S in a.o will be resolved against the definition of S in
>> b.o.
>>
>> I suppose we could refuse to inline versioned symbols, but that
>> doesn't seem desirable since it is normally fine.
>
>
> As Chris mentioned earlier today, the Apple tool chain does not
> support versioned symbols.
> But if versioned symbols are a naming convention (that is everything
> is encoded in
> the symbol name), then this would work the same as your previous
> example.  Namely,
> the linker would coalesce away S in c.o, which in turns tell the LTO
> engine that it
> can't inline/optimize away c.o's S and after LTO is done, the linker
> throws away
> the LTO generated S and uses b.o's S instead.

Versioned symbols are not a naming convention, but they aren't all
that different from one.  Basically every symbol may have an optional
version, and when a symbol has a version the version may be hidden or
not.  A symbol definition with a hidden version may only be matched by
a symbol reference with that exact version.  A symbol definition with
a non-hidden version definition may be matched by a symbol reference
with the same name without a version.  This is most interesting in the
dynamic linker, of course.

How does the linker inform the plugin that the plugin is not permitted
to use c.o's S?

Ian

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-05  0:43                   ` Ian Lance Taylor
@ 2008-06-05  1:09                     ` Nick Kledzik
  2008-06-05  5:07                       ` Devang Patel
  2008-06-05  5:44                       ` [whopr] Design/implementation alternatives for the driver and WPA Ian Lance Taylor
  0 siblings, 2 replies; 61+ messages in thread
From: Nick Kledzik @ 2008-06-05  1:09 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Chris Lattner, Diego Novillo, GCC Mailing List, Kenneth Zadeck,
	Jan Hubicka, Rafael Espindola, Ollie Wild, Robert Hundt,
	Devang Patel


On Jun 4, 2008, at 5:39 PM, Ian Lance Taylor wrote:
> Nick Kledzik <kledzik@apple.com> writes:
>
>> On Jun 4, 2008, at 5:00 PM, Ian Lance Taylor wrote:
>>> Nick Kledzik <kledzik@apple.com> writes:
>>>
>>>> I don't claim our current implementation is bug free, but the lto
>>>> interface
>>>> matches the Apple linker internal model, so we don't expect and  
>>>> have
>>>> not encountered any problems mixing mach-o and llvm bitcode files.
>>>
>>> Hmmm, OK, how about this example:
>>>
>>> a.o: contains LTO information, refers to S
>>> b.o: no LTO information, defines S
>>> c.o: contains LTO information, defines S at version V, S/V is not
>>> hidden
>>>
>>> In the absence of b.o, the reference to S in a.o will be resolved
>>> against the definition of S in c.o.  In the presence of b.o, the
>>> reference to S in a.o will be resolved against the definition of S  
>>> in
>>> b.o.
>>>
>>> I suppose we could refuse to inline versioned symbols, but that
>>> doesn't seem desirable since it is normally fine.
>>
>>
>> As Chris mentioned earlier today, the Apple tool chain does not
>> support versioned symbols.
>> But if versioned symbols are a naming convention (that is everything
>> is encoded in
>> the symbol name), then this would work the same as your previous
>> example.  Namely,
>> the linker would coalesce away S in c.o, which in turns tell the LTO
>> engine that it
>> can't inline/optimize away c.o's S and after LTO is done, the linker
>> throws away
>> the LTO generated S and uses b.o's S instead.
>
> Versioned symbols are not a naming convention, but they aren't all
> that different from one.  Basically every symbol may have an optional
> version, and when a symbol has a version the version may be hidden or
> not.  A symbol definition with a hidden version may only be matched by
> a symbol reference with that exact version.  A symbol definition with
> a non-hidden version definition may be matched by a symbol reference
> with the same name without a version.  This is most interesting in the
> dynamic linker, of course.
>
> How does the linker inform the plugin that the plugin is not permitted
> to use c.o's S?
In the previous case where S was weak, the call to  
lto_codegen_add_must_preserve_symbol("S")
caused the LTO engine to know it could not inline S (because it was
a weak definition and used outside the LTO usage sphere). And then
after LTO was done, the linker threw away the LTO produced S and used
the one from c.o instead.

In this case S is a regular symbol.  So the previous trick won't  
work.  Probably
the best solution would be to add a new  lto_ API to tell the LTO  
engine to
ignore a definition is already has.  It would make more sense to use  
this
new API in the weak case too.

-Nick

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-05  1:09                     ` Nick Kledzik
@ 2008-06-05  5:07                       ` Devang Patel
  2008-06-05  5:43                         ` Ian Lance Taylor
  2008-06-05  5:44                       ` [whopr] Design/implementation alternatives for the driver and WPA Ian Lance Taylor
  1 sibling, 1 reply; 61+ messages in thread
From: Devang Patel @ 2008-06-05  5:07 UTC (permalink / raw)
  To: Nick Kledzik, Ian Lance Taylor
  Cc: Chris Lattner, Diego Novillo, GCC Mailing List, Kenneth Zadeck,
	Jan Hubicka, Rafael Espindola, Ollie Wild, Robert Hundt


On Jun 4, 2008, at 6:09 PM, Nick Kledzik wrote:

>
> On Jun 4, 2008, at 5:39 PM, Ian Lance Taylor wrote:
>> Nick Kledzik <kledzik@apple.com> writes:
>>
>>> On Jun 4, 2008, at 5:00 PM, Ian Lance Taylor wrote:
>>>> Nick Kledzik <kledzik@apple.com> writes:
>>>>
>>>>> I don't claim our current implementation is bug free, but the lto
>>>>> interface
>>>>> matches the Apple linker internal model, so we don't expect and  
>>>>> have
>>>>> not encountered any problems mixing mach-o and llvm bitcode files.
>>>>
>>>> Hmmm, OK, how about this example:
>>>>
>>>> a.o: contains LTO information, refers to S
>>>> b.o: no LTO information, defines S
>>>> c.o: contains LTO information, defines S at version V, S/V is not
>>>> hidden
>>>>
>>>> In the absence of b.o, the reference to S in a.o will be resolved
>>>> against the definition of S in c.o.  In the presence of b.o, the
>>>> reference to S in a.o will be resolved against the definition of  
>>>> S in
>>>> b.o.
>>>>
>>>> I suppose we could refuse to inline versioned symbols, but that
>>>> doesn't seem desirable since it is normally fine.
>>>
>>>
>>> As Chris mentioned earlier today, the Apple tool chain does not
>>> support versioned symbols.
>>> But if versioned symbols are a naming convention (that is everything
>>> is encoded in
>>> the symbol name), then this would work the same as your previous
>>> example.  Namely,
>>> the linker would coalesce away S in c.o, which in turns tell the LTO
>>> engine that it
>>> can't inline/optimize away c.o's S and after LTO is done, the linker
>>> throws away
>>> the LTO generated S and uses b.o's S instead.
>>
>> Versioned symbols are not a naming convention, but they aren't all
>> that different from one.  Basically every symbol may have an optional
>> version, and when a symbol has a version the version may be hidden or
>> not.  A symbol definition with a hidden version may only be matched  
>> by
>> a symbol reference with that exact version.  A symbol definition with
>> a non-hidden version definition may be matched by a symbol reference
>> with the same name without a version.  This is most interesting in  
>> the
>> dynamic linker, of course.
>>
>> How does the linker inform the plugin that the plugin is not  
>> permitted
>> to use c.o's S?
> In the previous case where S was weak, the call to  
> lto_codegen_add_must_preserve_symbol("S")
> caused the LTO engine to know it could not inline S (because it was
> a weak definition and used outside the LTO usage sphere).

weak definition is the deciding factor. Optimizer can inline a  
function at the call site irrespective of whether it is used outside  
LTO usage sphere or not. The outside the LTO sphere use determines  
whether to preserve the function body, when the function is inlined  
everywhere inside LTO usage sphere,  or not.

> And then
> after LTO was done, the linker threw away the LTO produced S and used
> the one from c.o instead.
>
> In this case S is a regular symbol.  So the previous trick won't  
> work.  Probably
> the best solution would be to add a new  lto_ API to tell the LTO  
> engine to
> ignore a definition is already has.  It would make more sense to use  
> this
> new API in the weak case too.

If the optimizer can handle the symbol versioning case when one  
definition with version is defined in the same source file as the  
reference then you don't need new API.

For example,

a.o : refers to S and defines S at version V.
b.o : defines S.

Is inliner, at compile time allowed to inline uses of S in a.o using  
the definition it has ?

-
Devang

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-05  5:07                       ` Devang Patel
@ 2008-06-05  5:43                         ` Ian Lance Taylor
  2008-06-05  6:09                           ` [whopr] plugin interface design Chris Lattner
  0 siblings, 1 reply; 61+ messages in thread
From: Ian Lance Taylor @ 2008-06-05  5:43 UTC (permalink / raw)
  To: Devang Patel; +Cc: Nick Kledzik, GCC Mailing List

[ trimming the CC list ]

Devang Patel <dpatel@apple.com> writes:

> If the optimizer can handle the symbol versioning case when one
> definition with version is defined in the same source file as the
> reference then you don't need new API.
>
> For example,
>
> a.o : refers to S and defines S at version V.
> b.o : defines S.
>
> Is inliner, at compile time allowed to inline uses of S in a.o using
> the definition it has ?

The compiler doesn't know about symbol versions.  The way they work is
that you give the symbol a name like S_V, and then use an assembly
level .symver directive to say that S_V is really S at version V.  So
false inlining doesn't really arise in a single source file, unless
you do something rather odd.

Ian

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-05  1:09                     ` Nick Kledzik
  2008-06-05  5:07                       ` Devang Patel
@ 2008-06-05  5:44                       ` Ian Lance Taylor
  1 sibling, 0 replies; 61+ messages in thread
From: Ian Lance Taylor @ 2008-06-05  5:44 UTC (permalink / raw)
  To: Nick Kledzik; +Cc: GCC Mailing List, Devang Patel

Nick Kledzik <kledzik@apple.com> writes:

> In this case S is a regular symbol.  So the previous trick won't
> work.  Probably
> the best solution would be to add a new  lto_ API to tell the LTO
> engine to
> ignore a definition is already has.  It would make more sense to use
> this
> new API in the weak case too.

I would like to propose a farther reaching change: for each undefined
symbol reference, tell LTO the location of the symbol definition which
should be used.  The linker has to develop this information anyhow
during the course of the link.

Ian

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] plugin interface design
  2008-06-05  5:43                         ` Ian Lance Taylor
@ 2008-06-05  6:09                           ` Chris Lattner
  2008-06-05 13:53                             ` Ian Lance Taylor
  0 siblings, 1 reply; 61+ messages in thread
From: Chris Lattner @ 2008-06-05  6:09 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Devang Patel, Nick Kledzik, GCC Mailing List

On Jun 4, 2008, at 10:39 PM, Ian Lance Taylor wrote:
> Devang Patel <dpatel@apple.com> writes:
>
>> If the optimizer can handle the symbol versioning case when one
>> definition with version is defined in the same source file as the
>> reference then you don't need new API.
>>
>> For example,
>>
>> a.o : refers to S and defines S at version V.
>> b.o : defines S.
>>
>> Is inliner, at compile time allowed to inline uses of S in a.o using
>> the definition it has ?
>
> The compiler doesn't know about symbol versions.  The way they work is
> that you give the symbol a name like S_V, and then use an assembly
> level .symver directive to say that S_V is really S at version V.  So
> false inlining doesn't really arise in a single source file, unless
> you do something rather odd.

If you plan to do link-time optimization, you need to be able to  
capture all "assembler-level" features in your IR, somehow.  Magic  
that gets splatted out by the assembly printer will ideally be changed  
to update the IR in some form.

LLVM LTO does exactly this.  The front-end produces LLVM IR and does  
no .s file printing at all.  This IR goes through optimizations and at  
-O3 and lower is then run through the code generator which produces  
a .s file.

At -O4, the difference is that the code generator is not run, so LLVM  
IR is written to disk.  When LTO is run, we then load the LLVM IR for  
all the LTO'able files and then run an LLVM Linker across them.  This  
does an LLVM IR level link step, which is aware of the semantics of  
weak symbols, and many other details that come up when linking two  
files together (however, it has no idea where those two files came  
from, no idea about archive resolution, etc).

I don't know if LLVM properly supports symbol versions on ELF systems,  
I would guess not yet.  Since symbol versions affect linking, the LLVM  
linker would have to have enough information to "do the right thing".

Once a fully linked LLVM IR file is produced, the total result is sent  
through LLVM optimization passes, which then do interprocedural and  
intraprocedural optimizations to continue improving the code.  After  
that, the normal LLVM code generator is run to produce a native form  
of the LTO'd module and the system linker uses that to continue linking.

I don't know how closely your plans follow this model.  If you think  
this approach is reasonable, you really do need to reflect things like  
symbol versions in your IR somehow.  This compiler must know about  
versions, and when it does, it is easy to avoid optimizations that are  
invalid for them.

-Chris

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-04 20:38         ` Nick Kledzik
  2008-06-04 20:46           ` Ian Lance Taylor
@ 2008-06-05  8:41           ` Rafael Espindola
  2008-06-05 14:00             ` Ian Lance Taylor
  1 sibling, 1 reply; 61+ messages in thread
From: Rafael Espindola @ 2008-06-05  8:41 UTC (permalink / raw)
  To: Nick Kledzik
  Cc: Ian Lance Taylor, Chris Lattner, Diego Novillo, GCC Mailing List,
	Kenneth Zadeck, Jan Hubicka, Ollie Wild, Robert Hundt,
	Devang Patel

>> In ELF you have to think about symbol overriding.  Let's say you link
>> a.o b.o c.o.  a.o has a reference to symbol S.  b.o has a strong
>> definition.  c.o has a weak definition.  a.o and c.o have LTO
>> information, b.o does not.  ELF requires that a.o call the symbol from
>> b.o, not the symbol from c.o.  I don't see how to make that work with
>> the LLVM interface.
>
> This does work.  There are two parts to it.  First the linker's master
> symbol
> table sees the strong definition of S in b.o and the weak in c.o and
> decides to use the strong one from b.o.  Second (because of that) the linker
> calls  lto_codegen_add_must_preserve_symbol("S"). The LTO engine then
> sees it has a weak global function S and it cannot inline those.  Put
> together
> the LTO engine does generate a copy of S, but the linker throws it away
> and uses the one from b.o.

Interesting. The use of lto_codegen_add_must_preserve_symbol is kind
of the opposite of what I had understood. What do you do in this case:

a.o: IL file that contains a reference to "f"
b.o: IL file that has a weak def of "f"

There is no strong definition. Can you inline f into the use in a.o?

> -Nick
>
>

Cheers,
-- 
Rafael Avila de Espindola

Google Ireland Ltd.
Gordon House
Barrow Street
Dublin 4
Ireland

Registered in Dublin, Ireland
Registration Number: 368047

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] plugin interface design
  2008-06-05  6:09                           ` [whopr] plugin interface design Chris Lattner
@ 2008-06-05 13:53                             ` Ian Lance Taylor
  2008-06-05 16:37                               ` Chris Lattner
  0 siblings, 1 reply; 61+ messages in thread
From: Ian Lance Taylor @ 2008-06-05 13:53 UTC (permalink / raw)
  To: Chris Lattner; +Cc: Devang Patel, Nick Kledzik, GCC Mailing List

Chris Lattner <clattner@apple.com> writes:

> I don't know how closely your plans follow this model.  If you think
> this approach is reasonable, you really do need to reflect things like
> symbol versions in your IR somehow.  This compiler must know about
> versions, and when it does, it is easy to avoid optimizations that are
> invalid for them.

Sure.  But here's the thing: the gcc LTO approach involves having a
regular object with a regular symbol table, and the IR is embedded in
the object.  In other words, we do know the symbol version
information: it's in the symbol table of the object.  And so what I'm
discussing is a way for the linker to communicate the relevant part of
that information to the compiler plugin.  The relevant part is: "this
undefined symbol reference in a.o is bound to this symbol definition
in b.o."  There is nothing else that the compiler needs to know.
(Actually, when we move on to applying LTO across shared library
boundaries we may also want to say something about the strength of the
binding.)

I appreciate the cleanliness and simplicity of your description.  I'm
trying to fill in an ugly edge.  The reality is that symbol versions
are expressed via assembly language pseudo-ops, both in C/C++ files
and in assembly code, and also via version scripts passed to the
linker.  To the limited extent that the compiler needs to be aware of
them, the linker needs to convey that information.  If we decree that
the information must be expressed directly in the compiler IR, then I
think we're looking at a considerably larger degree of ugliness.

Ian

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-05  8:41           ` Rafael Espindola
@ 2008-06-05 14:00             ` Ian Lance Taylor
  2008-06-05 16:44               ` Chris Lattner
  0 siblings, 1 reply; 61+ messages in thread
From: Ian Lance Taylor @ 2008-06-05 14:00 UTC (permalink / raw)
  To: Rafael Espindola
  Cc: Nick Kledzik, Chris Lattner, Diego Novillo, GCC Mailing List,
	Kenneth Zadeck, Jan Hubicka, Ollie Wild, Robert Hundt,
	Devang Patel

"Rafael Espindola" <espindola@google.com> writes:

> Interesting. The use of lto_codegen_add_must_preserve_symbol is kind
> of the opposite of what I had understood. What do you do in this case:
>
> a.o: IL file that contains a reference to "f"
> b.o: IL file that has a weak def of "f"
>
> There is no strong definition. Can you inline f into the use in a.o?

I don't know what LLVM does, but in principle, in ELF, you can do this
inlining when linking an executable, but not when linking a shared
library.  Actually, when linking a shared library, what matters is not
whether the definition of "f" is weak or not, but what the visibility
of 'f" is (default, hidden, protected, or internal).  And, of course,
the visibility of "f" can be set by link-time options (e.g.,
-Bsymbolic).

Ian

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-04 20:56             ` Diego Novillo
@ 2008-06-05 15:10               ` Jan Hubicka
  2008-06-05 15:23                 ` Diego Novillo
  0 siblings, 1 reply; 61+ messages in thread
From: Jan Hubicka @ 2008-06-05 15:10 UTC (permalink / raw)
  To: Diego Novillo
  Cc: Kenneth Zadeck, Ollie Wild, Chris Lattner, gcc, Jan Hubicka,
	Rafael Espindola, Robert Hundt

Hi,
I am jumping in somewhat late, as yesterday I was on meetings without
internet access. (and I probably will be offline again tomorrow)

I think that in basic terms we all mostly agree (we want to implement
optimization scheme that does not get everything into memory, we want to
parallelize the post-IPA copmilation).  Linker interface seems very fine
too.
> 
> WHOPR simply adds another alternative, if you are willing to only run
> summary-based transformations, we can split the analysis and
> transformation phases in two such that you can parallelize the work
> over a cluster or a large SMP.  That's it.  Nothing more.

I think one problem is that both repackaging and cherry picking as
described is very centric about application on inlining.  It is probably
quite clear now, that the list of optimizations we want to perform on
LTO scale is going to grow from basic inlining + aliasing combo quite
soon.  Especially that datastructure changes are starting to kick in.
We also would need to sanely support partial offlining, clonning, etc.

This IMO should be somehow considered.  It is quite possible to
implement all this based on summaries, but we need to think of
flexibility of the whole scheme and not overly limit it at least in the
current stages of implementation.  If, for example, we would end up with
difficulties to do struct-reorg style transformation that mvoes fields
within structure, we would run into problems very soon.

I personally always leaned to kind of repackaging scheme.  I've hoped
that with sanely designed LTO dumping scheme, this will be relatively
straighforward to implement: simply you re-use same serialized functions
as they are in the original .o files and replace function summaries by
transformation summaries, so we might pretty much re-use same
infrastructure.   With sane caching mechanizm to keeping unmodified
function bodies in memory in cooperation in GGC, the repackaging stage
should be possible to implement as simple pass through the callgraph
writting the selected functions to the output file.

One advantage also is that local but non-trivial changes to program can
be done at LTO decision time that would simplify the inter-IPA-pass
iteraction that seems the most scary issue here.

Honza

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-05 15:10               ` Jan Hubicka
@ 2008-06-05 15:23                 ` Diego Novillo
  0 siblings, 0 replies; 61+ messages in thread
From: Diego Novillo @ 2008-06-05 15:23 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Kenneth Zadeck, Ollie Wild, Chris Lattner, gcc, Jan Hubicka,
	Rafael Espindola, Robert Hundt

On Thu, Jun 5, 2008 at 11:09, Jan Hubicka <hubicka@ucw.cz> wrote:

> I think one problem is that both repackaging and cherry picking as
> described is very centric about application on inlining.

No, that's simply the main application for the initial implementation.
 Any other summary-based transformation can be supported the same way.
 Optimizations that are not summary-based can be done the way they're
done today.  All that happens is that they won't be able take
advantage of the partitioning and distribution since WPA and LTRANS
will be executed together.

And of course, even summary-based transformations can be done the same
way they are done today.  The scaling aspects of WHOPR should only
kick in via a special option, or even via heuristics.

> I personally always leaned to kind of repackaging scheme.  I've hoped
> that with sanely designed LTO dumping scheme, this will be relatively
> straighforward to implement: simply you re-use same serialized functions
> as they are in the original .o files and replace function summaries by
> transformation summaries, so we might pretty much re-use same
> infrastructure.   With sane caching mechanizm to keeping unmodified
> function bodies in memory in cooperation in GGC, the repackaging stage
> should be possible to implement as simple pass through the callgraph
> writting the selected functions to the output file.

Sure.  All this is possible and we shouldn't break it.

Diego.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] plugin interface design
  2008-06-05 13:53                             ` Ian Lance Taylor
@ 2008-06-05 16:37                               ` Chris Lattner
  2008-06-05 17:39                                 ` Ian Lance Taylor
  0 siblings, 1 reply; 61+ messages in thread
From: Chris Lattner @ 2008-06-05 16:37 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Devang Patel, Nick Kledzik, GCC Mailing List


On Jun 5, 2008, at 6:51 AM, Ian Lance Taylor wrote:

> Chris Lattner <clattner@apple.com> writes:
>
>> I don't know how closely your plans follow this model.  If you think
>> this approach is reasonable, you really do need to reflect things  
>> like
>> symbol versions in your IR somehow.  This compiler must know about
>> versions, and when it does, it is easy to avoid optimizations that  
>> are
>> invalid for them.
>
> Sure.  But here's the thing: the gcc LTO approach involves having a
> regular object with a regular symbol table, and the IR is embedded in
> the object.  In other words, we do know the symbol version
> information: it's in the symbol table of the object.

Wow, that seems incredibly limiting.  This means that your LTO either  
has to:

1) treat the object header as part of the IR, or
2) avoid making any changes that would affect exported symbols

Is that right?  Why doesn't the "LTO reader" just read the symbol info  
from the ELF header and reflect it into the trees somehow?

-Chris

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-05 14:00             ` Ian Lance Taylor
@ 2008-06-05 16:44               ` Chris Lattner
  2008-06-05 17:44                 ` Ian Lance Taylor
  0 siblings, 1 reply; 61+ messages in thread
From: Chris Lattner @ 2008-06-05 16:44 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Rafael Espindola, Nick Kledzik, Diego Novillo, GCC Mailing List,
	Kenneth Zadeck, Jan Hubicka, Ollie Wild, Robert Hundt,
	Devang Patel

On Jun 5, 2008, at 6:59 AM, Ian Lance Taylor wrote:

> "Rafael Espindola" <espindola@google.com> writes:
>
>> Interesting. The use of lto_codegen_add_must_preserve_symbol is kind
>> of the opposite of what I had understood. What do you do in this  
>> case:
>>
>> a.o: IL file that contains a reference to "f"
>> b.o: IL file that has a weak def of "f"
>>
>> There is no strong definition. Can you inline f into the use in a.o?
>
> I don't know what LLVM does, but in principle, in ELF, you can do this
> inlining when linking an executable, but not when linking a shared
> library.  Actually, when linking a shared library, what matters is not
> whether the definition of "f" is weak or not, but what the visibility
> of 'f" is (default, hidden, protected, or internal).  And, of course,
> the visibility of "f" can be set by link-time options (e.g.,
> -Bsymbolic).

In LLVM LTO, the model is that the linker is the one that knows about  
visibility.  The problem is that 'hidden' is not sufficient to capture  
visibility info when mixing LTO modules with native ones.  If you  
have: [a-c].c and compile [ab].c with LTO and c.c without, any hidden  
symbols should be visible outside the [ab].o LTO region.

LLVM LTO handles this by marking symbols "internal" (aka static, aka  
not TREE_PUBLIC, whatever) when the symbol is not visible outside the  
LTO scope.  This allows the optimizers to go crazy and hack away at  
the symbols, but only when safe.

'Weakness' only matters when a symbol is exported from the LTO scope,  
so 'weak' and 'visibility' are orthogonal.

-Chris

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] plugin interface design
  2008-06-05 16:37                               ` Chris Lattner
@ 2008-06-05 17:39                                 ` Ian Lance Taylor
  2008-06-07 18:31                                   ` Chris Lattner
  0 siblings, 1 reply; 61+ messages in thread
From: Ian Lance Taylor @ 2008-06-05 17:39 UTC (permalink / raw)
  To: Chris Lattner; +Cc: Devang Patel, Nick Kledzik, GCC Mailing List

Chris Lattner <clattner@apple.com> writes:

> On Jun 5, 2008, at 6:51 AM, Ian Lance Taylor wrote:
>
>> Chris Lattner <clattner@apple.com> writes:
>>
>>> I don't know how closely your plans follow this model.  If you think
>>> this approach is reasonable, you really do need to reflect things
>>> like
>>> symbol versions in your IR somehow.  This compiler must know about
>>> versions, and when it does, it is easy to avoid optimizations that
>>> are
>>> invalid for them.
>>
>> Sure.  But here's the thing: the gcc LTO approach involves having a
>> regular object with a regular symbol table, and the IR is embedded in
>> the object.  In other words, we do know the symbol version
>> information: it's in the symbol table of the object.
>
> Wow, that seems incredibly limiting.  This means that your LTO either
> has to:
>
> 1) treat the object header as part of the IR, or
> 2) avoid making any changes that would affect exported symbols
>
> Is that right?  Why doesn't the "LTO reader" just read the symbol info  
> from the ELF header and reflect it into the trees somehow?

That would be fine.  It would require teaching the compiler about
symbol versioning and resolution rules which the linker already knows.
I sort of think that is unnecessary.  But I'm not opposed to it.

Of course there is the issue that some of this information also comes
from linker command line options.  That also has to be fed into the IR.

For example, earlier Nick suggested that LLVM will not inline a weak
symbol.  With ELF it is actually OK to inline a weak symbol when
generating an executable.  It is not OK when generating a shared
library, unless -Bsymbolic was used on the linker command line.  We
could represent these sorts of details directly in the compiler IR.
But I don't see a big advantage to doing so.

I'm proposing, instead, that the linker inform the compiler plugin
about this information based on link-time information.  That is a way
of representing it in the IR, of course.  But it seems to me to be
somewhat more pragmatic.

Incidentally, your choice 2 above doesn't follow.  The LTO compiler is
going to pass a new object file(s) back to the linker.  It doesn't
have to have the same set of exported symbols, except in cases where
the linker has directed that some symbol must be available.

Ian

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-05 16:44               ` Chris Lattner
@ 2008-06-05 17:44                 ` Ian Lance Taylor
  2008-06-05 18:50                   ` Nick Kledzik
  0 siblings, 1 reply; 61+ messages in thread
From: Ian Lance Taylor @ 2008-06-05 17:44 UTC (permalink / raw)
  To: Chris Lattner
  Cc: Rafael Espindola, Nick Kledzik, Diego Novillo, GCC Mailing List,
	Kenneth Zadeck, Jan Hubicka, Ollie Wild, Robert Hundt,
	Devang Patel

Chris Lattner <clattner@apple.com> writes:

> LLVM LTO handles this by marking symbols "internal" (aka static, aka
> not TREE_PUBLIC, whatever) when the symbol is not visible outside the
> LTO scope.  This allows the optimizers to go crazy and hack away at
> the symbols, but only when safe.

How does the linker do this?  Are you saying that when generating a
shared library, the linker calls lto_codegen_add_must_preserve_symbol
for every externally visible symbol?

How does the linker tell LTO that a symbol may be inlined, but must
also be externally visible?

Ian

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-05 17:44                 ` Ian Lance Taylor
@ 2008-06-05 18:50                   ` Nick Kledzik
  2008-06-05 21:03                     ` Ian Lance Taylor
  0 siblings, 1 reply; 61+ messages in thread
From: Nick Kledzik @ 2008-06-05 18:50 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Chris Lattner, Rafael Espindola, Diego Novillo, GCC Mailing List,
	Kenneth Zadeck, Jan Hubicka, Ollie Wild, Robert Hundt,
	Devang Patel


On Jun 5, 2008, at 10:43 AM, Ian Lance Taylor wrote:
> Chris Lattner <clattner@apple.com> writes:
>
>> LLVM LTO handles this by marking symbols "internal" (aka static, aka
>> not TREE_PUBLIC, whatever) when the symbol is not visible outside the
>> LTO scope.  This allows the optimizers to go crazy and hack away at
>> the symbols, but only when safe.
>
> How does the linker do this?  Are you saying that when generating a
> shared library, the linker calls lto_codegen_add_must_preserve_symbol
> for every externally visible symbol?
Yes.

> How does the linker tell LTO that a symbol may be inlined, but must
> also be externally visible?
The linker just tells LTO which symbols must remain.  The LTO engine
is free to inline anything that would improve codegen, with the  
exception
that any weak definition that must remain (preserved) cannot be inlined.

-Nick

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-05 18:50                   ` Nick Kledzik
@ 2008-06-05 21:03                     ` Ian Lance Taylor
  2008-06-05 21:47                       ` Chris Lattner
  0 siblings, 1 reply; 61+ messages in thread
From: Ian Lance Taylor @ 2008-06-05 21:03 UTC (permalink / raw)
  To: Nick Kledzik
  Cc: Chris Lattner, Rafael Espindola, Diego Novillo, GCC Mailing List,
	Kenneth Zadeck, Jan Hubicka, Ollie Wild, Robert Hundt,
	Devang Patel

Nick Kledzik <kledzik@apple.com> writes:

>> How does the linker tell LTO that a symbol may be inlined, but must
>> also be externally visible?
> The linker just tells LTO which symbols must remain.  The LTO engine
> is free to inline anything that would improve codegen, with the
> exception
> that any weak definition that must remain (preserved) cannot be inlined.

I'll just note that that isn't optimal for ELF when producing an
executable.

Ian

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-05 21:03                     ` Ian Lance Taylor
@ 2008-06-05 21:47                       ` Chris Lattner
  2008-06-06  1:22                         ` Ian Lance Taylor
  0 siblings, 1 reply; 61+ messages in thread
From: Chris Lattner @ 2008-06-05 21:47 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Nick Kledzik, Rafael Espindola, Diego Novillo, GCC Mailing List,
	Kenneth Zadeck, Jan Hubicka, Ollie Wild, Robert Hundt,
	Devang Patel


On Jun 5, 2008, at 2:03 PM, Ian Lance Taylor wrote:

> Nick Kledzik <kledzik@apple.com> writes:
>
>>> How does the linker tell LTO that a symbol may be inlined, but must
>>> also be externally visible?
>> The linker just tells LTO which symbols must remain.  The LTO engine
>> is free to inline anything that would improve codegen, with the
>> exception
>> that any weak definition that must remain (preserved) cannot be  
>> inlined.
>
> I'll just note that that isn't optimal for ELF when producing an
> executable.

Why? Because you have to touch (worst case) every symbol?  The cost of  
doing LTO *dramatically* dwarfs the cost of touching symbols  
once.  :)  You're right this could be improved, and we're actively  
working on it... but it seems like a strange thing to worry about vs  
correctness in all cases.

-Chris

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-05 21:47                       ` Chris Lattner
@ 2008-06-06  1:22                         ` Ian Lance Taylor
  2008-06-07 18:34                           ` Chris Lattner
  0 siblings, 1 reply; 61+ messages in thread
From: Ian Lance Taylor @ 2008-06-06  1:22 UTC (permalink / raw)
  To: Chris Lattner
  Cc: Nick Kledzik, Rafael Espindola, Diego Novillo, GCC Mailing List,
	Kenneth Zadeck, Jan Hubicka, Ollie Wild, Robert Hundt,
	Devang Patel

Chris Lattner <clattner@apple.com> writes:

> On Jun 5, 2008, at 2:03 PM, Ian Lance Taylor wrote:
>
>> Nick Kledzik <kledzik@apple.com> writes:
>>
>>>> How does the linker tell LTO that a symbol may be inlined, but must
>>>> also be externally visible?
>>> The linker just tells LTO which symbols must remain.  The LTO engine
>>> is free to inline anything that would improve codegen, with the
>>> exception
>>> that any weak definition that must remain (preserved) cannot be
>>> inlined.
>>
>> I'll just note that that isn't optimal for ELF when producing an
>> executable.
>
> Why? Because you have to touch (worst case) every symbol?  The cost of
> doing LTO *dramatically* dwarfs the cost of touching symbols  once.
> :)  You're right this could be improved, and we're actively  working
> on it... but it seems like a strange thing to worry about vs
> correctness in all cases.

Whoops, sorry, I meant the other thing.  Not inlining any weak
definition that must remain is not optimal.  When linking an
executable, it is perfectly OK to inline a weak function, even if the
weak symbol is required to remain in the final output file.  In
general if the symbol is known to be bound locally, then it is OK to
inline it.  This is separate from the question of whether the symbol
is visible externally.

Ian

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] plugin interface design
  2008-06-05 17:39                                 ` Ian Lance Taylor
@ 2008-06-07 18:31                                   ` Chris Lattner
  0 siblings, 0 replies; 61+ messages in thread
From: Chris Lattner @ 2008-06-07 18:31 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Devang Patel, Nick Kledzik, GCC Mailing List

On Jun 5, 2008, at 10:38 AM, Ian Lance Taylor wrote:
>>> Sure.  But here's the thing: the gcc LTO approach involves having a
>>> regular object with a regular symbol table, and the IR is embedded  
>>> in
>>> the object.  In other words, we do know the symbol version
>>> information: it's in the symbol table of the object.
>>
>> Wow, that seems incredibly limiting.  This means that your LTO either
>> has to:
>>
>> 1) treat the object header as part of the IR, or
>> 2) avoid making any changes that would affect exported symbols
>>
>> Is that right?  Why doesn't the "LTO reader" just read the symbol  
>> info
>> from the ELF header and reflect it into the trees somehow?
>
> For example, earlier Nick suggested that LLVM will not inline a weak
> symbol.  With ELF it is actually OK to inline a weak symbol when
> generating an executable.  It is not OK when generating a shared
> library, unless -Bsymbolic was used on the linker command line.  We
> could represent these sorts of details directly in the compiler IR.
> But I don't see a big advantage to doing so.

Sure, IMO that information should be reflected in the IR.  As you  
know, I have a strong bias towards making the IR fully self contained  
and self describing.  You're right that this may not be absolutely  
required though.

> Incidentally, your choice 2 above doesn't follow.  The LTO compiler is
> going to pass a new object file(s) back to the linker.  It doesn't
> have to have the same set of exported symbols, except in cases where
> the linker has directed that some symbol must be available.

If you only have one IPO pass, it doesn't matter.  If you have two or  
more passes then the first pass needs some way to communicate the  
changes it made to the second and later passes.  Presumably you don't  
want to be hacking on the input object files, and the output object  
file doesn't exist yet.

-Chris

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-06  1:22                         ` Ian Lance Taylor
@ 2008-06-07 18:34                           ` Chris Lattner
  0 siblings, 0 replies; 61+ messages in thread
From: Chris Lattner @ 2008-06-07 18:34 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Nick Kledzik, Rafael Espindola, Diego Novillo, GCC Mailing List,
	Kenneth Zadeck, Jan Hubicka, Ollie Wild, Robert Hundt,
	Devang Patel

On Jun 5, 2008, at 6:18 PM, Ian Lance Taylor wrote:
>>>>> How does the linker tell LTO that a symbol may be inlined, but  
>>>>> must
>>>>> also be externally visible?
>>>> The linker just tells LTO which symbols must remain.  The LTO  
>>>> engine
>>>> is free to inline anything that would improve codegen, with the
>>>> exception
>>>> that any weak definition that must remain (preserved) cannot be
>>>> inlined.
>>>
>>> I'll just note that that isn't optimal for ELF when producing an
>>> executable.
>>
>> Why? Because you have to touch (worst case) every symbol?  The cost  
>> of
>> doing LTO *dramatically* dwarfs the cost of touching symbols  once.
>> :)  You're right this could be improved, and we're actively  working
>> on it... but it seems like a strange thing to worry about vs
>> correctness in all cases.
>
> Whoops, sorry, I meant the other thing.  Not inlining any weak
> definition that must remain is not optimal.  When linking an
> executable, it is perfectly OK to inline a weak function, even if the
> weak symbol is required to remain in the final output file.  In
> general if the symbol is known to be bound locally, then it is OK to
> inline it.  This is separate from the question of whether the symbol
> is visible externally.

This isn't an issue in general.  Just add a new linkage type that is  
"weak but allows inlining" or whatever semantics you want.  There is  
no great need for your compiler IR to match the linkage model.  LLVM  
IR currently encompasses the linkage types used in MachO as well as  
most in ELF, but it also has types that are neither.  The "linkonce"  
linkage type in LLVM is most similiar to "non-external weak" linkage  
in ELF combined with some random flags that hang out in GCC's cgraph  
nodes.  If a symbol with that linkage ends up in a .o file, ELF sees  
it as 'non-external weak' linkage.

-Chris

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-06-03 16:46 [whopr] Design/implementation alternatives for the driver and WPA Diego Novillo
                   ` (2 preceding siblings ...)
  2008-06-04 16:31 ` Mark Mitchell
@ 2008-07-04  3:31 ` Cary Coutant
  2008-07-04  6:28   ` Ian Lance Taylor
  2008-07-04 13:43   ` Rafael Espindola
  3 siblings, 2 replies; 61+ messages in thread
From: Cary Coutant @ 2008-07-04  3:31 UTC (permalink / raw)
  To: Diego Novillo
  Cc: gcc, Kenneth Zadeck, Jan Hubicka, Rafael Espindola, Ollie Wild,
	Robert Hundt

> We've started working on the driver and WPA components for whopr.
> These are some of our initial thoughts and implementation strategy.  I
> have linked these to the WHOPR page as well.  I'm hoping we can
> discuss these at the Summit BoF, so I'm posting them now to start the
> discussion.

I've updated the WHOPR Driver wiki page with our latest thoughts on
the plug-in interface:

 http://gcc.gnu.org/wiki/whopr/driver

-cary

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-07-04  3:31 ` Cary Coutant
@ 2008-07-04  6:28   ` Ian Lance Taylor
  2008-07-04 22:58     ` Daniel Jacobowitz
  2008-07-06  7:30     ` Cary Coutant
  2008-07-04 13:43   ` Rafael Espindola
  1 sibling, 2 replies; 61+ messages in thread
From: Ian Lance Taylor @ 2008-07-04  6:28 UTC (permalink / raw)
  To: Cary Coutant
  Cc: Diego Novillo, gcc, Kenneth Zadeck, Jan Hubicka,
	Rafael Espindola, Ollie Wild, Robert Hundt

"Cary Coutant" <ccoutant@google.com> writes:

>> We've started working on the driver and WPA components for whopr.
>> These are some of our initial thoughts and implementation strategy.  I
>> have linked these to the WHOPR page as well.  I'm hoping we can
>> discuss these at the Summit BoF, so I'm posting them now to start the
>> discussion.
>
> I've updated the WHOPR Driver wiki page with our latest thoughts on
> the plug-in interface:
>
>  http://gcc.gnu.org/wiki/whopr/driver

A few comments.

* "End of first pass" may be a little gold specific.  Perhaps it
  should be called something like "after all input files have been
  seen."

* The linker does normally copy unrecognized sections with the
  SHF_ALLOC bit clear to the output file.  It doesn't allocate address
  space for them, but it does copy them.  I think this follows the ELF
  ABI.  I don't know of any generic way to direct the linker to not
  copy sections to the output file.

* Do we need to worry about the type of the symbol in the "add
  symbols" interface?  For example, what about a TLS symbol?  Also,
  when the GNU linker sees a common symbol in a regular object and a
  symbol with the same name in a shared library, the action depends on
  the type of the symbol in the shared library.  For STT_OBJECT, the
  common symbol becomes an undefined reference to the shared library.
  For STT_FUNCTION, it does not.  Gold does not currently behave this
  way--the common symbol always overrides.  But in any case, there is
  some precedent for worrying about symbol type.

* The command line arguments should explicitly be placed in the
  transfer vector in the order in which they appear on the command
  line.

* Type names ending in "_t" are reserved by POSIX.  We shouldn't use
  them (I'm looking at ld_plugin_status_t).

* GOLD_VERSION should perhaps say something about the format of the
  string.

* I guess that having the message hook take a va_list is most
  flexible, but it is inconvenient for typical uses.  Taking a
  variable number of arguments would be more convenient.  Or it might
  be reasonable to just take a string, and push formatting to the
  plugin.

Ian

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-07-04  3:31 ` Cary Coutant
  2008-07-04  6:28   ` Ian Lance Taylor
@ 2008-07-04 13:43   ` Rafael Espindola
  2008-07-06 14:22     ` Cary Coutant
  1 sibling, 1 reply; 61+ messages in thread
From: Rafael Espindola @ 2008-07-04 13:43 UTC (permalink / raw)
  To: Cary Coutant
  Cc: Diego Novillo, gcc, Kenneth Zadeck, Jan Hubicka, Ollie Wild,
	Robert Hundt

> I've updated the WHOPR Driver wiki page with our latest thoughts on
> the plug-in interface:
>
>  http://gcc.gnu.org/wiki/whopr/driver

Very nice! Just one comment:

On the "claim file", can you also pass the "file" size in the case it
is inside an archive?


> -cary
>


Cheers,
-- 
Rafael Avila de Espindola

Google Ireland Ltd.
Gordon House
Barrow Street
Dublin 4
Ireland

Registered in Dublin, Ireland
Registration Number: 368047

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and  WPA
  2008-07-04  6:28   ` Ian Lance Taylor
@ 2008-07-04 22:58     ` Daniel Jacobowitz
  2008-07-06  7:30     ` Cary Coutant
  1 sibling, 0 replies; 61+ messages in thread
From: Daniel Jacobowitz @ 2008-07-04 22:58 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Cary Coutant, Diego Novillo, gcc, Kenneth Zadeck, Jan Hubicka,
	Rafael Espindola, Ollie Wild, Robert Hundt

On Thu, Jul 03, 2008 at 09:48:11PM -0700, Ian Lance Taylor wrote:
> * The linker does normally copy unrecognized sections with the
>   SHF_ALLOC bit clear to the output file.  It doesn't allocate address
>   space for them, but it does copy them.  I think this follows the ELF
>   ABI.  I don't know of any generic way to direct the linker to not
>   copy sections to the output file.

Didn't someone say on the gABI list recently that they had a bit for
this?  Ah, H. J. proposed it and Rod Evans from Sun said they already
had an SHF_EXCLUDE:

http://groups.google.com/group/generic-abi/browse_thread/thread/5cf669951cb2eef1

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-07-04  6:28   ` Ian Lance Taylor
  2008-07-04 22:58     ` Daniel Jacobowitz
@ 2008-07-06  7:30     ` Cary Coutant
  2008-07-07  6:13       ` Ian Lance Taylor
  1 sibling, 1 reply; 61+ messages in thread
From: Cary Coutant @ 2008-07-06  7:30 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Diego Novillo, gcc, Kenneth Zadeck, Jan Hubicka,
	Rafael Espindola, Ollie Wild, Robert Hundt

> * "End of first pass" may be a little gold specific.  Perhaps it
>  should be called something like "after all input files have been
>  seen."

Sure. It seems to me that the pass 1, middle, pass 2 breakdown is
pretty common for linkers, though perhaps not universal. I'll find a
better name (I wasn't really happy with this name anyway.)

> * The linker does normally copy unrecognized sections with the
>  SHF_ALLOC bit clear to the output file.  It doesn't allocate address
>  space for them, but it does copy them.  I think this follows the ELF
>  ABI.  I don't know of any generic way to direct the linker to not
>  copy sections to the output file.

OK, as Daniel suggested, we could have the compiler set the
SHF_EXCLUDE bit as well for those sections, and add support for that
in gold (if it's not already there).

> * Do we need to worry about the type of the symbol in the "add
>  symbols" interface?  For example, what about a TLS symbol?  Also,
>  when the GNU linker sees a common symbol in a regular object and a
>  symbol with the same name in a shared library, the action depends on
>  the type of the symbol in the shared library.  For STT_OBJECT, the
>  common symbol becomes an undefined reference to the shared library.
>  For STT_FUNCTION, it does not.  Gold does not currently behave this
>  way--the common symbol always overrides.  But in any case, there is
>  some precedent for worrying about symbol type.

I don't think so, but I'll take a closer look. I think we don't really
need to worry about the type of the symbol until we get the real .o
file.

> * The command line arguments should explicitly be placed in the
>  transfer vector in the order in which they appear on the command
>  line.

OK.

> * Type names ending in "_t" are reserved by POSIX.  We shouldn't use
>  them (I'm looking at ld_plugin_status_t).

Oops, forgot about that.

> * GOLD_VERSION should perhaps say something about the format of the
>  string.

OK. What would be reasonable to say here? Just a string of the form
"n.m"? Is it reasonable to require that later versions are lexically
greater than earlier versions (e.g., can't have "1.9" then "1.10"), or
is it OK to require parsing the string to do comparisons?

> * I guess that having the message hook take a va_list is most
>  flexible, but it is inconvenient for typical uses.  Taking a
>  variable number of arguments would be more convenient.  Or it might
>  be reasonable to just take a string, and push formatting to the
>  plugin.

Yeah, I almost put "..." there instead. Probably better than va_list.

-cary

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-07-04 13:43   ` Rafael Espindola
@ 2008-07-06 14:22     ` Cary Coutant
  0 siblings, 0 replies; 61+ messages in thread
From: Cary Coutant @ 2008-07-06 14:22 UTC (permalink / raw)
  To: Rafael Espindola
  Cc: Diego Novillo, gcc, Kenneth Zadeck, Jan Hubicka, Ollie Wild,
	Robert Hundt

> On the "claim file", can you also pass the "file" size in the case it
> is inside an archive?

Good idea. Will do.

-cary

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [whopr] Design/implementation alternatives for the driver and WPA
  2008-07-06  7:30     ` Cary Coutant
@ 2008-07-07  6:13       ` Ian Lance Taylor
  0 siblings, 0 replies; 61+ messages in thread
From: Ian Lance Taylor @ 2008-07-07  6:13 UTC (permalink / raw)
  To: Cary Coutant; +Cc: gcc

"Cary Coutant" <ccoutant@google.com> writes:

>> * GOLD_VERSION should perhaps say something about the format of the
>>  string.
>
> OK. What would be reasonable to say here? Just a string of the form
> "n.m"? Is it reasonable to require that later versions are lexically
> greater than earlier versions (e.g., can't have "1.9" then "1.10"), or
> is it OK to require parsing the string to do comparisons?

I think we should either do a n.m[.o] string as you suggest, and not
say anything about lexical requirements--that is require parsing the
string--or we should say it's a number, e.g., n * 100 + m.

Ian

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2008-07-07  5:16 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-03 16:46 [whopr] Design/implementation alternatives for the driver and WPA Diego Novillo
2008-06-04  2:27 ` Chris Lattner
2008-06-04  7:28   ` Rafael Espindola
2008-06-04 16:34     ` Chris Lattner
2008-06-04 16:48       ` Rafael Espindola
2008-06-04 13:00   ` Diego Novillo
2008-06-04 15:28     ` Kenneth Zadeck
2008-06-04 15:54       ` Ian Lance Taylor
2008-06-04 16:50         ` Kenneth Zadeck
2008-06-04 17:05           ` Diego Novillo
2008-06-04 17:37           ` Ian Lance Taylor
2008-06-04 16:15       ` Chris Lattner
     [not found]         ` <65dd6fd50806041223l1871ecfbh384aa175c3da0645@mail.gmail.com>
2008-06-04 19:30           ` Fwd: " Ollie Wild
     [not found]           ` <89069638-6D2B-4AE6-ACB3-99A2B09091BA@apple.com>
2008-06-04 20:02             ` Ollie Wild
2008-06-04 23:59             ` Diego Novillo
2008-06-04 20:03           ` Kenneth Zadeck
2008-06-04 20:30             ` Ian Lance Taylor
2008-06-04 20:56             ` Diego Novillo
2008-06-05 15:10               ` Jan Hubicka
2008-06-05 15:23                 ` Diego Novillo
2008-06-04 14:28   ` Ian Lance Taylor
2008-06-04 16:29     ` Chris Lattner
2008-06-04 16:41       ` Chris Lattner
2008-06-04 18:48       ` Devang Patel
2008-06-04 19:45       ` Ian Lance Taylor
2008-06-04 20:38         ` Nick Kledzik
2008-06-04 20:46           ` Ian Lance Taylor
2008-06-04 21:43             ` Nick Kledzik
2008-06-05  0:01               ` Ian Lance Taylor
2008-06-05  0:20                 ` Nick Kledzik
2008-06-05  0:43                   ` Ian Lance Taylor
2008-06-05  1:09                     ` Nick Kledzik
2008-06-05  5:07                       ` Devang Patel
2008-06-05  5:43                         ` Ian Lance Taylor
2008-06-05  6:09                           ` [whopr] plugin interface design Chris Lattner
2008-06-05 13:53                             ` Ian Lance Taylor
2008-06-05 16:37                               ` Chris Lattner
2008-06-05 17:39                                 ` Ian Lance Taylor
2008-06-07 18:31                                   ` Chris Lattner
2008-06-05  5:44                       ` [whopr] Design/implementation alternatives for the driver and WPA Ian Lance Taylor
2008-06-05  8:41           ` Rafael Espindola
2008-06-05 14:00             ` Ian Lance Taylor
2008-06-05 16:44               ` Chris Lattner
2008-06-05 17:44                 ` Ian Lance Taylor
2008-06-05 18:50                   ` Nick Kledzik
2008-06-05 21:03                     ` Ian Lance Taylor
2008-06-05 21:47                       ` Chris Lattner
2008-06-06  1:22                         ` Ian Lance Taylor
2008-06-07 18:34                           ` Chris Lattner
     [not found]   ` <65dd6fd50806032310u2bda0953qb911e3ccfe3f305e@mail.gmail.com>
2008-06-04 19:29     ` Fwd: " Ollie Wild
2008-06-04 14:45 ` Ian Lance Taylor
2008-06-04 14:48   ` Diego Novillo
2008-06-04 15:28   ` Rafael Espindola
2008-06-04 16:31 ` Mark Mitchell
2008-07-04  3:31 ` Cary Coutant
2008-07-04  6:28   ` Ian Lance Taylor
2008-07-04 22:58     ` Daniel Jacobowitz
2008-07-06  7:30     ` Cary Coutant
2008-07-07  6:13       ` Ian Lance Taylor
2008-07-04 13:43   ` Rafael Espindola
2008-07-06 14:22     ` Cary Coutant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).