Questions about LTO infrastructure and pragma omp target

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Questions about LTO infrastructure and pragma omp target
@ 2013-08-15 13:44 Ilya Verbin
  2013-08-15 15:00 ` Jakub Jelinek
  0 siblings, 1 reply; 25+ messages in thread
From: Ilya Verbin @ 2013-08-15 13:44 UTC (permalink / raw)
  To: jakub, hubicka, rth, kirill.yukhin; +Cc: gcc

Hi All,

I'm trying to figure out how LTO infrastructure works on a high level.
I want to make sure that I understand this correctly.  Could you please
help me with that?

1.  Execution flow.  As far as I understood, there are 2 modes of
operation - with/without LTO plugin.  Below are the execution flows
for each mode.

Without LTO plugin:

gcc -flto      # Call GCC driver
 |_ cc1        # Compile first source file into asm + intermediate language
 |_ as         # Assemble these asm + IL into temporary object file
 |_ ...        # Compile and assemble all remaining source files
 |_ collect2   # Call linker driver
     |_ lto-wrapper    # Call lto-wrapper directly from collect2
     |   |_ gcc        # Driver
     |   |   |_ lto1   # Perform WPA and split into partitions
     |   |_ gcc        # Driver
     |   |   |_ lto1   # Perform LTRANS for the first partition
     |   |   |_ as     # Assemble this partition into final object file
     |   |_ ...        # Perform LTRANS for each partition
     |_ collect-ld     # Simple wrapper over ld
         |_ ld         # Perform linking

Using LTO plugin:

gcc -flto      # Call GCC driver
 |_ cc1        # Compile first source file into asm + intermediate language
 |_ as         # Assemble these asm + IL into temporary object file
 |_ ...        # Compile and assemble all remaining source files
 |_ collect2   # Call linker driver
     |_ collect-ld   # Simple wrapper over ld
         |_ ld with liblto_plugin.so   # Perform LTO and linking
             |_ lto-wrapper    # Is called from liblto_plugin.so
                 |_ gcc        # Driver
                 |   |_ lto1   # Perform WPA and split into partitions
                 |_ gcc        # Driver
                 |   |_ lto1   # Perform LTRANS for the first partition
                 |   |_ as     # Assemble this partition into final object file
                 |_ ...        # Perform LTRANS for each partition

Are they correct?

2.  The second question, regarding #pragma omp target implementation.
I'm going to reuse LTO approach in a prototype, that will produce 2
binaries - for host and target architectures.  Target binary will contain
functions outlined from omp target region and some infrastructure to run
them.
To produce 2 binaries we need to run gcc and ld twice.  At the first run
gcc will generate object file, that contains optimized code for host and
GIMPLE for target.  At the second run gcc will read the GIMPLE and
generate optimized code for target.

So, the question is - what is the right place for the second run of gcc
and ld?  Should I insert them into liblto_plugin.so?  Or should I create
entirely new plugin, that will only call gcc and ld for target, without
performing any LTO optimizations for host?
Suggestions?

----
Thanks,
Ilya Verbin,
Software Engineer
Intel Corporation

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-08-15 13:44 Questions about LTO infrastructure and pragma omp target Ilya Verbin
@ 2013-08-15 15:00 ` Jakub Jelinek
  2013-08-15 19:19   ` Richard Biener
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Jelinek @ 2013-08-15 15:00 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: hubicka, rth, kirill.yukhin, gcc

On Thu, Aug 15, 2013 at 05:36:39PM +0400, Ilya Verbin wrote:
> 2.  The second question, regarding #pragma omp target implementation.
> I'm going to reuse LTO approach in a prototype, that will produce 2
> binaries - for host and target architectures.  Target binary will contain
> functions outlined from omp target region and some infrastructure to run
> them.
> To produce 2 binaries we need to run gcc and ld twice.  At the first run
> gcc will generate object file, that contains optimized code for host and
> GIMPLE for target.  At the second run gcc will read the GIMPLE and
> generate optimized code for target.
> 
> So, the question is - what is the right place for the second run of gcc
> and ld?  Should I insert them into liblto_plugin.so?  Or should I create
> entirely new plugin, that will only call gcc and ld for target, without
> performing any LTO optimizations for host?
> Suggestions?

The rough plan (partly discussed at the accelerator BoF) was that we would
stream LTO bytecode into special section somewhere during ompexp pass or so
(note, right now LTO streaming streams everything in a TU, we'd want to
stream only the routines with "omp declare target" attribute, and outlined
#pragma omp target regions, and vars referenced from it and types etc.),
then have some other linker plugin (-fopenmp/-fopenacc) that would recognize
these special sections and run lto1 on those (if not -flto in some mode that
would just compile each TU separately or something), then we need to link
it together and let the linker put it into some section of the host binary
or shared library.

	Jakub

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-08-15 15:00 ` Jakub Jelinek
@ 2013-08-15 19:19   ` Richard Biener
  2013-08-23 13:15     ` Ilya Verbin
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Biener @ 2013-08-15 19:19 UTC (permalink / raw)
  To: Jakub Jelinek, Jakub Jelinek, Ilya Verbin
  Cc: hubicka, rth, kirill.yukhin, gcc

Jakub Jelinek <jakub@redhat.com> wrote:
>On Thu, Aug 15, 2013 at 05:36:39PM +0400, Ilya Verbin wrote:
>> 2.  The second question, regarding #pragma omp target implementation.
>> I'm going to reuse LTO approach in a prototype, that will produce 2
>> binaries - for host and target architectures.  Target binary will
>contain
>> functions outlined from omp target region and some infrastructure to
>run
>> them.
>> To produce 2 binaries we need to run gcc and ld twice.  At the first
>run
>> gcc will generate object file, that contains optimized code for host
>and
>> GIMPLE for target.  At the second run gcc will read the GIMPLE and
>> generate optimized code for target.
>> 
>> So, the question is - what is the right place for the second run of
>gcc
>> and ld?  Should I insert them into liblto_plugin.so?  Or should I
>create
>> entirely new plugin, that will only call gcc and ld for target,
>without
>> performing any LTO optimizations for host?
>> Suggestions?
>
>The rough plan (partly discussed at the accelerator BoF) was that we
>would
>stream LTO bytecode into special section somewhere during ompexp pass
>or so
>(note, right now LTO streaming streams everything in a TU, we'd want to
>stream only the routines with "omp declare target" attribute, and
>outlined
>#pragma omp target regions, and vars referenced from it and types
>etc.),
>then have some other linker plugin (-fopenmp/-fopenacc) that would
>recognize
>these special sections and run lto1 on those (if not -flto in some mode
>that
>would just compile each TU separately or something), then we need to
>link
>it together and let the linker put it into some section of the host
>binary
>or shared library.

Alternatively you make lto-wrapper aware of this which means that WPA stage would emit extra partitions that it marks for lto-wrapper.

That sounds better than another plugin to me.  Of course WPA time might be too limiting. Otoh the idea of multiple WPA stages, aka iterating lto could be picked up to have a late WPA stage.

Richard.

>	Jakub


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-08-15 19:19   ` Richard Biener
@ 2013-08-23 13:15     ` Ilya Verbin
  2013-08-23 14:38       ` Jakub Jelinek
  2013-08-23 15:05       ` Richard Biener
  0 siblings, 2 replies; 25+ messages in thread
From: Ilya Verbin @ 2013-08-23 13:15 UTC (permalink / raw)
  To: Richard Biener, Jakub Jelinek, Uday Khedker
  Cc: hubicka, rth, kirill.yukhin, Michael V. Zolotukhin, gcc

Jakub, Richard, Uday,
Thanks for your answers.

On 15 Aug 20:59, Richard Biener wrote:
> Alternatively you make lto-wrapper aware of this which means that WPA stage would emit extra partitions that it marks for lto-wrapper.
> 
> That sounds better than another plugin to me.  Of course WPA time might be too limiting. Otoh the idea of multiple WPA stages, aka iterating lto could be picked up to have a late WPA stage.
> 
> Richard.

I'm trying to implement the approach with modified lto-wrapper.
Suppose we have a bytecode of the routine foo, streamed during ompexp pass into some section, say .gnu.omptarget_foo.
In function lto.c:do_whole_program_analysis() an extra partition should be created, that will contain bytecode from .gnu.omptarget_foo, right?
As far as I understood, in addition to the bytecode of foo, we should also stream extra symtab_nodes, and read them somewhere in lto-cgraph.c:input_symtab().
This means we should maintain 2 symtabs inside WPA stage - original for host and new for target?

Richard, also what do you mean by "WPA time might be too limiting"?

Thanks,
    -- Ilya

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-08-23 13:15     ` Ilya Verbin
@ 2013-08-23 14:38       ` Jakub Jelinek
  2013-08-28  9:59         ` Basile Starynkevitch
  2013-08-23 15:05       ` Richard Biener
  1 sibling, 1 reply; 25+ messages in thread
From: Jakub Jelinek @ 2013-08-23 14:38 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Richard Biener, Uday Khedker, hubicka, rth, kirill.yukhin,
	Michael V. Zolotukhin, gcc

On Fri, Aug 23, 2013 at 02:55:27PM +0400, Ilya Verbin wrote:
> I'm trying to implement the approach with modified lto-wrapper.
> Suppose we have a bytecode of the routine foo, streamed during ompexp pass into some section, say .gnu.omptarget_foo.
> In function lto.c:do_whole_program_analysis() an extra partition should be created, that will contain bytecode from .gnu.omptarget_foo, right?
> As far as I understood, in addition to the bytecode of foo, we should also stream extra symtab_nodes, and read them somewhere in lto-cgraph.c:input_symtab().
> This means we should maintain 2 symtabs inside WPA stage - original for host and new for target?

I don't think we should stream into more than one target section.
There should be just .gnu.target_lto section (or whatever other suitable
name) and should stream into it:
1) all functions and variables with "omp declare target" attribute
2) the outlined bodies of #pragma omp target turned into *.ompfn functions
3) all the types, symtab etc. needed for that

If compiling with -flto, you'll also get everything from the CU streamed
into the normal LTO section, otherwise you'll get assembly for the host
variables/functions/etc.

Then the question is what the plugin should perform with these sections,
whether it will compile each input .gnu.target_lto section hunk separately
(as in non-LTO mode), or with -flto also LTO them together.

	Jakub

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-08-23 13:15     ` Ilya Verbin
  2013-08-23 14:38       ` Jakub Jelinek
@ 2013-08-23 15:05       ` Richard Biener
  2013-08-23 15:06         ` Jakub Jelinek
  2013-09-16 17:14         ` Ilya Verbin
  1 sibling, 2 replies; 25+ messages in thread
From: Richard Biener @ 2013-08-23 15:05 UTC (permalink / raw)
  To: Ilya Verbin, Jakub Jelinek, Uday Khedker
  Cc: hubicka, rth, kirill.yukhin, Michael V. Zolotukhin, gcc

Ilya Verbin <iverbin@gmail.com> wrote:
>Jakub, Richard, Uday,
>Thanks for your answers.
>
>On 15 Aug 20:59, Richard Biener wrote:
>> Alternatively you make lto-wrapper aware of this which means that WPA
>stage would emit extra partitions that it marks for lto-wrapper.
>> 
>> That sounds better than another plugin to me.  Of course WPA time
>might be too limiting. Otoh the idea of multiple WPA stages, aka
>iterating lto could be picked up to have a late WPA stage.
>> 
>> Richard.
>
>I'm trying to implement the approach with modified lto-wrapper.
>Suppose we have a bytecode of the routine foo, streamed during ompexp
>pass into some section, say .gnu.omptarget_foo.
>In function lto.c:do_whole_program_analysis() an extra partition should
>be created, that will contain bytecode from .gnu.omptarget_foo, right?

Right.

>As far as I understood, in addition to the bytecode of foo, we should
>also stream extra symtab_nodes, and read them somewhere in
>lto-cgraph.c:input_symtab().
>This means we should maintain 2 symtabs inside WPA stage - original for
>host and new for target?

No, as you will refer to the symbol with the target code from the host code you need a single unified symtab.

>Richard, also what do you mean by "WPA time might be too limiting"?

At WPA time you do not have function bodies available, so you need to identify candidates at compile-time. Also no sophisticated optimization has happened at this point of the compilation.

Richard.

>Thanks,
>    -- Ilya


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-08-23 15:05       ` Richard Biener
@ 2013-08-23 15:06         ` Jakub Jelinek
  2013-08-25 22:36           ` Ilya Verbin
  2013-09-16 17:14         ` Ilya Verbin
  1 sibling, 1 reply; 25+ messages in thread
From: Jakub Jelinek @ 2013-08-23 15:06 UTC (permalink / raw)
  To: Richard Biener
  Cc: Ilya Verbin, Uday Khedker, hubicka, rth, kirill.yukhin,
	Michael V. Zolotukhin, gcc

On Fri, Aug 23, 2013 at 02:24:42PM +0200, Richard Biener wrote:
> >As far as I understood, in addition to the bytecode of foo, we should
> >also stream extra symtab_nodes, and read them somewhere in
> >lto-cgraph.c:input_symtab().
> >This means we should maintain 2 symtabs inside WPA stage - original for
> >host and new for target?
> 
> No, as you will refer to the symbol with the target code from the host
> code you need a single unified symtab.

I really think you want two symtabs rather than a unified symtab,
or just stream a subset of the host symtab into the .gnu.target_lto
section.  The thing is, the target code (functions, vars, outlined bodies)
is a strict subset of the host code (because as a fallback, everything
needs to be able to run on the host), but when not compiling originally with
-flto, we IMHO should stream just the target subset, not everything
(and for -flto stream both the target subset into one section and everything
(host code) as we do right now, either with fat or slim lto objects).
I think we shouldn't require that you can use -fopenmp and acceleration only
if you compile host code with -flto.
The target code generally can't call into the host code, and for the other
direction at least for Intel MIC it is done by symbol lookup (you tell COI
library to upload a binary or shared library from file or memory to the
target device and then you tell it to invoke some named function).

	Jakub

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-08-23 15:06         ` Jakub Jelinek
@ 2013-08-25 22:36           ` Ilya Verbin
  2013-08-26  7:32             ` Jakub Jelinek
  0 siblings, 1 reply; 25+ messages in thread
From: Ilya Verbin @ 2013-08-25 22:36 UTC (permalink / raw)
  To: Richard Biener, Jakub Jelinek
  Cc: Uday Khedker, hubicka, rth, kirill.yukhin, Michael V. Zolotukhin, gcc

On 23 Aug 13:17, Jakub Jelinek wrote:
> I don't think we should stream into more than one target section.
> There should be just .gnu.target_lto section (or whatever other suitable
> name) and should stream into it:
> 1) all functions and variables with "omp declare target" attribute
> 2) the outlined bodies of #pragma omp target turned into *.ompfn functions
> 3) all the types, symtab etc. needed for that

Why having one target section is preferable than multiple sections for each
function body?

> Then the question is what the plugin should perform with these sections,
> whether it will compile each input .gnu.target_lto section hunk separately
> (as in non-LTO mode), or with -flto also LTO them together.

Yes, it is an important question...  To get started it is easier to implement
"non-target-lto" mode, however this approach should be general enough to extend
it to "target-lto" mode.  Does anyone need it?


On 23 Aug 14:36, Jakub Jelinek wrote:
> On Fri, Aug 23, 2013 at 02:24:42PM +0200, Richard Biener wrote:
> > No, as you will refer to the symbol with the target code from the host
> > code you need a single unified symtab.
> 
> I really think you want two symtabs rather than a unified symtab,
> or just stream a subset of the host symtab into the .gnu.target_lto
> section.  The thing is, the target code (functions, vars, outlined bodies)
> is a strict subset of the host code (because as a fallback, everything
> needs to be able to run on the host), but when not compiling originally with
> -flto, we IMHO should stream just the target subset, not everything
> (and for -flto stream both the target subset into one section and everything
> (host code) as we do right now, either with fat or slim lto objects).

I also think that having two symtabs looks better.  There is no direct refs to
the target symbols from the host code.  And (as far as I see it) unified symtab
will lead to mess in places, where host and target symbols should be handled
differently.

Thanks,
    -- Ilya

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-08-25 22:36           ` Ilya Verbin
@ 2013-08-26  7:32             ` Jakub Jelinek
  2013-09-03 14:00               ` Michael V. Zolotukhin
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Jelinek @ 2013-08-26  7:32 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Richard Biener, Uday Khedker, hubicka, rth, kirill.yukhin,
	Michael V. Zolotukhin, gcc

On Fri, Aug 23, 2013 at 09:15:14PM +0400, Ilya Verbin wrote:
> On 23 Aug 13:17, Jakub Jelinek wrote:
> > I don't think we should stream into more than one target section.
> > There should be just .gnu.target_lto section (or whatever other suitable
> > name) and should stream into it:
> > 1) all functions and variables with "omp declare target" attribute
> > 2) the outlined bodies of #pragma omp target turned into *.ompfn functions
> > 3) all the types, symtab etc. needed for that
> 
> Why having one target section is preferable than multiple sections for each
> function body?

Because together with each function you also need to stream everything the
function uses (all referenced vars, types, the symtab, etc.).  Plus, do you
really want to compile each function and each variable by a separate gcc
process?  Though, looking at what -flto does (I thought it emits one
section), apparently it emits plenty of sections, so probably just do what
normal LTO streaming does, except stream it into .gnu.target_lto* instead
of .gnu.lto* sections.

	Jakub

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-08-23 14:38       ` Jakub Jelinek
@ 2013-08-28  9:59         ` Basile Starynkevitch
  0 siblings, 0 replies; 25+ messages in thread
From: Basile Starynkevitch @ 2013-08-28  9:59 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Ilya Verbin, Richard Biener, Uday Khedker, hubicka, rth,
	kirill.yukhin, Michael V. Zolotukhin, gcc

On Fri, 2013-08-23 at 13:17 +0200, Jakub Jelinek wrote:
[...]
> Then the question is what the plugin should perform with these sections,
> whether it will compile each input .gnu.target_lto section hunk separately
> (as in non-LTO mode), or with -flto also LTO them together.
[...]

Since plugins are mentioned (and I guess Jakub was talking of linker
plugins) I would add that GCC plugins might want to add (and later use)
their own arbitrary LTO data in these sections.

Cheers

-- 
Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-08-26  7:32             ` Jakub Jelinek
@ 2013-09-03 14:00               ` Michael V. Zolotukhin
  2013-09-03 14:19                 ` Jakub Jelinek
  0 siblings, 1 reply; 25+ messages in thread
From: Michael V. Zolotukhin @ 2013-09-03 14:00 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Ilya Verbin, Richard Biener, Uday Khedker, hubicka, rth,
	kirill.yukhin, gcc

Hi guys,
Let's continue this discussion.

Summing up what was said above, I think we need following changes in
LTO-infrastructure to enable offloading:
  * [in lto_plugin] claim files with .openmp (or whatever
name) sections along with files containing .lto sections, as we do now
  * [in lto_plugin] check if resolution file is generated
correctly and probably fix it a bit to properly handle LTO and OpenMP
handling (separately from each other)
  * [in WPA] save .openmp sections to a new partition (in future - into
several new partitions).
  * [in WPA] save a copy of symtab (or its subset) to this
openmp-partition.
  * [in lto_wrapper] compile the openmp-partition with gcc-target and
then link it with ld-target

Is this a correct overview of the needed changes?

If I got it right, then there is a follow-up question: how would we
support several different targets?  We need to be able to invoke
different versions of gcc-target from lto_wrapper - do we want to have
plugins there as well (yeah, plugins for plugin:) )?

Thanks, Michael  

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-09-03 14:00               ` Michael V. Zolotukhin
@ 2013-09-03 14:19                 ` Jakub Jelinek
  2013-09-03 15:18                   ` Michael V. Zolotukhin
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Jelinek @ 2013-09-03 14:19 UTC (permalink / raw)
  To: Michael V. Zolotukhin
  Cc: Ilya Verbin, Richard Biener, Uday Khedker, hubicka, rth,
	kirill.yukhin, gcc

On Tue, Sep 03, 2013 at 05:59:35PM +0400, Michael V. Zolotukhin wrote:
> Let's continue this discussion.
> 
> Summing up what was said above, I think we need following changes in
> LTO-infrastructure to enable offloading:
>   * [in lto_plugin] claim files with .openmp (or whatever
> name) sections along with files containing .lto sections, as we do now

I'd go with .gnu.target_lto* names (i.e. s/.gnu.lto/.gnu.target_lto/
on the existing LTO section names if they are for the accelerator rather
than host).

>   * [in lto_plugin] check if resolution file is generated
> correctly and probably fix it a bit to properly handle LTO and OpenMP
> handling (separately from each other)
>   * [in WPA] save .openmp sections to a new partition (in future - into
> several new partitions).
>   * [in WPA] save a copy of symtab (or its subset) to this
> openmp-partition.
>   * [in lto_wrapper] compile the openmp-partition with gcc-target and
> then link it with ld-target

I really have almost zero experience with LTO, but I don't see how you could
use any resolution for those sections.  The resolution handling in the
linker will be for the host link, you want something along the lines of:
- if you find .gnu.target_lto* sections, feed those (one CU at a time) to
  the target compiler driver with some magic option that it will use lto1
  backend and will read from the .gnu.target_lto* sections instead of
  .gnu.lto*; and, at least when not -flto, you want it to just generate
  assembly for the target, and let the target driver also invoke assembler
- collect all those target object files from the link, link them together
  using target compiler driver, and feed back the resulting binary
  or shared library into the host linking (some magic section in there)
But, the target support has to work even without -flto, and for
debuggability etc. reasons I wouldn't force compiling all the target code
together unless required by the target.

	Jakub

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-09-03 14:19                 ` Jakub Jelinek
@ 2013-09-03 15:18                   ` Michael V. Zolotukhin
  2013-09-03 17:39                     ` Thomas Schwinge
  0 siblings, 1 reply; 25+ messages in thread
From: Michael V. Zolotukhin @ 2013-09-03 15:18 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Ilya Verbin, Richard Biener, Uday Khedker, hubicka, rth,
	kirill.yukhin, gcc

> I'd go with .gnu.target_lto* names (i.e. s/.gnu.lto/.gnu.target_lto/
> on the existing LTO section names if they are for the accelerator rather
> than host).
I guess that now we could go with any naming, as it's far from being
finalized.

> I really have almost zero experience with LTO, but I don't see how you could
> use any resolution for those sections.  The resolution handling in the
> linker will be for the host link, you want something along the lines of:
> - if you find .gnu.target_lto* sections, feed those (one CU at a time) to
>   the target compiler driver with some magic option that it will use lto1
>   backend and will read from the .gnu.target_lto* sections instead of
>   .gnu.lto*; and, at least when not -flto, you want it to just generate
>   assembly for the target, and let the target driver also invoke assembler
I'm also not an expert in LTO - I dived into it just a couple of days
ago.  However, the general scheme of LTO work, as I see it, is following
(as we have it now, without any offloading support):
  * collect2 calls ld with plugin lto_plugin
  * liblto_plugin check every linker's input file and if it contains
    .lto sections, the plugin claims this file for the further processing
  * when all input files are loaded (and thus checked), the plugin creates
    resolution file and calls lto_wrapper, passing to it all claimed files
  * lto_wrapper calls 'gcc -xlto -fwpa' (WPA phase)
  * WPA divides everything into several partitions, basing on the
    callgraph it creates
  * lto_wrapper calls 'gcc -xlto -fltrans' on the created partitions -
    here is when the compilation occurs
  * the resultant object files are feed back to the linker and it
    produces the final executable

From my POV, we could easily reuse this infrastructure, making almost no
changes in it, because all we want is to call some external programs
(target compiler, target linker), which don't affect host object-files
and binaries at all.  All they do will be in a separate files and the
host infrastructure will never know about them.
Also, nothing prevents us from doing link-time optimizations (in future)
on the target code - we could run 'gcc_target -flto' from lto_wrapper,
only adjusting it to use lto-frontend instead of C/C++/other language.

> - collect all those target object files from the link, link them together
>   using target compiler driver, and feed back the resulting binary
>   or shared library into the host linking (some magic section in there)
Why do we need to feed the target binary back to the host linking?  The
host program cannot directly call any routine from the target binary, so
IMHO there is no point in linking them together, they are just separate
executables.

> But, the target support has to work even without -flto, and for
> debuggability etc. reasons I wouldn't force compiling all the target code
> together unless required by the target.
Well, we could use 1-to-1 partitioning (meaning that routines from every
input CU are placed in a separate partition).

And the question about multi-target support here still remains open.

Thanks, Michael
> 
> 	Jakub

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-09-03 15:18                   ` Michael V. Zolotukhin
@ 2013-09-03 17:39                     ` Thomas Schwinge
  2013-09-03 18:30                       ` Michael V. Zolotukhin
  0 siblings, 1 reply; 25+ messages in thread
From: Thomas Schwinge @ 2013-09-03 17:39 UTC (permalink / raw)
  To: Michael V. Zolotukhin
  Cc: Ilya Verbin, Richard Biener, Uday Khedker, hubicka, rth,
	kirill.yukhin, gcc, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1271 bytes --]

Hi!

On Tue, 3 Sep 2013 19:18:01 +0400, "Michael V. Zolotukhin" <michael.v.zolotukhin@gmail.com> wrote:
> > - collect all those target object files from the link, link them together
> >   using target compiler driver, and feed back the resulting binary
> >   or shared library into the host linking (some magic section in there)
> Why do we need to feed the target binary back to the host linking?  The
> host program cannot directly call any routine from the target binary, so
> IMHO there is no point in linking them together, they are just separate
> executables.

The idea, as we discussed it at the GNU Tools Cauldron's Acceleration
BoF, is that the host program (for at least some acceleration devices)
will be responsible for loading the acceleration device's code to the
device, using some support library that is specific to each acceleration
device, and for that it is useful to have the the code readily accessible
in the host program, and thus link it in as "data".

> And the question about multi-target support here still remains open.

Many questions are still open -- but I'm glad there is activity on this
topic, and I'm sure we'll be able to converge with the designs we have or
are currently developing.

Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-09-03 17:39                     ` Thomas Schwinge
@ 2013-09-03 18:30                       ` Michael V. Zolotukhin
  2013-09-03 18:54                         ` Jakub Jelinek
  0 siblings, 1 reply; 25+ messages in thread
From: Michael V. Zolotukhin @ 2013-09-03 18:30 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Ilya Verbin, Richard Biener, Uday Khedker, hubicka, rth,
	kirill.yukhin, gcc, Jakub Jelinek

Hi Thomas,

> The idea, as we discussed it at the GNU Tools Cauldron's Acceleration
> BoF, is that the host program (for at least some acceleration devices)
> will be responsible for loading the acceleration device's code to the
> device, using some support library that is specific to each acceleration
> device
Unfortunately, I missed the Cauldron, though I'm familiar with the
general idea and now I'm trying to clarify details.

> and for that it is useful to have the the code readily accessible
> in the host program, and thus link it in as "data".
Oh, if we just link the target binary as a data section into the host
binary, then I see no problems in that, it seems absolutely feasible
with the existing infrastructure.  I just thought (seemingly it was
incorrect) that we're speaking about linking of target code with the
host code.

> > And the question about multi-target support here still remains open.
> 
> Many questions are still open -- but I'm glad there is activity on this
> topic, and I'm sure we'll be able to converge with the designs we have or
> are currently developing.
Thanks, I'm sure in that too:)

Best regards, Michael
>  Thomas

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-09-03 18:30                       ` Michael V. Zolotukhin
@ 2013-09-03 18:54                         ` Jakub Jelinek
  2013-09-03 19:09                           ` Michael V. Zolotukhin
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Jelinek @ 2013-09-03 18:54 UTC (permalink / raw)
  To: Michael V. Zolotukhin
  Cc: Thomas Schwinge, Ilya Verbin, Richard Biener, Uday Khedker,
	hubicka, rth, kirill.yukhin, gcc

On Tue, Sep 03, 2013 at 10:29:56PM +0400, Michael V. Zolotukhin wrote:
> > The idea, as we discussed it at the GNU Tools Cauldron's Acceleration
> > BoF, is that the host program (for at least some acceleration devices)
> > will be responsible for loading the acceleration device's code to the
> > device, using some support library that is specific to each acceleration
> > device
> Unfortunately, I missed the Cauldron, though I'm familiar with the
> general idea and now I'm trying to clarify details.
> 
> > and for that it is useful to have the the code readily accessible
> > in the host program, and thus link it in as "data".
> Oh, if we just link the target binary as a data section into the host
> binary, then I see no problems in that, it seems absolutely feasible
> with the existing infrastructure.  I just thought (seemingly it was
> incorrect) that we're speaking about linking of target code with the
> host code.

No.  The rough idea is that you emit the accelerator related subset of CUs
into the (special named) LTO sections, and when linking a binary you collect
all those sections from all the input object files, compile those (without
-flto ideally separately), link together and finally embed into the
executable into a data section.  Similarly when linking a shared library,
you do a target shared library and embed it in a data section of the host
shared library.  It is kind of fat binaries/shared libraries.  Each of the
accelerators would use different name of the data sections, so they could
coexist.
For the MIC, you'd then use COI to create the binary or shared libraries
from the (ro)data section memory image.  For others whatever they support.
Perhaps it should be for MIC a shared library even in the binary and have
some binary in a data section of the libgomp plugin, because it really
should work also if the host binary doesn't have any #pragma omp target
in it at all, but shared libraries do.

	Jakub

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-09-03 18:54                         ` Jakub Jelinek
@ 2013-09-03 19:09                           ` Michael V. Zolotukhin
  0 siblings, 0 replies; 25+ messages in thread
From: Michael V. Zolotukhin @ 2013-09-03 19:09 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Thomas Schwinge, Ilya Verbin, Richard Biener, Uday Khedker,
	hubicka, rth, kirill.yukhin, gcc

> > Oh, if we just link the target binary as a data section into the host
> > binary, then I see no problems in that, it seems absolutely feasible
> > with the existing infrastructure.  I just thought (seemingly it was
> > incorrect) that we're speaking about linking of target code with the
> > host code.
> 
> No.  The rough idea is that you emit the accelerator related subset of CUs
> into the (special named) LTO sections, and when linking a binary you collect
> all those sections from all the input object files, compile those (without
> -flto ideally separately), link together and finally embed into the
> executable into a data section.  Similarly when linking a shared library,
> you do a target shared library and embed it in a data section of the host
> shared library.  It is kind of fat binaries/shared libraries.  Each of the
> accelerators would use different name of the data sections, so they could
> coexist.
Thanks, that matches with my understanding.

> For the MIC, you'd then use COI to create the binary or shared libraries
> from the (ro)data section memory image.  For others whatever they support.
> Perhaps it should be for MIC a shared library even in the binary and have
> some binary in a data section of the libgomp plugin, because it really
> should work also if the host binary doesn't have any #pragma omp target
> in it at all, but shared libraries do.
I haven't thought about possible issues with shared libraries using offload yet,
but they surely deserve careful consideration.

Michael
> 	Jakub

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-08-23 15:05       ` Richard Biener
  2013-08-23 15:06         ` Jakub Jelinek
@ 2013-09-16 17:14         ` Ilya Verbin
  2013-09-17  8:12           ` Richard Biener
  1 sibling, 1 reply; 25+ messages in thread
From: Ilya Verbin @ 2013-09-16 17:14 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jakub Jelinek, Uday Khedker, hubicka, rth, kirill.yukhin,
	Michael V. Zolotukhin, gcc

Hi Richard,

On 23 Aug 14:24, Richard Biener wrote:
> Ilya Verbin <iverbin@gmail.com> wrote:
> >I'm trying to implement the approach with modified lto-wrapper.
> >Suppose we have a bytecode of the routine foo, streamed during ompexp
> >pass into some section, say .gnu.omptarget_foo.
> >In function lto.c:do_whole_program_analysis() an extra partition should
> >be created, that will contain bytecode from .gnu.omptarget_foo, right?
> 
> Right.
> 
> Richard.

What if we leave WPA stage unchanged?
Here is a patch that passes "fat" object files (with host-side .gnu.lto_ and
target-side .gnu.target_lto_ sections) directly to the target-compiler.
(Currently it works only with -flto enabled.)  Then target-compiler reads
bytecode from .gnu.target_lto_ and produces target-side object file.
At the moment lto-wrapper uses collect_gcc as a target-compiler.  Also it
doesn't properly handle the command-line args.
This looks simpler than emit extra partitions during WPA.  What do you think?


---
 gcc/lto-streamer.c   |  8 ++++++--
 gcc/lto-streamer.h   |  1 +
 gcc/lto-wrapper.c    | 22 +++++++++++++++++++++-
 gcc/lto/lang.opt     |  4 ++++
 gcc/lto/lto-object.c |  5 +++--
 gcc/lto/lto.c        |  5 ++++-
 6 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c
index e7b66c1..9e19060 100644
--- a/gcc/lto-streamer.c
+++ b/gcc/lto-streamer.c
@@ -145,6 +145,7 @@ lto_get_section_name (int section_type, const char *name, struct lto_file_decl_d
   const char *add;
   char post[32];
   const char *sep;
+  const char *prefix;
 
   if (section_type == LTO_section_function_body)
     {
@@ -172,8 +173,11 @@ lto_get_section_name (int section_type, const char *name, struct lto_file_decl_d
   else if (f != NULL) 
     sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, f->id);
   else
-    sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); 
-  return concat (LTO_SECTION_NAME_PREFIX, sep, add, post, NULL);
+    sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false));
+
+  prefix = flag_openmp_target ? OMP_SECTION_NAME_PREFIX
+			      : LTO_SECTION_NAME_PREFIX;
+  return concat (prefix, sep, add, post, NULL);
 }
 
 
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index e7c89f1..df72e16 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -141,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see
    name for the functions and static_initializers.  For other types of
    sections a '.' and the section type are appended.  */
 #define LTO_SECTION_NAME_PREFIX         ".gnu.lto_"
+#define OMP_SECTION_NAME_PREFIX         ".gnu.target_lto_"
 
 #define LTO_major_version 2
 #define LTO_minor_version 2
diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 15a34dd..f3b44ff 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -442,6 +442,7 @@ run_gcc (unsigned argc, char *argv[])
   unsigned i, j;
   const char **new_argv;
   const char **argv_ptr;
+  const char **target_argv;
   char *list_option_full = NULL;
   const char *linker_output = NULL;
   const char *collect_gcc, *collect_gcc_options;
@@ -452,7 +453,7 @@ run_gcc (unsigned argc, char *argv[])
   unsigned int fdecoded_options_count = 0;
   struct cl_decoded_option *decoded_options;
   unsigned int decoded_options_count;
-  struct obstack argv_obstack;
+  struct obstack argv_obstack, target_argv_obstack;
   int new_head_argc;
 
   /* Get the driver and options.  */
@@ -902,6 +903,25 @@ cont:
       free (input_names);
       free (list_option_full);
       obstack_free (&env_obstack, NULL);
+
+      /* Run gcc for target.  */
+      obstack_init (&target_argv_obstack);
+      obstack_ptr_grow (&target_argv_obstack, collect_gcc);
+      obstack_ptr_grow (&target_argv_obstack, "-xlto");
+      obstack_ptr_grow (&target_argv_obstack, "-fopenmp_target");
+      obstack_ptr_grow (&target_argv_obstack, "-c");
+      obstack_ptr_grow (&target_argv_obstack, "-o");
+      obstack_ptr_grow (&target_argv_obstack, "target.o");
+
+      /* Append the input objects.  */
+      for (i = 1; i < argc; ++i)
+	if (strncmp (argv[i], "-fresolution=", sizeof ("-fresolution=") - 1))
+	  obstack_ptr_grow (&target_argv_obstack, argv[i]);
+      obstack_ptr_grow (&target_argv_obstack, NULL);
+
+      target_argv = XOBFINISH (&target_argv_obstack, const char **);
+      fork_execute (CONST_CAST (char **, target_argv));
+      obstack_free (&target_argv_obstack, NULL);
     }
 
   obstack_free (&argv_obstack, NULL);
diff --git a/gcc/lto/lang.opt b/gcc/lto/lang.opt
index 7a9aede..cd0098c 100644
--- a/gcc/lto/lang.opt
+++ b/gcc/lto/lang.opt
@@ -40,4 +40,8 @@ fresolution=
 LTO Joined
 The resolution file
 
+fopenmp_target
+LTO Var(flag_openmp_target)
+Run LTO infrastructure to read target-side bytecode and to build it.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/lto/lto-object.c b/gcc/lto/lto-object.c
index 77be1fb..ccf06d2 100644
--- a/gcc/lto/lto-object.c
+++ b/gcc/lto/lto-object.c
@@ -226,9 +226,10 @@ lto_obj_add_section (void *data, const char *name, off_t offset,
   struct lto_section_slot s_slot;
   void **slot;
   struct lto_section_list *list = loasd->list;
+  const char *prefix = flag_openmp_target ? OMP_SECTION_NAME_PREFIX
+					  : LTO_SECTION_NAME_PREFIX;
 
-  if (strncmp (name, LTO_SECTION_NAME_PREFIX,
-	       strlen (LTO_SECTION_NAME_PREFIX)) != 0)
+  if (strncmp (name, prefix, strlen (prefix)))
     return 1;
 
   new_name = xstrdup (name);
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index c854589..d3bac3a 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -2677,9 +2677,12 @@ static int
 lto_section_with_id (const char *name, unsigned HOST_WIDE_INT *id)
 {
   const char *s;
+  const char *prefix = flag_openmp_target ? OMP_SECTION_NAME_PREFIX
+					  : LTO_SECTION_NAME_PREFIX;
 
-  if (strncmp (name, LTO_SECTION_NAME_PREFIX, strlen (LTO_SECTION_NAME_PREFIX)))
+  if (strncmp (name, prefix, strlen (prefix)))
     return 0;
+
   s = strrchr (name, '.');
   return s && sscanf (s, "." HOST_WIDE_INT_PRINT_HEX_PURE, id) == 1;
 }
-- 
1.7.11.7


Thanks,
  -- Ilya

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-09-16 17:14         ` Ilya Verbin
@ 2013-09-17  8:12           ` Richard Biener
  2013-09-17 11:31             ` Ilya Verbin
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Biener @ 2013-09-17  8:12 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Jakub Jelinek, Uday Khedker, Jan Hubicka, Richard Henderson,
	Kirill Yukhin, Michael V. Zolotukhin, GCC Development

On Mon, Sep 16, 2013 at 7:14 PM, Ilya Verbin <iverbin@gmail.com> wrote:
> Hi Richard,
>
> On 23 Aug 14:24, Richard Biener wrote:
>> Ilya Verbin <iverbin@gmail.com> wrote:
>> >I'm trying to implement the approach with modified lto-wrapper.
>> >Suppose we have a bytecode of the routine foo, streamed during ompexp
>> >pass into some section, say .gnu.omptarget_foo.
>> >In function lto.c:do_whole_program_analysis() an extra partition should
>> >be created, that will contain bytecode from .gnu.omptarget_foo, right?
>>
>> Right.
>>
>> Richard.
>
> What if we leave WPA stage unchanged?
> Here is a patch that passes "fat" object files (with host-side .gnu.lto_ and
> target-side .gnu.target_lto_ sections) directly to the target-compiler.
> (Currently it works only with -flto enabled.)  Then target-compiler reads
> bytecode from .gnu.target_lto_ and produces target-side object file.
> At the moment lto-wrapper uses collect_gcc as a target-compiler.  Also it
> doesn't properly handle the command-line args.
> This looks simpler than emit extra partitions during WPA.  What do you think?

It looks more like a hack ;)  It certainly doesn't look scalable to multiple
target ISAs.  You also unconditionally invoke the target compiler (well, you
invoke the same compiler ...)

As far as I understand your patch the target IL is already produced by
the compile stage (always? what about possible target IL emit from
-ftree-parallelize-loops?)?

As I understand Jakub he prefers things to work without -flto as well, so
target IL has to be handled by a different linker plugin and LTO would merely
be required to pass the target IL sections through the LTO pipeline and re-emit
it during LTRANS?

Btw, at this point it's bad that LTO IL sections do not have something like
a section header - encoding all section properties in the section name is
not going to scale.  Any takers to add a section header to all LTO sections?
(add a first flag, "compressed_p" there, so we can finally mix compressed
and uncompressed sections).

Richard.

>
> ---
>  gcc/lto-streamer.c   |  8 ++++++--
>  gcc/lto-streamer.h   |  1 +
>  gcc/lto-wrapper.c    | 22 +++++++++++++++++++++-
>  gcc/lto/lang.opt     |  4 ++++
>  gcc/lto/lto-object.c |  5 +++--
>  gcc/lto/lto.c        |  5 ++++-
>  6 files changed, 39 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c
> index e7b66c1..9e19060 100644
> --- a/gcc/lto-streamer.c
> +++ b/gcc/lto-streamer.c
> @@ -145,6 +145,7 @@ lto_get_section_name (int section_type, const char *name, struct lto_file_decl_d
>    const char *add;
>    char post[32];
>    const char *sep;
> +  const char *prefix;
>
>    if (section_type == LTO_section_function_body)
>      {
> @@ -172,8 +173,11 @@ lto_get_section_name (int section_type, const char *name, struct lto_file_decl_d
>    else if (f != NULL)
>      sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, f->id);
>    else
> -    sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false));
> -  return concat (LTO_SECTION_NAME_PREFIX, sep, add, post, NULL);
> +    sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false));
> +
> +  prefix = flag_openmp_target ? OMP_SECTION_NAME_PREFIX
> +                             : LTO_SECTION_NAME_PREFIX;
> +  return concat (prefix, sep, add, post, NULL);
>  }
>
>
> diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
> index e7c89f1..df72e16 100644
> --- a/gcc/lto-streamer.h
> +++ b/gcc/lto-streamer.h
> @@ -141,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see
>     name for the functions and static_initializers.  For other types of
>     sections a '.' and the section type are appended.  */
>  #define LTO_SECTION_NAME_PREFIX         ".gnu.lto_"
> +#define OMP_SECTION_NAME_PREFIX         ".gnu.target_lto_"
>
>  #define LTO_major_version 2
>  #define LTO_minor_version 2
> diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
> index 15a34dd..f3b44ff 100644
> --- a/gcc/lto-wrapper.c
> +++ b/gcc/lto-wrapper.c
> @@ -442,6 +442,7 @@ run_gcc (unsigned argc, char *argv[])
>    unsigned i, j;
>    const char **new_argv;
>    const char **argv_ptr;
> +  const char **target_argv;
>    char *list_option_full = NULL;
>    const char *linker_output = NULL;
>    const char *collect_gcc, *collect_gcc_options;
> @@ -452,7 +453,7 @@ run_gcc (unsigned argc, char *argv[])
>    unsigned int fdecoded_options_count = 0;
>    struct cl_decoded_option *decoded_options;
>    unsigned int decoded_options_count;
> -  struct obstack argv_obstack;
> +  struct obstack argv_obstack, target_argv_obstack;
>    int new_head_argc;
>
>    /* Get the driver and options.  */
> @@ -902,6 +903,25 @@ cont:
>        free (input_names);
>        free (list_option_full);
>        obstack_free (&env_obstack, NULL);
> +
> +      /* Run gcc for target.  */
> +      obstack_init (&target_argv_obstack);
> +      obstack_ptr_grow (&target_argv_obstack, collect_gcc);
> +      obstack_ptr_grow (&target_argv_obstack, "-xlto");
> +      obstack_ptr_grow (&target_argv_obstack, "-fopenmp_target");
> +      obstack_ptr_grow (&target_argv_obstack, "-c");
> +      obstack_ptr_grow (&target_argv_obstack, "-o");
> +      obstack_ptr_grow (&target_argv_obstack, "target.o");
> +
> +      /* Append the input objects.  */
> +      for (i = 1; i < argc; ++i)
> +       if (strncmp (argv[i], "-fresolution=", sizeof ("-fresolution=") - 1))
> +         obstack_ptr_grow (&target_argv_obstack, argv[i]);
> +      obstack_ptr_grow (&target_argv_obstack, NULL);
> +
> +      target_argv = XOBFINISH (&target_argv_obstack, const char **);
> +      fork_execute (CONST_CAST (char **, target_argv));
> +      obstack_free (&target_argv_obstack, NULL);
>      }
>
>    obstack_free (&argv_obstack, NULL);
> diff --git a/gcc/lto/lang.opt b/gcc/lto/lang.opt
> index 7a9aede..cd0098c 100644
> --- a/gcc/lto/lang.opt
> +++ b/gcc/lto/lang.opt
> @@ -40,4 +40,8 @@ fresolution=
>  LTO Joined
>  The resolution file
>
> +fopenmp_target
> +LTO Var(flag_openmp_target)
> +Run LTO infrastructure to read target-side bytecode and to build it.
> +
>  ; This comment is to ensure we retain the blank line above.
> diff --git a/gcc/lto/lto-object.c b/gcc/lto/lto-object.c
> index 77be1fb..ccf06d2 100644
> --- a/gcc/lto/lto-object.c
> +++ b/gcc/lto/lto-object.c
> @@ -226,9 +226,10 @@ lto_obj_add_section (void *data, const char *name, off_t offset,
>    struct lto_section_slot s_slot;
>    void **slot;
>    struct lto_section_list *list = loasd->list;
> +  const char *prefix = flag_openmp_target ? OMP_SECTION_NAME_PREFIX
> +                                         : LTO_SECTION_NAME_PREFIX;
>
> -  if (strncmp (name, LTO_SECTION_NAME_PREFIX,
> -              strlen (LTO_SECTION_NAME_PREFIX)) != 0)
> +  if (strncmp (name, prefix, strlen (prefix)))
>      return 1;
>
>    new_name = xstrdup (name);
> diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
> index c854589..d3bac3a 100644
> --- a/gcc/lto/lto.c
> +++ b/gcc/lto/lto.c
> @@ -2677,9 +2677,12 @@ static int
>  lto_section_with_id (const char *name, unsigned HOST_WIDE_INT *id)
>  {
>    const char *s;
> +  const char *prefix = flag_openmp_target ? OMP_SECTION_NAME_PREFIX
> +                                         : LTO_SECTION_NAME_PREFIX;
>
> -  if (strncmp (name, LTO_SECTION_NAME_PREFIX, strlen (LTO_SECTION_NAME_PREFIX)))
> +  if (strncmp (name, prefix, strlen (prefix)))
>      return 0;
> +
>    s = strrchr (name, '.');
>    return s && sscanf (s, "." HOST_WIDE_INT_PRINT_HEX_PURE, id) == 1;
>  }
> --
> 1.7.11.7
>
>
> Thanks,
>   -- Ilya

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-09-17  8:12           ` Richard Biener
@ 2013-09-17 11:31             ` Ilya Verbin
  2013-09-17 11:54               ` Jakub Jelinek
  2013-09-17 11:56               ` Richard Biener
  0 siblings, 2 replies; 25+ messages in thread
From: Ilya Verbin @ 2013-09-17 11:31 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jakub Jelinek, Uday Khedker, Jan Hubicka, Richard Henderson,
	Kirill Yukhin, Michael V. Zolotukhin, GCC Development

On 17 Sep 10:12, Richard Biener wrote:
> It looks more like a hack ;)  It certainly doesn't look scalable to multiple
> target ISAs.  You also unconditionally invoke the target compiler (well, you
> invoke the same compiler ...)

Yes, I currently call the "target" compiler unconditionally, but it can be
placed under a flag/env var/etc.  When we have multiple target ISAs, multiple
target compilers will be invoked.  Each of them will read same IL from
.gnu.target_lto_ and produce an executable for its target.
Why this approach is not scalable?

> As far as I understand your patch the target IL is already produced by
> the compile stage (always? what about possible target IL emit from
> -ftree-parallelize-loops?)?

Yes, I assume that IL is already produced, somehow like this:
http://gcc.gnu.org/ml/gcc/2013-09/msg00123.html
Probably the compile stage should somehow inform the lto-wrapper, that target
compilers should be called.

> As I understand Jakub he prefers things to work without -flto as well, so
> target IL has to be handled by a different linker plugin and LTO would merely
> be required to pass the target IL sections through the LTO pipeline and re-emit
> it during LTRANS?

If we want to reuse "LTO pipeline", the LTO infrastructure should be turn on
(i.e. lto-wrapper should be called).
With -flto, lto-wrapper will perform all usual things (WPA+LTRANS) + invoke all
necessary target compilers.
Without -flto it will merely invoke target compilers.
I do not see how different linker plugin can help.  It will run lto-wrapper same
way like current plugin?

Thanks,
  -- Ilya

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-09-17 11:31             ` Ilya Verbin
@ 2013-09-17 11:54               ` Jakub Jelinek
  2013-09-17 11:56               ` Richard Biener
  1 sibling, 0 replies; 25+ messages in thread
From: Jakub Jelinek @ 2013-09-17 11:54 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Richard Biener, Uday Khedker, Jan Hubicka, Richard Henderson,
	Kirill Yukhin, Michael V. Zolotukhin, GCC Development

On Tue, Sep 17, 2013 at 03:30:10PM +0400, Ilya Verbin wrote:
> On 17 Sep 10:12, Richard Biener wrote:
> > It looks more like a hack ;)  It certainly doesn't look scalable to multiple
> > target ISAs.  You also unconditionally invoke the target compiler (well, you
> > invoke the same compiler ...)
> 
> Yes, I currently call the "target" compiler unconditionally, but it can be
> placed under a flag/env var/etc.  When we have multiple target ISAs, multiple
> target compilers will be invoked.  Each of them will read same IL from
> .gnu.target_lto_ and produce an executable for its target.
> Why this approach is not scalable?
> 
> > As far as I understand your patch the target IL is already produced by
> > the compile stage (always? what about possible target IL emit from
> > -ftree-parallelize-loops?)?
> 
> Yes, I assume that IL is already produced, somehow like this:
> http://gcc.gnu.org/ml/gcc/2013-09/msg00123.html
> Probably the compile stage should somehow inform the lto-wrapper, that target
> compilers should be called.

-ftree-parallelize-loops right now only parallelizes loops, doesn't in any
way attempt to offload them, and I don't plan to work on anything like that
in the near future (it would need probably a different option and lots of
work), what is being offloaded right now is only code explicitly marked by
users.  For -flto -ftree-offload-loops or whatever the option would be,
you'd need to do LTO stuff in the plugin first and pick .gnu.target_lto_*
sections from whatever came out of LTO.

> > As I understand Jakub he prefers things to work without -flto as well, so
> > target IL has to be handled by a different linker plugin and LTO would merely
> > be required to pass the target IL sections through the LTO pipeline and re-emit
> > it during LTRANS?

I don't see why you'd need a different plugin, the section names are
unambiguous, so all you can do is just handle differently .gnu.lto_* and
.gnu.target_lto_* sections in the plugin.  The former are handled without
changes from current behavior, the latter are just compiled by target
compiler, shared libraries temporarily extracted from special sections of
host shared libraries and linked with target linker, then just embedded into
a data section of the host binary resp. shared library.

	Jakub

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-09-17 11:31             ` Ilya Verbin
  2013-09-17 11:54               ` Jakub Jelinek
@ 2013-09-17 11:56               ` Richard Biener
  2013-09-17 12:15                 ` Jakub Jelinek
  1 sibling, 1 reply; 25+ messages in thread
From: Richard Biener @ 2013-09-17 11:56 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Jakub Jelinek, Uday Khedker, Jan Hubicka, Richard Henderson,
	Kirill Yukhin, Michael V. Zolotukhin, GCC Development

On Tue, Sep 17, 2013 at 1:30 PM, Ilya Verbin <iverbin@gmail.com> wrote:
> On 17 Sep 10:12, Richard Biener wrote:
>> It looks more like a hack ;)  It certainly doesn't look scalable to multiple
>> target ISAs.  You also unconditionally invoke the target compiler (well, you
>> invoke the same compiler ...)
>
> Yes, I currently call the "target" compiler unconditionally, but it can be
> placed under a flag/env var/etc.  When we have multiple target ISAs, multiple
> target compilers will be invoked.  Each of them will read same IL from
> .gnu.target_lto_ and produce an executable for its target.
> Why this approach is not scalable?

Are you sure we have the same IL for all targets and the same targets
for all functions?  That would certainly simplify things, but you still need
a way to tell the target compiler which symbol to emit the function on
as the compile-stage will already necessarily refer to all target
variant symbols.

>> As far as I understand your patch the target IL is already produced by
>> the compile stage (always? what about possible target IL emit from
>> -ftree-parallelize-loops?)?
>
> Yes, I assume that IL is already produced, somehow like this:
> http://gcc.gnu.org/ml/gcc/2013-09/msg00123.html
> Probably the compile stage should somehow inform the lto-wrapper, that target
> compilers should be called.

I mean that (speaking in LTO way), you can end up emitting "target IL" from
the LTRANS stage.  So you'd re-run the LTO linker plugin and lto-wrapper
at the final link stage in case you compile with -flto(?)

>> As I understand Jakub he prefers things to work without -flto as well, so
>> target IL has to be handled by a different linker plugin and LTO would merely
>> be required to pass the target IL sections through the LTO pipeline and re-emit
>> it during LTRANS?
>
> If we want to reuse "LTO pipeline", the LTO infrastructure should be turn on
> (i.e. lto-wrapper should be called).
> With -flto, lto-wrapper will perform all usual things (WPA+LTRANS) + invoke all
> necessary target compilers.
> Without -flto it will merely invoke target compilers.
> I do not see how different linker plugin can help.  It will run lto-wrapper same
> way like current plugin?

Well, it certainly would be less LTO specific.

Richard.

> Thanks,
>   -- Ilya

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-09-17 11:56               ` Richard Biener
@ 2013-09-17 12:15                 ` Jakub Jelinek
  2013-09-19 10:45                   ` Ilya Verbin
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Jelinek @ 2013-09-17 12:15 UTC (permalink / raw)
  To: Richard Biener
  Cc: Ilya Verbin, Uday Khedker, Jan Hubicka, Richard Henderson,
	Kirill Yukhin, Michael V. Zolotukhin, GCC Development

On Tue, Sep 17, 2013 at 01:56:39PM +0200, Richard Biener wrote:
> On Tue, Sep 17, 2013 at 1:30 PM, Ilya Verbin <iverbin@gmail.com> wrote:
> > On 17 Sep 10:12, Richard Biener wrote:
> >> It looks more like a hack ;)  It certainly doesn't look scalable to multiple
> >> target ISAs.  You also unconditionally invoke the target compiler (well, you
> >> invoke the same compiler ...)
> >
> > Yes, I currently call the "target" compiler unconditionally, but it can be
> > placed under a flag/env var/etc.  When we have multiple target ISAs, multiple
> > target compilers will be invoked.  Each of them will read same IL from
> > .gnu.target_lto_ and produce an executable for its target.
> > Why this approach is not scalable?
> 
> Are you sure we have the same IL for all targets and the same targets
> for all functions?  That would certainly simplify things, but you still need
> a way to tell the target compiler which symbol to emit the function on
> as the compile-stage will already necessarily refer to all target
> variant symbols.

This has been discussed to some extent during Cauldron.
Yes, there are various target dependencies in the GIMPLE IL, many of them
very early.
Some of the dependencies are there already during preprocessing, there is
nothing to do about those.
For some things we will just rely on the host and target having the same
properties, stuff like BITS_PER_UNIT, type layout/alignment, endianity,
the OpenMP (and I believe OpenACC too) model effectively requires that,
while you don't need to have shared address space between host and target
(but can have that), for the mapping/unmapping it is assumed that you can
simply take host portions of memory and copy them over to the target device
or back, as sequence of bytes, there is no form of RPC or similar that would
tweak endianity, differently sized types, padding, etc.
While you can say have 64-bit host and 32-bit target or vice versa, the
target IL will simply contain precision info, alignment, structure layout
etc. and just will have to generate right code for that (something that is
native long on the host can be native long long on the target or vice versa
etc.).
Then there are dependencies we'd ideally get rid of, at least pre-IPA,
stuff like BRANCH_COST, but generally that is just an optimization issue and
thus not that big deal.
Bigger issue are target specific builtins, I guess we'll either have to just
sorry on those, or have some helper targhook that will translate a subset of
md builtins from selected hosts to selected targets.
Preferrably, before IPA we'd introduce as few target dependencies into the
IL as possible, and gradually towards RTL can add more dependencies (e.g.
the vectorizer adds so many target dependencies that at that point trying to
use the IL for a different target is practically impossible).

	Jakub

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-09-17 12:15                 ` Jakub Jelinek
@ 2013-09-19 10:45                   ` Ilya Verbin
  2013-09-19 10:50                     ` Jakub Jelinek
  0 siblings, 1 reply; 25+ messages in thread
From: Ilya Verbin @ 2013-09-19 10:45 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, Uday Khedker, Jan Hubicka, Richard Henderson,
	Kirill Yukhin, Michael V. Zolotukhin, GCC Development

On 17 Sep 14:12, Jakub Jelinek wrote:
> On Tue, Sep 17, 2013 at 01:56:39PM +0200, Richard Biener wrote:
> > 
> > Are you sure we have the same IL for all targets and the same targets
> > for all functions?  That would certainly simplify things, but you still need
> > a way to tell the target compiler which symbol to emit the function on
> > as the compile-stage will already necessarily refer to all target
> > variant symbols.
> 
> This has been discussed to some extent during Cauldron.
> Yes, there are various target dependencies in the GIMPLE IL, many of them
> very early.
> Some of the dependencies are there already during preprocessing, there is
> nothing to do about those.
> For some things we will just rely on the host and target having the same
> properties, stuff like BITS_PER_UNIT, type layout/alignment, endianity,
> the OpenMP (and I believe OpenACC too) model effectively requires that,
> while you don't need to have shared address space between host and target
> (but can have that), for the mapping/unmapping it is assumed that you can
> simply take host portions of memory and copy them over to the target device
> or back, as sequence of bytes, there is no form of RPC or similar that would
> tweak endianity, differently sized types, padding, etc.
> While you can say have 64-bit host and 32-bit target or vice versa, the
> target IL will simply contain precision info, alignment, structure layout
> etc. and just will have to generate right code for that (something that is
> native long on the host can be native long long on the target or vice versa
> etc.).
> Then there are dependencies we'd ideally get rid of, at least pre-IPA,
> stuff like BRANCH_COST, but generally that is just an optimization issue and
> thus not that big deal.
> Bigger issue are target specific builtins, I guess we'll either have to just
> sorry on those, or have some helper targhook that will translate a subset of
> md builtins from selected hosts to selected targets.
> Preferrably, before IPA we'd introduce as few target dependencies into the
> IL as possible, and gradually towards RTL can add more dependencies (e.g.
> the vectorizer adds so many target dependencies that at that point trying to
> use the IL for a different target is practically impossible).
> 
> 	Jakub

Do I understand correctly that GIMPLE IL is target dependent, but we will emit
the same IL for all targets?

  -- Ilya

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Questions about LTO infrastructure and pragma omp target
  2013-09-19 10:45                   ` Ilya Verbin
@ 2013-09-19 10:50                     ` Jakub Jelinek
  0 siblings, 0 replies; 25+ messages in thread
From: Jakub Jelinek @ 2013-09-19 10:50 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Richard Biener, Uday Khedker, Jan Hubicka, Richard Henderson,
	Kirill Yukhin, Michael V. Zolotukhin, GCC Development

On Thu, Sep 19, 2013 at 02:44:30PM +0400, Ilya Verbin wrote:
> Do I understand correctly that GIMPLE IL is target dependent, but we will emit
> the same IL for all targets?

Yes.  Some of the target dependencies are required to be inherited from the
host, some can be tolerated (optimization decisions), others can be errored
out (md builtins).

	Jakub

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-09-19 10:50 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-15 13:44 Questions about LTO infrastructure and pragma omp target Ilya Verbin
2013-08-15 15:00 ` Jakub Jelinek
2013-08-15 19:19   ` Richard Biener
2013-08-23 13:15     ` Ilya Verbin
2013-08-23 14:38       ` Jakub Jelinek
2013-08-28  9:59         ` Basile Starynkevitch
2013-08-23 15:05       ` Richard Biener
2013-08-23 15:06         ` Jakub Jelinek
2013-08-25 22:36           ` Ilya Verbin
2013-08-26  7:32             ` Jakub Jelinek
2013-09-03 14:00               ` Michael V. Zolotukhin
2013-09-03 14:19                 ` Jakub Jelinek
2013-09-03 15:18                   ` Michael V. Zolotukhin
2013-09-03 17:39                     ` Thomas Schwinge
2013-09-03 18:30                       ` Michael V. Zolotukhin
2013-09-03 18:54                         ` Jakub Jelinek
2013-09-03 19:09                           ` Michael V. Zolotukhin
2013-09-16 17:14         ` Ilya Verbin
2013-09-17  8:12           ` Richard Biener
2013-09-17 11:31             ` Ilya Verbin
2013-09-17 11:54               ` Jakub Jelinek
2013-09-17 11:56               ` Richard Biener
2013-09-17 12:15                 ` Jakub Jelinek
2013-09-19 10:45                   ` Ilya Verbin
2013-09-19 10:50                     ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).