public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* [gomp4] Building binaries for offload.
@ 2013-10-15 10:04 Kirill Yukhin
  2013-10-18  7:24 ` Jakub Jelinek
  0 siblings, 1 reply; 2+ messages in thread
From: Kirill Yukhin @ 2013-10-15 10:04 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener, hubicka; +Cc: gcc

Hello,
Let me somewhat summarize current understanding of
host binary linking as well as target binary building/linking.

We put code which supposed to be offloaded to dedicated sections,
with name starting with gnu.target_lto_

At link time (I mean, link time of host app):
  1. Generate dedicated data section in each binary (executable or DSO),
     which'll be a placeholder for offloading stuff.

  2. Generate __OPENMP_TARGET__ (weak, hidden) symbol,
     which'll point to start of the section mentioned in previous item.

This section should contain at least:
  1. Number of targets
  2. Size of offl. symbols table

  [ Repeat `number of targets']
  2. Name of target
  3. Offset to beginning of image to offload to that target
  4. Size of image

  5. Offl. symbols table

Offloading symbols table will contain information about addresses
of offloadable symbols in order to create mapping of host<->target
addresses at runtime.

To get list of target addresses we need to have dedicated interface call
to libgomp plugin, something like getTargetAddresses () which will
query target for the list of addresses (accompanied with symbol names).
To get this information target DSO should contain similar table of
mapping symbols to address.

Application is going to have single instance of libgomp, which
in turn means that we'll have single splay tree holding information
about mapping  (host -> target) for all DSO and executable.

When GOMP_target* is called, pointer to table of current execution
module is passed to libgomp along with pointer to routine (or global).
libgomp in turn:
  1. Verify in splay tree if address of given pointer (to the table)
     exists. If not - then this means given table is not yet initialized.
     libgomp initializes it (see below) and insert address of the table
     in to splay tree.
  2. Performs lookup for the address (host) in table provided
     and extracting target address.
  3. After target address is found, we perform API call (passing that address)
     to given device

We have at least 2 approaches of host->target mapping solving.

I. Preserve order of symbols appearance.
   Table row: [ address, size ]
   For routines, size to be 1

   In order to initialize the table we need to get two arrays:
   of host and target addresses. The order of appearance of objects in
   these arrays must be the same. Having this makes mapping easy.
   We just need to find index if given address in array of host addrs and
   then dereference array of target addresses with index found.

   The problem is that it unlikely will work when LTO of host is ON.
   I am also not sure, that order of handling objects on target is the same
   as on host.

II. Store symbol identifier along with address.
  Table row: [ symbol_name, address, size]
  For routines, size to be 1

  To construct the table of host addresses, at link
  time we put all symbol (marked at compile time with dedicated
  attribute) addresses to the table, accompanied with symbol names (they'll
  serve as keys)

  During initialization of the table we create host->target address mapping
  using symbol names as keys.

The last thing I wanted to summarize: compiling target code.

We have 2 approaches here:

   1. Perform WPA and extract sections, marked as target, into separate object
      file. Then call target compiler on that object file to produce the binary.

      As mentioned by Jakub, this approach will complicate debugging.

   2. Pass fat object files directly to the target compiler (one CU at a time).
      So, for every object file we are going to call GCC twice:
      	  - Host GCC, which will compile all host code for every CU
	  - Target GCC, which will compile all target code for every CU

I vote for option #2 as far as WPA-based approach complicates debugging.
What do you guys think?

--
Thanks, K

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [gomp4] Building binaries for offload.
  2013-10-15 10:04 [gomp4] Building binaries for offload Kirill Yukhin
@ 2013-10-18  7:24 ` Jakub Jelinek
  0 siblings, 0 replies; 2+ messages in thread
From: Jakub Jelinek @ 2013-10-18  7:24 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Richard Biener, hubicka, gcc

On Tue, Oct 15, 2013 at 02:03:48PM +0400, Kirill Yukhin wrote:
> Let me somewhat summarize current understanding of
> host binary linking as well as target binary building/linking.
> 
> We put code which supposed to be offloaded to dedicated sections,
> with name starting with gnu.target_lto_
> 
> At link time (I mean, link time of host app):
>   1. Generate dedicated data section in each binary (executable or DSO),
>      which'll be a placeholder for offloading stuff.
> 
>   2. Generate __OPENMP_TARGET__ (weak, hidden) symbol,
>      which'll point to start of the section mentioned in previous item.
> 
> This section should contain at least:
>   1. Number of targets
>   2. Size of offl. symbols table
> 
>   [ Repeat `number of targets']
>   2. Name of target
>   3. Offset to beginning of image to offload to that target
>   4. Size of image
> 
>   5. Offl. symbols table
> 
> Offloading symbols table will contain information about addresses
> of offloadable symbols in order to create mapping of host<->target
> addresses at runtime.
> 
> To get list of target addresses we need to have dedicated interface call
> to libgomp plugin, something like getTargetAddresses () which will
> query target for the list of addresses (accompanied with symbol names).
> To get this information target DSO should contain similar table of
> mapping symbols to address.

No, IMHO it is enough if the linker plugin finds the array of the target
addresses in the shared library it is going to embed (e.g. using some magic
symbol lookup, or named section) and just put a pointer to that place in the
payload into the __OPENMP_TARGET__ header structure, or whatever other way
will be best to provide that info to libgomp.
Say, if the pairs host_address, size are put into .gnu.target_addr section
in the host code and we arrange for the address to be put into vars in
.gnu.target_addr section in the .gnu.target_lto* IL for target, in the end
there will be a table of the target addresses in .gnu.target_addr section
in the target shared library.  So, either the __OPENMP_TARGET__ header
entry for the corresponding target (MIC in your case) would contain both the
host .gnu.target_addr table and a pointer to the .gnu.target_addr in the
payload, or the plugin could copy it over and create a table with
{ host_addr, size, target_addr_nonrelocated } and libgomp would just add a
load bias of the target shared library to the target address.

> Application is going to have single instance of libgomp, which
> in turn means that we'll have single splay tree holding information
> about mapping  (host -> target) for all DSO and executable.

One splay tree per device without shared address space in particular.

> We have at least 2 approaches of host->target mapping solving.
> 
> I. Preserve order of symbols appearance.
>    Table row: [ address, size ]
>    For routines, size to be 1
> 
>    In order to initialize the table we need to get two arrays:
>    of host and target addresses. The order of appearance of objects in
>    these arrays must be the same. Having this makes mapping easy.
>    We just need to find index if given address in array of host addrs and
>    then dereference array of target addresses with index found.
> 
>    The problem is that it unlikely will work when LTO of host is ON.
>    I am also not sure, that order of handling objects on target is the same
>    as on host.

I don't see why it wouldn't work, it will be the duty of the linker plugin
not to reorder the objects.

> II. Store symbol identifier along with address.
>   Table row: [ symbol_name, address, size]
>   For routines, size to be 1
> 
>   To construct the table of host addresses, at link
>   time we put all symbol (marked at compile time with dedicated
>   attribute) addresses to the table, accompanied with symbol names (they'll
>   serve as keys)
> 
>   During initialization of the table we create host->target address mapping
>   using symbol names as keys.

No, this is not going to work, as I said earlier, names aren't necessarily
unique for static functions.
> 
> The last thing I wanted to summarize: compiling target code.
> 
> We have 2 approaches here:
> 
>    1. Perform WPA and extract sections, marked as target, into separate object
>       file. Then call target compiler on that object file to produce the binary.
> 
>       As mentioned by Jakub, this approach will complicate debugging.
> 
>    2. Pass fat object files directly to the target compiler (one CU at a time).
>       So, for every object file we are going to call GCC twice:
>       	  - Host GCC, which will compile all host code for every CU
> 	  - Target GCC, which will compile all target code for every CU
> 
> I vote for option #2 as far as WPA-based approach complicates debugging.
> What do you guys think?

One needs to think about ld -r, the linker plugin might actually see
multiple CUs in one object file, so perhaps the target compiler will need to
be run on the same *.o file several times, with different offsets or
whatever other way to identify the CU in the sections (if .gnu.target_lto*
has section headers, it will be easier).

	Jakub

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-10-18  7:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-15 10:04 [gomp4] Building binaries for offload Kirill Yukhin
2013-10-18  7:24 ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).