public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Michael V. Zolotukhin" <michael.v.zolotukhin@gmail.com>
To: Jakub Jelinek <jakub@redhat.com>
Cc: Kirill Yukhin <kirill.yukhin@gmail.com>,
	Richard Henderson <rth@redhat.com>,
	gcc@gcc.gnu.org,	triegel@redhat.com
Subject: Re: [RFC] Offloading Support in libgomp
Date: Sat, 24 Aug 2013 06:17:00 -0000	[thread overview]
Message-ID: <20130823153052.GA2974@msticlxl57.ims.intel.com> (raw)
In-Reply-To: <20130823095250.GJ1814@tucnak.redhat.com>

> The single call approach would just be passed array of control structures
> that would describe each of the MAP clauses, and you'd simply loop over
> them; the standard requires that if some object is already mapped into the
> device, then nothing is performed (well, target update is an exception).
> So you'd have some data structure that maps [address,address+length)
> intervals for each device into target addresses and start with doing a
> lookup on that (note, if [addr,addr+6) -> devaddr is already in, then addr+4
> should be just mapped to devaddr+4).  If not found, you'd allocate the
> memory on the device, add into the mapping data structure and record in a
> vector for the target/target data in question, so that on exit from that it
> would be copied back if requested, and deallocated.  I don't see how single
> call vs. multiple calls would change anything on that.  Plus a single call
> allows the device id to be looked up just once and treat all the mappings
> as a batch (GOMP_target_data would probably need corresponding
> GOMP_target_data_end call, perhaps just with a device_id and would pop from
> the stack of pending target data regions for that device.
That makes sense.  We could maintain a vector of descriptors for each
encountered MAP clause and push to and pop from it when needed (when
e.g. new mapping is encountered inside 'pragma omp target data').  The
desciptor should contain address in the host memory, size of the mapped
block, type of mapping, related device, and handler, which would be
returned for this mapping by runtime.  Having vector of such
descriptors, we could pass it as an argument for outlined functions - in
them we need to extract needed addresses from the vector before
executing the body.  Did I get it right?

Also, a bit unclear point here is how should we generate these
extractions in target-version of the outlined function - seemingly we
won't pass this entire vector to it, so it's unclear out of what should
we extract the data.  What do you think on this?

> I meant just a single plugin that would handle all of them, or as richi
> said, perhaps teach LTO plugin to do that.
> For options, my vision was something like:
> -ftarget=mic -ftarget=hsail='-mfoobaz=4 -mbazbaz'
> which would mean:
> 1) compile LTO IL from the accelerator section for mic with
>    the originally recorded gcc command line options with the Target options
>    removed and no extra options added
> 2) compile LTO IL also for hsail target, with originally recorded gcc
>    command line options but Target options and -mfoobaz=4 -mbazbaz
>    options added
> 3) don't compile for ptx
> The thing is if you originally compile with
> -O3 -ftree-vectorize -march=corei7-avx -minline-all-stringops
> the -m* options might not apply to the target compiler at all.
> So you'd construct the command line from the original command line sans
> CL_TARGET options, append to that the link time override for the
> accelerator.  Then another thing is how to find out the corresponding
> compiler (including its driver) for the target from the plugin.
Could we set some correspondance between '-ftarget' option value and
corresponding compiler?  E.g. for '-ftarget=xyz' we would look for
xyz-cc1.  I haven't looked in details at how the compiler plugins work,
so maybe I said something unfeasable:)

> libgomp would start by trying to dlopen all available plugins,
> and for each of them call some routine in them that would query the hw
> for available devices, then libgomp would assign device ids to them (0 and
> up) and then for target specific parts just dispatch again to the plugin
> corresponding to the chosen device.
In libgomp we have a similar problem - there we would need to find out
plugins names from somewhere.  The difference is that libgomp would
always iterate through all plugins independently on compiler options,
but even with this I currently have no idea of how to populate list of
plugins names (I suppose, this should be done somewhere at
configure/make step of libgomp building process?).


On 23 Aug 11:52, Jakub Jelinek wrote:
> On Fri, Aug 23, 2013 at 01:28:10PM +0400, Michael V. Zolotukhin wrote:
> > Sure, I used '#pragma omp target' just for a simple example.  The
> > question about '#pragma omp target data' is still open.  As far as I
> > understand, all three of the pragmas could require data marshalling (but
> > not necessary - 'omp target', if it's located inside 'omp target data'
> > which specifies all needed for 'omp dtarget' variables, won't need any
> > data marshalling - right?).  This data movement could be done, as you
> > noted by a single call or a set of calls (one for each clause) - and
> > while single call seems appealing, it could be better to use separate
> > calls in case, when e.g. we have a 'target update' for only a subset of
> > all described in 'target data' variables.  One-call approach has
> > difficulties with specifying, which subset of the data we want to
> > update.
> 
> The single call approach would just be passed array of control structures
> that would describe each of the MAP clauses, and you'd simply loop over
> them; the standard requires that if some object is already mapped into the
> device, then nothing is performed (well, target update is an exception).
> So you'd have some data structure that maps [address,address+length)
> intervals for each device into target addresses and start with doing a
> lookup on that (note, if [addr,addr+6) -> devaddr is already in, then addr+4
> should be just mapped to devaddr+4).  If not found, you'd allocate the
> memory on the device, add into the mapping data structure and record in a
> vector for the target/target data in question, so that on exit from that it
> would be copied back if requested, and deallocated.  I don't see how single
> call vs. multiple calls would change anything on that.  Plus a single call
> allows the device id to be looked up just once and treat all the mappings
> as a batch (GOMP_target_data would probably need corresponding
> GOMP_target_data_end call, perhaps just with a device_id and would pop from
> the stack of pending target data regions for that device.
> [B
> > > I'd prefer GOMP_target instead of GOMP_offload for the function name, to
> > > make it clearly related to the corresponding directive.
> > That makes sense - I just used first name that came to my mind here.
> > 
> > > > GOMP_offload is a call to libgomp, which will be implemented somehow like this:
> > > >   void GOMP_offload (void (*fn)(void*), void *data, const char *fname)
> > > >   {
> > > >     if (gomp_offload_available ())
> > > 
> > > This really isn't just check whether accelerator is available, we need to
> > > query all accelerators in the system (and cache that somehow in the
> > > library), assign device numbers to individual devices (say, you could have
> > > two Intel MIC cards, one AMD HSAIL capable GPGPU and 4 Nvidia PTX capable
> > > GPGPUs or similar), ensure that already assigned device numbers aren't
> > > reused when discovering new ones and then just check what device user
> > > requested (if not available, fall back to host), next check see if we
> > > have corresponding code for that accelerator (again, fallback to host
> > > otherwise), optionally compile the code if not compiled yet (HSAIL/PTX code
> > > only) then finally do the name lookup and spawn it.
> > Multi-target option arises another bunch of questions:)  Could you
> > please check if my vision of how GCC would handle multiple offload
> > targets? Here it is:
> > We have GCC with a set of plugins for compiling code for each available
> > offloading target.  These plugins work similarly to lto-plugin, i.e.
> > they consume gimple as the input, but produce a code for the specific
> > target.  Libgomp also has similar set of plugins for HW specific
> > implementation of functions for remote running code, data transferring,
> > getting device status etc.
> > For example, for Intel MIC, AMD HSAIL and Nvidia PTX we'll have host-GCC
> > with three plugins and host-libgomp, also with three plugins.
> > Invoking GCC with options like '-mmic', '-mhsail', '-mptx' triggers
> > usage of a corresponding plugins in GCC driver.  In result, after the
> > compilation we'd have four binaries: one for host and three for possible
> > targets.
> 
> I meant just a single plugin that would handle all of them, or as richi
> said, perhaps teach LTO plugin to do that.
> For options, my vision was something like:
> -ftarget=mic -ftarget=hsail='-mfoobaz=4 -mbazbaz'
> which would mean:
> 1) compile LTO IL from the accelerator section for mic with
>    the originally recorded gcc command line options with the Target options
>    removed and no extra options added
> 2) compile LTO IL also for hsail target, with originally recorded gcc
>    command line options but Target options and -mfoobaz=4 -mbazbaz
>    options added
> 3) don't compile for ptx
> The thing is if you originally compile with
> -O3 -ftree-vectorize -march=corei7-avx -minline-all-stringops
> the -m* options might not apply to the target compiler at all.
> So you'd construct the command line from the original command line sans
> CL_TARGET options, append to that the link time override for the
> accelerator.  Then another thing is how to find out the corresponding
> compiler (including its driver) for the target from the plugin.
> 
> > Now, libgomp part.  The host binary consists calls to libgomp.so, which
> > is target-independent (i.e. it's host-specific).  It should be able to
> > call plugins for all three targets, so in functions like
> > gomp_offload_available it probably would iterate through all available
> > plugins, asking for device status and the code availability.  This
> > iterating would include dlopen of the corresponding plugin, calling a
> > function from it and moving to the next plugin.
> > Is this correct?
> 
> libgomp would start by trying to dlopen all available plugins,
> and for each of them call some routine in them that would query the hw
> for available devices, then libgomp would assign device ids to them (0 and
> up) and then for target specific parts just dispatch again to the plugin
> corresponding to the chosen device.
> 
> 	Jakub

  reply	other threads:[~2013-08-23 15:31 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-22 22:37 Michael V. Zolotukhin
2013-08-23  0:22 ` Jakub Jelinek
2013-08-23 12:16   ` Michael V. Zolotukhin
2013-08-23 12:37     ` Jakub Jelinek
2013-08-24  6:17       ` Michael V. Zolotukhin [this message]
2013-08-25 16:24         ` Jakub Jelinek
2013-08-27  0:36           ` Michael V. Zolotukhin
2013-08-27  0:38             ` Jakub Jelinek
2013-08-27  6:16               ` Michael V. Zolotukhin
2013-08-27  8:06                 ` Jakub Jelinek
2013-08-27 15:47                   ` Michael V. Zolotukhin
2013-08-27 16:22                     ` Jakub Jelinek
2013-08-27 19:54                       ` Michael V. Zolotukhin
2013-08-28 11:21                         ` Jakub Jelinek
2013-08-29 10:44                           ` Michael V. Zolotukhin
2013-09-10 15:02                           ` Michael V. Zolotukhin
2013-09-10 15:15                             ` Jakub Jelinek
2013-09-10 15:31                               ` Michael V. Zolotukhin
2013-09-10 15:36                                 ` Jakub Jelinek
2013-09-10 15:38                                   ` Michael V. Zolotukhin
2013-09-13 11:30                                     ` Michael V. Zolotukhin
2013-09-13 12:36                                       ` Jakub Jelinek
2013-09-13 13:11                                         ` Michael V. Zolotukhin
2013-09-13 13:16                                           ` Jakub Jelinek
2013-09-13 15:09                                             ` Ilya Tocar
2013-09-13 15:34                                         ` Jakub Jelinek
2014-07-17  7:52                                       ` Thomas Schwinge
2014-07-17 12:30                                         ` Ilya Verbin
2014-07-17 12:37                                           ` Jakub Jelinek
2014-07-17 12:58                                             ` Thomas Schwinge
2014-07-17 13:09                                               ` Thomas Schwinge
2014-07-17 13:35                                                 ` Jakub Jelinek
2014-07-17 14:37                                                   ` Thomas Schwinge
2013-09-13  9:35                         ` Michael Zolotukhin
2013-09-13 10:52                           ` Kirill Yukhin
2013-09-13 11:04                           ` Nathan Sidwell
2013-09-13 11:21                             ` Michael V. Zolotukhin
2013-09-16  9:35                           ` Jakub Jelinek
2013-09-17 12:05                             ` Michael V. Zolotukhin
2013-09-17 12:30                               ` Jakub Jelinek
2013-10-28 10:43                                 ` Ilya Verbin
2013-10-29  8:04                                   ` Jakub Jelinek
2014-01-31 18:03                                     ` Ilya Verbin
2014-01-31 19:43                                       ` Jakub Jelinek
2014-02-14 15:24                                         ` Ilya Verbin
2014-02-14 15:43                                           ` Jakub Jelinek
2014-02-14 18:54                                             ` Richard Henderson
2014-02-17 15:59                                             ` Ilya Verbin
2014-02-17 16:03                                               ` Jakub Jelinek
2013-08-28 12:56             ` Richard Biener
2013-08-28 15:26               ` Jakub Jelinek
2013-08-28 17:03                 ` Richard Biener
2013-08-28 17:15                   ` Jakub Jelinek
2013-08-29 21:09                     ` Richard Biener
2013-08-28 18:54                   ` Torvald Riegel
2013-08-28 18:43                 ` Torvald Riegel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130823153052.GA2974@msticlxl57.ims.intel.com \
    --to=michael.v.zolotukhin@gmail.com \
    --cc=gcc@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=kirill.yukhin@gmail.com \
    --cc=rth@redhat.com \
    --cc=triegel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).