public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Michael V. Zolotukhin" <michael.v.zolotukhin@gmail.com>
To: Jakub Jelinek <jakub@redhat.com>
Cc: Kirill Yukhin <kirill.yukhin@gmail.com>,
	Richard Henderson <rth@redhat.com>,
	gcc@gcc.gnu.org,	triegel@redhat.com
Subject: Re: [RFC] Offloading Support in libgomp
Date: Fri, 23 Aug 2013 12:16:00 -0000	[thread overview]
Message-ID: <20130823092810.GA36483@msticlxl57.ims.intel.com> (raw)
In-Reply-To: <20130822142814.GB1814@tucnak.redhat.com>

> Roughly.  We have 3 directives here,
> #pragma omp target
> #pragma omp target data
> #pragma omp target update
> and all of them have various clauses, some that are allowed at most once
> (e.g. the device clause, if clause) and others that can be used many times
> (the data movement clauses).
> The question is if we want to emit multiple calls for the single directive,
> say one for each data movement clause (where for each one we need address,
> length, direction and some way how to propagate the transformed address
> to the accelerator code), or if we build an array of the data movement
> structures and just pass that down to a single routine.  Because of the
> device clause which should be probably passed just as an integer with -1
> meaning the default, perhaps single routine might be better.
Sure, I used '#pragma omp target' just for a simple example.  The
question about '#pragma omp target data' is still open.  As far as I
understand, all three of the pragmas could require data marshalling (but
not necessary - 'omp target', if it's located inside 'omp target data'
which specifies all needed for 'omp dtarget' variables, won't need any
data marshalling - right?).  This data movement could be done, as you
noted by a single call or a set of calls (one for each clause) - and
while single call seems appealing, it could be better to use separate
calls in case, when e.g. we have a 'target update' for only a subset of
all described in 'target data' variables.  One-call approach has
difficulties with specifying, which subset of the data we want to
update.

> I'd prefer GOMP_target instead of GOMP_offload for the function name, to
> make it clearly related to the corresponding directive.
That makes sense - I just used first name that came to my mind here.

> > GOMP_offload is a call to libgomp, which will be implemented somehow like this:
> >   void GOMP_offload (void (*fn)(void*), void *data, const char *fname)
> >   {
> >     if (gomp_offload_available ())
> 
> This really isn't just check whether accelerator is available, we need to
> query all accelerators in the system (and cache that somehow in the
> library), assign device numbers to individual devices (say, you could have
> two Intel MIC cards, one AMD HSAIL capable GPGPU and 4 Nvidia PTX capable
> GPGPUs or similar), ensure that already assigned device numbers aren't
> reused when discovering new ones and then just check what device user
> requested (if not available, fall back to host), next check see if we
> have corresponding code for that accelerator (again, fallback to host
> otherwise), optionally compile the code if not compiled yet (HSAIL/PTX code
> only) then finally do the name lookup and spawn it.
Multi-target option arises another bunch of questions:)  Could you
please check if my vision of how GCC would handle multiple offload
targets? Here it is:
We have GCC with a set of plugins for compiling code for each available
offloading target.  These plugins work similarly to lto-plugin, i.e.
they consume gimple as the input, but produce a code for the specific
target.  Libgomp also has similar set of plugins for HW specific
implementation of functions for remote running code, data transferring,
getting device status etc.
For example, for Intel MIC, AMD HSAIL and Nvidia PTX we'll have host-GCC
with three plugins and host-libgomp, also with three plugins.
Invoking GCC with options like '-mmic', '-mhsail', '-mptx' triggers
usage of a corresponding plugins in GCC driver.  In result, after the
compilation we'd have four binaries: one for host and three for possible
targets.
Now, libgomp part.  The host binary consists calls to libgomp.so, which
is target-independent (i.e. it's host-specific).  It should be able to
call plugins for all three targets, so in functions like
gomp_offload_available it probably would iterate through all available
plugins, asking for device status and the code availability.  This
iterating would include dlopen of the corresponding plugin, calling a
function from it and moving to the next plugin.
Is this correct?

---
Thanks, Michael

On 22 Aug 16:28, Jakub Jelinek wrote:
> On Thu, Aug 22, 2013 at 06:08:10PM +0400, Michael V. Zolotukhin wrote:
> > We're working on design for offloading support in GCC (part of OpenMP4), and I
> > have a question regarding libgomp part.
> > 
> > Suppose we expand '#pragma omp target' like we expand '#pragma omp parallel',
> > i.e. the compiler expands the following code:
> >   #pragma omp target
> >   {
> >     body;
> >   }
> > to this:
> >   void subfunction (void *data)
> >   {
> >     use data;
> >     body;
> >   }
> > 
> >   setup data;
> >   function_name = "subfunction";
> >   GOMP_offload (subfunction, &data, function_name);
> 
> Roughly.  We have 3 directives here,
> #pragma omp target
> #pragma omp target data
> #pragma omp target update
> and all of them have various clauses, some that are allowed at most once
> (e.g. the device clause, if clause) and others that can be used many times
> (the data movement clauses).
> I'd prefer GOMP_target instead of GOMP_offload for the function name, to
> make it clearly related to the corresponding directive.
> The question is if we want to emit multiple calls for the single directive,
> say one for each data movement clause (where for each one we need address,
> length, direction and some way how to propagate the transformed address
> to the accelerator code), or if we build an array of the data movement
> structures and just pass that down to a single routine.  Because of the
> device clause which should be probably passed just as an integer with -1
> meaning the default, perhaps single routine might be better.
> 
> > GOMP_offload is a call to libgomp, which will be implemented somehow like this:
> >   void GOMP_offload (void (*fn)(void*), void *data, const char *fname)
> >   {
> >     if (gomp_offload_available ())
> 
> This really isn't just check whether accelerator is available, we need to
> query all accelerators in the system (and cache that somehow in the
> library), assign device numbers to individual devices (say, you could have
> two Intel MIC cards, one AMD HSAIL capable GPGPU and 4 Nvidia PTX capable
> GPGPUs or similar), ensure that already assigned device numbers aren't
> reused when discovering new ones and then just check what device user
> requested (if not available, fall back to host), next check see if we
> have corresponding code for that accelerator (again, fallback to host
> otherwise), optionally compile the code if not compiled yet (HSAIL/PTX code
> only) then finally do the name lookup and spawn it.
> Stuff specific to the HW should be in libgomp plugins IMHO, so we have one
> dlopenable module for each of the 3 variants, where one fn in the plugin
> would be about checking what HW is available, one about trying to run the
> code etc.
> 
> 	Jakub

  reply	other threads:[~2013-08-23  9:28 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-22 22:37 Michael V. Zolotukhin
2013-08-23  0:22 ` Jakub Jelinek
2013-08-23 12:16   ` Michael V. Zolotukhin [this message]
2013-08-23 12:37     ` Jakub Jelinek
2013-08-24  6:17       ` Michael V. Zolotukhin
2013-08-25 16:24         ` Jakub Jelinek
2013-08-27  0:36           ` Michael V. Zolotukhin
2013-08-27  0:38             ` Jakub Jelinek
2013-08-27  6:16               ` Michael V. Zolotukhin
2013-08-27  8:06                 ` Jakub Jelinek
2013-08-27 15:47                   ` Michael V. Zolotukhin
2013-08-27 16:22                     ` Jakub Jelinek
2013-08-27 19:54                       ` Michael V. Zolotukhin
2013-08-28 11:21                         ` Jakub Jelinek
2013-08-29 10:44                           ` Michael V. Zolotukhin
2013-09-10 15:02                           ` Michael V. Zolotukhin
2013-09-10 15:15                             ` Jakub Jelinek
2013-09-10 15:31                               ` Michael V. Zolotukhin
2013-09-10 15:36                                 ` Jakub Jelinek
2013-09-10 15:38                                   ` Michael V. Zolotukhin
2013-09-13 11:30                                     ` Michael V. Zolotukhin
2013-09-13 12:36                                       ` Jakub Jelinek
2013-09-13 13:11                                         ` Michael V. Zolotukhin
2013-09-13 13:16                                           ` Jakub Jelinek
2013-09-13 15:09                                             ` Ilya Tocar
2013-09-13 15:34                                         ` Jakub Jelinek
2014-07-17  7:52                                       ` Thomas Schwinge
2014-07-17 12:30                                         ` Ilya Verbin
2014-07-17 12:37                                           ` Jakub Jelinek
2014-07-17 12:58                                             ` Thomas Schwinge
2014-07-17 13:09                                               ` Thomas Schwinge
2014-07-17 13:35                                                 ` Jakub Jelinek
2014-07-17 14:37                                                   ` Thomas Schwinge
2013-09-13  9:35                         ` Michael Zolotukhin
2013-09-13 10:52                           ` Kirill Yukhin
2013-09-13 11:04                           ` Nathan Sidwell
2013-09-13 11:21                             ` Michael V. Zolotukhin
2013-09-16  9:35                           ` Jakub Jelinek
2013-09-17 12:05                             ` Michael V. Zolotukhin
2013-09-17 12:30                               ` Jakub Jelinek
2013-10-28 10:43                                 ` Ilya Verbin
2013-10-29  8:04                                   ` Jakub Jelinek
2014-01-31 18:03                                     ` Ilya Verbin
2014-01-31 19:43                                       ` Jakub Jelinek
2014-02-14 15:24                                         ` Ilya Verbin
2014-02-14 15:43                                           ` Jakub Jelinek
2014-02-14 18:54                                             ` Richard Henderson
2014-02-17 15:59                                             ` Ilya Verbin
2014-02-17 16:03                                               ` Jakub Jelinek
2013-08-28 12:56             ` Richard Biener
2013-08-28 15:26               ` Jakub Jelinek
2013-08-28 17:03                 ` Richard Biener
2013-08-28 17:15                   ` Jakub Jelinek
2013-08-29 21:09                     ` Richard Biener
2013-08-28 18:54                   ` Torvald Riegel
2013-08-28 18:43                 ` Torvald Riegel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130823092810.GA36483@msticlxl57.ims.intel.com \
    --to=michael.v.zolotukhin@gmail.com \
    --cc=gcc@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=kirill.yukhin@gmail.com \
    --cc=rth@redhat.com \
    --cc=triegel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).