public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Michael V. Zolotukhin" <michael.v.zolotukhin@gmail.com>
To: Jakub Jelinek <jakub@redhat.com>
Cc: Kirill Yukhin <kirill.yukhin@gmail.com>,
	Richard Henderson <rth@redhat.com>,
	gcc@gcc.gnu.org,	triegel@redhat.com
Subject: Re: [RFC] Offloading Support in libgomp
Date: Tue, 27 Aug 2013 15:47:00 -0000	[thread overview]
Message-ID: <20130827112609.GA4093@msticlxl57.ims.intel.com> (raw)
In-Reply-To: <20130826141117.GF21876@tucnak.zalov.cz>

Hi Jakub,

> Anyway, the GOMP_target_data implementation and part of GOMP_target would
> be something along the lines of following pseudocode:
> 
> device_data = lookup_device_id (device_id);
> ...
Thanks, I've seen that similarly.  But the problem with passing
arguments to the target is still open.  I'll try to explain, what is the
problem.

Remember what we did for 'pragma parallel':
  struct .omp_data_s.0 .omp_data_o.2;
  .omp_data_o.2.s = 0.0;
  .omp_data_o.2.b = &b;
  .omp_data_o.2.c = &c;
  .omp_data_o.2.y = y_7(D);
  .omp_data_o.2.j = j_9(D);
  __builtin_GOMP_parallel (bar._omp_fn.0, &.omp_data_o.2, 0, 0);
  s_12 = .omp_data_o.2.s;
  y_13 = .omp_data_o.2.y;
  j_14 = .omp_data_o.2.j;

I.e. compiler prepares a structure with all arguments and pass it to the
runtime.  Runtime passes this structure as-is to callee (i.e. to
bar._omp_fn.0).

In bar._omp_fn.0 the compiler just emits code that extracts
corresponding fields from the given struct and thus initialize all
needed local vars:
  bar._omp_fn.0 (struct .omp_data_s.0 * .omp_data_i)
  {
    int _12;
    int _13;
    ...
    _12 = .omp_data_i_11(D)->y;
    _13 = .omp_data_i_11(D)->j;
    ...
  }

That scheme would work perfectly for implementing host fallback, but as
I see it, can't be applied as is for target offloading.  The reason is
the following:
*) Compiler doesn't know runtime info, i.e. it doesn't know target
addresses so it can't fill the structure for passing to target version
of the routine.
*) Runtime doesn't know the structure layout - runtime should firstly
translate addresses and only then pass it to the callee, but it don't
know which addresses to translate, because it doesn't know which
variables are used by the callee.

Currently, I see two possible solutions for this:
1) add to the structure with arguments fields, describing size of each
field.  Then GOMP_target parses this struct and replace every found
address with the corresponding target address, and only then call
target_call.
2) Lift mapping/allocation stuff from runtime to compile time, i.e.
allow the compiler to generate calls like this:
  .omp_data_o.2.s = 0.0;
  .omp_data_o.2.b = &b;
  .omp_data_o.2.c = &c;
  .omp_data_o.2.y = y_7(D);
  .omp_data_o.2.j = j_9(D);
  .omp_data_o.target.2.s = GOMP_translate_target_address (0.0);
  .omp_data_o.target.2.b = GOMP_translate_target_address (&b);
  .omp_data_o.target.2.c = GOMP_translate_target_address (&c);
  .omp_data_o.target.2.y = GOMP_translate_target_address (y_7(D));
  .omp_data_o.target.2.j = GOMP_translate_target_address (j_9(D));
  GOMP_target (bar._omp_fn.0, &.omp_data_o.2, &.omp_data_o.target.2, 0, 0, );
Thus runtime would have two versions of structure with arguments and
will be able to pass it as-is to target callee.  But probably we'll need
a version of that struct for each target and that would look very ugly.

What do you think on that?  Maybe I'm missing or overcomplicating
something, but for now I can't get how all this stuff could work
together without answers to these questions.

Referencing to your code, the question could be rephrased as following:
Having thist string
>       target_call (fn_name, target_addrs);
how would FN_NAME, called on the target, figure out where to find its
arguments (as TARGET_ADDRS contains all mapped at runtime addresses)?

---
Thanks, Michael

On 26 Aug 16:11, Jakub Jelinek wrote:
> On Mon, Aug 26, 2013 at 05:29:36PM +0400, Michael V. Zolotukhin wrote:
> > > Nope, there is only one target data pragma, so you would use here just:
> > > 
> > >   struct data_descriptor data_desc2[2] = { ... };
> > >   GOMP_target (-1, bar.omp_fn.1, "bar.omp_fn.1", data_desc2, 2);
> > This 'pragma target' is placed inside a 'pragma target data' - so all
> > variables for 'pragma target data' should be available for the 'pragma
> > target'.  So we need to pass to GOMP_target an array, that contains
> > united set of mapped variables from both pragmas - in our example these
> > would be variables B, C, and S.  So as I see it, we need to use the same
> > array of descriptors both in outer 'pragma target data' and in inner
> > 'pragma target'.  Is it correct?  If data_desc2 contains descriptors of
> > only C and S, how B would be passed to bar.omp_fn.1?
> 
> Actually no, that should be the responsibility of the runtime library.
> Note, the #pragma omp target data directive doesn't have to be in the same
> function as #pragma omp target.  And, I'm sorry for having to confuse this
> by hacking in a premature optimization in the gimplifier.  It is true
> that if the target data is around target directive that the target will
> always be able to only look stuff up, will not need to allocate it,
> but 1) the gimplifier doesn't verify it is the same device between those two
> 2) and as discussed earlier we need it also for the mapping in GOMP_target
> So, the way it should actually work IMHO is that both GOMP_target_data
> is passed a descriptor for b, and also GOMP_target.
> In a more complicated testcase where you have a pointer based array section:
> void
> foo (int *p)
> {
>   #pragma omp target data map (tofrom: p[:1024])
>   {
>     #pragma omp target
>     for (int i = 0; i < 1024; i++)
>       p[i] += 2;
>   }
> }
> GOMP_target_data does two mappings - one where it maps
> (char *) p+0 ... (char *) p+1024*sizeof(int)-1
> region (tofrom) and one where it maps
> &p ... (char *)(&p+1)+1
> region with pointerassign type (i.e. that it is initialized to
> the address of the target pointer section).
> And then GOMP_target during gimplification determines that p
> is used, but not explicitly mapped, so it is added automatically
> to #pragma omp target as implicit map(tofrom:p).  That doesn't
> do anything with the corresponding array section, and while it is tofrom,
> it will actually be always ignored, since the region is already mapped.
> Perhaps as optimization the compiler could hint the runtime library that
> it can just look it up and doesn't need to allocate/copy anything.
> 
> Anyway, the GOMP_target_data implementation and part of GOMP_target would
> be something along the lines of following pseudocode:
> 
> device_data = lookup_device_id (device_id);
> if (device_data == NULL)
>   do host fallback;
> else
>   {
>     size_t i, length_sum = 0;
>     target_data_env = create_target_data_env ();
>     void *target_addrs[num_device_descs]; // VLA or alloca etc.
>     char *target_addr = NULL;
>     memset (target_addrs, 0, sizeof (target_addrs));
>     vec_safe_push (device_data->target_data_envs, target_data_env);
>     for (i = 0; i < num_device_descs; i++)
>       {
> 	target_addrs[i] = lookup_in_target_address_tree (device_data->addr_tree, device_desc[i].host_addr, device_desc[i].length);
>         if (target_addrs[i] == NULL)
> 	  length_sum += device_desc[i].length;
>       }
>     if (length_sum)
>       {
>         target_addr = target_malloc (device_data, length_sum);
> 	length_sum = 0;
> 	for (i = 0; i < num_device_descs; i++)
> 	  if (target_addrs[i] == NULL)
> 	    {
> 	      target_addrs[i] = target_addr + length_sum;
> 	      length_sum += device_desc[i].length;
> 	      switch (device_desc[i].kind)
> 		{
> 		case ALLOC: case FROM: /* nothing */ break;
> 		case TO:
> 		case TOFROM: target_copy_todevice (device_data, device_desc[i].host_addr, target_addrs[i], device_desc[i].length); break;
> 		case POINTER: lookup + copy to; break;
> 		}
> 	      ptr = add_to_target_address_tree (device_data->addr_tree, device_desc[i].host_addr, device_desc[i].length, target_addrs[i], device_desc[i].kind);
> 	      vec_safe_push (target_data_env->vec, ptr);
> 	    }
>       }
>     if (GOMP_target call)
>       {
>       target_call (fn_name, target_addrs);
>       FOR_EACH_SAFE_VEC (target_data_env->vec, ptr)
> 	{
> 	  switch (ptr->kind)
> 	    {
> 	    case FROM: case TOFROM: target_copy_fromdevice (device_data, ...); break;
> 	    }
> 	  remove_from_target_address_tree (device_data->addr_tree, ptr);
> 	}
>       vec_pop_and_free(device_data->target_data_envs);
>       }
>   }
> 
> and for GOMP_target_data_end it would do pretty much the stuff in between
> FOR_EACH_SAFE_VEC and vec_pop_and_free.
> All names above subject to change for something better, I just wanted to
> make the picture clear.  There needs to be some address -> target address
> data structure (addr_tree), probably some tree (AVL, whatever), in any case
> a lookup doesn't need to be exact, you can e.g. look up part of an existing
> mapping.  Trying to map something that overlaps an existing mapping, but is
> larger than that, is a user bug.
> 
> 	Jakub

  reply	other threads:[~2013-08-27 11:26 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-22 22:37 Michael V. Zolotukhin
2013-08-23  0:22 ` Jakub Jelinek
2013-08-23 12:16   ` Michael V. Zolotukhin
2013-08-23 12:37     ` Jakub Jelinek
2013-08-24  6:17       ` Michael V. Zolotukhin
2013-08-25 16:24         ` Jakub Jelinek
2013-08-27  0:36           ` Michael V. Zolotukhin
2013-08-27  0:38             ` Jakub Jelinek
2013-08-27  6:16               ` Michael V. Zolotukhin
2013-08-27  8:06                 ` Jakub Jelinek
2013-08-27 15:47                   ` Michael V. Zolotukhin [this message]
2013-08-27 16:22                     ` Jakub Jelinek
2013-08-27 19:54                       ` Michael V. Zolotukhin
2013-08-28 11:21                         ` Jakub Jelinek
2013-08-29 10:44                           ` Michael V. Zolotukhin
2013-09-10 15:02                           ` Michael V. Zolotukhin
2013-09-10 15:15                             ` Jakub Jelinek
2013-09-10 15:31                               ` Michael V. Zolotukhin
2013-09-10 15:36                                 ` Jakub Jelinek
2013-09-10 15:38                                   ` Michael V. Zolotukhin
2013-09-13 11:30                                     ` Michael V. Zolotukhin
2013-09-13 12:36                                       ` Jakub Jelinek
2013-09-13 13:11                                         ` Michael V. Zolotukhin
2013-09-13 13:16                                           ` Jakub Jelinek
2013-09-13 15:09                                             ` Ilya Tocar
2013-09-13 15:34                                         ` Jakub Jelinek
2014-07-17  7:52                                       ` Thomas Schwinge
2014-07-17 12:30                                         ` Ilya Verbin
2014-07-17 12:37                                           ` Jakub Jelinek
2014-07-17 12:58                                             ` Thomas Schwinge
2014-07-17 13:09                                               ` Thomas Schwinge
2014-07-17 13:35                                                 ` Jakub Jelinek
2014-07-17 14:37                                                   ` Thomas Schwinge
2013-09-13  9:35                         ` Michael Zolotukhin
2013-09-13 10:52                           ` Kirill Yukhin
2013-09-13 11:04                           ` Nathan Sidwell
2013-09-13 11:21                             ` Michael V. Zolotukhin
2013-09-16  9:35                           ` Jakub Jelinek
2013-09-17 12:05                             ` Michael V. Zolotukhin
2013-09-17 12:30                               ` Jakub Jelinek
2013-10-28 10:43                                 ` Ilya Verbin
2013-10-29  8:04                                   ` Jakub Jelinek
2014-01-31 18:03                                     ` Ilya Verbin
2014-01-31 19:43                                       ` Jakub Jelinek
2014-02-14 15:24                                         ` Ilya Verbin
2014-02-14 15:43                                           ` Jakub Jelinek
2014-02-14 18:54                                             ` Richard Henderson
2014-02-17 15:59                                             ` Ilya Verbin
2014-02-17 16:03                                               ` Jakub Jelinek
2013-08-28 12:56             ` Richard Biener
2013-08-28 15:26               ` Jakub Jelinek
2013-08-28 17:03                 ` Richard Biener
2013-08-28 17:15                   ` Jakub Jelinek
2013-08-29 21:09                     ` Richard Biener
2013-08-28 18:54                   ` Torvald Riegel
2013-08-28 18:43                 ` Torvald Riegel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130827112609.GA4093@msticlxl57.ims.intel.com \
    --to=michael.v.zolotukhin@gmail.com \
    --cc=gcc@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=kirill.yukhin@gmail.com \
    --cc=rth@redhat.com \
    --cc=triegel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).