public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jakub Jelinek <jakub@redhat.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [hsa 2/10] Modifications to libgomp proper
Date: Wed, 09 Dec 2015 11:55:00 -0000	[thread overview]
Message-ID: <20151209113536.GP5675@tucnak.redhat.com> (raw)
In-Reply-To: <20151207111957.GC24234@virgil.suse.cz>

On Mon, Dec 07, 2015 at 12:19:57PM +0100, Martin Jambor wrote:
> +/* Flag set when the subsequent element in the device-specific argument
> +   values.  */
> +#define GOMP_TARGET_ARG_SUBSEQUENT_PARAM	(1 << 7)
> +
> +/* Bitmask to apply to a target argument to find out the value identifier.  */
> +#define GOMP_TARGET_ARG_ID_MASK			(((1 << 8) - 1) << 8)
> +/* Target argument index of NUM_TEAMS.  */
> +#define GOMP_TARGET_ARG_NUM_TEAMS		(1 << 8)
> +/* Target argument index of THREAD_LIMIT.  */
> +#define GOMP_TARGET_ARG_THREAD_LIMIT		(2 << 8)

I meant that these two would be just special, passed as the first two
pointers in the array, without the markup.  Because, otherwise you either
need to use GOMP_TARGET_ARG_SUBSEQUENT_PARAM for these always, or for 32-bit
arches and for 64-bit ones shift often at runtime.  Having the markup even
for them is perhaps cleaner, but less efficient, so if you really want to go
that way, please make sure you handle it properly for 32-bit pointers
architectures though.  num_teams or thread_limit could be > 32767 or >
65535.

> -static void
> -gomp_target_fallback_firstprivate (void (*fn) (void *), size_t mapnum,
> -				   void **hostaddrs, size_t *sizes,
> -				   unsigned short *kinds)
> +static void *
> +gomp_target_unshare_firstprivate (size_t mapnum, void **hostaddrs,
> +				  size_t *sizes, unsigned short *kinds)
>  {
>    size_t i, tgt_align = 0, tgt_size = 0;
>    char *tgt = NULL;
> @@ -1281,7 +1282,7 @@ gomp_target_fallback_firstprivate (void (*fn) (void *), size_t mapnum,
>        }
>    if (tgt_align)
>      {
> -      tgt = gomp_alloca (tgt_size + tgt_align - 1);
> +      tgt = gomp_malloc (tgt_size + tgt_align - 1);

I don't like using gomp_malloc here, either copy/paste the function, or
create separate inline functions for the two loops, one for the first loop
which returns you tgt_align and tgt_size, and another for the stuff after
the allocation.  Then you can use those two inline functions to implement
both gomp_target_fallback_firstprivate which will use alloca, and
gomp_target_unshare_firstprivate which will use gomp_malloc instead.

> @@ -1356,6 +1377,11 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
>     and several arguments have been added:
>     FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h.
>     DEPEND is array of dependencies, see GOMP_task for details.
> +   ARGS is a pointer to an array consisting of NUM_TEAMS, THREAD_LIMIT and a
> +   variable number of device-specific arguments, which always take two elements
> +   where the first specifies the type and the second the actual value.  The
> +   last element of the array is a single NULL.

Note, here you document NUM_TEAMS and THREAD_LIMIT as special values, not
encoded.

> @@ -1473,6 +1508,7 @@ GOMP_target_data (int device, const void *unused, size_t mapnum,
>    struct gomp_device_descr *devicep = resolve_device (device);
>  
>    if (devicep == NULL
> +      || (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
>        || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))

Would be nice to have some consistency in the order of capabilities checks.
Usually you check SHARED_MEM after OPENMP_400, so perhaps do it this way
here too.

> @@ -1741,23 +1784,38 @@ gomp_target_task_fn (void *data)
>  
>        if (ttask->state == GOMP_TARGET_TASK_FINISHED)
>  	{
> -	  gomp_unmap_vars (ttask->tgt, true);
> +	  if (ttask->tgt)
> +	    gomp_unmap_vars (ttask->tgt, true);
>  	  return false;
>  	}

This doesn't make sense.  For the GOMP_OFFLOAD_CAP_SHARED_MEM case, unless
you want to run the free (ttask->firstprivate_copies); as a task, you
shouldn't be queing the target task again for further execution, instead
it should just be handled like a finished task at that point.  The reason
why for XeonPhi or PTX gomp_target_task_fn is run with
GOMP_TARGET_TASK_FINISHED state is that gomp_unmap_vars can perform IO
actions and doing it with the tasking lock held is highly undesirable.
>  
> -      void *fn_addr = gomp_get_target_fn_addr (devicep, ttask->fn);
> -      ttask->tgt
> -	= gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs, NULL,
> -			 ttask->sizes, ttask->kinds, true,
> -			 GOMP_MAP_VARS_TARGET);
> +      bool shared_mem;
> +      if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
> +	{
> +	  shared_mem = true;
> +	  ttask->tgt = NULL;
> +	  ttask->firstprivate_copies
> +	    = gomp_target_unshare_firstprivate (ttask->mapnum, ttask->hostaddrs,
> +						ttask->sizes, ttask->kinds);
> +	}
> +      else
> +	{
> +	  shared_mem = false;
> +	  ttask->tgt = gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs,
> +				      NULL, ttask->sizes, ttask->kinds, true,
> +				      GOMP_MAP_VARS_TARGET);
> +	}
>        ttask->state = GOMP_TARGET_TASK_READY_TO_RUN;
>  
>        devicep->async_run_func (devicep->target_id, fn_addr,
> -			       (void *) ttask->tgt->tgt_start, (void *) ttask);
> +			       shared_mem ? ttask->hostaddrs
> +			       : (void *) ttask->tgt->tgt_start,
> +			       ttask->args, (void *) ttask);

Replace the shared_mem bool variable with a void *arg or some other name and just
set it to ttask->hostaddrs in the then case and ttask->tgt->tgt_start
otherwise?

> --- a/libgomp/task.c
> +++ b/libgomp/task.c
> @@ -581,6 +581,7 @@ GOMP_PLUGIN_target_task_completion (void *data)
>        gomp_mutex_unlock (&team->task_lock);
>      }
>    ttask->state = GOMP_TARGET_TASK_FINISHED;
> +  free (ttask->firstprivate_copies);
>    gomp_target_task_completion (team, task);
>    gomp_mutex_unlock (&team->task_lock);
>  }

So, this function should have a special case for the SHARED_MEM case, handle
it closely to say how GOMP_taskgroup_end handles the finish_cancelled:
case.  Just note that the target task is missing from certain queues at that
point.

	Jakub

  reply	other threads:[~2015-12-09 11:55 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-07 11:18 [hsa 0/10] Merge of HSA branch Martin Jambor
2015-12-07 11:19 ` [hsa 1/10] Configury changes and new options Martin Jambor
2015-12-08 22:43   ` Richard Sandiford
2015-12-10 17:52     ` Martin Jambor
2015-12-11 17:42       ` Jakub Jelinek
2015-12-10 17:52   ` Martin Jambor
2015-12-07 11:20 ` [hsa 3/10] HSA libgomp plugin Martin Jambor
2015-12-09 12:16   ` Jakub Jelinek
2015-12-10 12:06     ` [hsa 3/10] HSA libgomp plugin [part 1/2] Martin Liška
2015-12-10 12:06     ` [hsa 3/10] HSA libgomp plugin [part 2/2] Martin Liška
2015-12-07 11:20 ` [hsa 2/10] Modifications to libgomp proper Martin Jambor
2015-12-09 11:55   ` Jakub Jelinek [this message]
2015-12-10 17:52     ` Martin Jambor
2015-12-11 18:05       ` Jakub Jelinek
2016-01-12 13:46         ` Martin Jambor
2016-01-12 14:23           ` Jakub Jelinek
2016-01-15 20:00             ` Jakub Jelinek
2016-01-12 13:00   ` Alexander Monakov
2016-01-12 13:10     ` Jakub Jelinek
     [not found]       ` <20160112132905.GM3060@virgil.suse.cz>
2016-01-12 13:38         ` Jakub Jelinek
2016-01-12 18:51           ` Martin Jambor
2015-12-07 11:21 ` [hsa 4/10] Merge of HSA branch Martin Jambor
2015-12-09 12:17   ` Jakub Jelinek
2015-12-07 11:23 ` [hsa 5/10] OpenMP lowering/expansion changes (gridification) Martin Jambor
2015-12-09 13:19   ` Jakub Jelinek
2015-12-09 16:25     ` Splitting up gcc/omp-low.c? (was: [hsa 5/10] OpenMP lowering/expansion changes (gridification)) Thomas Schwinge
2015-12-09 17:23       ` Splitting up gcc/omp-low.c? Bernd Schmidt
2015-12-10  8:08         ` Jakub Jelinek
2015-12-10 11:26           ` Bernd Schmidt
2015-12-10 11:34             ` Jakub Jelinek
2015-12-15 18:28               ` Nathan Sidwell
2016-04-08  9:36           ` Thomas Schwinge
2016-04-08 10:46             ` Martin Jambor
2016-04-08 11:08             ` Jakub Jelinek
2016-04-13 16:01             ` Thomas Schwinge
2016-04-13 17:38               ` Bernd Schmidt
2016-04-13 17:56                 ` Thomas Schwinge
2016-04-13 18:20                   ` Bernd Schmidt
2016-04-14 13:36                     ` Thomas Schwinge
2016-04-14 16:01                       ` Bernd Schmidt
2016-04-14 16:01               ` Thomas Schwinge
2016-04-14 20:28                 ` Split out OMP constructs' SIMD clone supporting code (was: Splitting up gcc/omp-low.c?) Thomas Schwinge
2016-04-15  6:25                   ` Split out OMP constructs' SIMD clone supporting code Thomas Schwinge
2016-04-15 11:15                   ` Split out OMP constructs' SIMD clone supporting code (was: Splitting up gcc/omp-low.c?) Jakub Jelinek
2016-04-15 11:53                     ` Thomas Schwinge
2016-04-15 11:57                       ` Jakub Jelinek
2016-04-15 12:11                         ` Thomas Schwinge
2016-04-15 12:15                           ` Jakub Jelinek
2016-04-15 14:33                             ` Thomas Schwinge
2016-05-03  9:34               ` Splitting up gcc/omp-low.c? Thomas Schwinge
2016-05-11 13:44                 ` Thomas Schwinge
2016-05-18 11:42                   ` Thomas Schwinge
2016-05-25  9:17                     ` Thomas Schwinge
2016-05-25 12:54                       ` Martin Jambor
2015-12-18 14:29     ` [hsa 5/10] OpenMP lowering/expansion changes (gridification) Martin Jambor
2015-12-07 11:24 ` [hsa 6/10] Pass manager changes Martin Jambor
2015-12-09 13:20   ` Jakub Jelinek
2015-12-09 13:47     ` Richard Biener
2015-12-07 11:25 ` [hsa 7/10] IPA-HSA pass Martin Jambor
2015-12-07 11:27 ` [hsa 8/10] HSAIL BRIG description header file (and a steering committee request) Martin Jambor
2015-12-07 11:29 ` [hsa 9/10] Majority of the HSA back-end Martin Jambor
2015-12-07 11:30 ` [hsa 10/10] HSA register allocator Martin Jambor
2015-12-07 11:46 ` [hsa 0/10] Merge of HSA branch Jakub Jelinek
2015-12-10 17:51   ` Martin Jambor
2016-01-26 10:46     ` (Non-)offloading diagnostics (was: [hsa 0/10] Merge of HSA branch) Thomas Schwinge
2016-01-26 11:18       ` Alexander Monakov
2016-01-26 11:38         ` (Non-)offloading diagnostics Thomas Schwinge
2016-02-26 16:46       ` Thomas Schwinge
2016-02-26 17:18         ` Martin Jambor
2016-02-26 17:51           ` Jakub Jelinek
2016-02-26 18:48             ` Martin Jambor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151209113536.GP5675@tucnak.redhat.com \
    --to=jakub@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).