public inbox for fortran@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jakub Jelinek <jakub@redhat.com>
To: Marcel Vollweiler <marcel@codesourcery.com>
Cc: gcc-patches@gcc.gnu.org, fortran@gcc.gnu.org
Subject: Re: [PATCH] OpenMP, libgomp: Add new runtime routines omp_target_memcpy_async and omp_target_memcpy_rect_async
Date: Thu, 5 May 2022 10:30:21 +0200	[thread overview]
Message-ID: <YnOLHQyU2E9k4Rna@tucnak> (raw)
In-Reply-To: <fcffc754-e289-9725-c386-1fc6b60667c6@codesourcery.com>

On Mon, Feb 21, 2022 at 12:19:20PM +0100, Marcel Vollweiler wrote:
> gcc/ChangeLog:
> 
> 	* omp-low.cc (omp_runtime_api_call): Added target_memcpy_async and
> 	target_memcpy_rect_async to omp_runtime_apis array.
> 
> libgomp/ChangeLog:
> 
> 	* libgomp.map: Added omp_target_memcpy_async and
> 	omp_target_memcpy_rect_async.
> 	* libgomp.texi: Both functions are now supported.
> 	* omp.h.in: Added omp_target_memcpy_async and
> 	omp_target_memcpy_rect_async.
> 	* omp_lib.f90.in: Added interfaces for both new functions.
> 	* omp_lib.h.in: Likewise.
> 	* target.c (omp_target_memcpy): Restructured into check and copy part.
> 	(omp_target_memcpy_check): New helper function for omp_target_memcpy and
> 	omp_target_memcpy_async that checks requirements.
> 	(omp_target_memcpy_copy): New helper function for omp_target_memcpy and
> 	omp_target_memcpy_async that performs the memcpy.
> 	(omp_target_memcpy_async_helper): New helper function that is used in
> 	omp_target_memcpy_async for the asynchronous task.
> 	(omp_target_memcpy_async): Added.
> 	(omp_target_memcpy_rect): Restructured into check and copy part.
> 	(omp_target_memcpy_rect_check): New helper function for
> 	omp_target_memcpy_rect and omp_target_memcpy_rect_async that checks
> 	requirements.
> 	(omp_target_memcpy_rect_copy): New helper function for
> 	omp_target_memcpy_rect and omp_target_memcpy_rect_async that performs
> 	the memcpy.
> 	(omp_target_memcpy_rect_async_helper): New helper function that is used
> 	in omp_target_memcpy_rect_async for the asynchronous task.
> 	(omp_target_memcpy_rect_async): Added.
> 	* testsuite/libgomp.c-c++-common/target-memcpy-async-1.c: New test.
> 	* testsuite/libgomp.c-c++-common/target-memcpy-async-2.c: New test.
> 	* testsuite/libgomp.c-c++-common/target-memcpy-rect-async-1.c: New test.
> 	* testsuite/libgomp.c-c++-common/target-memcpy-rect-async-2.c: New test.
> 	* testsuite/libgomp.fortran/target-memcpy-async-1.f90: New test.
> 	* testsuite/libgomp.fortran/target-memcpy-async-2.f90: New test.
> 	* testsuite/libgomp.fortran/target-memcpy-rect-async-1.f90: New test.
> 	* testsuite/libgomp.fortran/target-memcpy-rect-async-2.f90: New test.
> 
> --- a/libgomp/libgomp.map
> +++ b/libgomp/libgomp.map
> @@ -224,6 +224,8 @@ OMP_5.1 {
>  	omp_set_teams_thread_limit_8_;
>  	omp_get_teams_thread_limit;
>  	omp_get_teams_thread_limit_;
> +	omp_target_memcpy_async;
> +	omp_target_memcpy_rect_async;
>  } OMP_5.0.2;

These should be added to OMP_5.1.1, not here.

> --- a/libgomp/omp.h.in
> +++ b/libgomp/omp.h.in
> @@ -272,6 +272,10 @@ extern int omp_target_is_present (const void *, int) __GOMP_NOTHROW;
>  extern int omp_target_memcpy (void *, const void *, __SIZE_TYPE__,
>  			      __SIZE_TYPE__, __SIZE_TYPE__, int, int)
>    __GOMP_NOTHROW;
> +extern int omp_target_memcpy_async (void *, const void *, __SIZE_TYPE__,
> +				    __SIZE_TYPE__, __SIZE_TYPE__, int, int,
> +				    int, omp_depend_t*)

Formatting, space before *.

> +  __GOMP_NOTHROW;
>  extern int omp_target_memcpy_rect (void *, const void *, __SIZE_TYPE__, int,
>  				   const __SIZE_TYPE__ *,
>  				   const __SIZE_TYPE__ *,
> @@ -279,6 +283,14 @@ extern int omp_target_memcpy_rect (void *, const void *, __SIZE_TYPE__, int,
>  				   const __SIZE_TYPE__ *,
>  				   const __SIZE_TYPE__ *, int, int)
>    __GOMP_NOTHROW;
> +extern int omp_target_memcpy_rect_async (void *, const void *, __SIZE_TYPE__,
> +					 int, const __SIZE_TYPE__ *,
> +					 const __SIZE_TYPE__ *,
> +					 const __SIZE_TYPE__ *,
> +					 const __SIZE_TYPE__ *,
> +					 const __SIZE_TYPE__ *, int, int, int,
> +					 omp_depend_t*)

Likewise.

> -int
> -omp_target_memcpy (void *dst, const void *src, size_t length,
> -		   size_t dst_offset, size_t src_offset, int dst_device_num,
> -		   int src_device_num)
> +static int
> +omp_target_memcpy_check (void *dst, const void *src, int dst_device_num,
> +			 int src_device_num,
> +			 struct gomp_device_descr **dst_devicep,
> +			 struct gomp_device_descr **src_devicep)
>  {

Why does omp_target_memcpy_check need the dst and src arguments?  From what
I can see, they aren't used by it.

> +typedef struct
> +{
> +  void *dst;
> +  const void *src;
> +  size_t length;
> +  size_t dst_offset;
> +  size_t src_offset;
> +  struct gomp_device_descr *dst_devicep;
> +  struct gomp_device_descr *src_devicep;
> +} memcpy_t;

Please come up with some less generic name, struct omp_target_memcpy_data
or something similar.  Even the *_t suffix is problematic, as *_t is
reserved for the implementation.

> +
> +void
> +omp_target_memcpy_async_helper (void *args)

This should be static.

> +{
> +  memcpy_t *a = args;
> +  int ret = omp_target_memcpy_copy (a->dst, a->src, a->length, a->dst_offset,
> +				    a->src_offset, a->dst_devicep,
> +				    a->src_devicep);
> +  if (ret)
> +    gomp_fatal ("asynchronous memcpy failed");

I'm not really sure killing the whole program if the copying failed is the
best action.  Has it been discussed on omp-lang?  Perhaps the APIs should
have a way how to propagate the result to the caller when it completes
somehow?
Even if we do that, the ret variable seems to be superfluos, just do
  if (omp_target_memcpy_copy (...))
    gomp_fatal (...);

> +{
> +  struct gomp_device_descr *dst_devicep = NULL, *src_devicep = NULL;
> +
> +  int check = omp_target_memcpy_check (dst, src, dst_device_num, src_device_num,
> +				       &dst_devicep, &src_devicep);
> +  if (check)
> +    return check;
> +
> +  void (*fn) (void *) = &omp_target_memcpy_async_helper;
> +  void *data = NULL;
> +  void (*cpyfn) (void *, void *) = NULL;
> +  long arg_size = 0;
> +  long arg_align = 0;
> +  bool if_clause = false;
> +  unsigned flags = 0;
> +  int priority_arg = 0;
> +  void *detach = NULL;
> +
> +  memcpy_t s = {
> +    .dst = dst,
> +    .src = src,
> +    .length = length,
> +    .dst_offset = dst_offset,
> +    .src_offset = src_offset,
> +    .dst_devicep = dst_devicep,
> +    .src_devicep = src_devicep
> +  };

I think we in libgomp try to use C89 and so declare vars first before other
statements.

> +  data = &s;
> +
> +  void *depend[depobj_count+5];

Spaces around + , i.e. depobj_count + 5

> +  depend[0] = 0;
> +  depend[1] = (void*) ((uintptr_t) depobj_count);

Space before *.  The ()s around (uintptr_t) depobj_count
are superfluous.

> +  depend[2] = depend[3] = depend[4] = 0;
> +  for (int i = 0; i < depobj_count; ++i)
> +    depend[i+5] = &depobj_list[i];

i + 5

> +
> +  if (depobj_count > 0 && depobj_list != NULL)
> +    flags |= GOMP_TASK_FLAG_DEPEND;
> +
> +  GOMP_task (fn, data, cpyfn, arg_size, arg_align, if_clause, flags, depend,
> +	     priority_arg, detach);

We need to make sure that GOMP_task doesn't go through PLT.
So, I think this needs to be ialias_call and task.c needs to add ialias for
GOMP_task.
Also, I must say I don't like very much using variables that you initialize
to constants and just pass to the call, either pass the constants directly
to the call, or use /*priority_arg=*/0, /*detach=*/NULL style.

More importantly, I have no idea how this can work when you pass arg_size 0
and arg_align 0.  The s variable is in the current function frame, with
arg_size 0 nothing is really copied to the generated task.
arg_size should be sizeof (memcpy_t) and arg_align __alignof__ (memcpy_t)
(well, struct omp_target_memcpy_data).

Also, it would be nice to avoid GOMP_task for the depobj_count == 0 case
at least sometimes (but perhaps that can be done incrementally) and instead
use some CUDA etc. asynchronous copy APIs.  We don't really need to wait
for anything in that case, and from OpenMP POV all we need to make sure is
that barrier/taskwait/taskgroup end will know about these "tasks" and
wait for them.  So, it can be implemented more like #pragma omp target nowait
instead of #pragma omp task that calls the synchronous omp_target_memcpy.
Though, maybe that is how it should be implemented always, something like
gomp_create_target_task and its caller.  We already use that single routine
for multiple purposes (target nowait as well as target enter/exit data
nowait), so just telling it somehow that it shouldn't do mapping/unmapping
and perhaps target execution and instead copying would be nice.

E.g. if one uses omp_target_memcpy_async outside of any explicit parallel
or host teams, would be nice if it still was asynchronous and not
synchronous.  But even in explicit parallel, would be nice if we didn't
waste one of the threads waiting for it when it can do useful work on the
host.  It is true that for target nowait we have one unshackeled thread
usually that polls the device.  Though that is mainly because we need to
do some unmapping at the end of target nowait, including taking the lock
etc.  For the async copying maybe we don't need to take any lock and could
just arrange for check if already completed or sleep until completed if
possible, at least in the future.  For now at least handling it like target
nowait would be an improvement.

> +} memcpy_rect_t;

Again, please use better type name.

> +
> +void

And this should be static.

> +omp_target_memcpy_rect_async_helper (void *args)
> +{
> +  memcpy_rect_t *a = args;
> +  int ret = omp_target_memcpy_rect_copy (a->dst, a->src, a->element_size,
> +					 a->num_dims, a->volume, a->dst_offsets,
> +					 a->src_offsets, a->dst_dimensions,
> +					 a->src_dimensions, a->dst_devicep,
> +					 a->src_devicep);
> +  if (ret)
> +    gomp_fatal ("asynchronous memcpy failed");

See above.

	Jakub


  reply	other threads:[~2022-05-05  8:30 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-21 11:19 Marcel Vollweiler
2022-05-05  8:30 ` Jakub Jelinek [this message]
2022-05-05 10:25   ` Tobias Burnus
     [not found]   ` <d549a138-c8f2-098b-39b1-c742cef5c534@codesourcery.com>
     [not found]     ` <YoPks36yV4Fbpb1m@tucnak>
2022-05-19  8:39       ` Marcel Vollweiler
2022-05-19  8:47         ` Jakub Jelinek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YnOLHQyU2E9k4Rna@tucnak \
    --to=jakub@redhat.com \
    --cc=fortran@gcc.gnu.org \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=marcel@codesourcery.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).