* [PATCH, OpenACC 2.5, libgomp] Add *_async versions of runtime library API functions
@ 2018-09-10 15:04 Chung-Lin Tang
2018-09-10 17:22 ` Cesar Philippidis
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Chung-Lin Tang @ 2018-09-10 15:04 UTC (permalink / raw)
To: gcc-patches, Jakub Jelinek, Cesar Philippidis
[-- Attachment #1: Type: text/plain, Size: 4811 bytes --]
This patch adds *_async versions of several OpenACC runtime library API functions,
which is to allow execution of a function asynchronously on particular async
stream, an addition to the standard since 2.5. Specifically, these functions:
acc_copyin_async
acc_copyout_async
acc_copyout_finalize_async
acc_create_async
acc_delete_async
acc_delete_finalize_async
acc_memcpy_from_device_async
acc_memcpy_to_device_async
acc_update_device_async
acc_update_self_async
which have an additional 'int async' argument in additional from the non-async version.
libgomp tested with offloading with no regressions, is this okay for trunk?
Thanks,
Chung-Lin
2018-09-10 Chung-Lin Tang <cltang@codesourcery.com>
libgomp/
* oacc-mem.c (memcpy_tofrom_device): New function, combined from
acc_memcpy_to/from_device functions, now with async parameter.
(acc_memcpy_to_device): Modify to use memcpy_tofrom_device.
(acc_memcpy_from_device): Likewise.
(acc_memcpy_to_device_async): New API function.
(acc_memcpy_from_device_async): Likewise.
(present_create_copy): Add async parameter and async setting/unsetting.
(acc_create): Adjust present_create_copy call.
(acc_copyin): Likewise.
(acc_present_or_create): Likewise.
(acc_present_or_copyin): Likewise.
(acc_create_async): New API function.
(acc_copyin_async): New API function.
(delete_copyout): Add async parameter and async setting/unsetting.
(acc_delete): Adjust delete_copyout call.
(acc_copyout): Likewise.
(acc_delete_async): New API function.
(acc_copyout_async): Likewise.
(update_dev_host): Add async parameter and async setting/unsetting.
(acc_update_device): Adjust update_dev_host call.
(acc_update_self): Likewise.
(acc_update_device_async): New API function.
(acc_update_self_async): Likewise.
* openacc.h (acc_copyin_async): Declare new API function.
(acc_create_async): Likewise.
(acc_copyout_async): Likewise.
(acc_delete_async): Likewise.
(acc_update_device_async): Likewise.
(acc_update_self_async): Likewise.
(acc_memcpy_to_device_async): Likewise.
(acc_memcpy_from_device_async): Likewise.
* openacc_lib.h (acc_copyin_async_32_h): New subroutine.
(acc_copyin_async_64_h): New subroutine.
(acc_copyin_async_array_h): New subroutine.
(acc_create_async_32_h): New subroutine.
(acc_create_async_64_h): New subroutine.
(acc_create_async_array_h): New subroutine.
(acc_copyout_async_32_h): New subroutine.
(acc_copyout_async_64_h): New subroutine.
(acc_copyout_async_array_h): New subroutine.
(acc_delete_async_32_h): New subroutine.
(acc_delete_async_64_h): New subroutine.
(acc_delete_async_array_h): New subroutine.
(acc_update_device_async_32_h): New subroutine.
(acc_update_device_async_64_h): New subroutine.
(acc_update_device_async_array_h): New subroutine.
(acc_update_self_async_32_h): New subroutine.
(acc_update_self_async_64_h): New subroutine.
(acc_update_self_async_array_h): New subroutine.
* openacc.f90 (acc_copyin_async_32_h): New subroutine.
(acc_copyin_async_64_h): New subroutine.
(acc_copyin_async_array_h): New subroutine.
(acc_create_async_32_h): New subroutine.
(acc_create_async_64_h): New subroutine.
(acc_create_async_array_h): New subroutine.
(acc_copyout_async_32_h): New subroutine.
(acc_copyout_async_64_h): New subroutine.
(acc_copyout_async_array_h): New subroutine.
(acc_delete_async_32_h): New subroutine.
(acc_delete_async_64_h): New subroutine.
(acc_delete_async_array_h): New subroutine.
(acc_update_device_async_32_h): New subroutine.
(acc_update_device_async_64_h): New subroutine.
(acc_update_device_async_array_h): New subroutine.
(acc_update_self_async_32_h): New subroutine.
(acc_update_self_async_64_h): New subroutine.
(acc_update_self_async_array_h): New subroutine.
* libgomp.map (OACC_2.5): Add acc_copyin_async*, acc_copyout_async*,
acc_copyout_finalize_async*, acc_create_async*, acc_delete_async*,
acc_delete_finalize_async*, acc_memcpy_from_device_async*,
acc_memcpy_to_device_async*, acc_update_device_async*, and
acc_update_self_async* entries.
* testsuite/libgomp.oacc-c-c++-common/lib-94.c: New test.
* testsuite/libgomp.oacc-c-c++-common/lib-95.c: New test.
* testsuite/libgomp.oacc-fortran/lib-16.f90: New test.
[-- Attachment #2: async_api.patch --]
[-- Type: text/plain, Size: 36621 bytes --]
Index: libgomp/libgomp.map
===================================================================
--- libgomp/libgomp.map (revision 264192)
+++ libgomp/libgomp.map (working copy)
@@ -388,14 +388,48 @@ OACC_2.0.1 {
OACC_2.5 {
global:
+ acc_copyin_async;
+ acc_copyin_async_32_h_;
+ acc_copyin_async_64_h_;
+ acc_copyin_async_array_h_;
+ acc_copyout_async;
+ acc_copyout_async_32_h_;
+ acc_copyout_async_64_h_;
+ acc_copyout_async_array_h_;
acc_copyout_finalize;
acc_copyout_finalize_32_h_;
acc_copyout_finalize_64_h_;
acc_copyout_finalize_array_h_;
+ acc_copyout_finalize_async;
+ acc_copyout_finalize_async_32_h_;
+ acc_copyout_finalize_async_64_h_;
+ acc_copyout_finalize_async_array_h_;
+ acc_create_async;
+ acc_create_async_32_h_;
+ acc_create_async_64_h_;
+ acc_create_async_array_h_;
+ acc_delete_async;
+ acc_delete_async_32_h_;
+ acc_delete_async_64_h_;
+ acc_delete_async_array_h_;
acc_delete_finalize;
acc_delete_finalize_32_h_;
acc_delete_finalize_64_h_;
acc_delete_finalize_array_h_;
+ acc_delete_finalize_async;
+ acc_delete_finalize_async_32_h_;
+ acc_delete_finalize_async_64_h_;
+ acc_delete_finalize_async_array_h_;
+ acc_memcpy_from_device_async;
+ acc_memcpy_to_device_async;
+ acc_update_device_async;
+ acc_update_device_async_32_h_;
+ acc_update_device_async_64_h_;
+ acc_update_device_async_array_h_;
+ acc_update_self_async;
+ acc_update_self_async_32_h_;
+ acc_update_self_async_64_h_;
+ acc_update_self_async_array_h_;
} OACC_2.0.1;
GOACC_2.0 {
Index: libgomp/oacc-mem.c
===================================================================
--- libgomp/oacc-mem.c (revision 264192)
+++ libgomp/oacc-mem.c (working copy)
@@ -153,8 +153,9 @@ acc_free (void *d)
gomp_fatal ("error in freeing device memory in %s", __FUNCTION__);
}
-void
-acc_memcpy_to_device (void *d, void *h, size_t s)
+static void
+memcpy_tofrom_device (bool from, void *d, void *h, size_t s, int async,
+ const char *libfnname)
{
/* No need to call lazy open here, as the device pointer must have
been obtained from a routine that did that. */
@@ -164,31 +165,49 @@ acc_free (void *d)
if (thr->dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
{
- memmove (d, h, s);
+ if (from)
+ memmove (h, d, s);
+ else
+ memmove (d, h, s);
return;
}
- if (!thr->dev->host2dev_func (thr->dev->target_id, d, h, s))
- gomp_fatal ("error in %s", __FUNCTION__);
+ if (async > acc_async_sync)
+ thr->dev->openacc.async_set_async_func (async);
+
+ bool ret = (from
+ ? thr->dev->dev2host_func (thr->dev->target_id, h, d, s)
+ : thr->dev->host2dev_func (thr->dev->target_id, d, h, s));
+
+ if (async > acc_async_sync)
+ thr->dev->openacc.async_set_async_func (acc_async_sync);
+
+ if (!ret)
+ gomp_fatal ("error in %s", libfnname);
}
void
-acc_memcpy_from_device (void *h, void *d, size_t s)
+acc_memcpy_to_device (void *d, void *h, size_t s)
{
- /* No need to call lazy open here, as the device pointer must have
- been obtained from a routine that did that. */
- struct goacc_thread *thr = goacc_thread ();
+ memcpy_tofrom_device (false, d, h, s, acc_async_sync, __FUNCTION__);
+}
- assert (thr && thr->dev);
+void
+acc_memcpy_to_device_async (void *d, void *h, size_t s, int async)
+{
+ memcpy_tofrom_device (false, d, h, s, async, __FUNCTION__);
+}
- if (thr->dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
- {
- memmove (h, d, s);
- return;
- }
+void
+acc_memcpy_from_device (void *h, void *d, size_t s)
+{
+ memcpy_tofrom_device (true, d, h, s, acc_async_sync, __FUNCTION__);
+}
- if (!thr->dev->dev2host_func (thr->dev->target_id, h, d, s))
- gomp_fatal ("error in %s", __FUNCTION__);
+void
+acc_memcpy_from_device_async (void *h, void *d, size_t s, int async)
+{
+ memcpy_tofrom_device (true, d, h, s, async, __FUNCTION__);
}
/* Return the device pointer that corresponds to host data H. Or NULL
@@ -428,7 +447,7 @@ acc_unmap_data (void *h)
#define FLAG_COPY (1 << 2)
static void *
-present_create_copy (unsigned f, void *h, size_t s)
+present_create_copy (unsigned f, void *h, size_t s, int async)
{
void *d;
splay_tree_key n;
@@ -490,11 +509,17 @@ static void *
gomp_mutex_unlock (&acc_dev->lock);
+ if (async > acc_async_sync)
+ acc_dev->openacc.async_set_async_func (async);
+
tgt = gomp_map_vars (acc_dev, mapnum, &hostaddrs, NULL, &s, &kinds, true,
GOMP_MAP_VARS_OPENACC);
/* Initialize dynamic refcount. */
tgt->list[0].key->dynamic_refcount = 1;
+ if (async > acc_async_sync)
+ acc_dev->openacc.async_set_async_func (acc_async_sync);
+
gomp_mutex_lock (&acc_dev->lock);
d = tgt->to_free;
@@ -510,19 +535,32 @@ static void *
void *
acc_create (void *h, size_t s)
{
- return present_create_copy (FLAG_PRESENT | FLAG_CREATE, h, s);
+ return present_create_copy (FLAG_PRESENT | FLAG_CREATE, h, s, acc_async_sync);
}
+void
+acc_create_async (void *h, size_t s, int async)
+{
+ present_create_copy (FLAG_PRESENT | FLAG_CREATE, h, s, async);
+}
+
void *
acc_copyin (void *h, size_t s)
{
- return present_create_copy (FLAG_PRESENT | FLAG_CREATE | FLAG_COPY, h, s);
+ return present_create_copy (FLAG_PRESENT | FLAG_CREATE | FLAG_COPY, h, s,
+ acc_async_sync);
}
+void
+acc_copyin_async (void *h, size_t s, int async)
+{
+ present_create_copy (FLAG_PRESENT | FLAG_CREATE | FLAG_COPY, h, s, async);
+}
+
void *
acc_present_or_create (void *h, size_t s)
{
- return present_create_copy (FLAG_PRESENT | FLAG_CREATE, h, s);
+ return present_create_copy (FLAG_PRESENT | FLAG_CREATE, h, s, acc_async_sync);
}
/* acc_pcreate is acc_present_or_create by a different name. */
@@ -539,7 +577,8 @@ acc_pcreate (void *h, size_t s)
void *
acc_present_or_copyin (void *h, size_t s)
{
- return present_create_copy (FLAG_PRESENT | FLAG_CREATE | FLAG_COPY, h, s);
+ return present_create_copy (FLAG_PRESENT | FLAG_CREATE | FLAG_COPY, h, s,
+ acc_async_sync);
}
/* acc_pcopyin is acc_present_or_copyin by a different name. */
@@ -557,7 +596,7 @@ acc_pcopyin (void *h, size_t s)
#define FLAG_FINALIZE (1 << 1)
static void
-delete_copyout (unsigned f, void *h, size_t s, const char *libfnname)
+delete_copyout (unsigned f, void *h, size_t s, int async, const char *libfnname)
{
size_t host_size;
splay_tree_key n;
@@ -633,7 +672,13 @@ static void
}
if (f & FLAG_COPYOUT)
- acc_dev->dev2host_func (acc_dev->target_id, h, d, s);
+ {
+ if (async > acc_async_sync)
+ acc_dev->openacc.async_set_async_func (async);
+ acc_dev->dev2host_func (acc_dev->target_id, h, d, s);
+ if (async > acc_async_sync)
+ acc_dev->openacc.async_set_async_func (acc_async_sync);
+ }
gomp_remove_var (acc_dev, n);
}
@@ -644,41 +689,54 @@ static void
void
acc_delete (void *h , size_t s)
{
- delete_copyout (0, h, s, __FUNCTION__);
+ delete_copyout (0, h, s, acc_async_sync, __FUNCTION__);
}
void
+acc_delete_async (void *h , size_t s, int async)
+{
+ delete_copyout (0, h, s, async, __FUNCTION__);
+}
+
+void
acc_delete_finalize (void *h , size_t s)
{
- delete_copyout (FLAG_FINALIZE, h, s, __FUNCTION__);
+ delete_copyout (FLAG_FINALIZE, h, s, acc_async_sync, __FUNCTION__);
}
void
acc_delete_finalize_async (void *h , size_t s, int async)
{
- delete_copyout (FLAG_FINALIZE, h, s, __FUNCTION__);
+ delete_copyout (FLAG_FINALIZE, h, s, async, __FUNCTION__);
}
void
acc_copyout (void *h, size_t s)
{
- delete_copyout (FLAG_COPYOUT, h, s, __FUNCTION__);
+ delete_copyout (FLAG_COPYOUT, h, s, acc_async_sync, __FUNCTION__);
}
void
+acc_copyout_async (void *h, size_t s, int async)
+{
+ delete_copyout (FLAG_COPYOUT, h, s, async, __FUNCTION__);
+}
+
+void
acc_copyout_finalize (void *h, size_t s)
{
- delete_copyout (FLAG_COPYOUT | FLAG_FINALIZE, h, s, __FUNCTION__);
+ delete_copyout (FLAG_COPYOUT | FLAG_FINALIZE, h, s, acc_async_sync,
+ __FUNCTION__);
}
void
acc_copyout_finalize_async (void *h, size_t s, int async)
{
- delete_copyout (FLAG_COPYOUT | FLAG_FINALIZE, h, s, __FUNCTION__);
+ delete_copyout (FLAG_COPYOUT | FLAG_FINALIZE, h, s, async, __FUNCTION__);
}
static void
-update_dev_host (int is_dev, void *h, size_t s)
+update_dev_host (int is_dev, void *h, size_t s, int async)
{
splay_tree_key n;
void *d;
@@ -704,11 +762,17 @@ static void
d = (void *) (n->tgt->tgt_start + n->tgt_offset
+ (uintptr_t) h - n->host_start);
+ if (async > acc_async_sync)
+ acc_dev->openacc.async_set_async_func (async);
+
if (is_dev)
acc_dev->host2dev_func (acc_dev->target_id, d, h, s);
else
acc_dev->dev2host_func (acc_dev->target_id, h, d, s);
+ if (async > acc_async_sync)
+ acc_dev->openacc.async_set_async_func (acc_async_sync);
+
gomp_mutex_unlock (&acc_dev->lock);
}
@@ -715,16 +779,28 @@ static void
void
acc_update_device (void *h, size_t s)
{
- update_dev_host (1, h, s);
+ update_dev_host (1, h, s, acc_async_sync);
}
void
+acc_update_device_async (void *h, size_t s, int async)
+{
+ update_dev_host (1, h, s, async);
+}
+
+void
acc_update_self (void *h, size_t s)
{
- update_dev_host (0, h, s);
+ update_dev_host (0, h, s, acc_async_sync);
}
void
+acc_update_self_async (void *h, size_t s, int async)
+{
+ update_dev_host (0, h, s, async);
+}
+
+void
gomp_acc_insert_pointer (size_t mapnum, void **hostaddrs, size_t *sizes,
void *kinds)
{
Index: libgomp/openacc.f90
===================================================================
--- libgomp/openacc.f90 (revision 264192)
+++ libgomp/openacc.f90 (working copy)
@@ -332,6 +332,150 @@ module openacc_internal
logical acc_is_present_array_h
type (*), dimension (..), contiguous :: a
end function
+
+ subroutine acc_copyin_async_32_h (a, len, async)
+ use iso_c_binding, only: c_int32_t
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int32_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_copyin_async_64_h (a, len, async)
+ use iso_c_binding, only: c_int64_t
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int64_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_copyin_async_array_h (a, async)
+ use openacc_kinds, only: acc_handle_kind
+ type (*), dimension (..), contiguous :: a
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_create_async_32_h (a, len, async)
+ use iso_c_binding, only: c_int32_t
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int32_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_create_async_64_h (a, len, async)
+ use iso_c_binding, only: c_int64_t
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int64_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_create_async_array_h (a, async)
+ use openacc_kinds, only: acc_handle_kind
+ type (*), dimension (..), contiguous :: a
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_copyout_async_32_h (a, len, async)
+ use iso_c_binding, only: c_int32_t
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int32_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_copyout_async_64_h (a, len, async)
+ use iso_c_binding, only: c_int64_t
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int64_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_copyout_async_array_h (a, async)
+ use openacc_kinds, only: acc_handle_kind
+ type (*), dimension (..), contiguous :: a
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_delete_async_32_h (a, len, async)
+ use iso_c_binding, only: c_int32_t
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int32_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_delete_async_64_h (a, len, async)
+ use iso_c_binding, only: c_int64_t
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int64_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_delete_async_array_h (a, async)
+ use openacc_kinds, only: acc_handle_kind
+ type (*), dimension (..), contiguous :: a
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_update_device_async_32_h (a, len, async)
+ use iso_c_binding, only: c_int32_t
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int32_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_update_device_async_64_h (a, len, async)
+ use iso_c_binding, only: c_int64_t
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int64_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_update_device_async_array_h (a, async)
+ use openacc_kinds, only: acc_handle_kind
+ type (*), dimension (..), contiguous :: a
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_update_self_async_32_h (a, len, async)
+ use iso_c_binding, only: c_int32_t
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int32_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_update_self_async_64_h (a, len, async)
+ use iso_c_binding, only: c_int64_t
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int64_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_update_self_async_array_h (a, async)
+ use openacc_kinds, only: acc_handle_kind
+ type (*), dimension (..), contiguous :: a
+ integer (acc_handle_kind) async
+ end subroutine
end interface
interface
@@ -510,6 +654,60 @@ module openacc_internal
type (*), dimension (*) :: a
integer (c_size_t), value :: len
end function
+
+ subroutine acc_copyin_async_l (a, len, async) &
+ bind (C, name = "acc_copyin_async")
+ use iso_c_binding, only: c_size_t, c_int
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_size_t), value :: len
+ integer (c_int), value :: async
+ end subroutine
+
+ subroutine acc_create_async_l (a, len, async) &
+ bind (C, name = "acc_create_async")
+ use iso_c_binding, only: c_size_t, c_int
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_size_t), value :: len
+ integer (c_int), value :: async
+ end subroutine
+
+ subroutine acc_copyout_async_l (a, len, async) &
+ bind (C, name = "acc_copyout_async")
+ use iso_c_binding, only: c_size_t, c_int
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_size_t), value :: len
+ integer (c_int), value :: async
+ end subroutine
+
+ subroutine acc_delete_async_l (a, len, async) &
+ bind (C, name = "acc_delete_async")
+ use iso_c_binding, only: c_size_t, c_int
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_size_t), value :: len
+ integer (c_int), value :: async
+ end subroutine
+
+ subroutine acc_update_device_async_l (a, len, async) &
+ bind (C, name = "acc_update_device_async")
+ use iso_c_binding, only: c_size_t, c_int
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_size_t), value :: len
+ integer (c_int), value :: async
+ end subroutine
+
+ subroutine acc_update_self_async_l (a, len, async) &
+ bind (C, name = "acc_update_self_async")
+ use iso_c_binding, only: c_size_t, c_int
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_size_t), value :: len
+ integer (c_int), value :: async
+ end subroutine
end interface
end module
@@ -529,6 +727,8 @@ module openacc
public :: acc_copyin, acc_present_or_copyin, acc_pcopyin, acc_create
public :: acc_present_or_create, acc_pcreate, acc_copyout, acc_delete
public :: acc_update_device, acc_update_self, acc_is_present
+ public :: acc_copyin_async, acc_create_async, acc_copyout_async
+ public :: acc_delete_async, acc_update_device_async, acc_update_self_async
integer, parameter :: openacc_version = 201306
@@ -694,6 +894,42 @@ module openacc
! acc_memcpy_to_device: Only available in C/C++
! acc_memcpy_from_device: Only available in C/C++
+ interface acc_copyin_async
+ procedure :: acc_copyin_async_32_h
+ procedure :: acc_copyin_async_64_h
+ procedure :: acc_copyin_async_array_h
+ end interface
+
+ interface acc_create_async
+ procedure :: acc_create_async_32_h
+ procedure :: acc_create_async_64_h
+ procedure :: acc_create_async_array_h
+ end interface
+
+ interface acc_copyout_async
+ procedure :: acc_copyout_async_32_h
+ procedure :: acc_copyout_async_64_h
+ procedure :: acc_copyout_async_array_h
+ end interface
+
+ interface acc_delete_async
+ procedure :: acc_delete_async_32_h
+ procedure :: acc_delete_async_64_h
+ procedure :: acc_delete_async_array_h
+ end interface
+
+ interface acc_update_device_async
+ procedure :: acc_update_device_async_32_h
+ procedure :: acc_update_device_async_64_h
+ procedure :: acc_update_device_async_array_h
+ end interface
+
+ interface acc_update_self_async
+ procedure :: acc_update_self_async_32_h
+ procedure :: acc_update_self_async_64_h
+ procedure :: acc_update_self_async_array_h
+ end interface
+
end module
function acc_get_num_devices_h (d)
@@ -1078,3 +1314,189 @@ function acc_is_present_array_h (a)
type (*), dimension (..), contiguous :: a
acc_is_present_array_h = acc_is_present_l (a, sizeof (a)) == 1
end function
+
+subroutine acc_copyin_async_32_h (a, len, async)
+ use iso_c_binding, only: c_int32_t, c_size_t, c_int
+ use openacc_internal, only: acc_copyin_async_l
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int32_t) len
+ integer (acc_handle_kind) async
+ call acc_copyin_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
+end subroutine
+
+subroutine acc_copyin_async_64_h (a, len, async)
+ use iso_c_binding, only: c_int64_t, c_size_t, c_int
+ use openacc_internal, only: acc_copyin_async_l
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int64_t) len
+ integer (acc_handle_kind) async
+ call acc_copyin_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
+end subroutine
+
+subroutine acc_copyin_async_array_h (a, async)
+ use iso_c_binding, only: c_int
+ use openacc_internal, only: acc_copyin_async_l
+ use openacc_kinds, only: acc_handle_kind
+ type (*), dimension (..), contiguous :: a
+ integer (acc_handle_kind) async
+ call acc_copyin_async_l (a, sizeof (a), int (async, kind = c_int))
+end subroutine
+
+subroutine acc_create_async_32_h (a, len, async)
+ use iso_c_binding, only: c_int32_t, c_size_t, c_int
+ use openacc_internal, only: acc_create_async_l
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int32_t) len
+ integer (acc_handle_kind) async
+ call acc_create_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
+end subroutine
+
+subroutine acc_create_async_64_h (a, len, async)
+ use iso_c_binding, only: c_int64_t, c_size_t, c_int
+ use openacc_internal, only: acc_create_async_l
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int64_t) len
+ integer (acc_handle_kind) async
+ call acc_create_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
+end subroutine
+
+subroutine acc_create_async_array_h (a, async)
+ use iso_c_binding, only: c_int
+ use openacc_internal, only: acc_create_async_l
+ use openacc_kinds, only: acc_handle_kind
+ type (*), dimension (..), contiguous :: a
+ integer (acc_handle_kind) async
+ call acc_create_async_l (a, sizeof (a), int (async, kind = c_int))
+end subroutine
+
+subroutine acc_copyout_async_32_h (a, len, async)
+ use iso_c_binding, only: c_int32_t, c_size_t, c_int
+ use openacc_internal, only: acc_copyout_async_l
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int32_t) len
+ integer (acc_handle_kind) async
+ call acc_copyout_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
+end subroutine
+
+subroutine acc_copyout_async_64_h (a, len, async)
+ use iso_c_binding, only: c_int64_t, c_size_t, c_int
+ use openacc_internal, only: acc_copyout_async_l
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int64_t) len
+ integer (acc_handle_kind) async
+ call acc_copyout_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
+end subroutine
+
+subroutine acc_copyout_async_array_h (a, async)
+ use iso_c_binding, only: c_int
+ use openacc_internal, only: acc_copyout_async_l
+ use openacc_kinds, only: acc_handle_kind
+ type (*), dimension (..), contiguous :: a
+ integer (acc_handle_kind) async
+ call acc_copyout_async_l (a, sizeof (a), int (async, kind = c_int))
+end subroutine
+
+subroutine acc_delete_async_32_h (a, len, async)
+ use iso_c_binding, only: c_int32_t, c_size_t, c_int
+ use openacc_internal, only: acc_delete_async_l
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int32_t) len
+ integer (acc_handle_kind) async
+ call acc_delete_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
+end subroutine
+
+subroutine acc_delete_async_64_h (a, len, async)
+ use iso_c_binding, only: c_int64_t, c_size_t, c_int
+ use openacc_internal, only: acc_delete_async_l
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int64_t) len
+ integer (acc_handle_kind) async
+ call acc_delete_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
+end subroutine
+
+subroutine acc_delete_async_array_h (a, async)
+ use iso_c_binding, only: c_int
+ use openacc_internal, only: acc_delete_async_l
+ use openacc_kinds, only: acc_handle_kind
+ type (*), dimension (..), contiguous :: a
+ integer (acc_handle_kind) async
+ call acc_delete_async_l (a, sizeof (a), int (async, kind = c_int))
+end subroutine
+
+subroutine acc_update_device_async_32_h (a, len, async)
+ use iso_c_binding, only: c_int32_t, c_size_t, c_int
+ use openacc_internal, only: acc_update_device_async_l
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int32_t) len
+ integer (acc_handle_kind) async
+ call acc_update_device_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
+end subroutine
+
+subroutine acc_update_device_async_64_h (a, len, async)
+ use iso_c_binding, only: c_int64_t, c_size_t, c_int
+ use openacc_internal, only: acc_update_device_async_l
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int64_t) len
+ integer (acc_handle_kind) async
+ call acc_update_device_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
+end subroutine
+
+subroutine acc_update_device_async_array_h (a, async)
+ use iso_c_binding, only: c_int
+ use openacc_internal, only: acc_update_device_async_l
+ use openacc_kinds, only: acc_handle_kind
+ type (*), dimension (..), contiguous :: a
+ integer (acc_handle_kind) async
+ call acc_update_device_async_l (a, sizeof (a), int (async, kind = c_int))
+end subroutine
+
+subroutine acc_update_self_async_32_h (a, len, async)
+ use iso_c_binding, only: c_int32_t, c_size_t, c_int
+ use openacc_internal, only: acc_update_self_async_l
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int32_t) len
+ integer (acc_handle_kind) async
+ call acc_update_self_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
+end subroutine
+
+subroutine acc_update_self_async_64_h (a, len, async)
+ use iso_c_binding, only: c_int64_t, c_size_t, c_int
+ use openacc_internal, only: acc_update_self_async_l
+ use openacc_kinds, only: acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int64_t) len
+ integer (acc_handle_kind) async
+ call acc_update_self_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
+end subroutine
+
+subroutine acc_update_self_async_array_h (a, async)
+ use iso_c_binding, only: c_int
+ use openacc_internal, only: acc_update_self_async_l
+ use openacc_kinds, only: acc_handle_kind
+ type (*), dimension (..), contiguous :: a
+ integer (acc_handle_kind) async
+ call acc_update_self_async_l (a, sizeof (a), int (async, kind = c_int))
+end subroutine
Index: libgomp/openacc.h
===================================================================
--- libgomp/openacc.h (revision 264192)
+++ libgomp/openacc.h (working copy)
@@ -115,6 +115,16 @@ void acc_copyout_finalize_async (void *, size_t, i
void acc_delete_finalize (void *, size_t) __GOACC_NOTHROW;
void acc_delete_finalize_async (void *, size_t, int) __GOACC_NOTHROW;
+/* Async functions, specified in OpenACC 2.5. */
+void acc_copyin_async (void *, size_t, int) __GOACC_NOTHROW;
+void acc_create_async (void *, size_t, int) __GOACC_NOTHROW;
+void acc_copyout_async (void *, size_t, int) __GOACC_NOTHROW;
+void acc_delete_async (void *, size_t, int) __GOACC_NOTHROW;
+void acc_update_device_async (void *, size_t, int) __GOACC_NOTHROW;
+void acc_update_self_async (void *, size_t, int) __GOACC_NOTHROW;
+void acc_memcpy_to_device_async (void *, void *, size_t, int) __GOACC_NOTHROW;
+void acc_memcpy_from_device_async (void *, void *, size_t, int) __GOACC_NOTHROW;
+
/* CUDA-specific routines. */
void *acc_get_current_cuda_device (void) __GOACC_NOTHROW;
void *acc_get_current_cuda_context (void) __GOACC_NOTHROW;
Index: libgomp/openacc_lib.h
===================================================================
--- libgomp/openacc_lib.h (revision 264192)
+++ libgomp/openacc_lib.h (working copy)
@@ -403,3 +403,159 @@
! acc_memcpy_to_device: Only available in C/C++
! acc_memcpy_from_device: Only available in C/C++
+
+ interface acc_copyin_async
+ subroutine acc_copyin_async_32_h (a, len, async)
+ use iso_c_binding, only: c_int32_t
+ import acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int32_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_copyin_async_64_h (a, len, async)
+ use iso_c_binding, only: c_int64_t
+ import acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int64_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_copyin_async_array_h (a, async_)
+ import acc_handle_kind
+ type (*), dimension (..), contiguous :: a
+ integer (acc_handle_kind) async_
+ end subroutine
+ end interface
+
+ interface acc_create_async
+ subroutine acc_create_async_32_h (a, len, async)
+ use iso_c_binding, only: c_int32_t
+ import acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int32_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_create_async_64_h (a, len, async)
+ use iso_c_binding, only: c_int64_t
+ import acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int64_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_create_async_array_h (a, async_)
+ import acc_handle_kind
+ type (*), dimension (..), contiguous :: a
+ integer (acc_handle_kind) async_
+ end subroutine
+ end interface
+
+ interface acc_copyout_async
+ subroutine acc_copyout_async_32_h (a, len, async)
+ use iso_c_binding, only: c_int32_t
+ import acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int32_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_copyout_async_64_h (a, len, async)
+ use iso_c_binding, only: c_int64_t
+ import acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int64_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_copyout_async_array_h (a, async_)
+ import acc_handle_kind
+ type (*), dimension (..), contiguous :: a
+ integer (acc_handle_kind) async_
+ end subroutine
+ end interface
+
+ interface acc_delete_async
+ subroutine acc_delete_async_32_h (a, len, async)
+ use iso_c_binding, only: c_int32_t
+ import acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int32_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_delete_async_64_h (a, len, async)
+ use iso_c_binding, only: c_int64_t
+ import acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int64_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_delete_async_array_h (a, async_)
+ import acc_handle_kind
+ type (*), dimension (..), contiguous :: a
+ integer (acc_handle_kind) async_
+ end subroutine
+ end interface
+
+ interface acc_update_device_async
+ subroutine acc_update_device_async_32_h (a, len, async)
+ use iso_c_binding, only: c_int32_t
+ import acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int32_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_update_device_async_64_h (a, len, async)
+ use iso_c_binding, only: c_int64_t
+ import acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int64_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_update_device_async_array_h (a, async_)
+ import acc_handle_kind
+ type (*), dimension (..), contiguous :: a
+ integer (acc_handle_kind) async_
+ end subroutine
+ end interface
+
+ interface acc_update_self_async
+ subroutine acc_update_self_async_32_h (a, len, async)
+ use iso_c_binding, only: c_int32_t
+ import acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int32_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_update_self_async_64_h (a, len, async)
+ use iso_c_binding, only: c_int64_t
+ import acc_handle_kind
+ !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
+ type (*), dimension (*) :: a
+ integer (c_int64_t) len
+ integer (acc_handle_kind) async
+ end subroutine
+
+ subroutine acc_update_self_async_array_h (a, async_)
+ import acc_handle_kind
+ type (*), dimension (..), contiguous :: a
+ integer (acc_handle_kind) async_
+ end subroutine
+ end interface
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c (nonexistent)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c (working copy)
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <string.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+ const int N = 256;
+ int i;
+ int async = 8;
+ unsigned char *h;
+
+ h = (unsigned char *) malloc (N);
+
+ for (i = 0; i < N; i++)
+ {
+ h[i] = i;
+ }
+
+ acc_copyin_async (h, N, async);
+
+ memset (h, 0, N);
+
+ acc_wait (async);
+
+ acc_copyout_async (h, N, async + 1);
+
+ acc_wait (async + 1);
+
+ for (i = 0; i < N; i++)
+ {
+ if (h[i] != i)
+ abort ();
+ }
+
+ free (h);
+
+ return 0;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/lib-95.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-95.c (nonexistent)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-95.c (working copy)
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
+
+#include <string.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char **argv)
+{
+ const int N = 256;
+ int i, q = 5;
+ unsigned char *h, *g;
+ void *d;
+
+ h = (unsigned char *) malloc (N);
+ g = (unsigned char *) malloc (N);
+ for (i = 0; i < N; i++)
+ {
+ g[i] = i;
+ }
+
+ acc_create_async (h, N, q);
+
+ acc_memcpy_to_device_async (acc_deviceptr (h), g, N, q);
+ memset (&h[0], 0, N);
+
+ acc_wait (q);
+
+ acc_update_self_async (h, N, q + 1);
+ acc_delete_async (h, N, q + 1);
+
+ acc_wait (q + 1);
+
+ for (i = 0; i < N; i++)
+ {
+ if (h[i] != i)
+ abort ();
+ }
+
+ free (h);
+ free (g);
+
+ return 0;
+}
Index: libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90
===================================================================
--- libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90 (nonexistent)
+++ libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90 (working copy)
@@ -0,0 +1,57 @@
+! { dg-do run }
+! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } }
+
+program main
+ use openacc
+ implicit none
+
+ integer, parameter :: N = 256
+ integer, allocatable :: h(:)
+ integer :: i
+ integer :: async = 5
+
+ allocate (h(N))
+
+ do i = 1, N
+ h(i) = i
+ end do
+
+ call acc_copyin (h)
+
+ do i = 1, N
+ h(i) = i + i
+ end do
+
+ call acc_update_device_async (h, sizeof (h), async)
+
+ if (acc_is_present (h) .neqv. .TRUE.) call abort
+
+ h(:) = 0
+
+ call acc_copyout_async (h, sizeof (h), async)
+
+ call acc_wait (async)
+
+ do i = 1, N
+ if (h(i) /= i + i) call abort
+ end do
+
+ call acc_copyin (h, sizeof (h))
+
+ h(:) = 0
+
+ call acc_update_self_async (h, sizeof (h), async)
+
+ if (acc_is_present (h) .neqv. .TRUE.) call abort
+
+ do i = 1, N
+ if (h(i) /= i + i) call abort
+ end do
+
+ call acc_delete_async (h, async)
+
+ call acc_wait (async)
+
+ if (acc_is_present (h) .neqv. .FALSE.) call abort
+
+end program
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH, OpenACC 2.5, libgomp] Add *_async versions of runtime library API functions
2018-09-10 15:04 [PATCH, OpenACC 2.5, libgomp] Add *_async versions of runtime library API functions Chung-Lin Tang
@ 2018-09-10 17:22 ` Cesar Philippidis
2018-09-11 10:35 ` Chung-Lin Tang
2018-11-30 20:42 ` Thomas Schwinge
2018-11-01 22:01 ` Thomas Schwinge
2021-06-08 17:32 ` Thomas Schwinge
2 siblings, 2 replies; 7+ messages in thread
From: Cesar Philippidis @ 2018-09-10 17:22 UTC (permalink / raw)
To: cltang, gcc-patches, Jakub Jelinek
On 09/10/2018 08:04 AM, Chung-Lin Tang wrote:
> GOACC_2.0 {
> Index: libgomp/oacc-mem.c
> ===================================================================
> --- libgomp/oacc-mem.c (revision 264192)
> +++ libgomp/oacc-mem.c (working copy)
> @@ -153,8 +153,9 @@ acc_free (void *d)
> gomp_fatal ("error in freeing device memory in %s", __FUNCTION__);
> }
>
> -void
> -acc_memcpy_to_device (void *d, void *h, size_t s)
> +static void
> +memcpy_tofrom_device (bool from, void *d, void *h, size_t s, int async,
> + const char *libfnname)
This showed up oddly in the diff, but memcpy_tofrom_device is a new
internal function that's not part of the public API. It's nice that you
were able to merge the to/from functions together. I think this is safe
in terms of backwards compatibility.
> {
> /* No need to call lazy open here, as the device pointer must have
> been obtained from a routine that did that. */
> @@ -164,31 +165,49 @@ acc_free (void *d)
>
> if (thr->dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
> {
> - memmove (d, h, s);
> + if (from)
> + memmove (h, d, s);
> + else
> + memmove (d, h, s);
> return;
> }
>
> - if (!thr->dev->host2dev_func (thr->dev->target_id, d, h, s))
> - gomp_fatal ("error in %s", __FUNCTION__);
> + if (async > acc_async_sync)
> + thr->dev->openacc.async_set_async_func (async);
> +
> + bool ret = (from
> + ? thr->dev->dev2host_func (thr->dev->target_id, h, d, s)
> + : thr->dev->host2dev_func (thr->dev->target_id, d, h, s));
> +
> + if (async > acc_async_sync)
> + thr->dev->openacc.async_set_async_func (acc_async_sync);
> +
> + if (!ret)
> + gomp_fatal ("error in %s", libfnname);
> }
>
> void
> -acc_memcpy_from_device (void *h, void *d, size_t s)
> +acc_memcpy_to_device (void *d, void *h, size_t s)
> {
> - /* No need to call lazy open here, as the device pointer must have
> - been obtained from a routine that did that. */
> - struct goacc_thread *thr = goacc_thread ();
> + memcpy_tofrom_device (false, d, h, s, acc_async_sync, __FUNCTION__);
> +}
>
> - assert (thr && thr->dev);
> +void
> +acc_memcpy_to_device_async (void *d, void *h, size_t s, int async)
> +{
> + memcpy_tofrom_device (false, d, h, s, async, __FUNCTION__);
> +}
>
> - if (thr->dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
> - {
> - memmove (h, d, s);
> - return;
> - }
> +void
> +acc_memcpy_from_device (void *h, void *d, size_t s)
> +{
> + memcpy_tofrom_device (true, d, h, s, acc_async_sync, __FUNCTION__);
> +}
>
> - if (!thr->dev->dev2host_func (thr->dev->target_id, h, d, s))
> - gomp_fatal ("error in %s", __FUNCTION__);
> +void
> +acc_memcpy_from_device_async (void *h, void *d, size_t s, int async)
> +{
> + memcpy_tofrom_device (true, d, h, s, async, __FUNCTION__);
> }
>
> /* Return the device pointer that corresponds to host data H. Or NULL
> @@ -428,7 +447,7 @@ acc_unmap_data (void *h)
> #define FLAG_COPY (1 << 2)
>
> static void *
> -present_create_copy (unsigned f, void *h, size_t s)
> +present_create_copy (unsigned f, void *h, size_t s, int async)
Likewise, this is another internal function, so it shouldn't break anything.
> {
> void *d;
> splay_tree_key n;
> @@ -490,11 +509,17 @@ static void *
>
> gomp_mutex_unlock (&acc_dev->lock);
>
> + if (async > acc_async_sync)
> + acc_dev->openacc.async_set_async_func (async);
> +
> tgt = gomp_map_vars (acc_dev, mapnum, &hostaddrs, NULL, &s, &kinds, true,
> GOMP_MAP_VARS_OPENACC);
> /* Initialize dynamic refcount. */
> tgt->list[0].key->dynamic_refcount = 1;
>
> + if (async > acc_async_sync)
> + acc_dev->openacc.async_set_async_func (acc_async_sync);
> +
> gomp_mutex_lock (&acc_dev->lock);
>
> d = tgt->to_free;
> @@ -510,19 +535,32 @@ static void *
> void *
> acc_create (void *h, size_t s)
> {
> - return present_create_copy (FLAG_PRESENT | FLAG_CREATE, h, s);
> + return present_create_copy (FLAG_PRESENT | FLAG_CREATE, h, s, acc_async_sync);
> }
>
> +void
> +acc_create_async (void *h, size_t s, int async)
> +{
> + present_create_copy (FLAG_PRESENT | FLAG_CREATE, h, s, async);
> +}
> +
> void *
> acc_copyin (void *h, size_t s)
> {
> - return present_create_copy (FLAG_PRESENT | FLAG_CREATE | FLAG_COPY, h, s);
> + return present_create_copy (FLAG_PRESENT | FLAG_CREATE | FLAG_COPY, h, s,
> + acc_async_sync);
> }
>
> +void
> +acc_copyin_async (void *h, size_t s, int async)
> +{
> + present_create_copy (FLAG_PRESENT | FLAG_CREATE | FLAG_COPY, h, s, async);
> +}
> +
> void *
> acc_present_or_create (void *h, size_t s)
> {
> - return present_create_copy (FLAG_PRESENT | FLAG_CREATE, h, s);
> + return present_create_copy (FLAG_PRESENT | FLAG_CREATE, h, s, acc_async_sync);
> }
>
> /* acc_pcreate is acc_present_or_create by a different name. */
> @@ -539,7 +577,8 @@ acc_pcreate (void *h, size_t s)
> void *
> acc_present_or_copyin (void *h, size_t s)
> {
> - return present_create_copy (FLAG_PRESENT | FLAG_CREATE | FLAG_COPY, h, s);
> + return present_create_copy (FLAG_PRESENT | FLAG_CREATE | FLAG_COPY, h, s,
> + acc_async_sync);
> }
>
> /* acc_pcopyin is acc_present_or_copyin by a different name. */
> @@ -557,7 +596,7 @@ acc_pcopyin (void *h, size_t s)
> #define FLAG_FINALIZE (1 << 1)
>
> static void
> -delete_copyout (unsigned f, void *h, size_t s, const char *libfnname)
> +delete_copyout (unsigned f, void *h, size_t s, int async, const char *libfnname)
Ditto.
> {
> size_t host_size;
> splay_tree_key n;
> @@ -633,7 +672,13 @@ static void
> }
>
> if (f & FLAG_COPYOUT)
> - acc_dev->dev2host_func (acc_dev->target_id, h, d, s);
> + {
> + if (async > acc_async_sync)
> + acc_dev->openacc.async_set_async_func (async);
> + acc_dev->dev2host_func (acc_dev->target_id, h, d, s);
> + if (async > acc_async_sync)
> + acc_dev->openacc.async_set_async_func (acc_async_sync);
> + }
>
> gomp_remove_var (acc_dev, n);
> }
> @@ -644,41 +689,54 @@ static void
> void
> acc_delete (void *h , size_t s)
> {
> - delete_copyout (0, h, s, __FUNCTION__);
> + delete_copyout (0, h, s, acc_async_sync, __FUNCTION__);
> }
>
> void
> +acc_delete_async (void *h , size_t s, int async)
> +{
> + delete_copyout (0, h, s, async, __FUNCTION__);
> +}
> +
> +void
> acc_delete_finalize (void *h , size_t s)
> {
> - delete_copyout (FLAG_FINALIZE, h, s, __FUNCTION__);
> + delete_copyout (FLAG_FINALIZE, h, s, acc_async_sync, __FUNCTION__);
> }
>
> void
> acc_delete_finalize_async (void *h , size_t s, int async)
> {
> - delete_copyout (FLAG_FINALIZE, h, s, __FUNCTION__);
> + delete_copyout (FLAG_FINALIZE, h, s, async, __FUNCTION__);
> }
>
> void
> acc_copyout (void *h, size_t s)
> {
> - delete_copyout (FLAG_COPYOUT, h, s, __FUNCTION__);
> + delete_copyout (FLAG_COPYOUT, h, s, acc_async_sync, __FUNCTION__);
> }
>
> void
> +acc_copyout_async (void *h, size_t s, int async)
> +{
> + delete_copyout (FLAG_COPYOUT, h, s, async, __FUNCTION__);
> +}
> +
> +void
> acc_copyout_finalize (void *h, size_t s)
> {
> - delete_copyout (FLAG_COPYOUT | FLAG_FINALIZE, h, s, __FUNCTION__);
> + delete_copyout (FLAG_COPYOUT | FLAG_FINALIZE, h, s, acc_async_sync,
> + __FUNCTION__);
> }
>
> void
> acc_copyout_finalize_async (void *h, size_t s, int async)
> {
> - delete_copyout (FLAG_COPYOUT | FLAG_FINALIZE, h, s, __FUNCTION__);
> + delete_copyout (FLAG_COPYOUT | FLAG_FINALIZE, h, s, async, __FUNCTION__);
> }
>
> static void
> -update_dev_host (int is_dev, void *h, size_t s)
> +update_dev_host (int is_dev, void *h, size_t s, int async)
> {
> splay_tree_key n;
> void *d;
> @@ -704,11 +762,17 @@ static void
> d = (void *) (n->tgt->tgt_start + n->tgt_offset
> + (uintptr_t) h - n->host_start);
>
> + if (async > acc_async_sync)
> + acc_dev->openacc.async_set_async_func (async);
> +
> if (is_dev)
> acc_dev->host2dev_func (acc_dev->target_id, d, h, s);
> else
> acc_dev->dev2host_func (acc_dev->target_id, h, d, s);
>
> + if (async > acc_async_sync)
> + acc_dev->openacc.async_set_async_func (acc_async_sync);
> +
> gomp_mutex_unlock (&acc_dev->lock);
> }
>
> @@ -715,16 +779,28 @@ static void
> void
> acc_update_device (void *h, size_t s)
> {
> - update_dev_host (1, h, s);
> + update_dev_host (1, h, s, acc_async_sync);
> }
>
> void
> +acc_update_device_async (void *h, size_t s, int async)
> +{
> + update_dev_host (1, h, s, async);
> +}
> +
> +void
> acc_update_self (void *h, size_t s)
> {
> - update_dev_host (0, h, s);
> + update_dev_host (0, h, s, acc_async_sync);
> }
>
> void
> +acc_update_self_async (void *h, size_t s, int async)
> +{
> + update_dev_host (0, h, s, async);
> +}
> +
> +void
> gomp_acc_insert_pointer (size_t mapnum, void **hostaddrs, size_t *sizes,
> void *kinds)
> {
> Index: libgomp/openacc.f90
Given that this includes Fortran changes, you should have copied the
fortran mailing list. Those changes look fairly mechanical and
reasonable though.
> ===================================================================
> --- libgomp/openacc.f90 (revision 264192)
> +++ libgomp/openacc.f90 (working copy)
> @@ -332,6 +332,150 @@ module openacc_internal
> logical acc_is_present_array_h
> type (*), dimension (..), contiguous :: a
> end function
> +
> + subroutine acc_copyin_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_copyin_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_copyin_async_array_h (a, async)
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_create_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_create_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_create_async_array_h (a, async)
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_copyout_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_copyout_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_copyout_async_array_h (a, async)
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_delete_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_delete_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_delete_async_array_h (a, async)
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_update_device_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_update_device_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_update_device_async_array_h (a, async)
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_update_self_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_update_self_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_update_self_async_array_h (a, async)
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + end subroutine
> end interface
>
> interface
> @@ -510,6 +654,60 @@ module openacc_internal
> type (*), dimension (*) :: a
> integer (c_size_t), value :: len
> end function
> +
> + subroutine acc_copyin_async_l (a, len, async) &
> + bind (C, name = "acc_copyin_async")
> + use iso_c_binding, only: c_size_t, c_int
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_size_t), value :: len
> + integer (c_int), value :: async
> + end subroutine
> +
> + subroutine acc_create_async_l (a, len, async) &
> + bind (C, name = "acc_create_async")
> + use iso_c_binding, only: c_size_t, c_int
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_size_t), value :: len
> + integer (c_int), value :: async
> + end subroutine
> +
> + subroutine acc_copyout_async_l (a, len, async) &
> + bind (C, name = "acc_copyout_async")
> + use iso_c_binding, only: c_size_t, c_int
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_size_t), value :: len
> + integer (c_int), value :: async
> + end subroutine
> +
> + subroutine acc_delete_async_l (a, len, async) &
> + bind (C, name = "acc_delete_async")
> + use iso_c_binding, only: c_size_t, c_int
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_size_t), value :: len
> + integer (c_int), value :: async
> + end subroutine
> +
> + subroutine acc_update_device_async_l (a, len, async) &
> + bind (C, name = "acc_update_device_async")
> + use iso_c_binding, only: c_size_t, c_int
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_size_t), value :: len
> + integer (c_int), value :: async
> + end subroutine
> +
> + subroutine acc_update_self_async_l (a, len, async) &
> + bind (C, name = "acc_update_self_async")
> + use iso_c_binding, only: c_size_t, c_int
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_size_t), value :: len
> + integer (c_int), value :: async
> + end subroutine
> end interface
> end module
>
> @@ -529,6 +727,8 @@ module openacc
> public :: acc_copyin, acc_present_or_copyin, acc_pcopyin, acc_create
> public :: acc_present_or_create, acc_pcreate, acc_copyout, acc_delete
> public :: acc_update_device, acc_update_self, acc_is_present
> + public :: acc_copyin_async, acc_create_async, acc_copyout_async
> + public :: acc_delete_async, acc_update_device_async, acc_update_self_async
>
> integer, parameter :: openacc_version = 201306
>
> @@ -694,6 +894,42 @@ module openacc
> ! acc_memcpy_to_device: Only available in C/C++
> ! acc_memcpy_from_device: Only available in C/C++
>
> + interface acc_copyin_async
> + procedure :: acc_copyin_async_32_h
> + procedure :: acc_copyin_async_64_h
> + procedure :: acc_copyin_async_array_h
> + end interface
> +
> + interface acc_create_async
> + procedure :: acc_create_async_32_h
> + procedure :: acc_create_async_64_h
> + procedure :: acc_create_async_array_h
> + end interface
> +
> + interface acc_copyout_async
> + procedure :: acc_copyout_async_32_h
> + procedure :: acc_copyout_async_64_h
> + procedure :: acc_copyout_async_array_h
> + end interface
> +
> + interface acc_delete_async
> + procedure :: acc_delete_async_32_h
> + procedure :: acc_delete_async_64_h
> + procedure :: acc_delete_async_array_h
> + end interface
> +
> + interface acc_update_device_async
> + procedure :: acc_update_device_async_32_h
> + procedure :: acc_update_device_async_64_h
> + procedure :: acc_update_device_async_array_h
> + end interface
> +
> + interface acc_update_self_async
> + procedure :: acc_update_self_async_32_h
> + procedure :: acc_update_self_async_64_h
> + procedure :: acc_update_self_async_array_h
> + end interface
> +
> end module
>
> function acc_get_num_devices_h (d)
> @@ -1078,3 +1314,189 @@ function acc_is_present_array_h (a)
> type (*), dimension (..), contiguous :: a
> acc_is_present_array_h = acc_is_present_l (a, sizeof (a)) == 1
> end function
> +
> +subroutine acc_copyin_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t, c_size_t, c_int
> + use openacc_internal, only: acc_copyin_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + call acc_copyin_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_copyin_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t, c_size_t, c_int
> + use openacc_internal, only: acc_copyin_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + call acc_copyin_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_copyin_async_array_h (a, async)
> + use iso_c_binding, only: c_int
> + use openacc_internal, only: acc_copyin_async_l
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + call acc_copyin_async_l (a, sizeof (a), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_create_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t, c_size_t, c_int
> + use openacc_internal, only: acc_create_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + call acc_create_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_create_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t, c_size_t, c_int
> + use openacc_internal, only: acc_create_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + call acc_create_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_create_async_array_h (a, async)
> + use iso_c_binding, only: c_int
> + use openacc_internal, only: acc_create_async_l
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + call acc_create_async_l (a, sizeof (a), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_copyout_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t, c_size_t, c_int
> + use openacc_internal, only: acc_copyout_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + call acc_copyout_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_copyout_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t, c_size_t, c_int
> + use openacc_internal, only: acc_copyout_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + call acc_copyout_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_copyout_async_array_h (a, async)
> + use iso_c_binding, only: c_int
> + use openacc_internal, only: acc_copyout_async_l
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + call acc_copyout_async_l (a, sizeof (a), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_delete_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t, c_size_t, c_int
> + use openacc_internal, only: acc_delete_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + call acc_delete_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_delete_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t, c_size_t, c_int
> + use openacc_internal, only: acc_delete_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + call acc_delete_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_delete_async_array_h (a, async)
> + use iso_c_binding, only: c_int
> + use openacc_internal, only: acc_delete_async_l
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + call acc_delete_async_l (a, sizeof (a), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_update_device_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t, c_size_t, c_int
> + use openacc_internal, only: acc_update_device_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + call acc_update_device_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_update_device_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t, c_size_t, c_int
> + use openacc_internal, only: acc_update_device_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + call acc_update_device_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_update_device_async_array_h (a, async)
> + use iso_c_binding, only: c_int
> + use openacc_internal, only: acc_update_device_async_l
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + call acc_update_device_async_l (a, sizeof (a), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_update_self_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t, c_size_t, c_int
> + use openacc_internal, only: acc_update_self_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + call acc_update_self_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_update_self_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t, c_size_t, c_int
> + use openacc_internal, only: acc_update_self_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + call acc_update_self_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_update_self_async_array_h (a, async)
> + use iso_c_binding, only: c_int
> + use openacc_internal, only: acc_update_self_async_l
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + call acc_update_self_async_l (a, sizeof (a), int (async, kind = c_int))
> +end subroutine
> Index: libgomp/openacc.h
> ===================================================================
> --- libgomp/openacc.h (revision 264192)
> +++ libgomp/openacc.h (working copy)
> @@ -115,6 +115,16 @@ void acc_copyout_finalize_async (void *, size_t, i
> void acc_delete_finalize (void *, size_t) __GOACC_NOTHROW;
> void acc_delete_finalize_async (void *, size_t, int) __GOACC_NOTHROW;
>
> +/* Async functions, specified in OpenACC 2.5. */
> +void acc_copyin_async (void *, size_t, int) __GOACC_NOTHROW;
> +void acc_create_async (void *, size_t, int) __GOACC_NOTHROW;
> +void acc_copyout_async (void *, size_t, int) __GOACC_NOTHROW;
> +void acc_delete_async (void *, size_t, int) __GOACC_NOTHROW;
> +void acc_update_device_async (void *, size_t, int) __GOACC_NOTHROW;
> +void acc_update_self_async (void *, size_t, int) __GOACC_NOTHROW;
> +void acc_memcpy_to_device_async (void *, void *, size_t, int) __GOACC_NOTHROW;
> +void acc_memcpy_from_device_async (void *, void *, size_t, int) __GOACC_NOTHROW;
> +
> /* CUDA-specific routines. */
> void *acc_get_current_cuda_device (void) __GOACC_NOTHROW;
> void *acc_get_current_cuda_context (void) __GOACC_NOTHROW;
> Index: libgomp/openacc_lib.h
I don't see a test case for this. I believe that openacc_lib.h is used
by fixed-mode Fortran programs (those that end in a .f). Can you add a
fixed-mode version of lib-16.f90?
> ===================================================================
> --- libgomp/openacc_lib.h (revision 264192)
> +++ libgomp/openacc_lib.h (working copy)
> @@ -403,3 +403,159 @@
>
> ! acc_memcpy_to_device: Only available in C/C++
> ! acc_memcpy_from_device: Only available in C/C++
> +
> + interface acc_copyin_async
> + subroutine acc_copyin_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_copyin_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_copyin_async_array_h (a, async_)
> + import acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async_
> + end subroutine
> + end interface
> +
> + interface acc_create_async
> + subroutine acc_create_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_create_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_create_async_array_h (a, async_)
> + import acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async_
> + end subroutine
> + end interface
> +
> + interface acc_copyout_async
> + subroutine acc_copyout_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_copyout_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_copyout_async_array_h (a, async_)
> + import acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async_
> + end subroutine
> + end interface
> +
> + interface acc_delete_async
> + subroutine acc_delete_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_delete_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_delete_async_array_h (a, async_)
> + import acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async_
> + end subroutine
> + end interface
> +
> + interface acc_update_device_async
> + subroutine acc_update_device_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_update_device_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_update_device_async_array_h (a, async_)
> + import acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async_
> + end subroutine
> + end interface
> +
> + interface acc_update_self_async
> + subroutine acc_update_self_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_update_self_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_update_self_async_array_h (a, async_)
> + import acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async_
> + end subroutine
> + end interface
> Index: libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c
> ===================================================================
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c (nonexistent)
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c (working copy)
> @@ -0,0 +1,42 @@
> +/* { dg-do run } */
> +/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
> +
> +#include <string.h>
> +#include <stdlib.h>
> +#include <openacc.h>
> +
> +int
> +main (int argc, char **argv)
> +{
> + const int N = 256;
> + int i;
> + int async = 8;
> + unsigned char *h;
> +
> + h = (unsigned char *) malloc (N);
> +
> + for (i = 0; i < N; i++)
> + {
> + h[i] = i;
> + }
> +
> + acc_copyin_async (h, N, async);
> +
> + memset (h, 0, N);
> +
> + acc_wait (async);
> +
> + acc_copyout_async (h, N, async + 1);
> +
> + acc_wait (async + 1);
> +
> + for (i = 0; i < N; i++)
> + {
> + if (h[i] != i)
> + abort ();
> + }
> +
> + free (h);
> +
> + return 0;
> +}
> Index: libgomp/testsuite/libgomp.oacc-c-c++-common/lib-95.c
> ===================================================================
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-95.c (nonexistent)
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-95.c (working copy)
> @@ -0,0 +1,45 @@
> +/* { dg-do run } */
> +/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
> +
> +#include <string.h>
> +#include <stdlib.h>
> +#include <openacc.h>
> +
> +int
> +main (int argc, char **argv)
> +{
> + const int N = 256;
> + int i, q = 5;
> + unsigned char *h, *g;
> + void *d;
> +
> + h = (unsigned char *) malloc (N);
> + g = (unsigned char *) malloc (N);
> + for (i = 0; i < N; i++)
> + {
> + g[i] = i;
> + }
> +
> + acc_create_async (h, N, q);
> +
> + acc_memcpy_to_device_async (acc_deviceptr (h), g, N, q);
> + memset (&h[0], 0, N);
> +
> + acc_wait (q);
> +
> + acc_update_self_async (h, N, q + 1);
> + acc_delete_async (h, N, q + 1);
> +
> + acc_wait (q + 1);
> +
> + for (i = 0; i < N; i++)
> + {
> + if (h[i] != i)
> + abort ();
> + }
> +
> + free (h);
> + free (g);
> +
> + return 0;
> +}
> Index: libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90
> ===================================================================
> --- libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90 (nonexistent)
> +++ libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90 (working copy)
> @@ -0,0 +1,57 @@
> +! { dg-do run }
> +! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } }
> +
> +program main
> + use openacc
> + implicit none
> +
> + integer, parameter :: N = 256
> + integer, allocatable :: h(:)
> + integer :: i
> + integer :: async = 5
> +
> + allocate (h(N))
> +
> + do i = 1, N
> + h(i) = i
> + end do
> +
> + call acc_copyin (h)
> +
> + do i = 1, N
> + h(i) = i + i
> + end do
> +
> + call acc_update_device_async (h, sizeof (h), async)
> +
> + if (acc_is_present (h) .neqv. .TRUE.) call abort
> +
> + h(:) = 0
> +
> + call acc_copyout_async (h, sizeof (h), async)
> +
> + call acc_wait (async)
> +
> + do i = 1, N
> + if (h(i) /= i + i) call abort
> + end do
> +
> + call acc_copyin (h, sizeof (h))
> +
> + h(:) = 0
> +
> + call acc_update_self_async (h, sizeof (h), async)
> +
> + if (acc_is_present (h) .neqv. .TRUE.) call abort
> +
> + do i = 1, N
> + if (h(i) /= i + i) call abort
> + end do
> +
> + call acc_delete_async (h, async)
> +
> + call acc_wait (async)
> +
> + if (acc_is_present (h) .neqv. .FALSE.) call abort
> +
> +end program
>
While I can't approve this patch, it seems reasonable to me. I like how
you cleaned up things from OG8 (e.g., replacing return (n ? 1 : 0) with
return n != NULL'). Are there any other OG8 async patches in your queue?
Thanks,
Cesar
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH, OpenACC 2.5, libgomp] Add *_async versions of runtime library API functions
2018-09-10 17:22 ` Cesar Philippidis
@ 2018-09-11 10:35 ` Chung-Lin Tang
2018-11-30 20:42 ` Thomas Schwinge
1 sibling, 0 replies; 7+ messages in thread
From: Chung-Lin Tang @ 2018-09-11 10:35 UTC (permalink / raw)
To: Cesar Philippidis, cltang, gcc-patches, Jakub Jelinek
On 2018/9/11 1:22 AM, Cesar Philippidis wrote:
> On 09/10/2018 08:04 AM, Chung-Lin Tang wrote:
>
>> GOACC_2.0 {
>> Index: libgomp/oacc-mem.c
>> ===================================================================
>> --- libgomp/oacc-mem.c (revision 264192)
>> +++ libgomp/oacc-mem.c (working copy)
>> @@ -153,8 +153,9 @@ acc_free (void *d)
>> gomp_fatal ("error in freeing device memory in %s", __FUNCTION__);
>> }
>>
>> -void
>> -acc_memcpy_to_device (void *d, void *h, size_t s)
>> +static void
>> +memcpy_tofrom_device (bool from, void *d, void *h, size_t s, int async,
>> + const char *libfnname)
>
> This showed up oddly in the diff, but memcpy_tofrom_device is a new
> internal function that's not part of the public API. It's nice that you
> were able to merge the to/from functions together. I think this is safe
> in terms of backwards compatibility.
Yes, this patch only adds new API functions, though some internal refactoring was done.
No compatibility was compromised, if that wasn't apparent.
> While I can't approve this patch, it seems reasonable to me. I like how
> you cleaned up things from OG8 (e.g., replacing return (n ? 1 : 0) with
> return n != NULL'). Are there any other OG8 async patches in your queue?
Yeah, there are more, this is just a smaller, independent part of the whole set of changes,
as you also probably know.
Thanks,
Chung-Lin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH, OpenACC 2.5, libgomp] Add *_async versions of runtime library API functions
2018-09-10 15:04 [PATCH, OpenACC 2.5, libgomp] Add *_async versions of runtime library API functions Chung-Lin Tang
2018-09-10 17:22 ` Cesar Philippidis
@ 2018-11-01 22:01 ` Thomas Schwinge
2021-06-08 17:32 ` Thomas Schwinge
2 siblings, 0 replies; 7+ messages in thread
From: Thomas Schwinge @ 2018-11-01 22:01 UTC (permalink / raw)
To: Chung-Lin Tang; +Cc: gcc-patches, Jakub Jelinek, Cesar Philippidis
Hi Chung-Lin!
On Mon, 10 Sep 2018 23:04:18 +0800, Chung-Lin Tang <chunglin_tang@mentor.com> wrote:
> This patch adds *_async versions of several OpenACC runtime library API functions,
> which is to allow execution of a function asynchronously on particular async
> stream, an addition to the standard since 2.5. Specifically, these functions:
>
> acc_copyin_async
> acc_copyout_async
> acc_copyout_finalize_async
> acc_create_async
> acc_delete_async
> acc_delete_finalize_async
> acc_memcpy_from_device_async
> acc_memcpy_to_device_async
> acc_update_device_async
> acc_update_self_async
>
> which have an additional 'int async' argument in additional from the non-async version.
>
> libgomp tested with offloading with no regressions, is this okay for trunk?
Thanks, approved. To record the review effort, please include
"Reviewed-by: Thomas Schwinge <thomas@codesourcery.com>" in the commit
log, see <https://gcc.gnu.org/wiki/Reviewed-by>.
Grüße
Thomas
> 2018-09-10 Chung-Lin Tang <cltang@codesourcery.com>
>
> libgomp/
> * oacc-mem.c (memcpy_tofrom_device): New function, combined from
> acc_memcpy_to/from_device functions, now with async parameter.
> (acc_memcpy_to_device): Modify to use memcpy_tofrom_device.
> (acc_memcpy_from_device): Likewise.
> (acc_memcpy_to_device_async): New API function.
> (acc_memcpy_from_device_async): Likewise.
> (present_create_copy): Add async parameter and async setting/unsetting.
> (acc_create): Adjust present_create_copy call.
> (acc_copyin): Likewise.
> (acc_present_or_create): Likewise.
> (acc_present_or_copyin): Likewise.
> (acc_create_async): New API function.
> (acc_copyin_async): New API function.
> (delete_copyout): Add async parameter and async setting/unsetting.
> (acc_delete): Adjust delete_copyout call.
> (acc_copyout): Likewise.
> (acc_delete_async): New API function.
> (acc_copyout_async): Likewise.
> (update_dev_host): Add async parameter and async setting/unsetting.
> (acc_update_device): Adjust update_dev_host call.
> (acc_update_self): Likewise.
> (acc_update_device_async): New API function.
> (acc_update_self_async): Likewise.
> * openacc.h (acc_copyin_async): Declare new API function.
> (acc_create_async): Likewise.
> (acc_copyout_async): Likewise.
> (acc_delete_async): Likewise.
> (acc_update_device_async): Likewise.
> (acc_update_self_async): Likewise.
> (acc_memcpy_to_device_async): Likewise.
> (acc_memcpy_from_device_async): Likewise.
> * openacc_lib.h (acc_copyin_async_32_h): New subroutine.
> (acc_copyin_async_64_h): New subroutine.
> (acc_copyin_async_array_h): New subroutine.
> (acc_create_async_32_h): New subroutine.
> (acc_create_async_64_h): New subroutine.
> (acc_create_async_array_h): New subroutine.
> (acc_copyout_async_32_h): New subroutine.
> (acc_copyout_async_64_h): New subroutine.
> (acc_copyout_async_array_h): New subroutine.
> (acc_delete_async_32_h): New subroutine.
> (acc_delete_async_64_h): New subroutine.
> (acc_delete_async_array_h): New subroutine.
> (acc_update_device_async_32_h): New subroutine.
> (acc_update_device_async_64_h): New subroutine.
> (acc_update_device_async_array_h): New subroutine.
> (acc_update_self_async_32_h): New subroutine.
> (acc_update_self_async_64_h): New subroutine.
> (acc_update_self_async_array_h): New subroutine.
> * openacc.f90 (acc_copyin_async_32_h): New subroutine.
> (acc_copyin_async_64_h): New subroutine.
> (acc_copyin_async_array_h): New subroutine.
> (acc_create_async_32_h): New subroutine.
> (acc_create_async_64_h): New subroutine.
> (acc_create_async_array_h): New subroutine.
> (acc_copyout_async_32_h): New subroutine.
> (acc_copyout_async_64_h): New subroutine.
> (acc_copyout_async_array_h): New subroutine.
> (acc_delete_async_32_h): New subroutine.
> (acc_delete_async_64_h): New subroutine.
> (acc_delete_async_array_h): New subroutine.
> (acc_update_device_async_32_h): New subroutine.
> (acc_update_device_async_64_h): New subroutine.
> (acc_update_device_async_array_h): New subroutine.
> (acc_update_self_async_32_h): New subroutine.
> (acc_update_self_async_64_h): New subroutine.
> (acc_update_self_async_array_h): New subroutine.
> * libgomp.map (OACC_2.5): Add acc_copyin_async*, acc_copyout_async*,
> acc_copyout_finalize_async*, acc_create_async*, acc_delete_async*,
> acc_delete_finalize_async*, acc_memcpy_from_device_async*,
> acc_memcpy_to_device_async*, acc_update_device_async*, and
> acc_update_self_async* entries.
> * testsuite/libgomp.oacc-c-c++-common/lib-94.c: New test.
> * testsuite/libgomp.oacc-c-c++-common/lib-95.c: New test.
> * testsuite/libgomp.oacc-fortran/lib-16.f90: New test.
> Index: libgomp/libgomp.map
> ===================================================================
> --- libgomp/libgomp.map (revision 264192)
> +++ libgomp/libgomp.map (working copy)
> @@ -388,14 +388,48 @@ OACC_2.0.1 {
>
> OACC_2.5 {
> global:
> + acc_copyin_async;
> + acc_copyin_async_32_h_;
> + acc_copyin_async_64_h_;
> + acc_copyin_async_array_h_;
> + acc_copyout_async;
> + acc_copyout_async_32_h_;
> + acc_copyout_async_64_h_;
> + acc_copyout_async_array_h_;
> acc_copyout_finalize;
> acc_copyout_finalize_32_h_;
> acc_copyout_finalize_64_h_;
> acc_copyout_finalize_array_h_;
> + acc_copyout_finalize_async;
> + acc_copyout_finalize_async_32_h_;
> + acc_copyout_finalize_async_64_h_;
> + acc_copyout_finalize_async_array_h_;
> + acc_create_async;
> + acc_create_async_32_h_;
> + acc_create_async_64_h_;
> + acc_create_async_array_h_;
> + acc_delete_async;
> + acc_delete_async_32_h_;
> + acc_delete_async_64_h_;
> + acc_delete_async_array_h_;
> acc_delete_finalize;
> acc_delete_finalize_32_h_;
> acc_delete_finalize_64_h_;
> acc_delete_finalize_array_h_;
> + acc_delete_finalize_async;
> + acc_delete_finalize_async_32_h_;
> + acc_delete_finalize_async_64_h_;
> + acc_delete_finalize_async_array_h_;
> + acc_memcpy_from_device_async;
> + acc_memcpy_to_device_async;
> + acc_update_device_async;
> + acc_update_device_async_32_h_;
> + acc_update_device_async_64_h_;
> + acc_update_device_async_array_h_;
> + acc_update_self_async;
> + acc_update_self_async_32_h_;
> + acc_update_self_async_64_h_;
> + acc_update_self_async_array_h_;
> } OACC_2.0.1;
>
> GOACC_2.0 {
> Index: libgomp/oacc-mem.c
> ===================================================================
> --- libgomp/oacc-mem.c (revision 264192)
> +++ libgomp/oacc-mem.c (working copy)
> @@ -153,8 +153,9 @@ acc_free (void *d)
> gomp_fatal ("error in freeing device memory in %s", __FUNCTION__);
> }
>
> -void
> -acc_memcpy_to_device (void *d, void *h, size_t s)
> +static void
> +memcpy_tofrom_device (bool from, void *d, void *h, size_t s, int async,
> + const char *libfnname)
> {
> /* No need to call lazy open here, as the device pointer must have
> been obtained from a routine that did that. */
> @@ -164,31 +165,49 @@ acc_free (void *d)
>
> if (thr->dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
> {
> - memmove (d, h, s);
> + if (from)
> + memmove (h, d, s);
> + else
> + memmove (d, h, s);
> return;
> }
>
> - if (!thr->dev->host2dev_func (thr->dev->target_id, d, h, s))
> - gomp_fatal ("error in %s", __FUNCTION__);
> + if (async > acc_async_sync)
> + thr->dev->openacc.async_set_async_func (async);
> +
> + bool ret = (from
> + ? thr->dev->dev2host_func (thr->dev->target_id, h, d, s)
> + : thr->dev->host2dev_func (thr->dev->target_id, d, h, s));
> +
> + if (async > acc_async_sync)
> + thr->dev->openacc.async_set_async_func (acc_async_sync);
> +
> + if (!ret)
> + gomp_fatal ("error in %s", libfnname);
> }
>
> void
> -acc_memcpy_from_device (void *h, void *d, size_t s)
> +acc_memcpy_to_device (void *d, void *h, size_t s)
> {
> - /* No need to call lazy open here, as the device pointer must have
> - been obtained from a routine that did that. */
> - struct goacc_thread *thr = goacc_thread ();
> + memcpy_tofrom_device (false, d, h, s, acc_async_sync, __FUNCTION__);
> +}
>
> - assert (thr && thr->dev);
> +void
> +acc_memcpy_to_device_async (void *d, void *h, size_t s, int async)
> +{
> + memcpy_tofrom_device (false, d, h, s, async, __FUNCTION__);
> +}
>
> - if (thr->dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
> - {
> - memmove (h, d, s);
> - return;
> - }
> +void
> +acc_memcpy_from_device (void *h, void *d, size_t s)
> +{
> + memcpy_tofrom_device (true, d, h, s, acc_async_sync, __FUNCTION__);
> +}
>
> - if (!thr->dev->dev2host_func (thr->dev->target_id, h, d, s))
> - gomp_fatal ("error in %s", __FUNCTION__);
> +void
> +acc_memcpy_from_device_async (void *h, void *d, size_t s, int async)
> +{
> + memcpy_tofrom_device (true, d, h, s, async, __FUNCTION__);
> }
>
> /* Return the device pointer that corresponds to host data H. Or NULL
> @@ -428,7 +447,7 @@ acc_unmap_data (void *h)
> #define FLAG_COPY (1 << 2)
>
> static void *
> -present_create_copy (unsigned f, void *h, size_t s)
> +present_create_copy (unsigned f, void *h, size_t s, int async)
> {
> void *d;
> splay_tree_key n;
> @@ -490,11 +509,17 @@ static void *
>
> gomp_mutex_unlock (&acc_dev->lock);
>
> + if (async > acc_async_sync)
> + acc_dev->openacc.async_set_async_func (async);
> +
> tgt = gomp_map_vars (acc_dev, mapnum, &hostaddrs, NULL, &s, &kinds, true,
> GOMP_MAP_VARS_OPENACC);
> /* Initialize dynamic refcount. */
> tgt->list[0].key->dynamic_refcount = 1;
>
> + if (async > acc_async_sync)
> + acc_dev->openacc.async_set_async_func (acc_async_sync);
> +
> gomp_mutex_lock (&acc_dev->lock);
>
> d = tgt->to_free;
> @@ -510,19 +535,32 @@ static void *
> void *
> acc_create (void *h, size_t s)
> {
> - return present_create_copy (FLAG_PRESENT | FLAG_CREATE, h, s);
> + return present_create_copy (FLAG_PRESENT | FLAG_CREATE, h, s, acc_async_sync);
> }
>
> +void
> +acc_create_async (void *h, size_t s, int async)
> +{
> + present_create_copy (FLAG_PRESENT | FLAG_CREATE, h, s, async);
> +}
> +
> void *
> acc_copyin (void *h, size_t s)
> {
> - return present_create_copy (FLAG_PRESENT | FLAG_CREATE | FLAG_COPY, h, s);
> + return present_create_copy (FLAG_PRESENT | FLAG_CREATE | FLAG_COPY, h, s,
> + acc_async_sync);
> }
>
> +void
> +acc_copyin_async (void *h, size_t s, int async)
> +{
> + present_create_copy (FLAG_PRESENT | FLAG_CREATE | FLAG_COPY, h, s, async);
> +}
> +
> void *
> acc_present_or_create (void *h, size_t s)
> {
> - return present_create_copy (FLAG_PRESENT | FLAG_CREATE, h, s);
> + return present_create_copy (FLAG_PRESENT | FLAG_CREATE, h, s, acc_async_sync);
> }
>
> /* acc_pcreate is acc_present_or_create by a different name. */
> @@ -539,7 +577,8 @@ acc_pcreate (void *h, size_t s)
> void *
> acc_present_or_copyin (void *h, size_t s)
> {
> - return present_create_copy (FLAG_PRESENT | FLAG_CREATE | FLAG_COPY, h, s);
> + return present_create_copy (FLAG_PRESENT | FLAG_CREATE | FLAG_COPY, h, s,
> + acc_async_sync);
> }
>
> /* acc_pcopyin is acc_present_or_copyin by a different name. */
> @@ -557,7 +596,7 @@ acc_pcopyin (void *h, size_t s)
> #define FLAG_FINALIZE (1 << 1)
>
> static void
> -delete_copyout (unsigned f, void *h, size_t s, const char *libfnname)
> +delete_copyout (unsigned f, void *h, size_t s, int async, const char *libfnname)
> {
> size_t host_size;
> splay_tree_key n;
> @@ -633,7 +672,13 @@ static void
> }
>
> if (f & FLAG_COPYOUT)
> - acc_dev->dev2host_func (acc_dev->target_id, h, d, s);
> + {
> + if (async > acc_async_sync)
> + acc_dev->openacc.async_set_async_func (async);
> + acc_dev->dev2host_func (acc_dev->target_id, h, d, s);
> + if (async > acc_async_sync)
> + acc_dev->openacc.async_set_async_func (acc_async_sync);
> + }
>
> gomp_remove_var (acc_dev, n);
> }
> @@ -644,41 +689,54 @@ static void
> void
> acc_delete (void *h , size_t s)
> {
> - delete_copyout (0, h, s, __FUNCTION__);
> + delete_copyout (0, h, s, acc_async_sync, __FUNCTION__);
> }
>
> void
> +acc_delete_async (void *h , size_t s, int async)
> +{
> + delete_copyout (0, h, s, async, __FUNCTION__);
> +}
> +
> +void
> acc_delete_finalize (void *h , size_t s)
> {
> - delete_copyout (FLAG_FINALIZE, h, s, __FUNCTION__);
> + delete_copyout (FLAG_FINALIZE, h, s, acc_async_sync, __FUNCTION__);
> }
>
> void
> acc_delete_finalize_async (void *h , size_t s, int async)
> {
> - delete_copyout (FLAG_FINALIZE, h, s, __FUNCTION__);
> + delete_copyout (FLAG_FINALIZE, h, s, async, __FUNCTION__);
> }
>
> void
> acc_copyout (void *h, size_t s)
> {
> - delete_copyout (FLAG_COPYOUT, h, s, __FUNCTION__);
> + delete_copyout (FLAG_COPYOUT, h, s, acc_async_sync, __FUNCTION__);
> }
>
> void
> +acc_copyout_async (void *h, size_t s, int async)
> +{
> + delete_copyout (FLAG_COPYOUT, h, s, async, __FUNCTION__);
> +}
> +
> +void
> acc_copyout_finalize (void *h, size_t s)
> {
> - delete_copyout (FLAG_COPYOUT | FLAG_FINALIZE, h, s, __FUNCTION__);
> + delete_copyout (FLAG_COPYOUT | FLAG_FINALIZE, h, s, acc_async_sync,
> + __FUNCTION__);
> }
>
> void
> acc_copyout_finalize_async (void *h, size_t s, int async)
> {
> - delete_copyout (FLAG_COPYOUT | FLAG_FINALIZE, h, s, __FUNCTION__);
> + delete_copyout (FLAG_COPYOUT | FLAG_FINALIZE, h, s, async, __FUNCTION__);
> }
>
> static void
> -update_dev_host (int is_dev, void *h, size_t s)
> +update_dev_host (int is_dev, void *h, size_t s, int async)
> {
> splay_tree_key n;
> void *d;
> @@ -704,11 +762,17 @@ static void
> d = (void *) (n->tgt->tgt_start + n->tgt_offset
> + (uintptr_t) h - n->host_start);
>
> + if (async > acc_async_sync)
> + acc_dev->openacc.async_set_async_func (async);
> +
> if (is_dev)
> acc_dev->host2dev_func (acc_dev->target_id, d, h, s);
> else
> acc_dev->dev2host_func (acc_dev->target_id, h, d, s);
>
> + if (async > acc_async_sync)
> + acc_dev->openacc.async_set_async_func (acc_async_sync);
> +
> gomp_mutex_unlock (&acc_dev->lock);
> }
>
> @@ -715,16 +779,28 @@ static void
> void
> acc_update_device (void *h, size_t s)
> {
> - update_dev_host (1, h, s);
> + update_dev_host (1, h, s, acc_async_sync);
> }
>
> void
> +acc_update_device_async (void *h, size_t s, int async)
> +{
> + update_dev_host (1, h, s, async);
> +}
> +
> +void
> acc_update_self (void *h, size_t s)
> {
> - update_dev_host (0, h, s);
> + update_dev_host (0, h, s, acc_async_sync);
> }
>
> void
> +acc_update_self_async (void *h, size_t s, int async)
> +{
> + update_dev_host (0, h, s, async);
> +}
> +
> +void
> gomp_acc_insert_pointer (size_t mapnum, void **hostaddrs, size_t *sizes,
> void *kinds)
> {
> Index: libgomp/openacc.f90
> ===================================================================
> --- libgomp/openacc.f90 (revision 264192)
> +++ libgomp/openacc.f90 (working copy)
> @@ -332,6 +332,150 @@ module openacc_internal
> logical acc_is_present_array_h
> type (*), dimension (..), contiguous :: a
> end function
> +
> + subroutine acc_copyin_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_copyin_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_copyin_async_array_h (a, async)
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_create_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_create_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_create_async_array_h (a, async)
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_copyout_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_copyout_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_copyout_async_array_h (a, async)
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_delete_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_delete_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_delete_async_array_h (a, async)
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_update_device_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_update_device_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_update_device_async_array_h (a, async)
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_update_self_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_update_self_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_update_self_async_array_h (a, async)
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + end subroutine
> end interface
>
> interface
> @@ -510,6 +654,60 @@ module openacc_internal
> type (*), dimension (*) :: a
> integer (c_size_t), value :: len
> end function
> +
> + subroutine acc_copyin_async_l (a, len, async) &
> + bind (C, name = "acc_copyin_async")
> + use iso_c_binding, only: c_size_t, c_int
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_size_t), value :: len
> + integer (c_int), value :: async
> + end subroutine
> +
> + subroutine acc_create_async_l (a, len, async) &
> + bind (C, name = "acc_create_async")
> + use iso_c_binding, only: c_size_t, c_int
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_size_t), value :: len
> + integer (c_int), value :: async
> + end subroutine
> +
> + subroutine acc_copyout_async_l (a, len, async) &
> + bind (C, name = "acc_copyout_async")
> + use iso_c_binding, only: c_size_t, c_int
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_size_t), value :: len
> + integer (c_int), value :: async
> + end subroutine
> +
> + subroutine acc_delete_async_l (a, len, async) &
> + bind (C, name = "acc_delete_async")
> + use iso_c_binding, only: c_size_t, c_int
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_size_t), value :: len
> + integer (c_int), value :: async
> + end subroutine
> +
> + subroutine acc_update_device_async_l (a, len, async) &
> + bind (C, name = "acc_update_device_async")
> + use iso_c_binding, only: c_size_t, c_int
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_size_t), value :: len
> + integer (c_int), value :: async
> + end subroutine
> +
> + subroutine acc_update_self_async_l (a, len, async) &
> + bind (C, name = "acc_update_self_async")
> + use iso_c_binding, only: c_size_t, c_int
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_size_t), value :: len
> + integer (c_int), value :: async
> + end subroutine
> end interface
> end module
>
> @@ -529,6 +727,8 @@ module openacc
> public :: acc_copyin, acc_present_or_copyin, acc_pcopyin, acc_create
> public :: acc_present_or_create, acc_pcreate, acc_copyout, acc_delete
> public :: acc_update_device, acc_update_self, acc_is_present
> + public :: acc_copyin_async, acc_create_async, acc_copyout_async
> + public :: acc_delete_async, acc_update_device_async, acc_update_self_async
>
> integer, parameter :: openacc_version = 201306
>
> @@ -694,6 +894,42 @@ module openacc
> ! acc_memcpy_to_device: Only available in C/C++
> ! acc_memcpy_from_device: Only available in C/C++
>
> + interface acc_copyin_async
> + procedure :: acc_copyin_async_32_h
> + procedure :: acc_copyin_async_64_h
> + procedure :: acc_copyin_async_array_h
> + end interface
> +
> + interface acc_create_async
> + procedure :: acc_create_async_32_h
> + procedure :: acc_create_async_64_h
> + procedure :: acc_create_async_array_h
> + end interface
> +
> + interface acc_copyout_async
> + procedure :: acc_copyout_async_32_h
> + procedure :: acc_copyout_async_64_h
> + procedure :: acc_copyout_async_array_h
> + end interface
> +
> + interface acc_delete_async
> + procedure :: acc_delete_async_32_h
> + procedure :: acc_delete_async_64_h
> + procedure :: acc_delete_async_array_h
> + end interface
> +
> + interface acc_update_device_async
> + procedure :: acc_update_device_async_32_h
> + procedure :: acc_update_device_async_64_h
> + procedure :: acc_update_device_async_array_h
> + end interface
> +
> + interface acc_update_self_async
> + procedure :: acc_update_self_async_32_h
> + procedure :: acc_update_self_async_64_h
> + procedure :: acc_update_self_async_array_h
> + end interface
> +
> end module
>
> function acc_get_num_devices_h (d)
> @@ -1078,3 +1314,189 @@ function acc_is_present_array_h (a)
> type (*), dimension (..), contiguous :: a
> acc_is_present_array_h = acc_is_present_l (a, sizeof (a)) == 1
> end function
> +
> +subroutine acc_copyin_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t, c_size_t, c_int
> + use openacc_internal, only: acc_copyin_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + call acc_copyin_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_copyin_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t, c_size_t, c_int
> + use openacc_internal, only: acc_copyin_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + call acc_copyin_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_copyin_async_array_h (a, async)
> + use iso_c_binding, only: c_int
> + use openacc_internal, only: acc_copyin_async_l
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + call acc_copyin_async_l (a, sizeof (a), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_create_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t, c_size_t, c_int
> + use openacc_internal, only: acc_create_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + call acc_create_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_create_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t, c_size_t, c_int
> + use openacc_internal, only: acc_create_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + call acc_create_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_create_async_array_h (a, async)
> + use iso_c_binding, only: c_int
> + use openacc_internal, only: acc_create_async_l
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + call acc_create_async_l (a, sizeof (a), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_copyout_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t, c_size_t, c_int
> + use openacc_internal, only: acc_copyout_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + call acc_copyout_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_copyout_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t, c_size_t, c_int
> + use openacc_internal, only: acc_copyout_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + call acc_copyout_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_copyout_async_array_h (a, async)
> + use iso_c_binding, only: c_int
> + use openacc_internal, only: acc_copyout_async_l
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + call acc_copyout_async_l (a, sizeof (a), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_delete_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t, c_size_t, c_int
> + use openacc_internal, only: acc_delete_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + call acc_delete_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_delete_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t, c_size_t, c_int
> + use openacc_internal, only: acc_delete_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + call acc_delete_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_delete_async_array_h (a, async)
> + use iso_c_binding, only: c_int
> + use openacc_internal, only: acc_delete_async_l
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + call acc_delete_async_l (a, sizeof (a), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_update_device_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t, c_size_t, c_int
> + use openacc_internal, only: acc_update_device_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + call acc_update_device_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_update_device_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t, c_size_t, c_int
> + use openacc_internal, only: acc_update_device_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + call acc_update_device_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_update_device_async_array_h (a, async)
> + use iso_c_binding, only: c_int
> + use openacc_internal, only: acc_update_device_async_l
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + call acc_update_device_async_l (a, sizeof (a), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_update_self_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t, c_size_t, c_int
> + use openacc_internal, only: acc_update_self_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + call acc_update_self_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_update_self_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t, c_size_t, c_int
> + use openacc_internal, only: acc_update_self_async_l
> + use openacc_kinds, only: acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + call acc_update_self_async_l (a, int (len, kind = c_size_t), int (async, kind = c_int))
> +end subroutine
> +
> +subroutine acc_update_self_async_array_h (a, async)
> + use iso_c_binding, only: c_int
> + use openacc_internal, only: acc_update_self_async_l
> + use openacc_kinds, only: acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async
> + call acc_update_self_async_l (a, sizeof (a), int (async, kind = c_int))
> +end subroutine
> Index: libgomp/openacc.h
> ===================================================================
> --- libgomp/openacc.h (revision 264192)
> +++ libgomp/openacc.h (working copy)
> @@ -115,6 +115,16 @@ void acc_copyout_finalize_async (void *, size_t, i
> void acc_delete_finalize (void *, size_t) __GOACC_NOTHROW;
> void acc_delete_finalize_async (void *, size_t, int) __GOACC_NOTHROW;
>
> +/* Async functions, specified in OpenACC 2.5. */
> +void acc_copyin_async (void *, size_t, int) __GOACC_NOTHROW;
> +void acc_create_async (void *, size_t, int) __GOACC_NOTHROW;
> +void acc_copyout_async (void *, size_t, int) __GOACC_NOTHROW;
> +void acc_delete_async (void *, size_t, int) __GOACC_NOTHROW;
> +void acc_update_device_async (void *, size_t, int) __GOACC_NOTHROW;
> +void acc_update_self_async (void *, size_t, int) __GOACC_NOTHROW;
> +void acc_memcpy_to_device_async (void *, void *, size_t, int) __GOACC_NOTHROW;
> +void acc_memcpy_from_device_async (void *, void *, size_t, int) __GOACC_NOTHROW;
> +
> /* CUDA-specific routines. */
> void *acc_get_current_cuda_device (void) __GOACC_NOTHROW;
> void *acc_get_current_cuda_context (void) __GOACC_NOTHROW;
> Index: libgomp/openacc_lib.h
> ===================================================================
> --- libgomp/openacc_lib.h (revision 264192)
> +++ libgomp/openacc_lib.h (working copy)
> @@ -403,3 +403,159 @@
>
> ! acc_memcpy_to_device: Only available in C/C++
> ! acc_memcpy_from_device: Only available in C/C++
> +
> + interface acc_copyin_async
> + subroutine acc_copyin_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_copyin_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_copyin_async_array_h (a, async_)
> + import acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async_
> + end subroutine
> + end interface
> +
> + interface acc_create_async
> + subroutine acc_create_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_create_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_create_async_array_h (a, async_)
> + import acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async_
> + end subroutine
> + end interface
> +
> + interface acc_copyout_async
> + subroutine acc_copyout_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_copyout_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_copyout_async_array_h (a, async_)
> + import acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async_
> + end subroutine
> + end interface
> +
> + interface acc_delete_async
> + subroutine acc_delete_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_delete_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_delete_async_array_h (a, async_)
> + import acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async_
> + end subroutine
> + end interface
> +
> + interface acc_update_device_async
> + subroutine acc_update_device_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_update_device_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_update_device_async_array_h (a, async_)
> + import acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async_
> + end subroutine
> + end interface
> +
> + interface acc_update_self_async
> + subroutine acc_update_self_async_32_h (a, len, async)
> + use iso_c_binding, only: c_int32_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int32_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_update_self_async_64_h (a, len, async)
> + use iso_c_binding, only: c_int64_t
> + import acc_handle_kind
> + !GCC$ ATTRIBUTES NO_ARG_CHECK :: a
> + type (*), dimension (*) :: a
> + integer (c_int64_t) len
> + integer (acc_handle_kind) async
> + end subroutine
> +
> + subroutine acc_update_self_async_array_h (a, async_)
> + import acc_handle_kind
> + type (*), dimension (..), contiguous :: a
> + integer (acc_handle_kind) async_
> + end subroutine
> + end interface
> Index: libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c
> ===================================================================
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c (nonexistent)
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c (working copy)
> @@ -0,0 +1,42 @@
> +/* { dg-do run } */
> +/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
> +
> +#include <string.h>
> +#include <stdlib.h>
> +#include <openacc.h>
> +
> +int
> +main (int argc, char **argv)
> +{
> + const int N = 256;
> + int i;
> + int async = 8;
> + unsigned char *h;
> +
> + h = (unsigned char *) malloc (N);
> +
> + for (i = 0; i < N; i++)
> + {
> + h[i] = i;
> + }
> +
> + acc_copyin_async (h, N, async);
> +
> + memset (h, 0, N);
> +
> + acc_wait (async);
> +
> + acc_copyout_async (h, N, async + 1);
> +
> + acc_wait (async + 1);
> +
> + for (i = 0; i < N; i++)
> + {
> + if (h[i] != i)
> + abort ();
> + }
> +
> + free (h);
> +
> + return 0;
> +}
> Index: libgomp/testsuite/libgomp.oacc-c-c++-common/lib-95.c
> ===================================================================
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-95.c (nonexistent)
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-95.c (working copy)
> @@ -0,0 +1,45 @@
> +/* { dg-do run } */
> +/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
> +
> +#include <string.h>
> +#include <stdlib.h>
> +#include <openacc.h>
> +
> +int
> +main (int argc, char **argv)
> +{
> + const int N = 256;
> + int i, q = 5;
> + unsigned char *h, *g;
> + void *d;
> +
> + h = (unsigned char *) malloc (N);
> + g = (unsigned char *) malloc (N);
> + for (i = 0; i < N; i++)
> + {
> + g[i] = i;
> + }
> +
> + acc_create_async (h, N, q);
> +
> + acc_memcpy_to_device_async (acc_deviceptr (h), g, N, q);
> + memset (&h[0], 0, N);
> +
> + acc_wait (q);
> +
> + acc_update_self_async (h, N, q + 1);
> + acc_delete_async (h, N, q + 1);
> +
> + acc_wait (q + 1);
> +
> + for (i = 0; i < N; i++)
> + {
> + if (h[i] != i)
> + abort ();
> + }
> +
> + free (h);
> + free (g);
> +
> + return 0;
> +}
> Index: libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90
> ===================================================================
> --- libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90 (nonexistent)
> +++ libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90 (working copy)
> @@ -0,0 +1,57 @@
> +! { dg-do run }
> +! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } }
> +
> +program main
> + use openacc
> + implicit none
> +
> + integer, parameter :: N = 256
> + integer, allocatable :: h(:)
> + integer :: i
> + integer :: async = 5
> +
> + allocate (h(N))
> +
> + do i = 1, N
> + h(i) = i
> + end do
> +
> + call acc_copyin (h)
> +
> + do i = 1, N
> + h(i) = i + i
> + end do
> +
> + call acc_update_device_async (h, sizeof (h), async)
> +
> + if (acc_is_present (h) .neqv. .TRUE.) call abort
> +
> + h(:) = 0
> +
> + call acc_copyout_async (h, sizeof (h), async)
> +
> + call acc_wait (async)
> +
> + do i = 1, N
> + if (h(i) /= i + i) call abort
> + end do
> +
> + call acc_copyin (h, sizeof (h))
> +
> + h(:) = 0
> +
> + call acc_update_self_async (h, sizeof (h), async)
> +
> + if (acc_is_present (h) .neqv. .TRUE.) call abort
> +
> + do i = 1, N
> + if (h(i) /= i + i) call abort
> + end do
> +
> + call acc_delete_async (h, async)
> +
> + call acc_wait (async)
> +
> + if (acc_is_present (h) .neqv. .FALSE.) call abort
> +
> +end program
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH, OpenACC 2.5, libgomp] Add *_async versions of runtime library API functions
2018-09-10 17:22 ` Cesar Philippidis
2018-09-11 10:35 ` Chung-Lin Tang
@ 2018-11-30 20:42 ` Thomas Schwinge
1 sibling, 0 replies; 7+ messages in thread
From: Thomas Schwinge @ 2018-11-30 20:42 UTC (permalink / raw)
To: cltang, gcc-patches; +Cc: Jakub Jelinek
Hi!
On Mon, 10 Sep 2018 10:22:17 -0700, Cesar Philippidis <cesar@codesourcery.com> wrote:
> On 09/10/2018 08:04 AM, Chung-Lin Tang wrote:
> > Index: libgomp/openacc_lib.h
>
> I don't see a test case for this.
Right. (Better test coverage would be desirable generally.)
> I believe that openacc_lib.h is used
> by fixed-mode Fortran programs (those that end in a .f). Can you add a
> fixed-mode version of lib-16.f90?
Not only with fixed-form, but also with free-form Fortran programs.
Committed to trunk in r266683:
commit d084eb0a61d209ee0d852089ea76a672f519883b
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date: Fri Nov 30 20:38:57 2018 +0000
Add libgomp.oacc-fortran/lib-16-2.f90
This is a copy of libgomp.oacc-fortran/lib-16.f90, but does 'include
"openacc_lib.h"' instead of 'use openacc'.
libgomp/
* testsuite/libgomp.oacc-fortran/lib-16-2.f90: New file.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@266683 138bc75d-0d04-0410-961f-82ee72b054a4
---
libgomp/ChangeLog | 4 ++++
libgomp/testsuite/libgomp.oacc-fortran/{lib-16.f90 => lib-16-2.f90} | 3 ++-
libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90 | 1 +
3 files changed, 7 insertions(+), 1 deletion(-)
diff --git libgomp/ChangeLog libgomp/ChangeLog
index d3c1bc36c145..a9dcbd808200 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,7 @@
+2018-11-30 Thomas Schwinge <thomas@codesourcery.com>
+
+ * testsuite/libgomp.oacc-fortran/lib-16-2.f90: New file.
+
2018-10-19 Richard Biener <rguenther@suse.de>
PR tree-optimization/88182
diff --git libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90 libgomp/testsuite/libgomp.oacc-fortran/lib-16-2.f90
similarity index 94%
copy from libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90
copy to libgomp/testsuite/libgomp.oacc-fortran/lib-16-2.f90
index 9701b52dd257..fa76f65912fb 100644
--- libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/lib-16-2.f90
@@ -1,9 +1,10 @@
+! See also "lib-16.f90".
! { dg-do run }
! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } }
program main
- use openacc
implicit none
+ include "openacc_lib.h"
integer, parameter :: N = 256
integer, allocatable :: h(:)
diff --git libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90 libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90
index 9701b52dd257..011f9cf31db4 100644
--- libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90
@@ -1,3 +1,4 @@
+! See also "lib-16-2.f90".
! { dg-do run }
! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } }
Grüße
Thomas
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH, OpenACC 2.5, libgomp] Add *_async versions of runtime library API functions
2018-09-10 15:04 [PATCH, OpenACC 2.5, libgomp] Add *_async versions of runtime library API functions Chung-Lin Tang
2018-09-10 17:22 ` Cesar Philippidis
2018-11-01 22:01 ` Thomas Schwinge
@ 2021-06-08 17:32 ` Thomas Schwinge
2021-07-27 9:22 ` Thomas Schwinge
2 siblings, 1 reply; 7+ messages in thread
From: Thomas Schwinge @ 2021-06-08 17:32 UTC (permalink / raw)
To: Chung-Lin Tang; +Cc: gcc-patches, Julian Brown
Hi Chung-Lin!
;-) It's been a while:
On 2018-09-10T23:04:18+0800, Chung-Lin Tang <chunglin_tang@mentor.com> wrote:
> * testsuite/libgomp.oacc-c-c++-common/lib-94.c: New test.
> * testsuite/libgomp.oacc-c-c++-common/lib-95.c: New test.
> * testsuite/libgomp.oacc-fortran/lib-16.f90: New test.
Do you happen to remember why in these testcases you're using the
following pattern:
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c (nonexistent)
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c (working copy)
> @@ -0,0 +1,42 @@
> +/* { dg-do run } */
> +/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
> +
> +#include <string.h>
> +#include <stdlib.h>
> +#include <openacc.h>
> +
> +int
> +main (int argc, char **argv)
> +{
> + const int N = 256;
> + int i;
> + int async = 8;
> + unsigned char *h;
> +
> + h = (unsigned char *) malloc (N);
> +
> + for (i = 0; i < N; i++)
> + {
> + h[i] = i;
> + }
> +
> + acc_copyin_async (h, N, async);
> +
> + memset (h, 0, N);
> +
> + acc_wait (async);
You first issue 'acc_copyin_async', then (while potentially that's still
accessing 'h') already 'memset' 'h' (potentially overwriting data that
'acc_copyin_async' is still working on), and only then 'acc_wait'?
My understanding of OpenACC would swap 'memset' and 'acc_wait', but maybe
you have a specific reason to do it in this way?
In particular, the GCC nvptx offloading implementation "doesn't seem to
care" (as discussed elsewhere; 'OpenACC "ephemeral" asynchronous
host-to-device copies', etc.) -- but I suppose if you meant to test such
implementation traits here, you'd have commented that?
> +
> + acc_copyout_async (h, N, async + 1);
> +
> + acc_wait (async + 1);
> +
> + for (i = 0; i < N; i++)
> + {
> + if (h[i] != i)
> + abort ();
> + }
> +
> + free (h);
> +
> + return 0;
> +}
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-95.c (nonexistent)
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-95.c (working copy)
> @@ -0,0 +1,45 @@
> +/* { dg-do run } */
> +/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
> +
> +#include <string.h>
> +#include <stdlib.h>
> +#include <openacc.h>
> +
> +int
> +main (int argc, char **argv)
> +{
> + const int N = 256;
> + int i, q = 5;
> + unsigned char *h, *g;
> + void *d;
> +
> + h = (unsigned char *) malloc (N);
> + g = (unsigned char *) malloc (N);
> + for (i = 0; i < N; i++)
> + {
> + g[i] = i;
> + }
> +
> + acc_create_async (h, N, q);
> +
> + acc_memcpy_to_device_async (acc_deviceptr (h), g, N, q);
> + memset (&h[0], 0, N);
> +
> + acc_wait (q);
Similar here.
> + acc_update_self_async (h, N, q + 1);
> + acc_delete_async (h, N, q + 1);
> +
> + acc_wait (q + 1);
> +
> + for (i = 0; i < N; i++)
> + {
> + if (h[i] != i)
> + abort ();
> + }
> +
> + free (h);
> + free (g);
> +
> + return 0;
> +}
> --- libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90 (nonexistent)
> +++ libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90 (working copy)
(Later also similarly copied into 'libgomp.oacc-fortran/lib-16-2.f90'.)
Similar:
> @@ -0,0 +1,57 @@
> +! { dg-do run }
> +! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } }
> +
> +program main
> + use openacc
> + implicit none
> +
> + integer, parameter :: N = 256
> + integer, allocatable :: h(:)
> + integer :: i
> + integer :: async = 5
> +
> + allocate (h(N))
> +
> + do i = 1, N
> + h(i) = i
> + end do
> +
> + call acc_copyin (h)
> +
> + do i = 1, N
> + h(i) = i + i
> + end do
> +
> + call acc_update_device_async (h, sizeof (h), async)
> +
> + if (acc_is_present (h) .neqv. .TRUE.) call abort
Don't we need 'acc_wait' here (while 'acc_update_device_async' may still
be reading from 'h'), before overwriting 'h' here:
> +
> + h(:) = 0
> +
> + call acc_copyout_async (h, sizeof (h), async)
> +
> + call acc_wait (async)
> +
> + do i = 1, N
> + if (h(i) /= i + i) call abort
> + end do
> +
> + call acc_copyin (h, sizeof (h))
> +
> + h(:) = 0
> +
> + call acc_update_self_async (h, sizeof (h), async)
> +
> + if (acc_is_present (h) .neqv. .TRUE.) call abort
Don't we need 'acc_wait' here (to make sure we finish device to host copy
of 'h'), before evaluating 'h' here:
> +
> + do i = 1, N
> + if (h(i) /= i + i) call abort
> + end do
> +
> + call acc_delete_async (h, async)
> +
> + call acc_wait (async)
> +
> + if (acc_is_present (h) .neqv. .FALSE.) call abort
> +
> +end program
Julian has patches for most of these (as part of other commits).
Grüße
Thomas
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH, OpenACC 2.5, libgomp] Add *_async versions of runtime library API functions
2021-06-08 17:32 ` Thomas Schwinge
@ 2021-07-27 9:22 ` Thomas Schwinge
0 siblings, 0 replies; 7+ messages in thread
From: Thomas Schwinge @ 2021-07-27 9:22 UTC (permalink / raw)
To: gcc-patches; +Cc: Chung-Lin Tang, Julian Brown
[-- Attachment #1: Type: text/plain, Size: 5648 bytes --]
Hi!
On 2021-06-08T19:32:22+0200, I wrote:
> Hi Chung-Lin!
>
> ;-) It's been a while:
>
> On 2018-09-10T23:04:18+0800, Chung-Lin Tang <chunglin_tang@mentor.com> wrote:
>> * testsuite/libgomp.oacc-c-c++-common/lib-94.c: New test.
>> * testsuite/libgomp.oacc-c-c++-common/lib-95.c: New test.
>> * testsuite/libgomp.oacc-fortran/lib-16.f90: New test.
>
> Do you happen to remember why in these testcases you're using the
> following pattern:
Apparently not ;-) -- no answer/objection, I've thus now pushed
"Fix OpenACC 'async'/'wait' issues in
'libgomp.oacc-c-c++-common/lib-{94,95}.c',
'libgomp.oacc-fortran/lib-16{,-2}.f90'" to master branch in
commit 599e275d7e0b3fb79ff704d4cb2d8fdb0231116e, see attached.
Grüße
Thomas
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c (nonexistent)
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c (working copy)
>> @@ -0,0 +1,42 @@
>> +/* { dg-do run } */
>> +/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
>> +
>> +#include <string.h>
>> +#include <stdlib.h>
>> +#include <openacc.h>
>> +
>> +int
>> +main (int argc, char **argv)
>> +{
>> + const int N = 256;
>> + int i;
>> + int async = 8;
>> + unsigned char *h;
>> +
>> + h = (unsigned char *) malloc (N);
>> +
>> + for (i = 0; i < N; i++)
>> + {
>> + h[i] = i;
>> + }
>> +
>> + acc_copyin_async (h, N, async);
>> +
>> + memset (h, 0, N);
>> +
>> + acc_wait (async);
>
> You first issue 'acc_copyin_async', then (while potentially that's still
> accessing 'h') already 'memset' 'h' (potentially overwriting data that
> 'acc_copyin_async' is still working on), and only then 'acc_wait'?
>
> My understanding of OpenACC would swap 'memset' and 'acc_wait', but maybe
> you have a specific reason to do it in this way?
>
> In particular, the GCC nvptx offloading implementation "doesn't seem to
> care" (as discussed elsewhere; 'OpenACC "ephemeral" asynchronous
> host-to-device copies', etc.) -- but I suppose if you meant to test such
> implementation traits here, you'd have commented that?
>
>> +
>> + acc_copyout_async (h, N, async + 1);
>> +
>> + acc_wait (async + 1);
>> +
>> + for (i = 0; i < N; i++)
>> + {
>> + if (h[i] != i)
>> + abort ();
>> + }
>> +
>> + free (h);
>> +
>> + return 0;
>> +}
>
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-95.c (nonexistent)
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-95.c (working copy)
>> @@ -0,0 +1,45 @@
>> +/* { dg-do run } */
>> +/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
>> +
>> +#include <string.h>
>> +#include <stdlib.h>
>> +#include <openacc.h>
>> +
>> +int
>> +main (int argc, char **argv)
>> +{
>> + const int N = 256;
>> + int i, q = 5;
>> + unsigned char *h, *g;
>> + void *d;
>> +
>> + h = (unsigned char *) malloc (N);
>> + g = (unsigned char *) malloc (N);
>> + for (i = 0; i < N; i++)
>> + {
>> + g[i] = i;
>> + }
>> +
>> + acc_create_async (h, N, q);
>> +
>> + acc_memcpy_to_device_async (acc_deviceptr (h), g, N, q);
>> + memset (&h[0], 0, N);
>> +
>> + acc_wait (q);
>
> Similar here.
>
>> + acc_update_self_async (h, N, q + 1);
>> + acc_delete_async (h, N, q + 1);
>> +
>> + acc_wait (q + 1);
>> +
>> + for (i = 0; i < N; i++)
>> + {
>> + if (h[i] != i)
>> + abort ();
>> + }
>> +
>> + free (h);
>> + free (g);
>> +
>> + return 0;
>> +}
>
>> --- libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90 (nonexistent)
>> +++ libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90 (working copy)
>
> (Later also similarly copied into 'libgomp.oacc-fortran/lib-16-2.f90'.)
>
> Similar:
>
>> @@ -0,0 +1,57 @@
>> +! { dg-do run }
>> +! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } }
>> +
>> +program main
>> + use openacc
>> + implicit none
>> +
>> + integer, parameter :: N = 256
>> + integer, allocatable :: h(:)
>> + integer :: i
>> + integer :: async = 5
>> +
>> + allocate (h(N))
>> +
>> + do i = 1, N
>> + h(i) = i
>> + end do
>> +
>> + call acc_copyin (h)
>> +
>> + do i = 1, N
>> + h(i) = i + i
>> + end do
>> +
>> + call acc_update_device_async (h, sizeof (h), async)
>> +
>> + if (acc_is_present (h) .neqv. .TRUE.) call abort
>
> Don't we need 'acc_wait' here (while 'acc_update_device_async' may still
> be reading from 'h'), before overwriting 'h' here:
>
>> +
>> + h(:) = 0
>> +
>> + call acc_copyout_async (h, sizeof (h), async)
>> +
>> + call acc_wait (async)
>> +
>> + do i = 1, N
>> + if (h(i) /= i + i) call abort
>> + end do
>> +
>> + call acc_copyin (h, sizeof (h))
>> +
>> + h(:) = 0
>> +
>> + call acc_update_self_async (h, sizeof (h), async)
>> +
>> + if (acc_is_present (h) .neqv. .TRUE.) call abort
>
> Don't we need 'acc_wait' here (to make sure we finish device to host copy
> of 'h'), before evaluating 'h' here:
>
>> +
>> + do i = 1, N
>> + if (h(i) /= i + i) call abort
>> + end do
>> +
>> + call acc_delete_async (h, async)
>> +
>> + call acc_wait (async)
>> +
>> + if (acc_is_present (h) .neqv. .FALSE.) call abort
>> +
>> +end program
>
> Julian has patches for most of these (as part of other commits).
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Fix-OpenACC-async-wait-issues-in-libgomp.oacc-c-c-co.patch --]
[-- Type: text/x-diff, Size: 3375 bytes --]
From 599e275d7e0b3fb79ff704d4cb2d8fdb0231116e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Tue, 8 Jun 2021 19:32:22 +0200
Subject: [PATCH] Fix OpenACC 'async'/'wait' issues in
'libgomp.oacc-c-c++-common/lib-{94,95}.c',
'libgomp.oacc-fortran/lib-16{,-2}.f90'
Fix-up for r265842 (commit 58168bbf6f8fb456280cca13343a498ad94878c7)
"[OpenACC 2.5, libgomp] Add *_async versions of runtime library API functions".
libgomp/
* testsuite/libgomp.oacc-c-c++-common/lib-94.c: Fix OpenACC
'async'/'wait' issue.
* testsuite/libgomp.oacc-c-c++-common/lib-95.c: Likewise.
* testsuite/libgomp.oacc-fortran/lib-16-2.f90: Likewise.
* testsuite/libgomp.oacc-fortran/lib-16.f90: Likewise.
Co-Authored-By: Julian Brown <julian@codesourcery.com>
---
libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c | 4 ++--
libgomp/testsuite/libgomp.oacc-c-c++-common/lib-95.c | 3 ++-
libgomp/testsuite/libgomp.oacc-fortran/lib-16-2.f90 | 4 ++++
libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90 | 4 ++++
4 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c
index 54497237b0c..baa3ac83f04 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c
@@ -22,10 +22,10 @@ main (int argc, char **argv)
acc_copyin_async (h, N, async);
- memset (h, 0, N);
-
acc_wait (async);
+ memset (h, 0, N);
+
acc_copyout_async (h, N, async + 1);
acc_wait (async + 1);
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-95.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-95.c
index 85b238d78c8..842fb849e79 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-95.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-95.c
@@ -23,10 +23,11 @@ main (int argc, char **argv)
acc_create_async (h, N, q);
acc_memcpy_to_device_async (acc_deviceptr (h), g, N, q);
- memset (&h[0], 0, N);
acc_wait (q);
+ memset (h, 0, N);
+
acc_update_self_async (h, N, q + 1);
acc_delete_async (h, N, q + 1);
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-16-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/lib-16-2.f90
index ddd557d3be0..2be75dca98c 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/lib-16-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-16-2.f90
@@ -27,6 +27,8 @@ program main
if (acc_is_present (h) .neqv. .TRUE.) stop 1
+ call acc_wait (async)
+
h(:) = 0
call acc_copyout_async (h, sizeof (h), async)
@@ -45,6 +47,8 @@ program main
if (acc_is_present (h) .neqv. .TRUE.) stop 3
+ call acc_wait (async)
+
do i = 1, N
if (h(i) /= i + i) stop 4
end do
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90 b/libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90
index ccd1ce6ee18..fae0d1031ed 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90
@@ -27,6 +27,8 @@ program main
if (acc_is_present (h) .neqv. .TRUE.) stop 1
+ call acc_wait (async)
+
h(:) = 0
call acc_copyout_async (h, sizeof (h), async)
@@ -45,6 +47,8 @@ program main
if (acc_is_present (h) .neqv. .TRUE.) stop 3
+ call acc_wait (async)
+
do i = 1, N
if (h(i) /= i + i) stop 4
end do
--
2.30.2
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-07-27 9:22 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-10 15:04 [PATCH, OpenACC 2.5, libgomp] Add *_async versions of runtime library API functions Chung-Lin Tang
2018-09-10 17:22 ` Cesar Philippidis
2018-09-11 10:35 ` Chung-Lin Tang
2018-11-30 20:42 ` Thomas Schwinge
2018-11-01 22:01 ` Thomas Schwinge
2021-06-08 17:32 ` Thomas Schwinge
2021-07-27 9:22 ` Thomas Schwinge
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).