From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 08E8A3858C78 for ; Fri, 24 Mar 2023 15:43:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 08E8A3858C78 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.98,288,1673942400"; d="scan'208,223";a="316526" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 24 Mar 2023 07:43:14 -0800 IronPort-SDR: 2LRleVDPiQS9mD8OIS27yH1rtNFPb2bdiHzc9W1if+WWtFVJp9pRR2cmJ5PnVoUUxBGepY2uuz KAuy0huJP5NNMJdeKMWGi7VAdEXKj/qdv0sg0bnaRqQ5mBQsjGYmRMoAJY8O5sc3/NvAD0TZOH Q8NWqRIVaJ7yMQaqczv8kgRBWV2j+NdT6Ts4+NLmDLv4OKzWE2EzbV4HcljoxOK2FH6bnKmkk9 gu5Y6rNeAmw/Y55QRIqlDTuzp/iOjMW+DSseZAEwfbZ95K9DEfx/TD6JBFSbsXs3M2RDSxFa+c uGE= From: Thomas Schwinge To: Tobias Burnus , CC: Jakub Jelinek , Tom de Vries , Alexander Monakov Subject: [og12] libgomp: Simplify OpenMP reverse offload host <-> device memory copy implementation (was: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling) In-Reply-To: <87r0ti9k3o.fsf@euler.schwinge.homeip.net> References: <57b3ae5e-8f15-8bea-fa09-39bccbaa2414@codesourcery.com> <87r0ti9k3o.fsf@euler.schwinge.homeip.net> User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/28.2 (x86_64-pc-linux-gnu) Date: Fri, 24 Mar 2023 16:43:00 +0100 Message-ID: <87edpe9muz.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-13.mgc.mentorg.com (139.181.222.13) To svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --=-=-= Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi! On 2023-03-21T16:53:31+0100, I wrote: > On 2022-08-26T11:07:28+0200, Tobias Burnus wrot= e: >> This patch adds initial [OpenMP reverse offload] support for nvptx. > >> CUDA does lockup when trying to copy data from the currently running >> stream; hence, a new stream is generated to do the memory copying. > > As part of other work, where I had to touch those special code paths, I > found that we may reduce complexity a little bit "by using the existing > 'goacc_asyncqueue' instead of re-coding parts of it". OK to push > "libgomp: Simplify OpenMP reverse offload host <-> device memory copy imp= lementation" > (still testing), see attached? My other work now actually does depend on this; I've pushed to devel/omp/gcc-12 branch commit c276fa0616eb79ddc4d0245e775a841e84cbb7dd "libgomp: Simplify OpenMP reverse offload host <-> device memory copy imple= mentation", see attached. May this also go into master branch still at this time, or "next year"? Gr=C3=BC=C3=9Fe Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201= , 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3= =A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf= t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955 --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename="0001-libgomp-Simplify-OpenMP-reverse-offload-host-device-.patch" >From c276fa0616eb79ddc4d0245e775a841e84cbb7dd Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Tue, 21 Mar 2023 16:14:16 +0100 Subject: [PATCH] libgomp: Simplify OpenMP reverse offload host <-> device memory copy implementation ... by using the existing 'goacc_asyncqueue' instead of re-coding parts of it. Follow-up to commit 131d18e928a3ea1ab2d3bf61aa92d68a8a254609 "libgomp/nvptx: Prepare for reverse-offload callback handling", and commit ea4b23d9c82d9be3b982c3519fe5e8e9d833a6a8 "libgomp: Handle OpenMP's reverse offloads". libgomp/ * target.c (gomp_target_rev): Instead of 'dev_to_host_cpy', 'host_to_dev_cpy', 'token', take a single 'goacc_asyncqueue'. * libgomp.h (gomp_target_rev): Adjust. * libgomp-plugin.c (GOMP_PLUGIN_target_rev): Adjust. * libgomp-plugin.h (GOMP_PLUGIN_target_rev): Adjust. * plugin/plugin-gcn.c (process_reverse_offload): Adjust. * plugin/plugin-nvptx.c (rev_off_dev_to_host_cpy) (rev_off_host_to_dev_cpy): Remove. (GOMP_OFFLOAD_run): Adjust. --- libgomp/ChangeLog.omp | 10 ++++ libgomp/libgomp-plugin.c | 7 +-- libgomp/libgomp-plugin.h | 6 +- libgomp/libgomp.h | 5 +- libgomp/plugin/plugin-gcn.c | 2 +- libgomp/plugin/plugin-nvptx.c | 77 ++++++++++++++----------- libgomp/target.c | 102 +++++++++++++++------------------- 7 files changed, 106 insertions(+), 103 deletions(-) diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp index 9360db66b03..fb352b39a6d 100644 --- a/libgomp/ChangeLog.omp +++ b/libgomp/ChangeLog.omp @@ -1,5 +1,15 @@ 2023-03-24 Thomas Schwinge + * target.c (gomp_target_rev): Instead of 'dev_to_host_cpy', + 'host_to_dev_cpy', 'token', take a single 'goacc_asyncqueue'. + * libgomp.h (gomp_target_rev): Adjust. + * libgomp-plugin.c (GOMP_PLUGIN_target_rev): Adjust. + * libgomp-plugin.h (GOMP_PLUGIN_target_rev): Adjust. + * plugin/plugin-gcn.c (process_reverse_offload): Adjust. + * plugin/plugin-nvptx.c (rev_off_dev_to_host_cpy) + (rev_off_host_to_dev_cpy): Remove. + (GOMP_OFFLOAD_run): Adjust. + * target.c (gomp_unmap_vars_internal): Queue splay-tree keys for removal after main loop. diff --git a/libgomp/libgomp-plugin.c b/libgomp/libgomp-plugin.c index 316de749f69..c76fa63da83 100644 --- a/libgomp/libgomp-plugin.c +++ b/libgomp/libgomp-plugin.c @@ -82,11 +82,8 @@ GOMP_PLUGIN_fatal (const char *msg, ...) void GOMP_PLUGIN_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr, uint64_t sizes_ptr, uint64_t kinds_ptr, int dev_num, - void (*dev_to_host_cpy) (void *, const void *, size_t, - void *), - void (*host_to_dev_cpy) (void *, const void *, size_t, - void *), void *token) + struct goacc_asyncqueue *aq) { gomp_target_rev (fn_ptr, mapnum, devaddrs_ptr, sizes_ptr, kinds_ptr, dev_num, - dev_to_host_cpy, host_to_dev_cpy, token); + aq); } diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h index 66d995f33e8..ca557a79380 100644 --- a/libgomp/libgomp-plugin.h +++ b/libgomp/libgomp-plugin.h @@ -122,11 +122,7 @@ extern void GOMP_PLUGIN_fatal (const char *, ...) __attribute__ ((noreturn, format (printf, 1, 2))); extern void GOMP_PLUGIN_target_rev (uint64_t, uint64_t, uint64_t, uint64_t, - uint64_t, int, - void (*) (void *, const void *, size_t, - void *), - void (*) (void *, const void *, size_t, - void *), void *); + uint64_t, int, struct goacc_asyncqueue *); /* Prototypes for functions implemented by libgomp plugins. */ extern const char *GOMP_OFFLOAD_get_name (void); diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index 92f6f14960f..3b2b4aa9534 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -1127,10 +1127,7 @@ extern void gomp_init_targets_once (void); extern int gomp_get_num_devices (void); extern bool gomp_target_task_fn (void *); extern void gomp_target_rev (uint64_t, uint64_t, uint64_t, uint64_t, uint64_t, - int, - void (*) (void *, const void *, size_t, void *), - void (*) (void *, const void *, size_t, void *), - void *); + int, struct goacc_asyncqueue *); extern void * gomp_usm_alloc (size_t size, int device_num); extern void gomp_usm_free (void *device_ptr, int device_num); extern bool gomp_page_locked_host_alloc (void **, size_t); diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 64694cdc118..82f5940f970 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -2008,7 +2008,7 @@ process_reverse_offload (uint64_t fn, uint64_t mapnum, uint64_t hostaddrs, { int dev_num = dev_num64; GOMP_PLUGIN_target_rev (fn, mapnum, hostaddrs, sizes, kinds, dev_num, - NULL, NULL, NULL); + NULL); } /* Output any data written to console output from the kernel. It is expected diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index 6ade34beb67..23f89b6fb34 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -56,6 +56,7 @@ #include #include #include +#include /* An arbitrary fixed limit (128MB) for the size of the OpenMP soft stacks block to cache between kernel invocations. For soft-stacks blocks bigger @@ -1837,11 +1838,11 @@ GOMP_OFFLOAD_openacc_cuda_set_stream (struct goacc_asyncqueue *aq, void *stream) return 1; } -struct goacc_asyncqueue * -GOMP_OFFLOAD_openacc_async_construct (int device __attribute__((unused))) +static struct goacc_asyncqueue * +nvptx_goacc_asyncqueue_construct (unsigned int flags) { CUstream stream = NULL; - CUDA_CALL_ERET (NULL, cuStreamCreate, &stream, CU_STREAM_DEFAULT); + CUDA_CALL_ERET (NULL, cuStreamCreate, &stream, flags); struct goacc_asyncqueue *aq = GOMP_PLUGIN_malloc (sizeof (struct goacc_asyncqueue)); @@ -1849,14 +1850,26 @@ GOMP_OFFLOAD_openacc_async_construct (int device __attribute__((unused))) return aq; } -bool -GOMP_OFFLOAD_openacc_async_destruct (struct goacc_asyncqueue *aq) +struct goacc_asyncqueue * +GOMP_OFFLOAD_openacc_async_construct (int device __attribute__((unused))) +{ + return nvptx_goacc_asyncqueue_construct (CU_STREAM_DEFAULT); +} + +static bool +nvptx_goacc_asyncqueue_destruct (struct goacc_asyncqueue *aq) { CUDA_CALL_ERET (false, cuStreamDestroy, aq->cuda_stream); free (aq); return true; } +bool +GOMP_OFFLOAD_openacc_async_destruct (struct goacc_asyncqueue *aq) +{ + return nvptx_goacc_asyncqueue_destruct (aq); +} + int GOMP_OFFLOAD_openacc_async_test (struct goacc_asyncqueue *aq) { @@ -1870,13 +1883,19 @@ GOMP_OFFLOAD_openacc_async_test (struct goacc_asyncqueue *aq) return -1; } -bool -GOMP_OFFLOAD_openacc_async_synchronize (struct goacc_asyncqueue *aq) +static bool +nvptx_goacc_asyncqueue_synchronize (struct goacc_asyncqueue *aq) { CUDA_CALL_ERET (false, cuStreamSynchronize, aq->cuda_stream); return true; } +bool +GOMP_OFFLOAD_openacc_async_synchronize (struct goacc_asyncqueue *aq) +{ + return nvptx_goacc_asyncqueue_synchronize (aq); +} + bool GOMP_OFFLOAD_openacc_async_serialize (struct goacc_asyncqueue *aq1, struct goacc_asyncqueue *aq2) @@ -2136,22 +2155,6 @@ nvptx_stacks_acquire (struct ptx_device *ptx_dev, size_t size, int num) } -void -rev_off_dev_to_host_cpy (void *dest, const void *src, size_t size, - CUstream stream) -{ - CUDA_CALL_ASSERT (cuMemcpyDtoHAsync, dest, (CUdeviceptr) src, size, stream); - CUDA_CALL_ASSERT (cuStreamSynchronize, stream); -} - -void -rev_off_host_to_dev_cpy (void *dest, const void *src, size_t size, - CUstream stream) -{ - CUDA_CALL_ASSERT (cuMemcpyHtoDAsync, (CUdeviceptr) dest, src, size, stream); - CUDA_CALL_ASSERT (cuStreamSynchronize, stream); -} - void GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args) { @@ -2185,9 +2188,17 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args) } nvptx_adjust_launch_bounds (tgt_fn, ptx_dev, &teams, &threads); - size_t stack_size = nvptx_stacks_size (); bool reverse_offload = ptx_dev->rev_data != NULL; - CUstream copy_stream = NULL; + struct goacc_asyncqueue *reverse_offload_aq = NULL; + if (reverse_offload) + { + reverse_offload_aq + = nvptx_goacc_asyncqueue_construct (CU_STREAM_NON_BLOCKING); + if (!reverse_offload_aq) + exit (EXIT_FAILURE); + } + + size_t stack_size = nvptx_stacks_size (); pthread_mutex_lock (&ptx_dev->omp_stacks.lock); void *stacks = nvptx_stacks_acquire (ptx_dev, stack_size, teams * threads); @@ -2201,8 +2212,6 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args) GOMP_PLUGIN_debug (0, " %s: kernel %s: launch" " [(teams: %u), 1, 1] [(lanes: 32), (threads: %u), 1]\n", __FUNCTION__, fn_name, teams, threads); - if (reverse_offload) - CUDA_CALL_ASSERT (cuStreamCreate, ©_stream, CU_STREAM_NON_BLOCKING); r = CUDA_CALL_NOCHECK (cuLaunchKernel, function, teams, 1, 1, 32, threads, 1, lowlat_pool_size, NULL, NULL, config); if (r != CUDA_SUCCESS) @@ -2225,17 +2234,15 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args) GOMP_PLUGIN_target_rev (rev_data->fn, rev_data->mapnum, rev_data->addrs, rev_data->sizes, rev_data->kinds, rev_data->dev_num, - rev_off_dev_to_host_cpy, - rev_off_host_to_dev_cpy, copy_stream); - CUDA_CALL_ASSERT (cuStreamSynchronize, copy_stream); + reverse_offload_aq); + if (!nvptx_goacc_asyncqueue_synchronize (reverse_offload_aq)) + exit (EXIT_FAILURE); __atomic_store_n (&rev_data->fn, 0, __ATOMIC_RELEASE); } usleep (1); } else r = CUDA_CALL_NOCHECK (cuCtxSynchronize, ); - if (reverse_offload) - CUDA_CALL_ASSERT (cuStreamDestroy, copy_stream); if (r == CUDA_ERROR_LAUNCH_FAILED) GOMP_PLUGIN_fatal ("cuCtxSynchronize error: %s %s\n", cuda_error (r), maybe_abort_msg); @@ -2243,6 +2250,12 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args) GOMP_PLUGIN_fatal ("cuCtxSynchronize error: %s", cuda_error (r)); pthread_mutex_unlock (&ptx_dev->omp_stacks.lock); + + if (reverse_offload) + { + if (!nvptx_goacc_asyncqueue_destruct (reverse_offload_aq)) + exit (EXIT_FAILURE); + } } /* TODO: Implement GOMP_OFFLOAD_async_run. */ diff --git a/libgomp/target.c b/libgomp/target.c index 107c3567a30..2f53f056e53 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -3527,9 +3527,7 @@ gomp_map_cdata_lookup (struct cpy_data *d, uint64_t *devaddrs, void gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr, uint64_t sizes_ptr, uint64_t kinds_ptr, int dev_num, - void (*dev_to_host_cpy) (void *, const void *, size_t, void*), - void (*host_to_dev_cpy) (void *, const void *, size_t, void*), - void *token) + struct goacc_asyncqueue *aq) { /* Return early if there is no offload code. */ if (sizeof (OFFLOAD_PLUGINS) == sizeof ("")) @@ -3571,26 +3569,17 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr, devaddrs = (uint64_t *) gomp_malloc (mapnum * sizeof (uint64_t)); sizes = (uint64_t *) gomp_malloc (mapnum * sizeof (uint64_t)); kinds = (unsigned short *) gomp_malloc (mapnum * sizeof (unsigned short)); - if (dev_to_host_cpy) - { - dev_to_host_cpy (devaddrs, (const void *) (uintptr_t) devaddrs_ptr, - mapnum * sizeof (uint64_t), token); - dev_to_host_cpy (sizes, (const void *) (uintptr_t) sizes_ptr, - mapnum * sizeof (uint64_t), token); - dev_to_host_cpy (kinds, (const void *) (uintptr_t) kinds_ptr, - mapnum * sizeof (unsigned short), token); - } - else - { - gomp_copy_dev2host (devicep, NULL, devaddrs, - (const void *) (uintptr_t) devaddrs_ptr, - mapnum * sizeof (uint64_t)); - gomp_copy_dev2host (devicep, NULL, sizes, - (const void *) (uintptr_t) sizes_ptr, - mapnum * sizeof (uint64_t)); - gomp_copy_dev2host (devicep, NULL, kinds, (const void *) (uintptr_t) kinds_ptr, - mapnum * sizeof (unsigned short)); - } + gomp_copy_dev2host (devicep, aq, devaddrs, + (const void *) (uintptr_t) devaddrs_ptr, + mapnum * sizeof (uint64_t)); + gomp_copy_dev2host (devicep, aq, sizes, + (const void *) (uintptr_t) sizes_ptr, + mapnum * sizeof (uint64_t)); + gomp_copy_dev2host (devicep, aq, kinds, + (const void *) (uintptr_t) kinds_ptr, + mapnum * sizeof (unsigned short)); + if (aq && !devicep->openacc.async.synchronize_func (aq)) + exit (EXIT_FAILURE); } size_t tgt_align = 0, tgt_size = 0; @@ -3617,13 +3606,14 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr, if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) memcpy (tgt + tgt_size, (void *) (uintptr_t) devaddrs[i], (size_t) sizes[i]); - else if (dev_to_host_cpy) - dev_to_host_cpy (tgt + tgt_size, (void *) (uintptr_t) devaddrs[i], - (size_t) sizes[i], token); else - gomp_copy_dev2host (devicep, NULL, tgt + tgt_size, - (void *) (uintptr_t) devaddrs[i], - (size_t) sizes[i]); + { + gomp_copy_dev2host (devicep, aq, tgt + tgt_size, + (void *) (uintptr_t) devaddrs[i], + (size_t) sizes[i]); + if (aq && !devicep->openacc.async.synchronize_func (aq)) + exit (EXIT_FAILURE); + } devaddrs[i] = (uint64_t) (uintptr_t) tgt + tgt_size; tgt_size = tgt_size + sizes[i]; if ((devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) @@ -3735,15 +3725,15 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr, || kind == GOMP_MAP_FORCE_TOFROM || GOMP_MAP_ALWAYS_TO_P (kind)) { - if (dev_to_host_cpy) - dev_to_host_cpy ((void *) (uintptr_t) devaddrs[i], - (void *) (uintptr_t) cdata[i].devaddr, - sizes[i], token); - else - gomp_copy_dev2host (devicep, NULL, - (void *) (uintptr_t) devaddrs[i], - (void *) (uintptr_t) cdata[i].devaddr, - sizes[i]); + gomp_copy_dev2host (devicep, aq, + (void *) (uintptr_t) devaddrs[i], + (void *) (uintptr_t) cdata[i].devaddr, + sizes[i]); + if (aq && !devicep->openacc.async.synchronize_func (aq)) + { + gomp_mutex_unlock (&devicep->lock); + exit (EXIT_FAILURE); + } } if (struct_cpy) struct_cpy--; @@ -3810,15 +3800,15 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr, devaddrs[i] = (uint64_t) (uintptr_t) gomp_aligned_alloc (align, sizes[i]); - if (dev_to_host_cpy) - dev_to_host_cpy ((void *) (uintptr_t) devaddrs[i], - (void *) (uintptr_t) cdata[i].devaddr, - sizes[i], token); - else - gomp_copy_dev2host (devicep, NULL, - (void *) (uintptr_t) devaddrs[i], - (void *) (uintptr_t) cdata[i].devaddr, - sizes[i]); + gomp_copy_dev2host (devicep, aq, + (void *) (uintptr_t) devaddrs[i], + (void *) (uintptr_t) cdata[i].devaddr, + sizes[i]); + if (aq && !devicep->openacc.async.synchronize_func (aq)) + { + gomp_mutex_unlock (&devicep->lock); + exit (EXIT_FAILURE); + } } for (j = i + 1; j < mapnum; j++) { @@ -3926,15 +3916,15 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr, /* FALLTHRU */ case GOMP_MAP_FROM: case GOMP_MAP_TOFROM: - if (copy && host_to_dev_cpy) - host_to_dev_cpy ((void *) (uintptr_t) cdata[i].devaddr, - (void *) (uintptr_t) devaddrs[i], - sizes[i], token); - else if (copy) - gomp_copy_host2dev (devicep, NULL, - (void *) (uintptr_t) cdata[i].devaddr, - (void *) (uintptr_t) devaddrs[i], - sizes[i], false, NULL); + if (copy) + { + gomp_copy_host2dev (devicep, aq, + (void *) (uintptr_t) cdata[i].devaddr, + (void *) (uintptr_t) devaddrs[i], + sizes[i], false, NULL); + if (aq && !devicep->openacc.async.synchronize_func (aq)) + exit (EXIT_FAILURE); + } default: break; } -- 2.25.1 --=-=-=--