From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 12B343839C52 for ; Tue, 27 Jul 2021 10:01:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 12B343839C52 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: qwCK8Ty0MC8cFswZ+rZ2Jix20DIS2Hf+JE+XDlPQU+JEiSK5T0hJwrrcIpSGIvd3TrWjnlo5hq IpQhD2VrNQMcFv8K0w+B37CSsUtyktj1a7TPZ/krm2rJgc4iu+eq9OHUNkgEP4g1l8fvr12+rA y3HTmFB2IsodW4xbh2HkxUNBZxaLoE+WDt7h1CVhHpY6VWRMviXEwjbJ6wGF/dBOHdMS9LHz8P CNzyW7JboIrgW45z43Zkc5OU+25u+CNCUl2to3TIr9AZUuFmHQGTLG+vLCaQpE3u8ayXL/510g 3JnIf++zW0u5wUXOkc3Co0wG X-IronPort-AV: E=Sophos;i="5.84,273,1620720000"; d="scan'208,223";a="63950488" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 27 Jul 2021 02:01:29 -0800 IronPort-SDR: hS0rYPG/4Us7hRHwRb6NfR9eWWqMreQNr/qVcNmuDWhqiC+/OD2lEb0OX1D1NGpk3rMMBMX4Jb 4S3nDWrVcqN2/S23wVPHQu6Qjm9Xz621pb4YZl37K3hGB0/rDnUNe+ccTp0FVPt6tDLtOj5Gzh COZ0g3C5Lh7r1ZpHouU9b9V2S1vyxTWFavZeamEzXYwpwnSYVrf9SGet7sxCBUuQ336AHL8MeX So5naf3rL1cnBOAaJwHjO65Xb73OhLTNYe8lh3W5TEu9nWdC6L3cuIls6LWMLlECEg520YtatI wKI= From: Thomas Schwinge To: Julian Brown , CC: Jakub Jelinek , Chung-Lin Tang , Tom de Vries Subject: Re: [PATCH 3/4] openacc: Fix asynchronous host-to-device copies in libgomp runtime In-Reply-To: <560623585b89e577c9e76d8b53939569f49064fa.1624987598.git.julian@codesourcery.com> References: <560623585b89e577c9e76d8b53939569f49064fa.1624987598.git.julian@codesourcery.com> User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/27.1 (x86_64-pc-linux-gnu) Date: Tue, 27 Jul 2021 12:01:18 +0200 Message-ID: <87r1fkt6s1.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-02.mgc.mentorg.com (139.181.222.2) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jul 2021 10:01:33 -0000 --=-=-= Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi! On 2021-06-29T16:42:03-0700, Julian Brown wrote: > This patch fixes several places in libgomp/target.c where "ephemeral" dat= a > (on the stack or in temporary heap locations) may be used as the source o= f > an asynchronous host-to-device copy that may not complete before the host > data disappears. Versions of the patch have been posted several times > before, but this one (at Chung-Lin Tang's prior suggesion, IIRC) moves > all logic into target.c rather than pushing it out to each target plugin. Thanks for the re-work! > An existing, but flawed, workaround for this problem in the AMD GCN > libgomp offloading plugin is currently present on mainline, and was > posted for the og9 branch here: > > https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg00901.html > > and previous versions of this patch were posted here (for mainline/og9): > > https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg01482.html > https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01026.html ... but this version here I like best! ;-) > This patch exposes a problem with OpenACC profiling support that is > fixed by the next patch in the series. The acc_prof-parallel-1.c test > is XFAILed for now. (Ought to XFAIL 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c' for GCN only. Also, 'libgomp.oacc-c-c++-common/acc_prof-init-1.c' did similarly FAIL for GCN, intermittently.) But, actually no XFAILing is necessary, given my recent commit 29ddaf43f70e19fd1110b539e8b3d0436c757e34 "[OpenACC] Clarify sequencing of 'async' data copying vs. profiling events in 'libgomp.oacc-c-c++-common/acc_prof-{init,parallel}-1.c'". A few more comments: > --- a/libgomp/plugin/plugin-gcn.c > +++ b/libgomp/plugin/plugin-gcn.c > @@ -2934,7 +2931,7 @@ gomp_offload_free (void *ptr) > > static void > queue_push_copy (struct goacc_asyncqueue *aq, void *dst, const void *src= , > - size_t len, bool free_src) > + size_t len) Also have to update function comment. > @@ -3916,15 +3912,7 @@ GOMP_OFFLOAD_openacc_async_host2dev (int device, v= oid *dst, const void *src, > { > struct agent_info *agent =3D get_agent_info (device); > assert (agent =3D=3D aq->agent); > - /* The source data does not necessarily remain live until the deferred > - copy happens. Taking a snapshot of the data here avoids reading > - uninitialised data later, but means that (a) data is copied twice a= nd > - (b) modifications to the copied data between the "spawning" point o= f > - the asynchronous kernel and when it is executed will not be seen. > - But, that is probably correct. */ > - void *src_copy =3D GOMP_PLUGIN_malloc (n); > - memcpy (src_copy, src, n); > - queue_push_copy (aq, dst, src_copy, n, true); > + queue_push_copy (aq, dst, src, n); > return true; > } :-) > --- a/libgomp/target.c > +++ b/libgomp/target.c > -static inline void > -goacc_device_copy_async (struct gomp_device_descr *devicep, > - bool (*copy_func) (int, void *, const void *, size= _t, > - struct goacc_asyncqueue *), > - const char *dst, void *dstaddr, > - const char *src, const void *srcaddr, > - size_t size, struct goacc_asyncqueue *aq) > -{ > - if (!copy_func (devicep->target_id, dstaddr, srcaddr, size, aq)) > - { > - gomp_mutex_unlock (&devicep->lock); > - gomp_fatal ("Copying of %s object [%p..%p) to %s object [%p..%p) f= ailed", > - src, srcaddr, srcaddr + size, dst, dstaddr, dstaddr + siz= e); > - } > -} For symmetry with 'gomp_device_copy', I prefer to keep (and thus restored) 'goacc_device_copy_async', adding a 'srcaddr_orig' parameter for error reporting purposes. Pushed 'Fix OpenACC "ephemeral" asynchronous host-to-device copies' to master branch in commit 9c41f5b9cddd93f1b56eb71bff87b255d37d16f4, see attached. Removes GCN XFAIL 'libgomp.oacc-c-c++-common/async-data-1-1.c'. > +/* Copy host memory to an offload device. In asynchronous mode (if AQ i= s > + non-NULL), when the source data is stack or may otherwise be dealloca= ted > + before the asynchronous copy takes place, EPHEMERAL must be passed as > + TRUE. The CBUF isn't used for non-ephemeral asynchronous copies, bec= ause > + the host data might not be computed yet (by an earlier asynchronous c= ompute > + region). */ > + > [gomp_copy_host2dev] Code changes related to the latter sentence have moved into a separate "Don't use libgomp 'cbuf' buffering with OpenACC 'async'", pushed to master branch in commit d88a6951586c7229b25708f4486eaaf4bf4b5bbe, see attached. Removes GCN XFAIL 'libgomp.oacc-c-c++-common/async-data-1-2.c'. Gr=C3=BC=C3=9Fe Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201= , 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3= =A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf= t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955 --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename="0001-Fix-OpenACC-ephemeral-asynchronous-host-to-device-co.patch" >From 9c41f5b9cddd93f1b56eb71bff87b255d37d16f4 Mon Sep 17 00:00:00 2001 From: Julian Brown Date: Tue, 29 Jun 2021 16:42:03 -0700 Subject: [PATCH 1/2] Fix OpenACC "ephemeral" asynchronous host-to-device copies This patch fixes several places in libgomp/target.c where "ephemeral" data (on the stack or in temporary heap locations) may be used as the source of an asynchronous host-to-device copy that may not complete before the host data disappears. An existing, but flawed, workaround for this problem in the AMD GCN libgomp offloading plugin is currently present on mainline, and was posted for the og9 branch here: https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg00901.html and previous versions of this patch were posted here (for mainline/og9): https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg01482.html https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01026.html libgomp/ * libgomp.h (gomp_copy_host2dev): Update prototype. * oacc-mem.c (memcpy_tofrom_device, update_dev_host): Add new argument to gomp_copy_host2dev (false). * plugin/plugin-gcn.c (struct copy_data): Remove free_src field. (copy_data): Don't free src. (queue_push_copy): Remove free_src handling. (GOMP_OFFLOAD_dev2dev): Update call to queue_push_copy. (GOMP_OFFLOAD_openacc_async_host2dev): Remove source-data snapshotting. (GOMP_OFFLOAD_openacc_async_dev2host): Update call to queue_push_copy. * target.c (goacc_device_copy_async): Add SRCADDR_ORIG parameter. (gomp_copy_host2dev): Add EPHEMERAL parameter. Snapshot source data when true, and set up deferred freeing of temporary buffer. (gomp_copy_dev2host): Update call to goacc_device_copy_async. (gomp_map_vars_existing, gomp_map_pointer, gomp_attach_pointer) (gomp_detach_pointer, gomp_map_vars_internal, gomp_update): Update calls to gomp_copy_host2dev with appropriate ephemeral argument. * testsuite/libgomp.oacc-c-c++-common/async-data-1-1.c: Remove XFAIL. Co-Authored-By: Thomas Schwinge --- libgomp/libgomp.h | 2 +- libgomp/oacc-mem.c | 4 +- libgomp/plugin/plugin-gcn.c | 23 ++---- libgomp/target.c | 77 ++++++++++++++----- .../async-data-1-1.c | 2 - 5 files changed, 64 insertions(+), 44 deletions(-) diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index 8d25dc8e2a8..e8901da1069 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -1226,7 +1226,7 @@ extern void gomp_acc_declare_allocate (bool, size_t, void **, size_t *, struct gomp_coalesce_buf; extern void gomp_copy_host2dev (struct gomp_device_descr *, struct goacc_asyncqueue *, void *, const void *, - size_t, struct gomp_coalesce_buf *); + size_t, bool, struct gomp_coalesce_buf *); extern void gomp_copy_dev2host (struct gomp_device_descr *, struct goacc_asyncqueue *, void *, const void *, size_t); diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c index c21508f3739..5988db0b886 100644 --- a/libgomp/oacc-mem.c +++ b/libgomp/oacc-mem.c @@ -202,7 +202,7 @@ memcpy_tofrom_device (bool from, void *d, void *h, size_t s, int async, if (from) gomp_copy_dev2host (thr->dev, aq, h, d, s); else - gomp_copy_host2dev (thr->dev, aq, d, h, s, /* TODO: cbuf? */ NULL); + gomp_copy_host2dev (thr->dev, aq, d, h, s, false, /* TODO: cbuf? */ NULL); if (profiling_p) { @@ -874,7 +874,7 @@ update_dev_host (int is_dev, void *h, size_t s, int async) goacc_aq aq = get_goacc_asyncqueue (async); if (is_dev) - gomp_copy_host2dev (acc_dev, aq, d, h, s, /* TODO: cbuf? */ NULL); + gomp_copy_host2dev (acc_dev, aq, d, h, s, false, /* TODO: cbuf? */ NULL); else gomp_copy_dev2host (acc_dev, aq, h, d, s); diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index cfed42a2d4d..2548614a2e5 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -292,7 +292,6 @@ struct copy_data void *dst; const void *src; size_t len; - bool free_src; struct goacc_asyncqueue *aq; }; @@ -2914,8 +2913,6 @@ copy_data (void *data_) data->aq->agent->device_id, data->aq->id, data->len, data->src, data->dst); hsa_memory_copy_wrapper (data->dst, data->src, data->len); - if (data->free_src) - free ((void *) data->src); free (data); } @@ -2929,12 +2926,11 @@ gomp_offload_free (void *ptr) } /* Request an asynchronous data copy, to or from a device, on a given queue. - The event will be registered as a callback. If FREE_SRC is true - then the source data will be freed following the copy. */ + The event will be registered as a callback. */ static void queue_push_copy (struct goacc_asyncqueue *aq, void *dst, const void *src, - size_t len, bool free_src) + size_t len) { if (DEBUG_QUEUES) GCN_DEBUG ("queue_push_copy %d:%d: %zu bytes from (%p) to (%p)\n", @@ -2944,7 +2940,6 @@ queue_push_copy (struct goacc_asyncqueue *aq, void *dst, const void *src, data->dst = dst; data->src = src; data->len = len; - data->free_src = free_src; data->aq = aq; queue_push_callback (aq, copy_data, data); } @@ -3646,7 +3641,7 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst, const void *src, size_t n) { struct agent_info *agent = get_agent_info (device); maybe_init_omp_async (agent); - queue_push_copy (agent->omp_async_queue, dst, src, n, false); + queue_push_copy (agent->omp_async_queue, dst, src, n); return true; } @@ -3916,15 +3911,7 @@ GOMP_OFFLOAD_openacc_async_host2dev (int device, void *dst, const void *src, { struct agent_info *agent = get_agent_info (device); assert (agent == aq->agent); - /* The source data does not necessarily remain live until the deferred - copy happens. Taking a snapshot of the data here avoids reading - uninitialised data later, but means that (a) data is copied twice and - (b) modifications to the copied data between the "spawning" point of - the asynchronous kernel and when it is executed will not be seen. - But, that is probably correct. */ - void *src_copy = GOMP_PLUGIN_malloc (n); - memcpy (src_copy, src, n); - queue_push_copy (aq, dst, src_copy, n, true); + queue_push_copy (aq, dst, src, n); return true; } @@ -3936,7 +3923,7 @@ GOMP_OFFLOAD_openacc_async_dev2host (int device, void *dst, const void *src, { struct agent_info *agent = get_agent_info (device); assert (agent == aq->agent); - queue_push_copy (aq, dst, src, n, false); + queue_push_copy (aq, dst, src, n); return true; } diff --git a/libgomp/target.c b/libgomp/target.c index bb09d501dd6..5576e57f822 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -214,13 +214,24 @@ goacc_device_copy_async (struct gomp_device_descr *devicep, struct goacc_asyncqueue *), const char *dst, void *dstaddr, const char *src, const void *srcaddr, + const void *srcaddr_orig, size_t size, struct goacc_asyncqueue *aq) { if (!copy_func (devicep->target_id, dstaddr, srcaddr, size, aq)) { gomp_mutex_unlock (&devicep->lock); - gomp_fatal ("Copying of %s object [%p..%p) to %s object [%p..%p) failed", - src, srcaddr, srcaddr + size, dst, dstaddr, dstaddr + size); + if (srcaddr_orig && srcaddr_orig != srcaddr) + gomp_fatal ("Copying of %s object [%p..%p)" + " via buffer %s object [%p..%p)" + " to %s object [%p..%p) failed", + src, srcaddr_orig, srcaddr_orig + size, + src, srcaddr, srcaddr + size, + dst, dstaddr, dstaddr + size); + else + gomp_fatal ("Copying of %s object [%p..%p)" + " to %s object [%p..%p) failed", + src, srcaddr, srcaddr + size, + dst, dstaddr, dstaddr + size); } } @@ -317,11 +328,16 @@ gomp_to_device_kind_p (int kind) } } +/* Copy host memory to an offload device. In asynchronous mode (if AQ is + non-NULL), when the source data is stack or may otherwise be deallocated + before the asynchronous copy takes place, EPHEMERAL must be passed as + TRUE. */ + attribute_hidden void gomp_copy_host2dev (struct gomp_device_descr *devicep, struct goacc_asyncqueue *aq, void *d, const void *h, size_t sz, - struct gomp_coalesce_buf *cbuf) + bool ephemeral, struct gomp_coalesce_buf *cbuf) { if (cbuf) { @@ -349,8 +365,23 @@ gomp_copy_host2dev (struct gomp_device_descr *devicep, } } if (__builtin_expect (aq != NULL, 0)) - goacc_device_copy_async (devicep, devicep->openacc.async.host2dev_func, - "dev", d, "host", h, sz, aq); + { + void *h_buf = (void *) h; + if (ephemeral) + { + /* We're queueing up an asynchronous copy from data that may + disappear before the transfer takes place (i.e. because it is a + stack local in a function that is no longer executing). Make a + copy of the data into a temporary buffer in those cases. */ + h_buf = gomp_malloc (sz); + memcpy (h_buf, h, sz); + } + goacc_device_copy_async (devicep, devicep->openacc.async.host2dev_func, + "dev", d, "host", h_buf, h, sz, aq); + if (ephemeral) + /* Free temporary buffer once the transfer has completed. */ + devicep->openacc.async.queue_callback_func (aq, free, h_buf); + } else gomp_device_copy (devicep, devicep->host2dev_func, "dev", d, "host", h, sz); } @@ -362,7 +393,7 @@ gomp_copy_dev2host (struct gomp_device_descr *devicep, { if (__builtin_expect (aq != NULL, 0)) goacc_device_copy_async (devicep, devicep->openacc.async.dev2host_func, - "host", h, "dev", d, sz, aq); + "host", h, "dev", d, NULL, sz, aq); else gomp_device_copy (devicep, devicep->dev2host_func, "host", h, "dev", d, sz); } @@ -521,7 +552,7 @@ gomp_map_vars_existing (struct gomp_device_descr *devicep, (void *) (oldn->tgt->tgt_start + oldn->tgt_offset + newn->host_start - oldn->host_start), (void *) newn->host_start, - newn->host_end - newn->host_start, cbuf); + newn->host_end - newn->host_start, false, cbuf); gomp_increment_refcount (oldn, refcount_set); } @@ -548,8 +579,8 @@ gomp_map_pointer (struct target_mem_desc *tgt, struct goacc_asyncqueue *aq, cur_node.tgt_offset = (uintptr_t) NULL; gomp_copy_host2dev (devicep, aq, (void *) (tgt->tgt_start + target_offset), - (void *) &cur_node.tgt_offset, - sizeof (void *), cbuf); + (void *) &cur_node.tgt_offset, sizeof (void *), + true, cbuf); return; } /* Add bias to the pointer value. */ @@ -569,7 +600,8 @@ gomp_map_pointer (struct target_mem_desc *tgt, struct goacc_asyncqueue *aq, to initialize the pointer with. */ cur_node.tgt_offset -= bias; gomp_copy_host2dev (devicep, aq, (void *) (tgt->tgt_start + target_offset), - (void *) &cur_node.tgt_offset, sizeof (void *), cbuf); + (void *) &cur_node.tgt_offset, sizeof (void *), + true, cbuf); } static void @@ -702,7 +734,7 @@ gomp_attach_pointer (struct gomp_device_descr *devicep, (void *) (n->tgt->tgt_start + n->tgt_offset), (void *) data); gomp_copy_host2dev (devicep, aq, (void *) devptr, (void *) &data, - sizeof (void *), cbufp); + sizeof (void *), true, cbufp); } else gomp_debug (1, "%s: attach count for %p -> %u\n", __FUNCTION__, @@ -755,7 +787,7 @@ gomp_detach_pointer (struct gomp_device_descr *devicep, (void *) target); gomp_copy_host2dev (devicep, aq, (void *) devptr, (void *) &target, - sizeof (void *), cbufp); + sizeof (void *), true, cbufp); } else gomp_debug (1, "%s: attach count for %p -> %u\n", __FUNCTION__, @@ -1218,7 +1250,7 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, len = sizes[i]; gomp_copy_host2dev (devicep, aq, (void *) (tgt->tgt_start + tgt_size), - (void *) hostaddrs[i], len, cbufp); + (void *) hostaddrs[i], len, false, cbufp); tgt_size += len; continue; case GOMP_MAP_FIRSTPRIVATE_INT: @@ -1312,7 +1344,7 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, + cur_node.host_start - n->host_start), (void *) &cur_node.tgt_offset, - sizeof (void *), cbufp); + sizeof (void *), true, cbufp); cur_node.tgt_offset = n->tgt->tgt_start + n->tgt_offset + cur_node.host_start - n->host_start; continue; @@ -1450,7 +1482,8 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, (void *) (tgt->tgt_start + k->tgt_offset), (void *) k->host_start, - k->host_end - k->host_start, cbufp); + k->host_end - k->host_start, + false, cbufp); break; case GOMP_MAP_POINTER: gomp_map_pointer (tgt, aq, @@ -1462,7 +1495,8 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, (void *) (tgt->tgt_start + k->tgt_offset), (void *) k->host_start, - k->host_end - k->host_start, cbufp); + k->host_end - k->host_start, + false, cbufp); tgt->list[i].has_null_ptr_assoc = false; for (j = i + 1; j < mapnum; j++) @@ -1525,7 +1559,7 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, (void *) (tgt->tgt_start + k->tgt_offset), (void *) k->host_start, - sizeof (void *), cbufp); + sizeof (void *), false, cbufp); break; default: gomp_mutex_unlock (&devicep->lock); @@ -1541,7 +1575,7 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, /* We intentionally do not use coalescing here, as it's not data allocated by the current call to this function. */ gomp_copy_host2dev (devicep, aq, (void *) n->tgt_offset, - &tgt_addr, sizeof (void *), NULL); + &tgt_addr, sizeof (void *), true, NULL); } array++; } @@ -1556,7 +1590,7 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, gomp_copy_host2dev (devicep, aq, (void *) (tgt->tgt_start + i * sizeof (void *)), (void *) &cur_node.tgt_offset, sizeof (void *), - cbufp); + true, cbufp); } } @@ -1568,7 +1602,8 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, (void *) (tgt->tgt_start + cbuf.chunks[c].start), (char *) cbuf.buf + (cbuf.chunks[c].start - cbuf.chunks[0].start), - cbuf.chunks[c].end - cbuf.chunks[c].start, NULL); + cbuf.chunks[c].end - cbuf.chunks[c].start, + true, NULL); free (cbuf.buf); cbuf.buf = NULL; cbufp = NULL; @@ -1892,7 +1927,7 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum, void **hostaddrs, if (GOMP_MAP_COPY_TO_P (kind & typemask)) gomp_copy_host2dev (devicep, NULL, devaddr, hostaddr, size, - NULL); + false, NULL); if (GOMP_MAP_COPY_FROM_P (kind & typemask)) gomp_copy_dev2host (devicep, NULL, hostaddr, devaddr, size); } diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/async-data-1-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/async-data-1-1.c index cd87aec56ff..9f2bed8aca8 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/async-data-1-1.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/async-data-1-1.c @@ -3,8 +3,6 @@ Due to one data mapping, this isn't using the libgomp 'cbuf' buffering. */ -/* { dg-xfail-run-if "TODO" { openacc_radeon_accel_selected } } */ - #include -- 2.30.2 --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename="0002-Don-t-use-libgomp-cbuf-buffering-with-OpenACC-async.patch" >From d88a6951586c7229b25708f4486eaaf4bf4b5bbe Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Fri, 23 Jul 2021 22:01:32 +0200 Subject: [PATCH 2/2] Don't use libgomp 'cbuf' buffering with OpenACC 'async' The host data might not be computed yet (by an earlier asynchronous compute region, for example. libgomp/ * target.c (gomp_coalesce_buf_add): Update comment. (gomp_copy_host2dev, gomp_map_vars_internal): Don't expect to see 'aq && cbuf'. (gomp_map_vars_internal): Only 'if (!aq)', do 'gomp_coalesce_buf_add'. * testsuite/libgomp.oacc-c-c++-common/async-data-1-2.c: Remove XFAIL. Co-Authored-By: Julian Brown --- libgomp/target.c | 71 ++++++++++++------- .../async-data-1-2.c | 5 +- 2 files changed, 47 insertions(+), 29 deletions(-) diff --git a/libgomp/target.c b/libgomp/target.c index 5576e57f822..453b3210e40 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -275,7 +275,14 @@ struct gomp_coalesce_buf host to device (e.g. map(alloc:), map(from:) etc.). */ #define MAX_COALESCE_BUF_GAP (4 * 1024) -/* Add region with device tgt_start relative offset and length to CBUF. */ +/* Add region with device tgt_start relative offset and length to CBUF. + + This must not be used for asynchronous copies, because the host data might + not be computed yet (by an earlier asynchronous compute region, for + example). + TODO ... but we could allow CBUF usage for EPHEMERAL data? (Open question: + is it more performant to use libgomp CBUF buffering or individual device + asyncronous copying?) */ static inline void gomp_coalesce_buf_add (struct gomp_coalesce_buf *cbuf, size_t start, size_t len) @@ -339,6 +346,30 @@ gomp_copy_host2dev (struct gomp_device_descr *devicep, void *d, const void *h, size_t sz, bool ephemeral, struct gomp_coalesce_buf *cbuf) { + if (__builtin_expect (aq != NULL, 0)) + { + /* See 'gomp_coalesce_buf_add'. */ + assert (!cbuf); + + void *h_buf = (void *) h; + if (ephemeral) + { + /* We're queueing up an asynchronous copy from data that may + disappear before the transfer takes place (i.e. because it is a + stack local in a function that is no longer executing). Make a + copy of the data into a temporary buffer in those cases. */ + h_buf = gomp_malloc (sz); + memcpy (h_buf, h, sz); + } + goacc_device_copy_async (devicep, devicep->openacc.async.host2dev_func, + "dev", d, "host", h_buf, h, sz, aq); + if (ephemeral) + /* Free temporary buffer once the transfer has completed. */ + devicep->openacc.async.queue_callback_func (aq, free, h_buf); + + return; + } + if (cbuf) { uintptr_t doff = (uintptr_t) d - cbuf->tgt->tgt_start; @@ -364,26 +395,8 @@ gomp_copy_host2dev (struct gomp_device_descr *devicep, } } } - if (__builtin_expect (aq != NULL, 0)) - { - void *h_buf = (void *) h; - if (ephemeral) - { - /* We're queueing up an asynchronous copy from data that may - disappear before the transfer takes place (i.e. because it is a - stack local in a function that is no longer executing). Make a - copy of the data into a temporary buffer in those cases. */ - h_buf = gomp_malloc (sz); - memcpy (h_buf, h, sz); - } - goacc_device_copy_async (devicep, devicep->openacc.async.host2dev_func, - "dev", d, "host", h_buf, h, sz, aq); - if (ephemeral) - /* Free temporary buffer once the transfer has completed. */ - devicep->openacc.async.queue_callback_func (aq, free, h_buf); - } - else - gomp_device_copy (devicep, devicep->host2dev_func, "dev", d, "host", h, sz); + + gomp_device_copy (devicep, devicep->host2dev_func, "dev", d, "host", h, sz); } attribute_hidden void @@ -959,8 +972,9 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, for (i = first; i <= last; i++) { tgt->list[i].key = NULL; - if (gomp_to_device_kind_p (get_kind (short_mapkind, kinds, i) - & typemask)) + if (!aq + && gomp_to_device_kind_p (get_kind (short_mapkind, kinds, i) + & typemask)) gomp_coalesce_buf_add (&cbuf, tgt_size - cur_node.host_end + (uintptr_t) hostaddrs[i], @@ -1001,8 +1015,9 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, if (tgt_align < align) tgt_align = align; tgt_size = (tgt_size + align - 1) & ~(align - 1); - gomp_coalesce_buf_add (&cbuf, tgt_size, - cur_node.host_end - cur_node.host_start); + if (!aq) + gomp_coalesce_buf_add (&cbuf, tgt_size, + cur_node.host_end - cur_node.host_start); tgt_size += cur_node.host_end - cur_node.host_start; has_firstprivate = true; continue; @@ -1095,7 +1110,8 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, if (tgt_align < align) tgt_align = align; tgt_size = (tgt_size + align - 1) & ~(align - 1); - if (gomp_to_device_kind_p (kind & typemask)) + if (!aq + && gomp_to_device_kind_p (kind & typemask)) gomp_coalesce_buf_add (&cbuf, tgt_size, cur_node.host_end - cur_node.host_start); tgt_size += cur_node.host_end - cur_node.host_start; @@ -1596,6 +1612,9 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, if (cbufp) { + /* See 'gomp_coalesce_buf_add'. */ + assert (!aq); + long c = 0; for (c = 0; c < cbuf.chunk_cnt; ++c) gomp_copy_host2dev (devicep, aq, diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/async-data-1-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/async-data-1-2.c index 385237698e2..3299499312f 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/async-data-1-2.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/async-data-1-2.c @@ -1,10 +1,9 @@ /* Verify back to back 'async' operations, two data mappings. - Due to two data mappings, this is using the libgomp 'cbuf' buffering. + Make sure that despite two data mappings, this isn't using the libgomp + 'cbuf' buffering. */ -/* { dg-xfail-run-if "TODO" { openacc_radeon_accel_selected } } */ - #include -- 2.30.2 --=-=-=--