Hi! On 2021-06-29T16:42:03-0700, Julian Brown wrote: > This patch fixes several places in libgomp/target.c where "ephemeral" data > (on the stack or in temporary heap locations) may be used as the source of > an asynchronous host-to-device copy that may not complete before the host > data disappears. Versions of the patch have been posted several times > before, but this one (at Chung-Lin Tang's prior suggesion, IIRC) moves > all logic into target.c rather than pushing it out to each target plugin. Thanks for the re-work! > An existing, but flawed, workaround for this problem in the AMD GCN > libgomp offloading plugin is currently present on mainline, and was > posted for the og9 branch here: > > https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg00901.html > > and previous versions of this patch were posted here (for mainline/og9): > > https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg01482.html > https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01026.html ... but this version here I like best! ;-) > This patch exposes a problem with OpenACC profiling support that is > fixed by the next patch in the series. The acc_prof-parallel-1.c test > is XFAILed for now. (Ought to XFAIL 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c' for GCN only. Also, 'libgomp.oacc-c-c++-common/acc_prof-init-1.c' did similarly FAIL for GCN, intermittently.) But, actually no XFAILing is necessary, given my recent commit 29ddaf43f70e19fd1110b539e8b3d0436c757e34 "[OpenACC] Clarify sequencing of 'async' data copying vs. profiling events in 'libgomp.oacc-c-c++-common/acc_prof-{init,parallel}-1.c'". A few more comments: > --- a/libgomp/plugin/plugin-gcn.c > +++ b/libgomp/plugin/plugin-gcn.c > @@ -2934,7 +2931,7 @@ gomp_offload_free (void *ptr) > > static void > queue_push_copy (struct goacc_asyncqueue *aq, void *dst, const void *src, > - size_t len, bool free_src) > + size_t len) Also have to update function comment. > @@ -3916,15 +3912,7 @@ GOMP_OFFLOAD_openacc_async_host2dev (int device, void *dst, const void *src, > { > struct agent_info *agent = get_agent_info (device); > assert (agent == aq->agent); > - /* The source data does not necessarily remain live until the deferred > - copy happens. Taking a snapshot of the data here avoids reading > - uninitialised data later, but means that (a) data is copied twice and > - (b) modifications to the copied data between the "spawning" point of > - the asynchronous kernel and when it is executed will not be seen. > - But, that is probably correct. */ > - void *src_copy = GOMP_PLUGIN_malloc (n); > - memcpy (src_copy, src, n); > - queue_push_copy (aq, dst, src_copy, n, true); > + queue_push_copy (aq, dst, src, n); > return true; > } :-) > --- a/libgomp/target.c > +++ b/libgomp/target.c > -static inline void > -goacc_device_copy_async (struct gomp_device_descr *devicep, > - bool (*copy_func) (int, void *, const void *, size_t, > - struct goacc_asyncqueue *), > - const char *dst, void *dstaddr, > - const char *src, const void *srcaddr, > - size_t size, struct goacc_asyncqueue *aq) > -{ > - if (!copy_func (devicep->target_id, dstaddr, srcaddr, size, aq)) > - { > - gomp_mutex_unlock (&devicep->lock); > - gomp_fatal ("Copying of %s object [%p..%p) to %s object [%p..%p) failed", > - src, srcaddr, srcaddr + size, dst, dstaddr, dstaddr + size); > - } > -} For symmetry with 'gomp_device_copy', I prefer to keep (and thus restored) 'goacc_device_copy_async', adding a 'srcaddr_orig' parameter for error reporting purposes. Pushed 'Fix OpenACC "ephemeral" asynchronous host-to-device copies' to master branch in commit 9c41f5b9cddd93f1b56eb71bff87b255d37d16f4, see attached. Removes GCN XFAIL 'libgomp.oacc-c-c++-common/async-data-1-1.c'. > +/* Copy host memory to an offload device. In asynchronous mode (if AQ is > + non-NULL), when the source data is stack or may otherwise be deallocated > + before the asynchronous copy takes place, EPHEMERAL must be passed as > + TRUE. The CBUF isn't used for non-ephemeral asynchronous copies, because > + the host data might not be computed yet (by an earlier asynchronous compute > + region). */ > + > [gomp_copy_host2dev] Code changes related to the latter sentence have moved into a separate "Don't use libgomp 'cbuf' buffering with OpenACC 'async'", pushed to master branch in commit d88a6951586c7229b25708f4486eaaf4bf4b5bbe, see attached. Removes GCN XFAIL 'libgomp.oacc-c-c++-common/async-data-1-2.c'. Grüße Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955