From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 48677 invoked by alias); 13 Aug 2019 21:37:41 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 48615 invoked by uid 89); 13 Aug 2019 21:37:40 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-17.1 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_PASS autolearn=ham version=3.3.1 spammy=transfers X-HELO: esa2.mentor.iphmx.com Received: from esa2.mentor.iphmx.com (HELO esa2.mentor.iphmx.com) (68.232.141.98) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 13 Aug 2019 21:37:38 +0000 IronPort-SDR: TXJurYaz4w/sVS5UHkn4QyuD8ilpuTyVox7TGoqveCkEoBUs8W6+oziH7ki9cc1Kc5hwwj4jKo PIRcJE7AFTOUgIMn6kmdlpIl5EkD6iMf3vacYuqQEKuV6Ddn6hTLhPQCKvEgfqMDjpc/J60YYJ hYRzrSbAhFKMoejmWFseomrinhhKYhGY6mA+X6jPJpE1XDD1yEGBWty2+xxw4AnWhFw6w9LLPu vQgPx4xvwT+lv+PeLdPL6ceoTuVD6az+R/c5i00MgUgMiYcYlNTdTpsn/6SZoO7BfR7AkveZtY x/M= Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 13 Aug 2019 13:37:36 -0800 IronPort-SDR: pNc/Wpvlazx+etYed4DYV98Kd5jXvUjl5PBnVWQrSILH3vaBMmWJi4DI8TV560hIgnlUbb8tBP kNp+9DaRVD/EuuQhgBVc9E3/D0xQ3oTq0ikc8ki3l7bh5D3w60Mlwp/mdzJP1L34poyMjv70/e cH3q7oGt2k+Rqv/X1ui6Zqnce8e9XVSrbJvGR+ILvpZaD2QvLe55yHcbPF+0kUiePPjfSnF5ti MFMo7XZUAEPtapGXd/SMO5xPlwuE6Vikx7af3eH/YDc+CC7TJzFdxLJA+E4vYvqLKJm6w+v40+ AhU= From: Julian Brown To: CC: Andrew Stubbs , Jakub Jelinek Subject: [PATCH 2/3] [og9] Use temporary buffers for async host2dev copies Date: Tue, 13 Aug 2019 21:37:00 -0000 Message-ID: <6723cd26bad519660b91d8eb371d6c9d57876e72.1565729221.git.julian@codesourcery.com> In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Return-Path: julian@codesourcery.com X-IsSubscribed: yes X-SW-Source: 2019-08/txt/msg00903.txt.bz2 In libgomp, host-to-device transfers are instigated in several places where the source data is either on the stack, or in an unstable heap location (i.e. which is immediately freed after performing the host-to-device transfer). When the transfer is asynchronous, this means that taking the address of source data and attempting the copy from that at some later point is extremely likely to fail. A previous fix for this problem (from our internal branch, and included with the AMD GCN offloading patches) attempted to separate transfers from the stack (performing them immediately) from transfers from the heap (which can safely be done some time later). Unfortunately that doesn't work well with more recent changes to libgomp and the GCN plugin. So instead, this patch copies the source data for asynchronous host-to-device copies immediately to a temporary buffer, then the transfer to the device can safely take place asynchronously some time later. Julian ChangeLog libgomp/ * plugin/plugin-gcn.c (struct copy_data): Add using_src_copy field. (copy_data): Free temporary buffer if using. (queue_push_copy): Add using_src_copy parameter. (GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_async_dev2host): Update calls to queue_push_copy. (GOMP_OFFLOAD_async_host2dev): Likewise. Allocate temporary buffer and copy source data to it immediately. * target.c (gomp_copy_host2dev): Update function comment. (copy_host2dev_immediate): Remove. (gomp_map_pointer, gomp_map_vars_internal): Replace calls to copy_host2dev_immediate with calls to gomp_copy_host2dev. --- libgomp/ChangeLog.openacc | 14 ++++++++++ libgomp/plugin/plugin-gcn.c | 20 ++++++++++--- libgomp/target.c | 56 +++++++++++++++---------------------- 3 files changed, 52 insertions(+), 38 deletions(-) diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc index 2279545f361..2a9a7f18ca2 100644 --- a/libgomp/ChangeLog.openacc +++ b/libgomp/ChangeLog.openacc @@ -1,3 +1,17 @@ +2019-08-13 Julian Brown + + * plugin/plugin-gcn.c (struct copy_data): Add using_src_copy field. + (copy_data): Free temporary buffer if using. + (queue_push_copy): Add using_src_copy parameter. + (GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_async_dev2host): Update calls to + queue_push_copy. + (GOMP_OFFLOAD_async_host2dev): Likewise. Allocate temporary buffer and + copy source data to it immediately. + * target.c (gomp_copy_host2dev): Update function comment. + (copy_host2dev_immediate): Remove. + (gomp_map_pointer, gomp_map_vars_internal): Replace calls to + copy_host2dev_immediate with calls to gomp_copy_host2dev. + 2019-08-08 Julian Brown * plugin/plugin-gcn.c (gcn_exec): Use 1 for the default number of diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index a41568b3306..65690e643ed 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -3063,6 +3063,7 @@ struct copy_data const void *src; size_t len; bool use_hsa_memory_copy; + bool using_src_copy; struct goacc_asyncqueue *aq; }; @@ -3077,12 +3078,14 @@ copy_data (void *data_) hsa_fns.hsa_memory_copy_fn (data->dst, data->src, data->len); else memcpy (data->dst, data->src, data->len); + if (data->using_src_copy) + free ((void *) data->src); free (data); } static void queue_push_copy (struct goacc_asyncqueue *aq, void *dst, const void *src, - size_t len, bool use_hsa_memory_copy) + size_t len, bool use_hsa_memory_copy, bool using_src_copy) { if (DEBUG_QUEUES) HSA_DEBUG ("queue_push_copy %d:%d: %zu bytes from (%p) to (%p)\n", @@ -3093,6 +3096,7 @@ queue_push_copy (struct goacc_asyncqueue *aq, void *dst, const void *src, data->src = src; data->len = len; data->use_hsa_memory_copy = use_hsa_memory_copy; + data->using_src_copy = using_src_copy; data->aq = aq; queue_push_callback (aq, copy_data, data); } @@ -3137,7 +3141,7 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst, const void *src, size_t n) { struct agent_info *agent = get_agent_info (device); maybe_init_omp_async (agent); - queue_push_copy (agent->omp_async_queue, dst, src, n, false); + queue_push_copy (agent->omp_async_queue, dst, src, n, false, false); return true; } @@ -3469,7 +3473,15 @@ GOMP_OFFLOAD_openacc_async_host2dev (int device, void *dst, const void *src, { struct agent_info *agent = get_agent_info (device); assert (agent == aq->agent); - queue_push_copy (aq, dst, src, n, image_address_p (agent, dst)); + /* The source data does not necessarily remain live until the deferred + copy happens. Taking a snapshot of the data here avoids reading + uninitialised data later, but means that (a) data is copied twice and + (b) modifications to the copied data between the "spawning" point of + the asynchronous kernel and when it is executed will not be seen. + But, that is probably correct. */ + void *src_copy = GOMP_PLUGIN_malloc (n); + memcpy (src_copy, src, n); + queue_push_copy (aq, dst, src_copy, n, image_address_p (agent, dst), true); return true; } @@ -3479,7 +3491,7 @@ GOMP_OFFLOAD_openacc_async_dev2host (int device, void *dst, const void *src, { struct agent_info *agent = get_agent_info (device); assert (agent == aq->agent); - queue_push_copy (aq, dst, src, n, image_address_p (agent, src)); + queue_push_copy (aq, dst, src, n, image_address_p (agent, src), false); return true; } diff --git a/libgomp/target.c b/libgomp/target.c index 4645894f869..5f7f946e2ba 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -303,10 +303,9 @@ gomp_to_device_kind_p (int kind) } /* Copy host memory to an offload device. In asynchronous mode (if AQ is - non-NULL), this is only safe when the source memory is a global or heap - location (otherwise a copy may take place from a dangling pointer to an - expired stack frame). Use copy_host2dev_immediate for copies from stack - locations. */ + non-NULL), H may point to a stack location. It is up to the underlying + plugin to ensure that this data is read immediately, rather than at some + later point when the stack frame will likely have been destroyed. */ attribute_hidden void gomp_copy_host2dev (struct gomp_device_descr *devicep, @@ -346,17 +345,6 @@ gomp_copy_host2dev (struct gomp_device_descr *devicep, gomp_device_copy (devicep, devicep->host2dev_func, "dev", d, "host", h, sz); } -/* Use this variant for host-to-device copies from stack locations that may not - be live at the time an asynchronous copy operation takes place. */ - -static void -copy_host2dev_immediate (struct gomp_device_descr *devicep, void *d, - const void *h, size_t sz, - struct gomp_coalesce_buf *cbuf) -{ - gomp_copy_host2dev (devicep, NULL, d, h, sz, cbuf); -} - attribute_hidden void gomp_copy_dev2host (struct gomp_device_descr *devicep, struct goacc_asyncqueue *aq, @@ -617,10 +605,10 @@ gomp_map_pointer (struct target_mem_desc *tgt, struct goacc_asyncqueue *aq, if (cur_node.host_start == (uintptr_t) NULL) { cur_node.tgt_offset = (uintptr_t) NULL; - copy_host2dev_immediate (devicep, - (void *) (tgt->tgt_start + target_offset), - (void *) &cur_node.tgt_offset, - sizeof (void *), cbuf); + gomp_copy_host2dev (devicep, aq, + (void *) (tgt->tgt_start + target_offset), + (void *) &cur_node.tgt_offset, sizeof (void *), + cbuf); return; } /* Add bias to the pointer value. */ @@ -639,9 +627,8 @@ gomp_map_pointer (struct target_mem_desc *tgt, struct goacc_asyncqueue *aq, array section. Now subtract bias to get what we want to initialize the pointer with. */ cur_node.tgt_offset -= bias; - copy_host2dev_immediate (devicep, (void *) (tgt->tgt_start + target_offset), - (void *) &cur_node.tgt_offset, sizeof (void *), - cbuf); + gomp_copy_host2dev (devicep, aq, (void *) (tgt->tgt_start + target_offset), + (void *) &cur_node.tgt_offset, sizeof (void *), cbuf); } static void @@ -1460,13 +1447,13 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, cur_node.tgt_offset = gomp_map_val (tgt, hostaddrs, i - 1); if (cur_node.tgt_offset) cur_node.tgt_offset -= sizes[i]; - copy_host2dev_immediate (devicep, - (void *) (n->tgt->tgt_start - + n->tgt_offset - + cur_node.host_start - - n->host_start), - (void *) &cur_node.tgt_offset, - sizeof (void *), cbufp); + gomp_copy_host2dev (devicep, aq, + (void *) (n->tgt->tgt_start + + n->tgt_offset + + cur_node.host_start + - n->host_start), + (void *) &cur_node.tgt_offset, + sizeof (void *), cbufp); cur_node.tgt_offset = n->tgt->tgt_start + n->tgt_offset + cur_node.host_start - n->host_start; continue; @@ -1705,8 +1692,8 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset); /* We intentionally do not use coalescing here, as it's not data allocated by the current call to this function. */ - copy_host2dev_immediate (devicep, (void *) n->tgt_offset, - &tgt_addr, sizeof (void *), NULL); + gomp_copy_host2dev (devicep, aq, (void *) n->tgt_offset, + &tgt_addr, sizeof (void *), NULL); } array++; } @@ -1828,9 +1815,10 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, for (i = 0; i < mapnum; i++) { cur_node.tgt_offset = gomp_map_val (tgt, hostaddrs, i); - copy_host2dev_immediate (devicep, - (void *) (tgt->tgt_start + i * sizeof (void *)), - (void *) &cur_node.tgt_offset, sizeof (void *), cbufp); + gomp_copy_host2dev (devicep, aq, + (void *) (tgt->tgt_start + i * sizeof (void *)), + (void *) &cur_node.tgt_offset, sizeof (void *), + cbufp); } } -- 2.22.0