public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Thomas Schwinge <thomas@codesourcery.com>
To: Julian Brown <julian@codesourcery.com>, <gcc-patches@gcc.gnu.org>
Cc: Jakub Jelinek <jakub@redhat.com>,
	Tobias Burnus <tobias@codesourcery.com>
Subject: Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data (was: [PATCH 3/4] openacc: Fix asynchronous host-to-device copies in libgomp runtime)
Date: Fri, 10 Mar 2023 16:22:53 +0100	[thread overview]
Message-ID: <87356cbpgy.fsf@euler.schwinge.homeip.net> (raw)
In-Reply-To: <87r1fkt6s1.fsf@euler.schwinge.homeip.net>

[-- Attachment #1: Type: text/plain, Size: 1987 bytes --]

Hi!

On 2021-07-27T12:01:18+0200, I wrote:
> On 2021-06-29T16:42:03-0700, Julian Brown <julian@codesourcery.com> wrote:
>> This patch fixes several places in libgomp/target.c where "ephemeral" data
>> (on the stack or in temporary heap locations) may be used as the source of
>> an asynchronous host-to-device copy that may not complete before the host
>> data disappears.  Versions of the patch have been posted several times
>> before, but this one (at Chung-Lin Tang's prior suggesion, IIRC) moves
>> all logic into target.c rather than pushing it out to each target plugin.
>
> Thanks for the re-work!

>> +/* Copy host memory to an offload device.  In asynchronous mode (if AQ is
>> +   non-NULL), when the source data is stack or may otherwise be deallocated
>> +   before the asynchronous copy takes place, EPHEMERAL must be passed as
>> +   TRUE.  The CBUF isn't used for non-ephemeral asynchronous copies, because
>> +   the host data might not be computed yet (by an earlier asynchronous compute
>> +   region).  */
>> +
>>  [gomp_copy_host2dev]
>
> Code changes related to the latter sentence have moved into a separate
> "Don't use libgomp 'cbuf' buffering with OpenACC 'async'", pushed to
> master branch in commit d88a6951586c7229b25708f4486eaaf4bf4b5bbe, [...]

Re this TODO comment:

> +   TODO ... but we could allow CBUF usage for EPHEMERAL data?  (Open question:
> +   is it more performant to use libgomp CBUF buffering or individual device
> +   asyncronous copying?)  */

Pushed to master branch commit 2b2340e236c0bba8aaca358ea25a5accd8249fbd
"Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data",
see attached.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Allow-libgomp-cbuf-buffering-with-OpenACC-async-for-.patch --]
[-- Type: text/x-diff, Size: 5525 bytes --]

From 2b2340e236c0bba8aaca358ea25a5accd8249fbd Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Mon, 27 Feb 2023 16:41:17 +0100
Subject: [PATCH] Allow libgomp 'cbuf' buffering with OpenACC 'async' for
 'ephemeral' data

This does *allow*, but under no circumstances is this currently going to be
used: all potentially applicable data is non-'ephemeral', and thus not
considered for 'gomp_coalesce_buf_add' for OpenACC 'async'.  (But a use will
emerge later.)

Follow-up to commit r12-2530-gd88a6951586c7229b25708f4486eaaf4bf4b5bbe
"Don't use libgomp 'cbuf' buffering with OpenACC 'async'", addressing this
TODO comment:

    TODO ... but we could allow CBUF usage for EPHEMERAL data?  (Open question:
    is it more performant to use libgomp CBUF buffering or individual device
    asyncronous copying?)

Ephemeral data is small, and therefore individual device asyncronous copying
does seem dubious -- in particular given that for all those, we'd individually
have to allocate and queue for deallocation a temporary buffer to capture the
ephemeral data.  Instead, just let the 'cbuf' *be* the temporary buffer.

	libgomp/
	* target.c (gomp_copy_host2dev, gomp_map_vars_internal): Allow
	libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral'
	data.
---
 libgomp/target.c | 70 +++++++++++++++++++++++++-----------------------
 1 file changed, 36 insertions(+), 34 deletions(-)

diff --git a/libgomp/target.c b/libgomp/target.c
index 0344f68a936..074caa6a4dc 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -310,10 +310,8 @@ struct gomp_coalesce_buf
 
    This must not be used for asynchronous copies, because the host data might
    not be computed yet (by an earlier asynchronous compute region, for
-   example).
-   TODO ... but we could allow CBUF usage for EPHEMERAL data?  (Open question:
-   is it more performant to use libgomp CBUF buffering or individual device
-   asyncronous copying?)  */
+   example).  The exception is for EPHEMERAL data, that we know is available
+   already "by construction".  */
 
 static inline void
 gomp_coalesce_buf_add (struct gomp_coalesce_buf *cbuf, size_t start, size_t len)
@@ -377,30 +375,6 @@ gomp_copy_host2dev (struct gomp_device_descr *devicep,
 		    void *d, const void *h, size_t sz,
 		    bool ephemeral, struct gomp_coalesce_buf *cbuf)
 {
-  if (__builtin_expect (aq != NULL, 0))
-    {
-      /* See 'gomp_coalesce_buf_add'.  */
-      assert (!cbuf);
-
-      void *h_buf = (void *) h;
-      if (ephemeral)
-	{
-	  /* We're queueing up an asynchronous copy from data that may
-	     disappear before the transfer takes place (i.e. because it is a
-	     stack local in a function that is no longer executing).  Make a
-	     copy of the data into a temporary buffer in those cases.  */
-	  h_buf = gomp_malloc (sz);
-	  memcpy (h_buf, h, sz);
-	}
-      goacc_device_copy_async (devicep, devicep->openacc.async.host2dev_func,
-			       "dev", d, "host", h_buf, h, sz, aq);
-      if (ephemeral)
-	/* Free temporary buffer once the transfer has completed.  */
-	devicep->openacc.async.queue_callback_func (aq, free, h_buf);
-
-      return;
-    }
-
   if (cbuf)
     {
       uintptr_t doff = (uintptr_t) d - cbuf->tgt->tgt_start;
@@ -420,6 +394,12 @@ gomp_copy_host2dev (struct gomp_device_descr *devicep,
 		      gomp_mutex_unlock (&devicep->lock);
 		      gomp_fatal ("internal libgomp cbuf error");
 		    }
+
+		  /* In an asynchronous context, verify that CBUF isn't used
+		     with non-EPHEMERAL data; see 'gomp_coalesce_buf_add'.  */
+		  if (__builtin_expect (aq != NULL, 0))
+		    assert (ephemeral);
+
 		  memcpy ((char *) cbuf->buf + (doff - cbuf->chunks[0].start),
 			  h, sz);
 		  return;
@@ -430,7 +410,28 @@ gomp_copy_host2dev (struct gomp_device_descr *devicep,
 	}
     }
 
-  gomp_device_copy (devicep, devicep->host2dev_func, "dev", d, "host", h, sz);
+  if (__builtin_expect (aq != NULL, 0))
+    {
+      void *h_buf = (void *) h;
+      if (ephemeral)
+	{
+	  /* We're queueing up an asynchronous copy from data that may
+	     disappear before the transfer takes place (i.e. because it is a
+	     stack local in a function that is no longer executing).  As we've
+	     not been able to use CBUF, make a copy of the data into a
+	     temporary buffer.  */
+	  h_buf = gomp_malloc (sz);
+	  memcpy (h_buf, h, sz);
+	}
+      goacc_device_copy_async (devicep, devicep->openacc.async.host2dev_func,
+			       "dev", d, "host", h_buf, h, sz, aq);
+      if (ephemeral)
+	/* Free once the transfer has completed.  */
+	devicep->openacc.async.queue_callback_func (aq, free, h_buf);
+    }
+  else
+    gomp_device_copy (devicep, devicep->host2dev_func,
+		      "dev", d, "host", h, sz);
 }
 
 attribute_hidden void
@@ -1751,9 +1752,6 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 
   if (cbufp)
     {
-      /* See 'gomp_coalesce_buf_add'.  */
-      assert (!aq);
-
       long c = 0;
       for (c = 0; c < cbuf.chunk_cnt; ++c)
 	gomp_copy_host2dev (devicep, aq,
@@ -1761,8 +1759,12 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 			    (char *) cbuf.buf + (cbuf.chunks[c].start
 						 - cbuf.chunks[0].start),
 			    cbuf.chunks[c].end - cbuf.chunks[c].start,
-			    true, NULL);
-      free (cbuf.buf);
+			    false, NULL);
+      if (aq)
+	/* Free once the transfer has completed.  */
+	devicep->openacc.async.queue_callback_func (aq, free, cbuf.buf);
+      else
+	free (cbuf.buf);
       cbuf.buf = NULL;
       cbufp = NULL;
     }
-- 
2.25.1


  reply	other threads:[~2023-03-10 15:23 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-29 23:42 [PATCH 0/4] openacc: Async fixes Julian Brown
2021-06-29 23:42 ` [PATCH 1/4] openacc: Async fix for lib-94 testcase Julian Brown
2021-06-29 23:42 ` [PATCH 2/4] openacc: Fix async bugs in several OpenACC test cases Julian Brown
2021-06-29 23:52   ` Julian Brown
2021-06-29 23:42 ` [PATCH 3/4] openacc: Fix asynchronous host-to-device copies in libgomp runtime Julian Brown
2021-07-27 10:01   ` Thomas Schwinge
2023-03-10 15:22     ` Thomas Schwinge [this message]
2021-06-29 23:42 ` [PATCH 4/4] openacc: Profiling-interface fixes for asynchronous operations Julian Brown
2021-06-30  8:28 ` [PATCH 0/4] openacc: Async fixes Thomas Schwinge
2021-06-30 10:40   ` Julian Brown
2021-07-02 13:51     ` Julian Brown
2023-03-10 11:38   ` Thomas Schwinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87356cbpgy.fsf@euler.schwinge.homeip.net \
    --to=thomas@codesourcery.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=julian@codesourcery.com \
    --cc=tobias@codesourcery.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).