From: Thomas Schwinge <thomas@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Cc: Chung-Lin Tang <cltang@codesourcery.com>
Subject: Re: [PR88370] acc_get_cuda_stream/acc_set_cuda_stream: acc_async_sync, acc_async_noval
Date: Fri, 14 Dec 2018 21:07:00 -0000 [thread overview]
Message-ID: <87va3vrf6r.fsf@euler.schwinge.homeip.net> (raw)
In-Reply-To: <yxfpd0qg9i2v.fsf@hertz.schwinge.homeip.net>
Hi!
On Wed, 5 Dec 2018 15:14:16 +0100, I wrote:
> On Mon, 19 Nov 2018 16:33:30 +0900, Chung-Lin Tang <chunglin_tang@mentor.com> wrote:
> > On 2018/11/18 10:36 AM, Thomas Schwinge wrote:
> > > Generally, I envision test cases running a few "acc_get_cuda_stream"
> > > calls with relevant argument values, to see whether the expected
> > > queues/streames are being used. (Similar for other offload targets.)
> > >
> > > But I suppose we might again need to get clarified whether
> > > "acc_get_cuda_stream(acc_async_sync)",
> > > "acc_get_cuda_stream(acc_async_noval)", or
> > > "acc_get_cuda_stream(acc_async_default)" are actually valid calls (given
> > > that these argument values are not valid "async value"s), and these would
> > > then return the respective CUDA stream handles, different from the one
> > > returned for "acc_get_cuda_stream(0)" etc.
> > >
> > > That said, we can certainly implement it that way, because that's not
> > > against the specification.
> >
> > I think the likely clarification we'll ever get on this is that it's
> > implementation defined :P
>
> Well, actually, I've been able to convince myself ;-) to a reading of the
> specification so that this is supported, and filed
> <https://gcc.gnu.org/PR88370>.
>
> Does the following look alright to you?
>
> Do you agree that 'Refusing request to set CUDA stream associated with
> "acc_async_sync"' should just be an informational debug message, instead
> of a hard error? (This restriction might disappear in the future.) (Oh,
> and other negative values will still be diagnosed as errors by
> "select_stream_for_async".)
Not having heard anything against this, and as a prerequisite for other
changes, I have now committed the following in r267147:
commit 815940afeefeeafa49ad3a5d81ef2d273ddeb3d7
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date: Fri Dec 14 20:42:08 2018 +0000
[PR88370] acc_get_cuda_stream/acc_set_cuda_stream: acc_async_sync, acc_async_noval
Per my reading of the OpenACC specification (and as supported by secondary
documentation, such as code examples, or presentations), it's valid to call
"acc_get_cuda_stream"/"acc_set_cuda_stream" also with "acc_async_sync",
"acc_async_noval" arguments, not just with the nonnegative values as currently
implemented.
libgomp/
PR libgomp/88370
* libgomp.texi (acc_get_current_cuda_context, acc_get_cuda_stream)
(acc_set_cuda_stream): Clarify.
* oacc-cuda.c (acc_get_cuda_stream, acc_set_cuda_stream): Use
"async_valid_p".
* plugin/plugin-nvptx.c (nvptx_set_cuda_stream): Refuse "async ==
acc_async_sync".
* testsuite/libgomp.oacc-c-c++-common/acc_set_cuda_stream-1.c: New file.
* testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-84.c: Update.
* testsuite/libgomp.oacc-c-c++-common/lib-85.c: Likewise.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@267147 138bc75d-0d04-0410-961f-82ee72b054a4
---
libgomp/ChangeLog | 14 ++++
libgomp/libgomp.texi | 17 ++--
libgomp/oacc-cuda.c | 4 +-
libgomp/plugin/plugin-nvptx.c | 10 ++-
.../acc_set_cuda_stream-1.c | 42 ++++++++++
.../libgomp.oacc-c-c++-common/async_queue-1.c | 97 ++++++++++++++++++++++
.../testsuite/libgomp.oacc-c-c++-common/lib-84.c | 31 +++++--
.../testsuite/libgomp.oacc-c-c++-common/lib-85.c | 27 +++++-
8 files changed, 222 insertions(+), 20 deletions(-)
diff --git libgomp/ChangeLog libgomp/ChangeLog
index 4c66021c367d..b4ab6b690553 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,17 @@
+2018-12-14 Thomas Schwinge <thomas@codesourcery.com>
+
+ PR libgomp/88370
+ * libgomp.texi (acc_get_current_cuda_context, acc_get_cuda_stream)
+ (acc_set_cuda_stream): Clarify.
+ * oacc-cuda.c (acc_get_cuda_stream, acc_set_cuda_stream): Use
+ "async_valid_p".
+ * plugin/plugin-nvptx.c (nvptx_set_cuda_stream): Refuse "async ==
+ acc_async_sync".
+ * testsuite/libgomp.oacc-c-c++-common/acc_set_cuda_stream-1.c: New file.
+ * testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Likewise.
+ * testsuite/libgomp.oacc-c-c++-common/lib-84.c: Update.
+ * testsuite/libgomp.oacc-c-c++-common/lib-85.c: Likewise.
+
2018-12-14 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.c-c++-common/function-not-offloaded-aux.c: New test.
diff --git libgomp/libgomp.texi libgomp/libgomp.texi
index 3fa8eb8165e5..e6c20525bc0c 100644
--- libgomp/libgomp.texi
+++ libgomp/libgomp.texi
@@ -2768,7 +2768,7 @@ as used by the CUDA Runtime or Driver API's.
@item @emph{C/C++}:
@multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{acc_get_current_cuda_context(void);}
+@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_context(void);}
@end multitable
@item @emph{Reference}:
@@ -2782,12 +2782,12 @@ A.2.1.2.
@section @code{acc_get_cuda_stream} -- Get CUDA stream handle.
@table @asis
@item @emph{Description}
-This function returns the CUDA stream handle. This handle is the same
-as used by the CUDA Runtime or Driver API's.
+This function returns the CUDA stream handle for the queue @var{async}.
+This handle is the same as used by the CUDA Runtime or Driver API's.
@item @emph{C/C++}:
@multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{acc_get_cuda_stream(void);}
+@item @emph{Prototype}: @tab @code{void *acc_get_cuda_stream(int async);}
@end multitable
@item @emph{Reference}:
@@ -2802,11 +2802,16 @@ A.2.1.3.
@table @asis
@item @emph{Description}
This function associates the stream handle specified by @var{stream} with
-the asynchronous value specified by @var{async}.
+the queue @var{async}.
+
+This cannot be used to change the stream handle associated with
+@code{acc_async_sync}.
+
+The return value is not specified.
@item @emph{C/C++}:
@multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{acc_set_cuda_stream(int async void *stream);}
+@item @emph{Prototype}: @tab @code{int acc_set_cuda_stream(int async, void *stream);}
@end multitable
@item @emph{Reference}:
diff --git libgomp/oacc-cuda.c libgomp/oacc-cuda.c
index 20774c1b4876..4ee4c9b08576 100644
--- libgomp/oacc-cuda.c
+++ libgomp/oacc-cuda.c
@@ -58,7 +58,7 @@ acc_get_cuda_stream (int async)
{
struct goacc_thread *thr = goacc_thread ();
- if (!async_valid_stream_id_p (async))
+ if (!async_valid_p (async))
return NULL;
if (thr && thr->dev && thr->dev->openacc.cuda.get_stream_func)
@@ -72,7 +72,7 @@ acc_set_cuda_stream (int async, void *stream)
{
struct goacc_thread *thr;
- if (!async_valid_stream_id_p (async) || stream == NULL)
+ if (!async_valid_p (async) || stream == NULL)
return 0;
goacc_lazy_initialize ();
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index 6492e5ffab77..7d0d38e0c2e1 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -1753,8 +1753,14 @@ nvptx_set_cuda_stream (int async, void *stream)
pthread_t self = pthread_self ();
struct nvptx_thread *nvthd = nvptx_thread ();
- if (async < 0)
- GOMP_PLUGIN_fatal ("bad async %d", async);
+ /* Due to the "null_stream" usage for "acc_async_sync", this cannot be used
+ to change the stream handle associated with "acc_async_sync". */
+ if (async == acc_async_sync)
+ {
+ GOMP_PLUGIN_debug (0, "Refusing request to set CUDA stream associated"
+ " with \"acc_async_sync\"\n");
+ return 0;
+ }
pthread_mutex_lock (&nvthd->ptx_dev->stream_lock);
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/acc_set_cuda_stream-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/acc_set_cuda_stream-1.c
new file mode 100644
index 000000000000..93981ff5cb7f
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/acc_set_cuda_stream-1.c
@@ -0,0 +1,42 @@
+/* Verify expected nvptx plugin behavior for "acc_set_cuda_stream" for
+ "acc_async_sync". */
+
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-set-target-env-var GOMP_DEBUG "1" } */
+
+#undef NDEBUG
+#include <assert.h>
+#include <openacc.h>
+
+int main(void)
+{
+ int async = 42;
+
+ /* Initialize. */
+#pragma acc parallel async(acc_async_sync)
+ ;
+#pragma acc parallel async(async)
+ ;
+#pragma acc wait
+
+ void *cuda_stream_sync = acc_get_cuda_stream (acc_async_sync);
+ assert (cuda_stream_sync == NULL);
+ void *cuda_stream_async = acc_get_cuda_stream (async);
+ assert (cuda_stream_async != NULL);
+ int ret = acc_set_cuda_stream (acc_async_sync, cuda_stream_async);
+ assert (ret == 0);
+ void *cuda_stream_sync_ = acc_get_cuda_stream (acc_async_sync);
+ assert (cuda_stream_sync_ == cuda_stream_sync);
+ void *cuda_stream_async_ = acc_get_cuda_stream (async);
+ assert (cuda_stream_async_ == cuda_stream_async);
+
+#pragma acc parallel async(acc_async_sync)
+ ;
+#pragma acc parallel async(async)
+ ;
+#pragma acc wait
+
+ return 0;
+}
+
+/* { dg-output "Refusing request to set CUDA stream associated with \"acc_async_sync\"" } */
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/async_queue-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/async_queue-1.c
new file mode 100644
index 000000000000..48e1846a36e3
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/async_queue-1.c
@@ -0,0 +1,97 @@
+/* Test mapping of async values to specific underlying queues. */
+
+#undef NDEBUG
+#include <assert.h>
+#include <openacc.h>
+
+/* This is implemented in terms of the "acc_get_cuda_stream" interface. */
+
+struct
+{
+ int async;
+ void *cuda_stream;
+} queues[] = { { acc_async_sync, NULL },
+ { acc_async_noval, NULL },
+ { 0, NULL },
+ { 1, NULL },
+ { 2, NULL },
+ { 36, NULL },
+ { 1982, NULL } };
+const size_t queues_n = sizeof queues / sizeof queues[0];
+
+int main(void)
+{
+ /* Explicitly initialize: it's not clear whether the following OpenACC
+ runtime library calls implicitly initialize;
+ <https://github.com/OpenACC/openacc-spec/issues/102>. */
+ acc_device_t d;
+#if defined ACC_DEVICE_TYPE_nvidia
+ d = acc_device_nvidia;
+#elif defined ACC_DEVICE_TYPE_host
+ d = acc_device_host;
+#else
+# error Not ported to this ACC_DEVICE_TYPE
+#endif
+ acc_init (d);
+
+ for (size_t i = 0; i < queues_n; ++i)
+ {
+ /* Before actually being used, there are all NULL. */
+ queues[i].cuda_stream = acc_get_cuda_stream (queues[i].async);
+ assert (queues[i].cuda_stream == NULL);
+ }
+
+ for (size_t i = 0; i < queues_n; ++i)
+ {
+ /* Use the queue to initialize it. */
+#pragma acc parallel async(queues[i].async)
+ ;
+#pragma acc wait
+
+ /* Verify CUDA stream used. */
+ queues[i].cuda_stream = acc_get_cuda_stream (queues[i].async);
+#if defined ACC_DEVICE_TYPE_nvidia
+ /* "acc_async_sync" maps to the NULL CUDA default stream. */
+ if (queues[i].async == acc_async_sync)
+ assert (queues[i].cuda_stream == NULL);
+ else
+ assert (queues[i].cuda_stream != NULL);
+#elif defined ACC_DEVICE_TYPE_host
+ /* For "acc_device_host" there are no CUDA streams. */
+ assert (queues[i].cuda_stream == NULL);
+#else
+# error Not ported to this ACC_DEVICE_TYPE
+#endif
+ }
+
+ /* Verify same results. */
+ for (size_t i = 0; i < queues_n; ++i)
+ {
+ void *cuda_stream;
+
+ cuda_stream = acc_get_cuda_stream (queues[i].async);
+ assert (cuda_stream == queues[i].cuda_stream);
+
+#pragma acc parallel async(queues[i].async)
+ ;
+#pragma acc wait
+
+ cuda_stream = acc_get_cuda_stream (queues[i].async);
+ assert (cuda_stream == queues[i].cuda_stream);
+ }
+
+ /* Verify individual underlying queues are all different. */
+ for (size_t i = 0; i < queues_n; ++i)
+ {
+ if (queues[i].cuda_stream == NULL)
+ continue;
+ for (size_t j = i + 1; j < queues_n; ++j)
+ {
+ if (queues[j].cuda_stream == NULL)
+ continue;
+ assert (queues[j].cuda_stream != queues[i].cuda_stream);
+ }
+ }
+
+ return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-84.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-84.c
index 786b908f755b..d793c7436300 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-84.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-84.c
@@ -7,6 +7,14 @@
#include <openacc.h>
#include <cuda.h>
+#if !defined __cplusplus
+# undef static_assert
+# define static_assert _Static_assert
+#endif
+
+static_assert (acc_async_sync == -2, "acc_async_sync?");
+static_assert (acc_async_noval == -1, "acc_async_noval?");
+
int
main (int argc, char **argv)
{
@@ -20,9 +28,11 @@ main (int argc, char **argv)
(void) acc_get_device_num (acc_device_nvidia);
- streams = (CUstream *) malloc (N * sizeof (void *));
+ streams = (CUstream *) malloc ((2 + N) * sizeof (void *));
+ streams += 2;
+ /* "streams[i]" is valid for i in [acc_async_sync..N). */
- for (i = 0; i < N; i++)
+ for (i = acc_async_sync; i < N; i++)
{
streams[i] = (CUstream) acc_get_cuda_stream (i);
if (streams[i] != NULL)
@@ -35,11 +45,20 @@ main (int argc, char **argv)
abort ();
}
- if (!acc_set_cuda_stream (i, streams[i]))
- abort ();
+ int ret = acc_set_cuda_stream (i, streams[i]);
+ if (i == acc_async_sync)
+ {
+ if (ret == 1)
+ abort ();
+ }
+ else
+ {
+ if (ret != 1)
+ abort ();
+ }
}
- for (i = 0; i < N; i++)
+ for (i = acc_async_sync; i < N; i++)
{
int j;
int cnt;
@@ -48,7 +67,7 @@ main (int argc, char **argv)
s = streams[i];
- for (j = 0; j < N; j++)
+ for (j = acc_async_sync; j < N; j++)
{
if (s == streams[j])
cnt++;
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-85.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-85.c
index cf925a7b002e..141c83b53dd6 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-85.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-85.c
@@ -7,6 +7,14 @@
#include <stdio.h>
#include <cuda.h>
+#if !defined __cplusplus
+# undef static_assert
+# define static_assert _Static_assert
+#endif
+
+static_assert (acc_async_sync == -2, "acc_async_sync?");
+static_assert (acc_async_noval == -1, "acc_async_noval?");
+
int
main (int argc, char **argv)
{
@@ -20,9 +28,11 @@ main (int argc, char **argv)
(void) acc_get_device_num (acc_device_nvidia);
- streams = (CUstream *) malloc (N * sizeof (void *));
+ streams = (CUstream *) malloc ((2 + N) * sizeof (void *));
+ streams += 2;
+ /* "streams[i]" is valid for i in [acc_async_sync..N). */
- for (i = 0; i < N; i++)
+ for (i = acc_async_sync; i < N; i++)
{
streams[i] = (CUstream) acc_get_cuda_stream (i);
if (streams[i] != NULL)
@@ -35,8 +45,17 @@ main (int argc, char **argv)
abort ();
}
- if (!acc_set_cuda_stream (i, streams[i]))
- abort ();
+ int ret = acc_set_cuda_stream (i, streams[i]);
+ if (i == acc_async_sync)
+ {
+ if (ret == 1)
+ abort ();
+ }
+ else
+ {
+ if (ret != 1)
+ abort ();
+ }
}
s = NULL;
Grüße
Thomas
next prev parent reply other threads:[~2018-12-14 21:07 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-13 10:15 [gomp4] Async related additions to OpenACC runtime library Chung-Lin Tang
2017-02-14 11:29 ` Thomas Schwinge
2017-02-14 13:05 ` Chung-Lin Tang
2017-02-15 20:04 ` Thomas Schwinge
2018-11-18 1:37 ` OpenACC ICV acc-default-async-var (was: [gomp4] Async related additions to OpenACC runtime library) Thomas Schwinge
2018-11-19 7:33 ` OpenACC ICV acc-default-async-var Chung-Lin Tang
2018-12-05 14:14 ` [PR88370] acc_get_cuda_stream/acc_set_cuda_stream: acc_async_sync, acc_async_noval (was: OpenACC ICV acc-default-async-var) Thomas Schwinge
2018-12-14 21:07 ` Thomas Schwinge [this message]
2018-12-05 14:25 ` OpenACC ICV acc-default-async-var Thomas Schwinge
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87va3vrf6r.fsf@euler.schwinge.homeip.net \
--to=thomas@codesourcery.com \
--cc=cltang@codesourcery.com \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).