From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 56209 invoked by alias); 14 Dec 2018 21:07:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 56193 invoked by uid 89); 14 Dec 2018 21:06:59 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=Due, 2811, informational, refusing X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 14 Dec 2018 21:06:54 +0000 Received: from svr-orw-mbx-03.mgc.mentorg.com ([147.34.90.203]) by relay1.mentorg.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256) id 1gXufd-0005Ki-07 from Thomas_Schwinge@mentor.com for gcc-patches@gcc.gnu.org; Fri, 14 Dec 2018 13:06:53 -0800 Received: from svr-orw-mbx-03.mgc.mentorg.com (147.34.90.203) by svr-orw-mbx-03.mgc.mentorg.com (147.34.90.203) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Fri, 14 Dec 2018 13:06:50 -0800 Received: from tftp-cs (147.34.91.1) by svr-orw-mbx-03.mgc.mentorg.com (147.34.90.203) with Microsoft SMTP Server id 15.0.1320.4 via Frontend Transport; Fri, 14 Dec 2018 13:06:50 -0800 Received: by tftp-cs (Postfix, from userid 49978) id 19BD1C2321; Fri, 14 Dec 2018 13:06:50 -0800 (PST) From: Thomas Schwinge To: CC: Chung-Lin Tang Subject: Re: [PR88370] acc_get_cuda_stream/acc_set_cuda_stream: acc_async_sync, acc_async_noval In-Reply-To: References: <044c4fdb-e659-6029-1da1-4f6bfc05ca9c@mentor.com> <247a55cf-e6f6-ab71-70b5-5eca5fcc2233@mentor.com> User-Agent: Notmuch/0.9-125-g4686d11 (http://notmuchmail.org) Emacs/25.2.2 (x86_64-pc-linux-gnu) Date: Fri, 14 Dec 2018 21:07:00 -0000 Message-ID: <87va3vrf6r.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-SW-Source: 2018-12/txt/msg01096.txt.bz2 Hi! On Wed, 5 Dec 2018 15:14:16 +0100, I wrote: > On Mon, 19 Nov 2018 16:33:30 +0900, Chung-Lin Tang wrote: > > On 2018/11/18 10:36 AM, Thomas Schwinge wrote: > > > Generally, I envision test cases running a few "acc_get_cuda_stream" > > > calls with relevant argument values, to see whether the expected > > > queues/streames are being used. (Similar for other offload targets.) > > >=20 > > > But I suppose we might again need to get clarified whether > > > "acc_get_cuda_stream(acc_async_sync)", > > > "acc_get_cuda_stream(acc_async_noval)", or > > > "acc_get_cuda_stream(acc_async_default)" are actually valid calls (gi= ven > > > that these argument values are not valid "async value"s), and these w= ould > > > then return the respective CUDA stream handles, different from the one > > > returned for "acc_get_cuda_stream(0)" etc. > > >=20 > > > That said, we can certainly implement it that way, because that's not > > > against the specification. > >=20 > > I think the likely clarification we'll ever get on this is that it's > > implementation defined :P >=20 > Well, actually, I've been able to convince myself ;-) to a reading of the > specification so that this is supported, and filed > . >=20 > Does the following look alright to you? >=20 > Do you agree that 'Refusing request to set CUDA stream associated with > "acc_async_sync"' should just be an informational debug message, instead > of a hard error? (This restriction might disappear in the future.) (Oh, > and other negative values will still be diagnosed as errors by > "select_stream_for_async".) Not having heard anything against this, and as a prerequisite for other changes, I have now committed the following in r267147: commit 815940afeefeeafa49ad3a5d81ef2d273ddeb3d7 Author: tschwinge Date: Fri Dec 14 20:42:08 2018 +0000 [PR88370] acc_get_cuda_stream/acc_set_cuda_stream: acc_async_sync, acc_= async_noval =20=20=20=20 Per my reading of the OpenACC specification (and as supported by second= ary documentation, such as code examples, or presentations), it's valid to = call "acc_get_cuda_stream"/"acc_set_cuda_stream" also with "acc_async_sync", "acc_async_noval" arguments, not just with the nonnegative values as cu= rrently implemented. =20=20=20=20 libgomp/ PR libgomp/88370 * libgomp.texi (acc_get_current_cuda_context, acc_get_cuda_stre= am) (acc_set_cuda_stream): Clarify. * oacc-cuda.c (acc_get_cuda_stream, acc_set_cuda_stream): Use "async_valid_p". * plugin/plugin-nvptx.c (nvptx_set_cuda_stream): Refuse "async = =3D=3D acc_async_sync". * testsuite/libgomp.oacc-c-c++-common/acc_set_cuda_stream-1.c: = New file. * testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-84.c: Update. * testsuite/libgomp.oacc-c-c++-common/lib-85.c: Likewise. =20=20=20=20 git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@267147 138bc75d-0d04-04= 10-961f-82ee72b054a4 --- libgomp/ChangeLog | 14 ++++ libgomp/libgomp.texi | 17 ++-- libgomp/oacc-cuda.c | 4 +- libgomp/plugin/plugin-nvptx.c | 10 ++- .../acc_set_cuda_stream-1.c | 42 ++++++++++ .../libgomp.oacc-c-c++-common/async_queue-1.c | 97 ++++++++++++++++++= ++++ .../testsuite/libgomp.oacc-c-c++-common/lib-84.c | 31 +++++-- .../testsuite/libgomp.oacc-c-c++-common/lib-85.c | 27 +++++- 8 files changed, 222 insertions(+), 20 deletions(-) diff --git libgomp/ChangeLog libgomp/ChangeLog index 4c66021c367d..b4ab6b690553 100644 --- libgomp/ChangeLog +++ libgomp/ChangeLog @@ -1,3 +1,17 @@ +2018-12-14 Thomas Schwinge + + PR libgomp/88370 + * libgomp.texi (acc_get_current_cuda_context, acc_get_cuda_stream) + (acc_set_cuda_stream): Clarify. + * oacc-cuda.c (acc_get_cuda_stream, acc_set_cuda_stream): Use + "async_valid_p". + * plugin/plugin-nvptx.c (nvptx_set_cuda_stream): Refuse "async =3D=3D + acc_async_sync". + * testsuite/libgomp.oacc-c-c++-common/acc_set_cuda_stream-1.c: New file. + * testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Likewise. + * testsuite/libgomp.oacc-c-c++-common/lib-84.c: Update. + * testsuite/libgomp.oacc-c-c++-common/lib-85.c: Likewise. + 2018-12-14 Tom de Vries =20 * testsuite/libgomp.c-c++-common/function-not-offloaded-aux.c: New test. diff --git libgomp/libgomp.texi libgomp/libgomp.texi index 3fa8eb8165e5..e6c20525bc0c 100644 --- libgomp/libgomp.texi +++ libgomp/libgomp.texi @@ -2768,7 +2768,7 @@ as used by the CUDA Runtime or Driver API's. =20 @item @emph{C/C++}: @multitable @columnfractions .20 .80 -@item @emph{Prototype}: @tab @code{acc_get_current_cuda_context(void);} +@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_context(void= );} @end multitable =20 @item @emph{Reference}: @@ -2782,12 +2782,12 @@ A.2.1.2. @section @code{acc_get_cuda_stream} -- Get CUDA stream handle. @table @asis @item @emph{Description} -This function returns the CUDA stream handle. This handle is the same -as used by the CUDA Runtime or Driver API's. +This function returns the CUDA stream handle for the queue @var{async}. +This handle is the same as used by the CUDA Runtime or Driver API's. =20 @item @emph{C/C++}: @multitable @columnfractions .20 .80 -@item @emph{Prototype}: @tab @code{acc_get_cuda_stream(void);} +@item @emph{Prototype}: @tab @code{void *acc_get_cuda_stream(int async);} @end multitable =20 @item @emph{Reference}: @@ -2802,11 +2802,16 @@ A.2.1.3. @table @asis @item @emph{Description} This function associates the stream handle specified by @var{stream} with -the asynchronous value specified by @var{async}. +the queue @var{async}. + +This cannot be used to change the stream handle associated with +@code{acc_async_sync}. + +The return value is not specified. =20 @item @emph{C/C++}: @multitable @columnfractions .20 .80 -@item @emph{Prototype}: @tab @code{acc_set_cuda_stream(int async void *str= eam);} +@item @emph{Prototype}: @tab @code{int acc_set_cuda_stream(int async, void= *stream);} @end multitable =20 @item @emph{Reference}: diff --git libgomp/oacc-cuda.c libgomp/oacc-cuda.c index 20774c1b4876..4ee4c9b08576 100644 --- libgomp/oacc-cuda.c +++ libgomp/oacc-cuda.c @@ -58,7 +58,7 @@ acc_get_cuda_stream (int async) { struct goacc_thread *thr =3D goacc_thread (); =20 - if (!async_valid_stream_id_p (async)) + if (!async_valid_p (async)) return NULL; =20 if (thr && thr->dev && thr->dev->openacc.cuda.get_stream_func) @@ -72,7 +72,7 @@ acc_set_cuda_stream (int async, void *stream) { struct goacc_thread *thr; =20 - if (!async_valid_stream_id_p (async) || stream =3D=3D NULL) + if (!async_valid_p (async) || stream =3D=3D NULL) return 0; =20 goacc_lazy_initialize (); diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c index 6492e5ffab77..7d0d38e0c2e1 100644 --- libgomp/plugin/plugin-nvptx.c +++ libgomp/plugin/plugin-nvptx.c @@ -1753,8 +1753,14 @@ nvptx_set_cuda_stream (int async, void *stream) pthread_t self =3D pthread_self (); struct nvptx_thread *nvthd =3D nvptx_thread (); =20 - if (async < 0) - GOMP_PLUGIN_fatal ("bad async %d", async); + /* Due to the "null_stream" usage for "acc_async_sync", this cannot be u= sed + to change the stream handle associated with "acc_async_sync". */ + if (async =3D=3D acc_async_sync) + { + GOMP_PLUGIN_debug (0, "Refusing request to set CUDA stream associate= d" + " with \"acc_async_sync\"\n"); + return 0; + } =20 pthread_mutex_lock (&nvthd->ptx_dev->stream_lock); =20 diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/acc_set_cuda_stream-= 1.c libgomp/testsuite/libgomp.oacc-c-c++-common/acc_set_cuda_stream-1.c new file mode 100644 index 000000000000..93981ff5cb7f --- /dev/null +++ libgomp/testsuite/libgomp.oacc-c-c++-common/acc_set_cuda_stream-1.c @@ -0,0 +1,42 @@ +/* Verify expected nvptx plugin behavior for "acc_set_cuda_stream" for + "acc_async_sync". */ + +/* { dg-do run { target openacc_nvidia_accel_selected } } */ +/* { dg-set-target-env-var GOMP_DEBUG "1" } */ + +#undef NDEBUG +#include +#include + +int main(void) +{ + int async =3D 42; + + /* Initialize. */ +#pragma acc parallel async(acc_async_sync) + ; +#pragma acc parallel async(async) + ; +#pragma acc wait + + void *cuda_stream_sync =3D acc_get_cuda_stream (acc_async_sync); + assert (cuda_stream_sync =3D=3D NULL); + void *cuda_stream_async =3D acc_get_cuda_stream (async); + assert (cuda_stream_async !=3D NULL); + int ret =3D acc_set_cuda_stream (acc_async_sync, cuda_stream_async); + assert (ret =3D=3D 0); + void *cuda_stream_sync_ =3D acc_get_cuda_stream (acc_async_sync); + assert (cuda_stream_sync_ =3D=3D cuda_stream_sync); + void *cuda_stream_async_ =3D acc_get_cuda_stream (async); + assert (cuda_stream_async_ =3D=3D cuda_stream_async); + +#pragma acc parallel async(acc_async_sync) + ; +#pragma acc parallel async(async) + ; +#pragma acc wait + + return 0; +} + +/* { dg-output "Refusing request to set CUDA stream associated with \"acc_= async_sync\"" } */ diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/async_queue-1.c libg= omp/testsuite/libgomp.oacc-c-c++-common/async_queue-1.c new file mode 100644 index 000000000000..48e1846a36e3 --- /dev/null +++ libgomp/testsuite/libgomp.oacc-c-c++-common/async_queue-1.c @@ -0,0 +1,97 @@ +/* Test mapping of async values to specific underlying queues. */ + +#undef NDEBUG +#include +#include + +/* This is implemented in terms of the "acc_get_cuda_stream" interface. */ + +struct +{ + int async; + void *cuda_stream; +} queues[] =3D { { acc_async_sync, NULL }, + { acc_async_noval, NULL }, + { 0, NULL }, + { 1, NULL }, + { 2, NULL }, + { 36, NULL }, + { 1982, NULL } }; +const size_t queues_n =3D sizeof queues / sizeof queues[0]; + +int main(void) +{ + /* Explicitly initialize: it's not clear whether the following OpenACC + runtime library calls implicitly initialize; + . */ + acc_device_t d; +#if defined ACC_DEVICE_TYPE_nvidia + d =3D acc_device_nvidia; +#elif defined ACC_DEVICE_TYPE_host + d =3D acc_device_host; +#else +# error Not ported to this ACC_DEVICE_TYPE +#endif + acc_init (d); + + for (size_t i =3D 0; i < queues_n; ++i) + { + /* Before actually being used, there are all NULL. */ + queues[i].cuda_stream =3D acc_get_cuda_stream (queues[i].async); + assert (queues[i].cuda_stream =3D=3D NULL); + } + + for (size_t i =3D 0; i < queues_n; ++i) + { + /* Use the queue to initialize it. */ +#pragma acc parallel async(queues[i].async) + ; +#pragma acc wait + + /* Verify CUDA stream used. */ + queues[i].cuda_stream =3D acc_get_cuda_stream (queues[i].async); +#if defined ACC_DEVICE_TYPE_nvidia + /* "acc_async_sync" maps to the NULL CUDA default stream. */ + if (queues[i].async =3D=3D acc_async_sync) + assert (queues[i].cuda_stream =3D=3D NULL); + else + assert (queues[i].cuda_stream !=3D NULL); +#elif defined ACC_DEVICE_TYPE_host + /* For "acc_device_host" there are no CUDA streams. */ + assert (queues[i].cuda_stream =3D=3D NULL); +#else +# error Not ported to this ACC_DEVICE_TYPE +#endif + } + + /* Verify same results. */ + for (size_t i =3D 0; i < queues_n; ++i) + { + void *cuda_stream; + + cuda_stream =3D acc_get_cuda_stream (queues[i].async); + assert (cuda_stream =3D=3D queues[i].cuda_stream); + +#pragma acc parallel async(queues[i].async) + ; +#pragma acc wait + + cuda_stream =3D acc_get_cuda_stream (queues[i].async); + assert (cuda_stream =3D=3D queues[i].cuda_stream); + } + + /* Verify individual underlying queues are all different. */ + for (size_t i =3D 0; i < queues_n; ++i) + { + if (queues[i].cuda_stream =3D=3D NULL) + continue; + for (size_t j =3D i + 1; j < queues_n; ++j) + { + if (queues[j].cuda_stream =3D=3D NULL) + continue; + assert (queues[j].cuda_stream !=3D queues[i].cuda_stream); + } + } + + return 0; +} diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-84.c libgomp/tes= tsuite/libgomp.oacc-c-c++-common/lib-84.c index 786b908f755b..d793c7436300 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-84.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-84.c @@ -7,6 +7,14 @@ #include #include =20 +#if !defined __cplusplus +# undef static_assert +# define static_assert _Static_assert +#endif + +static_assert (acc_async_sync =3D=3D -2, "acc_async_sync?"); +static_assert (acc_async_noval =3D=3D -1, "acc_async_noval?"); + int main (int argc, char **argv) { @@ -20,9 +28,11 @@ main (int argc, char **argv) =20 (void) acc_get_device_num (acc_device_nvidia); =20 - streams =3D (CUstream *) malloc (N * sizeof (void *)); + streams =3D (CUstream *) malloc ((2 + N) * sizeof (void *)); + streams +=3D 2; + /* "streams[i]" is valid for i in [acc_async_sync..N). */ =20 - for (i =3D 0; i < N; i++) + for (i =3D acc_async_sync; i < N; i++) { streams[i] =3D (CUstream) acc_get_cuda_stream (i); if (streams[i] !=3D NULL) @@ -35,11 +45,20 @@ main (int argc, char **argv) abort (); } =20 - if (!acc_set_cuda_stream (i, streams[i])) - abort (); + int ret =3D acc_set_cuda_stream (i, streams[i]); + if (i =3D=3D acc_async_sync) + { + if (ret =3D=3D 1) + abort (); + } + else + { + if (ret !=3D 1) + abort (); + } } =20 - for (i =3D 0; i < N; i++) + for (i =3D acc_async_sync; i < N; i++) { int j; int cnt; @@ -48,7 +67,7 @@ main (int argc, char **argv) =20 s =3D streams[i]; =20 - for (j =3D 0; j < N; j++) + for (j =3D acc_async_sync; j < N; j++) { if (s =3D=3D streams[j]) cnt++; diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-85.c libgomp/tes= tsuite/libgomp.oacc-c-c++-common/lib-85.c index cf925a7b002e..141c83b53dd6 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-85.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-85.c @@ -7,6 +7,14 @@ #include #include =20 +#if !defined __cplusplus +# undef static_assert +# define static_assert _Static_assert +#endif + +static_assert (acc_async_sync =3D=3D -2, "acc_async_sync?"); +static_assert (acc_async_noval =3D=3D -1, "acc_async_noval?"); + int main (int argc, char **argv) { @@ -20,9 +28,11 @@ main (int argc, char **argv) =20 (void) acc_get_device_num (acc_device_nvidia); =20 - streams =3D (CUstream *) malloc (N * sizeof (void *)); + streams =3D (CUstream *) malloc ((2 + N) * sizeof (void *)); + streams +=3D 2; + /* "streams[i]" is valid for i in [acc_async_sync..N). */ =20 - for (i =3D 0; i < N; i++) + for (i =3D acc_async_sync; i < N; i++) { streams[i] =3D (CUstream) acc_get_cuda_stream (i); if (streams[i] !=3D NULL) @@ -35,8 +45,17 @@ main (int argc, char **argv) abort (); } =20 - if (!acc_set_cuda_stream (i, streams[i])) - abort (); + int ret =3D acc_set_cuda_stream (i, streams[i]); + if (i =3D=3D acc_async_sync) + { + if (ret =3D=3D 1) + abort (); + } + else + { + if (ret !=3D 1) + abort (); + } } =20 s =3D NULL; Gr=C3=BC=C3=9Fe Thomas