From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 871 invoked by alias); 18 Nov 2018 01:37:08 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 846 invoked by uid 89); 18 Nov 2018 01:37:06 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: =?ISO-8859-1?Q?No, score=-11.0 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=gr, gr=c3=bc=c3, Once, thr?= X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sun, 18 Nov 2018 01:37:04 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-MBX-03.mgc.mentorg.com) by relay1.mentorg.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256) id 1gOC1F-0005vd-ED from Thomas_Schwinge@mentor.com for gcc-patches@gcc.gnu.org; Sat, 17 Nov 2018 17:37:01 -0800 Received: from hertz.schwinge.homeip.net (137.202.0.90) by SVR-IES-MBX-03.mgc.mentorg.com (139.181.222.3) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Sun, 18 Nov 2018 01:36:56 +0000 From: Thomas Schwinge To: Chung-Lin Tang CC: gcc-patches Subject: OpenACC ICV acc-default-async-var (was: [gomp4] Async related additions to OpenACC runtime library) In-Reply-To: <044c4fdb-e659-6029-1da1-4f6bfc05ca9c@mentor.com> References: <044c4fdb-e659-6029-1da1-4f6bfc05ca9c@mentor.com> User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/25.2.2 (x86_64-pc-linux-gnu) Date: Sun, 18 Nov 2018 01:37:00 -0000 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-SW-Source: 2018-11/txt/msg01588.txt.bz2 Hi Chung-Lin! On Mon, 13 Feb 2017 18:13:42 +0800, Chung-Lin Tang wrote: > This patch adds: >=20 > // New functions to set/get the current default async queue > void acc_set_default_async (int); > int acc_get_default_async (void); >=20 > and _async versions of a few existing API functions: (Please, separate patches for separate features/changes.) Reviewing the OpenACC ICV acc-default-async-var changes here. > --- include/gomp-constants.h (revision 245382) > +++ include/gomp-constants.h (working copy) > /* Asynchronous behavior. Keep in sync with > libgomp/{openacc.h,openacc.f90,openacc_lib.h}:acc_async_t. */ >=20=20 > +#define GOMP_ASYNC_DEFAULT 0 > #define GOMP_ASYNC_NOVAL -1 > #define GOMP_ASYNC_SYNC -2 This means that "acc_set_default_async(acc_async_default)" will set acc-default-async-var to "0", that is, the same as "acc_set_default_async(0)". It thus follows that "async"/"async(acc_async_noval)" is the same as "async(0)". Is that intentional? It is in line with the OpenACC 2.5 specification: "initial value [...] is implementation defined", but I wonder why map it to "async(0)", and not to its own, unspecified, but separate queue. In the latter case, "acc_async_default" etc. would then map to a negative value to denote this unspecified, but separate queue (and your changes would need to be adapted for that). I have not verified whether we're currently already having (on trunk and/or openacc-gcc-8-branch) the semantics of the queue of "async(acc_async_noval)" mapping to the same queue as "async(0)"? I'm fine to accept your changes as proposed (basically, everthing from your patch posted that has a "default_async" in it), for that's an incremental improvement anyway. But -- unless you tell me I've misunderstood something -- I'll get the issue I raised clarified with the OpenACC technical committee, and we will then later improve this further. No matter what the outcome, the implementation-defined behavior should be documented. (Can do that once we get the intentions clarified.) > --- libgomp/oacc-async.c (revision 245382) > +++ libgomp/oacc-async.c (working copy) > +int > +acc_get_default_async (void) > +{ > + struct goacc_thread *thr =3D goacc_thread (); > + > + if (!thr || !thr->dev) > + gomp_fatal ("no device active"); I suppose that instead, this might also either just "return acc_async_sync", or in fact "goacc_lazy_initialize", and then return the correct value? As far as I remember now, I have an issue open with the OpenACC technical committee to clarify which constructs/API calls are expected to implicitly initialize. I'll fold this question in. So, OK to leave 'gomp_fatal ("no device active")', as that's what all other async routines also seem to be doing at the moment. > + > + return thr->default_async; > +} > + > +void > +acc_set_default_async (int async) > +{ > + if (async < acc_async_sync) > + gomp_fatal ("invalid async argument: %d", async); (This will nowadays use "async_valid_stream_id_p" or some such.) > + > + struct goacc_thread *thr =3D goacc_thread (); > + > + if (!thr || !thr->dev) > + gomp_fatal ("no device active"); As above. > + thr->default_async =3D async; > +} > --- libgomp/oacc-plugin.c (revision 245382) > +++ libgomp/oacc-plugin.c (working copy) > +/* Return the default async number from the TLS data for the current thr= ead. */ > + > +int > +GOMP_PLUGIN_acc_thread_default_async (void) > +{ > + struct goacc_thread *thr =3D goacc_thread (); > + return thr ? thr->default_async : acc_async_default; > +} As I understand, the need for this function will disappear with your later "async re-work" changes, so OK as posted, but I wondered in which cases we would not have a valid "goacc_thread" when coming here? (Might again related to the "goacc_lazy_initialize" issue mentioned above.) > --- libgomp/plugin/plugin-nvptx.c (revision 245382) > +++ libgomp/plugin/plugin-nvptx.c (working copy) > @@ -414,13 +414,10 @@ select_stream_for_async (int async, pthread_t thre > struct ptx_stream *stream =3D NULL; > int orig_async =3D async; >=20=20 > - /* The special value acc_async_noval (-1) maps (for now) to an > - implicitly-created stream, which is then handled the same as any ot= her > - numbered async stream. Other options are available, e.g. using the= null > - stream for anonymous async operations, or choosing an idle stream f= rom an > - active set. But, stick with this for now. */ > - if (async > acc_async_sync) > - async++; Is that actually a separate change from the acc-default-async-var changes? Is this one relevant in the question raised above, whether "async(acc_async_noval)" maps to the same queue as "async(0)"? > + /* The special value acc_async_noval (-1) maps to the thread-specific > + default async stream. */ > + if (async =3D=3D acc_async_noval) > + async =3D GOMP_PLUGIN_acc_thread_default_async (); > --- libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-2.c (revision 0) > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-2.c (revision 0) > @@ -0,0 +1,904 @@ > +[...] > + acc_set_default_async (s); > +[...] This is the one single test case using this functionality, but it only verifies "correct results', but doesn't observe the actual queues (for example, CUDA streams) being used. Need test cases for "acc_get_default_async", too, and also Fortran ones. Generally, I envision test cases running a few "acc_get_cuda_stream" calls with relevant argument values, to see whether the expected queues/streames are being used. (Similar for other offload targets.) But I suppose we might again need to get clarified whether "acc_get_cuda_stream(acc_async_sync)", "acc_get_cuda_stream(acc_async_noval)", or "acc_get_cuda_stream(acc_async_default)" are actually valid calls (given that these argument values are not valid "async value"s), and these would then return the respective CUDA stream handles, different from the one returned for "acc_get_cuda_stream(0)" etc. That said, we can certainly implement it that way, because that's not against the specification. (Once available in trunk, we can also construct test cases using the OpenACC Profiling Interface for verifying such internal mapping.) Gr=C3=BC=C3=9Fe Thomas