On 2018/12/19 5:03 AM, Thomas Schwinge wrote: > Hi Chung-Lin! > > On Tue, 18 Dec 2018 23:06:38 +0800, Chung-Lin Tang wrote: >> this part includes some of the lookup_goacc_asyncqueue fixes we talked about. >> I am still thinking about how the queue lock problem should really be solved, so regard >> this patch as just fixing some of the problems. Hi Thomas, This is my solution to the queue lock stuff we talked about. I've only attached a diff to be applied atop of the existing changes, though that may actually be easier to review. Note that this is still in testing, which means it hasn't been tested :P, but I'm posting to see if you have time to give it a look before the holidays. Having the need for a lock on the async queues is quite irritating, especially when the structure needed for managing them is quite simple. Therefore, lets do away the need for locks entirely. This patch makes the asyncqueue part of the device->openacc.async managed by lock-free atomic operations; almost all of the complexity is contained in lookup_goacc_asyncqueue(), so it should be not too complex. A descriptor and the queue array is allocated/exchanged atomically when more storage is required, while in the common case a simple lookup is enough. The fact that we manage asyncqueues by only expanding and never destroying asyncqueues during the device lifetime also simplifies many things. The current implementation may be not that optimized and clunky in some cases, but I think this should be the way to implement what should be a fairly simple asyncqueue array and active list. I'll update more on the status as testing proceeds. (and about the other corners you noticed in the last mail, I'll get to that later...) Thanks, Chung-Lin > Sure, thanks. > > Two comments, though: > >> --- libgomp/oacc-async.c (revision 267226) >> +++ libgomp/oacc-async.c (working copy) > >> +attribute_hidden struct goacc_asyncqueue * >> +lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async) >> +{ >> + /* The special value acc_async_noval (-1) maps to the thread-specific >> + default async stream. */ >> + if (async == acc_async_noval) >> + async = thr->default_async; >> + >> + if (async == acc_async_sync) >> + return NULL; >> + >> + if (async < 0) >> + gomp_fatal ("bad async %d", async); >> + >> + struct gomp_device_descr *dev = thr->dev; >> + >> + gomp_mutex_lock (&dev->openacc.async.lock); >> + >> + if (!create >> + && (async >= dev->openacc.async.nasyncqueue >> + || !dev->openacc.async.asyncqueue[async])) >> + { >> + gomp_mutex_unlock (&dev->openacc.async.lock); >> + return NULL; >> + } >> + >> + if (async >= dev->openacc.async.nasyncqueue) >> + { >> + int diff = async + 1 - dev->openacc.async.nasyncqueue; >> + dev->openacc.async.asyncqueue >> + = gomp_realloc (dev->openacc.async.asyncqueue, >> + sizeof (goacc_aq) * (async + 1)); >> + memset (dev->openacc.async.asyncqueue + dev->openacc.async.nasyncqueue, >> + 0, sizeof (goacc_aq) * diff); >> + dev->openacc.async.nasyncqueue = async + 1; >> + } >> + >> + if (!dev->openacc.async.asyncqueue[async]) >> + { >> + dev->openacc.async.asyncqueue[async] = dev->openacc.async.construct_func (); >> + >> + if (!dev->openacc.async.asyncqueue[async]) >> + { >> + gomp_mutex_unlock (&dev->openacc.async.lock); >> + gomp_fatal ("async %d creation failed", async); >> + } > > That will now always fail for host fallback, where > "host_openacc_async_construct" just always does "return NULL". > > Actually, if the device doesn't support asyncqueues, this whole function > should turn into some kind of no-op, so that we don't again and again try > to create a new one for every call to "lookup_goacc_asyncqueue". > > I'm attaching one possible solution. I think it's fine to assume that > the majority of devices will support asyncqueues, and for those that > don't, this is just a one-time overhead per async-argument. So, no > special handling required in "lookup_goacc_asyncqueue". > >> + /* Link new async queue into active list. */ >> + goacc_aq_list n = gomp_malloc (sizeof (struct goacc_asyncqueue_list)); >> + n->aq = dev->openacc.async.asyncqueue[async]; >> + n->next = dev->openacc.async.active; >> + dev->openacc.async.active = n; >> + } >> + gomp_mutex_unlock (&dev->openacc.async.lock); > > You still need to keep "async" locked during... > >> + return dev->openacc.async.asyncqueue[async]; > > ... this dereference. > >> +} > > > Oh, and: > >> --- libgomp/oacc-plugin.c (revision 267226) >> +++ libgomp/oacc-plugin.c (working copy) >> @@ -31,14 +31,10 @@ >> #include "oacc-int.h" >> >> void >> -GOMP_PLUGIN_async_unmap_vars (void *ptr, int async) >> +GOMP_PLUGIN_async_unmap_vars (void *ptr __attribute__((unused)), >> + int async __attribute__((unused))) >> { >> - struct target_mem_desc *tgt = ptr; >> - struct gomp_device_descr *devicep = tgt->device_descr; >> - >> - devicep->openacc.async_set_async_func (async); >> - gomp_unmap_vars (tgt, true); >> - devicep->openacc.async_set_async_func (acc_async_sync); >> + gomp_fatal ("invalid plugin function"); >> } > > Please add a comment here, something like: "Obsolete entry point, no > longer used." > > > Grüße > Thomas > >