On 2018/12/10 6:02 PM, Chung-Lin Tang wrote:
> On 2018/12/7 04:57 AM, Thomas Schwinge wrote>> --- a/libgomp/plugin/plugin-nvptx.c
>>> +++ b/libgomp/plugin/plugin-nvptx.c
>>
>>> +struct goacc_asyncqueue *
>>> +GOMP_OFFLOAD_openacc_async_construct (void)
>>> +{
>>> +Â  struct goacc_asyncqueue *aq
>>> +Â Â Â  = GOMP_PLUGIN_malloc (sizeof (struct goacc_asyncqueue));
>>> +Â  aq->cuda_stream = NULL;
>>> +Â  CUDA_CALL_ASSERT (cuStreamCreate, &aq->cuda_stream, CU_STREAM_DEFAULT);
>>
>> Curiously (this was the same in the code before): does this have to be
>> "CU_STREAM_DEFAULT" instead of "CU_STREAM_NON_BLOCKING", because we want
>> to block anything from running in parallel with "acc_async_sync" GPU
>> kernels, that use the "NULL" stream?Â  (Not asking you to change this now,
>> but I wonder if this is overly strict?)
> 
> IIUC, this non-blocking only pertains to the "Legacy Default Stream" in CUDA, which we're pretty much ignoring; we should be using the newer per-thread default stream model. We could review this issue later.
> 
>>> +Â  if (aq->cuda_stream == NULL)
>>> +Â Â Â  GOMP_PLUGIN_fatal ("CUDA stream create NULL\n");
>>
>> Can this actually happen, given the "CUDA_CALL_ASSERT" usage above?
> 
> Hmm, yeah I think this is superfluous too...
> 
>>> +Â  CUDA_CALL_ASSERT (cuStreamSynchronize, aq->cuda_stream);
>>
>> Why is the synchronization needed here?
> 
> I don't remember, could likely be something added during debug.
> I'll remove this and test if things are okay.

I have removed the above seemingly unneeded lines and re-tested, appears okay.
Also the formerly attached version seemed to for some reason had many conflicts
with older code, all resolved in this v2 nvptx part.

Thanks,
Chung-Lin