On 2018/12/11 9:50 PM, Chung-Lin Tang wrote:
> OnÂ 2018/12/10Â 6:02Â PM,Â Chung-LinÂ TangÂ wrote:
>> OnÂ 2018/12/7Â 04:57Â AM,Â ThomasÂ SchwingeÂ wrote>>Â ---Â a/libgomp/plugin/plugin-nvptx.c
>>>> +++Â b/libgomp/plugin/plugin-nvptx.c
>>>
>>>> +structÂ goacc_asyncqueueÂ *
>>>> +GOMP_OFFLOAD_openacc_async_constructÂ (void)
>>>> +{
>>>> +Â Â structÂ goacc_asyncqueueÂ *aq
>>>> +Â Â Â Â =Â GOMP_PLUGIN_mallocÂ (sizeofÂ (structÂ goacc_asyncqueue));
>>>> +Â Â aq->cuda_streamÂ =Â NULL;
>>>> +Â Â CUDA_CALL_ASSERTÂ (cuStreamCreate,Â &aq->cuda_stream,Â CU_STREAM_DEFAULT);
>>>
>>> CuriouslyÂ (thisÂ wasÂ theÂ sameÂ inÂ theÂ codeÂ before):Â doesÂ thisÂ haveÂ toÂ be
>>> "CU_STREAM_DEFAULT"Â insteadÂ ofÂ "CU_STREAM_NON_BLOCKING",Â becauseÂ weÂ want
>>> toÂ blockÂ anythingÂ fromÂ runningÂ inÂ parallelÂ withÂ "acc_async_sync"Â GPU
>>> kernels,Â thatÂ useÂ theÂ "NULL"Â stream?Â Â (NotÂ askingÂ youÂ toÂ changeÂ thisÂ now,
>>> butÂ IÂ wonderÂ ifÂ thisÂ isÂ overlyÂ strict?)
>>
>> IIUC,Â thisÂ non-blockingÂ onlyÂ pertainsÂ toÂ theÂ "LegacyÂ DefaultÂ Stream"Â inÂ CUDA,Â whichÂ we'reÂ prettyÂ muchÂ ignoring;Â weÂ shouldÂ beÂ usingÂ theÂ newerÂ per-threadÂ defaultÂ streamÂ model.Â WeÂ couldÂ reviewÂ thisÂ issueÂ later.
>>
>>>> +Â Â ifÂ (aq->cuda_streamÂ ==Â NULL)
>>>> +Â Â Â Â GOMP_PLUGIN_fatalÂ ("CUDAÂ streamÂ createÂ NULL\n");
>>>
>>> CanÂ thisÂ actuallyÂ happen,Â givenÂ theÂ "CUDA_CALL_ASSERT"Â usageÂ above?
>>
>> Hmm,Â yeahÂ IÂ thinkÂ thisÂ isÂ superfluousÂ too...
>>
>>>> +Â Â CUDA_CALL_ASSERTÂ (cuStreamSynchronize,Â aq->cuda_stream);
>>>
>>> WhyÂ isÂ theÂ synchronizationÂ neededÂ here?
>>
>> IÂ don'tÂ remember,Â couldÂ likelyÂ beÂ somethingÂ addedÂ duringÂ debug.
>> I'llÂ removeÂ thisÂ andÂ testÂ ifÂ thingsÂ areÂ okay.
> 
> IÂ haveÂ removedÂ theÂ aboveÂ seeminglyÂ unneededÂ linesÂ andÂ re-tested,Â appearsÂ okay.
> AlsoÂ theÂ formerlyÂ attachedÂ versionÂ seemedÂ toÂ forÂ someÂ reasonÂ hadÂ manyÂ conflicts
> withÂ olderÂ code,Â allÂ resolvedÂ inÂ thisÂ v2Â nvptxÂ part.

GOMP_OFFLOAD_openacc_async_construct is updated to return NULL for failure,
there's also some adjustments in oacc-async.c, coming next.

Chung-Lin