From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=60Ec=5K=mentor.com=Thomas_Schwinge@sourceware.org>
Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252])
	by sourceware.org (Postfix) with ESMTPS id 4FFC13858C39
	for <gcc-patches@gcc.gnu.org>; Fri, 13 Jan 2023 13:59:15 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4FFC13858C39
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.97,214,1669104000"; 
   d="scan'208";a="93372454"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
  by esa4.mentor.iphmx.com with ESMTP; 13 Jan 2023 05:59:14 -0800
IronPort-SDR: +zrXUSnsVLoAjIiKLeNh2NPNC//vn/4QvyFefWt8NXBbu1z3maeBMXk6qW8Gv8EZbqn9geyw2D
 EPGR1e/zMi4dkW6ad+696s/kDxkow0+jh3ZBUH57nP19Cf5wW4Rw2loucR+yBT2qCQVRvlSn6k
 NC+i+7lYBCf6Ee/ZWlMv56fE7C0FJJdeaXNKX2qKYmZPG1RtRSamHrL8ACVAnU7XBKSIeMPu4L
 0g1JAmq8zclJE2lKvipA2qZdZLv8KDiCcQQCiGnoL70QfjSuNZFGwyvpZWrWZiVVizjL6c5Dc0
 wb0=
From: Thomas Schwinge <thomas@codesourcery.com>
To: Chung-Lin Tang <cltang@codesourcery.com>, Tom de Vries <tdevries@suse.de>
CC: <gcc-patches@gcc.gnu.org>
Subject: Re: nvptx: Avoid deadlock in 'cuStreamAddCallback' callback, error
 case (was: [PATCH 6/6, OpenACC, libgomp] Async re-work, nvptx changes)
In-Reply-To: <e4cb68a2-d7f2-a0bd-1133-f4a8d4b62728@siemens.com>
References: <9523b49a-0454-e0a9-826d-5eeec2a8c973@mentor.com>
 <87zgan6eug.fsf@euler.schwinge.homeip.net>
 <e4cb68a2-d7f2-a0bd-1133-f4a8d4b62728@siemens.com>
User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/28.2
 (x86_64-pc-linux-gnu)
Date: Fri, 13 Jan 2023 14:59:06 +0100
Message-ID: <87a62mfsd1.fsf@euler.schwinge.homeip.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To
 svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10)
X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Hi!

On 2023-01-13T21:17:43+0800, Chung-Lin Tang <chunglin.tang@siemens.com> wro=
te:
> On 2023/1/12 9:51 PM, Thomas Schwinge wrote:
>> In my case, 'cuda_callback_wrapper' (expectedly) gets invoked with
>> 'res !=3D CUDA_SUCCESS' ("an illegal memory access was encountered").
>> When we invoke 'GOMP_PLUGIN_fatal', this attempts to shut down the devic=
e
>> (..., which deadlocks); that's generally problematic: per
>> https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#gr=
oup__CUDA__STREAM_1g613d97a277d7640f4cb1c03bd51c2483
>> "'cuStreamAddCallback' [...] Callbacks must not make any CUDA API calls"=
.
>
> I remember running into this myself when first creating this async suppor=
t
> (IIRC in my case it was cuFree()-ing something) yet you've found another =
mistake here! :)

;-)

>> Given that eventually we must reach a host/device synchronization point
>> (latest when the device is shut down at program termination), and the
>> non-'CUDA_SUCCESS' will be upheld until then, it does seem safe to
>> replace this 'GOMP_PLUGIN_fatal' with 'GOMP_PLUGIN_error' as per the
>> "nvptx: Avoid deadlock in 'cuStreamAddCallback' callback, error case"
>> attached.  OK to push?
>
> I think this patch is fine. Actual approval powers are your's or Tom's :)

ACK.  I'll let it sit for some more time before 'git push'.


>> (Might we even skip 'GOMP_PLUGIN_error' here, understanding that the
>> error will be caught and reported at the next host/device synchronizatio=
n
>> point?  But I've not verified that.)
>
> Actually, the CUDA driver API docs are a bit vague on what exactly this
> CUresult arg to the callback actually means. The 'res !=3D CUDA_SUCCESS' =
handling
> here was basically just generic handling.

I suppose this really is just for its own use: for example, skip certain
things in presence of pre-existing error?

> I am not really sure what is the
> true right thing to do here (is the error still retained by CUDA after th=
e callback
> completes?)

Indeed the latter is what I do observe:

      GOMP_OFFLOAD_openacc_async_exec: prepare mappings
      nvptx_exec: kernel main$_omp_fn$0: launch gangs=3D1, workers=3D1, vec=
tors=3D32
      nvptx_exec: kernel main$_omp_fn$0: finished

    libgomp: cuMemcpyDtoHAsync_v2 error: an illegal memory access was encou=
ntered

    libgomp:
    libgomp: Copying of dev object [0x7f9a45000000..0x7f9a45000028) to host=
 object [0x1d89350..0x1d89378) failed
    cuda_callback_wrapper error: an illegal memory access was encountered

    libgomp: cuStreamDestroy error: an illegal memory access was encountere=
d

    libgomp: cuMemFree_v2 error: an illegal memory access was encountered

    libgomp: device finalization failed

Here, after the 'async' OpenACC 'parallel' a 'copyout' gets enqueued,
thus 'cuMemcpyDtoHAsync_v2', which is where we first get the device-side
fault reported (all as expected).  Then -- CUDA-internally
multi-threaded, I suppose (thus the mangled printing) -- we print the
'Copying [...] failed' error plus get 'cuda_callback_wrapper' invoked.
This receives the previous 'CUresult' as seen, and then the error is
still visible at device shut-down, as shown by the following reports.
(This makes sense, as the 'CUcontext' does not magically recover.)

Also, per
<https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#grou=
p__CUDA__STREAM_1g613d97a277d7640f4cb1c03bd51c2483>,
"In the event of a device error, all subsequently executed callbacks will
receive an appropriate 'CUresult'".

But again: I'm perfectly fine with the repeated error reporting.


Gr=C3=BC=C3=9Fe
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201=
, 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3=
=A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf=
t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955