From: Thomas Schwinge <thomas@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>,
Chung-Lin Tang <cltang@codesourcery.com>,
"Tom de Vries" <tdevries@suse.de>
Subject: nvptx: Avoid deadlock in 'cuStreamAddCallback' callback, error case (was: [PATCH 6/6, OpenACC, libgomp] Async re-work, nvptx changes)
Date: Thu, 12 Jan 2023 14:51:19 +0100 [thread overview]
Message-ID: <87zgan6eug.fsf@euler.schwinge.homeip.net> (raw)
In-Reply-To: <9523b49a-0454-e0a9-826d-5eeec2a8c973@mentor.com>
[-- Attachment #1: Type: text/plain, Size: 2597 bytes --]
Hi Chung-Lin, Tom!
It's been a while:
On 2018-09-25T21:11:58+0800, Chung-Lin Tang <chunglin_tang@mentor.com> wrote:
> [...] NVPTX/CUDA-specific implementation
> of the new-style goacc_asyncqueues.
In an OpenACC 'async' setting, where the device kernel (expectedly)
crashes because of "an illegal memory access was encountered", I'm
running into a deadlock here:
> --- a/libgomp/plugin/plugin-nvptx.c
> +++ b/libgomp/plugin/plugin-nvptx.c
> +static void
> +cuda_callback_wrapper (CUstream stream, CUresult res, void *ptr)
> +{
> + if (res != CUDA_SUCCESS)
> + GOMP_PLUGIN_fatal ("%s error: %s", __FUNCTION__, cuda_error (res));
> + struct nvptx_callback *cb = (struct nvptx_callback *) ptr;
> + cb->fn (cb->ptr);
> + free (ptr);
> +}
> +
> +void
> +GOMP_OFFLOAD_openacc_async_queue_callback (struct goacc_asyncqueue *aq,
> + void (*callback_fn)(void *),
> + void *userptr)
> +{
> + struct nvptx_callback *b = GOMP_PLUGIN_malloc (sizeof (*b));
> + b->fn = callback_fn;
> + b->ptr = userptr;
> + b->aq = aq;
> + CUDA_CALL_ASSERT (cuStreamAddCallback, aq->cuda_stream,
> + cuda_callback_wrapper, (void *) b, 0);
> +}
In my case, 'cuda_callback_wrapper' (expectedly) gets invoked with
'res != CUDA_SUCCESS' ("an illegal memory access was encountered").
When we invoke 'GOMP_PLUGIN_fatal', this attempts to shut down the device
(..., which deadlocks); that's generally problematic: per
<https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g613d97a277d7640f4cb1c03bd51c2483>
"'cuStreamAddCallback' [...] Callbacks must not make any CUDA API calls".
Given that eventually we must reach a host/device synchronization point
(latest when the device is shut down at program termination), and the
non-'CUDA_SUCCESS' will be upheld until then, it does seem safe to
replace this 'GOMP_PLUGIN_fatal' with 'GOMP_PLUGIN_error' as per the
"nvptx: Avoid deadlock in 'cuStreamAddCallback' callback, error case"
attached. OK to push?
(Might we even skip 'GOMP_PLUGIN_error' here, understanding that the
error will be caught and reported at the next host/device synchronization
point? But I've not verified that.)
Grüße
Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-Avoid-deadlock-in-cuStreamAddCallback-callback.patch --]
[-- Type: text/x-diff, Size: 1634 bytes --]
From b7ddcc0807967750e3c884326ed4c53c05cde81f Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 12 Jan 2023 14:39:46 +0100
Subject: [PATCH] nvptx: Avoid deadlock in 'cuStreamAddCallback' callback,
error case
When we invoke 'GOMP_PLUGIN_fatal', this attempts to shut down the device
(..., which may deadlock); that's generally problematic: per
<https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g613d97a277d7640f4cb1c03bd51c2483>
"'cuStreamAddCallback' [...] Callbacks must not make any CUDA API calls".
Given that eventually we must reach a host/device synchronization point
(latest when the device is shut down at program termination), and the
non-'CUDA_SUCCESS' will be upheld until then, it does seem safe to
replace this 'GOMP_PLUGIN_fatal' with 'GOMP_PLUGIN_error'.
libgomp/
* plugin/plugin-nvptx.c (cuda_callback_wrapper): Invoke
'GOMP_PLUGIN_error' instead of 'GOMP_PLUGIN_fatal'.
---
libgomp/plugin/plugin-nvptx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 395639537e83..cdb3d435bdc8 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1927,7 +1927,7 @@ static void
cuda_callback_wrapper (CUstream stream, CUresult res, void *ptr)
{
if (res != CUDA_SUCCESS)
- GOMP_PLUGIN_fatal ("%s error: %s", __FUNCTION__, cuda_error (res));
+ GOMP_PLUGIN_error ("%s error: %s", __FUNCTION__, cuda_error (res));
struct nvptx_callback *cb = (struct nvptx_callback *) ptr;
cb->fn (cb->ptr);
free (ptr);
--
2.39.0
next prev parent reply other threads:[~2023-01-12 13:51 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-25 13:13 [PATCH 6/6, OpenACC, libgomp] Async re-work, nvptx changes Chung-Lin Tang
2018-10-05 14:07 ` Tom de Vries
2018-12-06 20:57 ` Thomas Schwinge
2018-12-10 10:02 ` Chung-Lin Tang
2018-12-11 13:50 ` [PATCH 6/6, OpenACC, libgomp] Async re-work, nvptx changes (revised, v2) Chung-Lin Tang
2018-12-18 15:07 ` [PATCH 6/6, OpenACC, libgomp] Async re-work, nvptx changes (revised, v3) Chung-Lin Tang
2023-01-12 13:51 ` Thomas Schwinge [this message]
2023-01-13 13:17 ` nvptx: Avoid deadlock in 'cuStreamAddCallback' callback, error case (was: [PATCH 6/6, OpenACC, libgomp] Async re-work, nvptx changes) Chung-Lin Tang
2023-01-13 13:59 ` Thomas Schwinge
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zgan6eug.fsf@euler.schwinge.homeip.net \
--to=thomas@codesourcery.com \
--cc=cltang@codesourcery.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=tdevries@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).