public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Thomas Schwinge <tschwinge@baylibre.com>
To: Tobias Burnus <tburnus@baylibre.com>
Cc: gcc-patches@gcc.gnu.org, Jakub Jelinek <jakub@redhat.com>,
	Tom de Vries <tdevries@suse.de>
Subject: Re: [v2][patch] plugin/plugin-nvptx.c: Fix fini_device call when already shutdown [PR113513]
Date: Mon, 29 Jan 2024 16:53:30 +0100	[thread overview]
Message-ID: <875xzcf85h.fsf@euler.schwinge.ddns.net> (raw)
In-Reply-To: <53a3c4e3-452c-4445-8d4a-be66dccc9e45@baylibre.com>

Hi Tobias!

On 2024-01-23T10:55:16+0100, Tobias Burnus <tburnus@baylibre.com> wrote:
> Slightly changed patch:
>
> nvptx_attach_host_thread_to_device now fails again with an error for 
> CUDA_ERROR_DEINITIALIZED, except for GOMP_OFFLOAD_fini_device.
>
> I think it makes more sense that way.

Agreed.

> Tobias Burnus wrote:
>> Testing showed that the libgomp.c/target-52.c failed with:
>>
>> libgomp: cuCtxGetDevice error: unknown cuda error
>>
>> libgomp: device finalization failed
>>
>> This testcase uses OMP_DISPLAY_ENV=true and 
>> OMP_TARGET_OFFLOAD=mandatory, and those env vars matter, i.e. it only 
>> fails if dg-set-target-env-var is honored.
>>
>> If both env vars are set, the device initialization occurs earlier as 
>> OMP_DEFAULT_DEVICE is shown due to the display-env env var and its 
>> value (when target-offload-var is 'mandatory') might be either 
>> 'omp_invalid_device' or '0'.
>>
>> It turned out that this had an effect on device finalization, which 
>> caused CUDA to stop earlier than expected. This patch now handles this 
>> case gracefully. For details, see the commit log message in the 
>> attached patch and/or the PR.

> plugin/plugin-nvptx.c: Fix fini_device call when already shutdown [PR113513]
>
> The following issue was found when running libgomp.c/target-52.c with
> nvptx offloading when the dg-set-target-env-var was honored.

Curious, I've never seen this failure mode in my several different
configurations.  :-|

> The issue
> occurred for both -foffload=disable and with offloading configured when
> an nvidia device is available.
>
> At the end of the program, the offloading parts are shutdown via two means:
> The callback registered via 'atexit (gomp_target_fini)' and - via code
> generated in mkoffload, the '__attribute__((destructor)) fini' function
> that calls GOMP_offload_unregister_ver.
>
> In normal processing, first gomp_target_fini is called - which then sets
> GOMP_DEVICE_FINALIZED for the device - and later GOMP_offload_unregister_ver,
> but that's then because the state is GOMP_DEVICE_FINALIZED.
> If both OMP_DISPLAY_ENV=true and OMP_TARGET_OFFLOAD="mandatory" are set,
> the call omp_display_env already invokes gomp_init_targets_once, i.e. it
> occurs earlier than usual and is invoked via __attribute__((constructor))
> initialize_env.
>
> For some unknown reasons, while this does not have an effect on the
> order of the called plugin functions for initialization, it changes the
> order of function calls for shutting down. Namely, when the two environment
> variables are set, GOMP_offload_unregister_ver is called now before
> gomp_target_fini.

Re "unknown reasons", isn't that indeed explained by the different
'atexit' function/'__attribute__((destructor))' sequencing, due to
different order of 'atexit'/'__attribute__((constructor))' calls?

I think I agree that, defensively, we should behave correctly in libgomp
finitialization, no matter in which these calls occur.

> And it seems as if CUDA regards a call to cuModuleUnload
> (or unloading the last module?) as indication that the device context should
> be destroyed - or, at least, afterwards calling cuCtxGetDevice will return
> CUDA_ERROR_DEINITIALIZED.

However, this I don't understand -- but would like to.  Are you saying
that for:

    --- libgomp/plugin/plugin-nvptx.c
    +++ libgomp/plugin/plugin-nvptx.c
    @@ -1556,8 +1556,16 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned version, const void *target_data)
         if (image->target_data == target_data)
           {
     	*prev_p = image->next;
    -	if (CUDA_CALL_NOCHECK (cuModuleUnload, image->module) != CUDA_SUCCESS)
    +	CUresult r;
    +	r = CUDA_CALL_NOCHECK (cuModuleUnload, image->module);
    +	GOMP_PLUGIN_debug (0, "%s: cuModuleUnload: %s\n", __FUNCTION__, cuda_error (r));
    +	if (r != CUDA_SUCCESS)
     	  ret = false;
    +	CUdevice dev_;
    +	r = CUDA_CALL_NOCHECK (cuCtxGetDevice, &dev_);
    +	GOMP_PLUGIN_debug (0, "%s: cuCtxGetDevice: %s\n", __FUNCTION__, cuda_error (r));
    +	GOMP_PLUGIN_debug (0, "%s: dev_=%d, dev->dev=%d\n", __FUNCTION__, dev_, dev->dev);
    +	assert (dev_ == dev->dev);
     	free (image->fns);
     	free (image);
     	break;

..., you're seeing an error for 'libgomp.c/target-52.c' with
'env OMP_TARGET_OFFLOAD=mandatory OMP_DISPLAY_ENV=true'?  I get:

    GOMP_OFFLOAD_unload_image: cuModuleUnload: no error
    GOMP_OFFLOAD_unload_image: cuCtxGetDevice: no error
    GOMP_OFFLOAD_unload_image: dev_=0, dev->dev=0

Or, is something else happening in between the 'cuModuleUnload' and your
reportedly failing 'cuCtxGetDevice'?

Re your PR113513 details, I don't see how your failure mode could be
related to (a) the PTX code ('--with-arch=sm_80'), or the GPU hardware
("NVIDIA RTX A1000 6GB") (..., unless the Nvidia Driver is doing "funny"
things, of course...), so could this possibly be due to a recent change
in the CUDA Driver/Nvidia Driver?  You say "CUDA Version: 12.3", but
which which Nvidia Driver version?  The latest I've now tested are:

    Driver Version: 525.147.05   CUDA Version: 12.0
    Driver Version: 535.154.05   CUDA Version: 12.2

I'll re-try with a more recent version.

> As the previous code in nvptx_attach_host_thread_to_device wasn't expecting
> that result, it called
>   GOMP_PLUGIN_error ("cuCtxGetDevice error: %s", cuda_error (r));
> causing a fatal error of the program.
>
> This commit handles now CUDA_ERROR_DEINITIALIZED in a special way such
> that GOMP_OFFLOAD_fini_device just works.

I'd like to please defer that one until we understand the actual origin
of the misbehavior.


> When reading the code, the following was observed in addition:
> When gomp_fini_device is called, it invokes goacc_fini_asyncqueues
> to ensure that the queue is emptied.  It seems to make sense to do
> likewise for GOMP_offload_unregister_ver, which this commit does in
> addition.

I don't understand why offload image unregistration (a) should trigger
'goacc_fini_asyncqueues', and (b) how that relates to PR113513?


Grüße
 Thomas

  reply	other threads:[~2024-01-29 15:53 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-22 19:45 [patch] " Tobias Burnus
2024-01-23  9:55 ` [v2][patch] " Tobias Burnus
2024-01-29 15:53   ` Thomas Schwinge [this message]
2024-01-29 17:07     ` Tobias Burnus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875xzcf85h.fsf@euler.schwinge.ddns.net \
    --to=tschwinge@baylibre.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=tburnus@baylibre.com \
    --cc=tdevries@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).