From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by sourceware.org (Postfix) with ESMTPS id EDB983858D20 for ; Mon, 29 Jan 2024 15:53:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org EDB983858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=baylibre.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=baylibre.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org EDB983858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::334 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706543617; cv=none; b=fWtNaq41WzarL5fCqb5PrLkApi9XRd9oB9iThn78bj40mtCNsezXoky/WJ4XhqIiECoH0e0sCTR4nw/vnF1PiBHpaIzhH4I8MPNHotJP7LbsAoyUywva2NJxT1xbrC8p5DmxsmM/y5V0vq66D2gDn+cY1pFgwSwexPBfJbQmrSY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706543617; c=relaxed/simple; bh=GVk0uNI39C8+rPyXVFczmSK4YS+0B4SS+xGVYbZcG6M=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=W9aPDBen/wK/TuVsQwpCJIXAf/hh3bJIoy2Mq7whG1IsUdfgAtboNFdR4jkbitSpgGw/7VJrymLekZ9wzlnZDblC9A+tXKsIkJIn8XpCVtqjqPC36VEfaOyuuxFvthhK99Jb9C/GX3wBu7Rk9F1eaCjnQ6ja/H9GxLtsYJs0AmI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x334.google.com with SMTP id 5b1f17b1804b1-40ef207c654so14603885e9.1 for ; Mon, 29 Jan 2024 07:53:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=baylibre-com.20230601.gappssmtp.com; s=20230601; t=1706543613; x=1707148413; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:message-id:date:user-agent :references:in-reply-to:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PQnCBEqF4MoXiAV9hPsBkhjyLI0/AC+tTsBkiMOwWJk=; b=gs1CzkVnJTq545t0oHxvXqxsiihGIQmNqq0MTp1J+UBcPq9oUAGITr+83mHTf5FnEw MPgsfTFPyIzSkQ5obDq3kyJ+HjWFtTV6fZEw3iH4P5B+MI/lU0wdV0iLk/3PANfLsAdJ Fjv7/vbxAghakZhkyRC/YT1XviGRMkPbaH/QhRaUCJywcrW/kc62xtnvNUnAkmUdsEGX 6mfG+AyWxt8ot6c8Yo9/UCQ+vTf5i5BpduplAR6Nt4uYC/qwtbgkF9XP8OzJeHRlb7V2 woEUMO9BuhgSV1SqRYO4WylYQlVGD10nDL6vYVhg7TkcIcJBr7NEiDZ+7XtP8RD2CZD/ EBdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706543613; x=1707148413; h=content-transfer-encoding:mime-version:message-id:date:user-agent :references:in-reply-to:subject:cc:to:from:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=PQnCBEqF4MoXiAV9hPsBkhjyLI0/AC+tTsBkiMOwWJk=; b=i8GrNySQe+6UH1V4p7SntedSknE+4rBqIq5FKoye7U6f4ecSZzVMaEHcHJg3tSbSqo Bd7sJiIBqSOAUsrrgp7geqMlavTIJyOBocPAQmaDU2yRAuhWSO1VsTtXw6QnoQU3+Xxn HNwTedKmXW0B/jPxUGOLOx8ZcYVGf9LqkrW6/6dIkqpNYk3qysD7mDNfWHWcpwIIYqnZ NO5/eW3nqwdAxVJDNxi0UZLp2HgtAxBEhHdz8prYMWYcg8YK1syubh9vDgUp3ml62pmG YLPJX8GXPvvw/tqa9f8WvhlKYkt1NIoQ4qGjyYA2yyYDkIy5yFeVJEeVKyw2uZUGe6T6 4TnA== X-Gm-Message-State: AOJu0YyV22UL8AsxqkVMUatKjWYwurH7s0EMWbTamTa0ucsZxOmbawcq aRlfSLRgHwWm34zLEDgAz6thkVXJCDkH3SbJzQ/o2cZpRUCsHOmCYoTSalOG5nw= X-Google-Smtp-Source: AGHT+IEwkJT9JRAS17lcCPtHocUKWb9lEq9DJwXNQgSXRmyAyldf9mSnr3cYhwcF8iQRqsIFzO7+qA== X-Received: by 2002:a1c:4c16:0:b0:40e:6d25:d0e2 with SMTP id z22-20020a1c4c16000000b0040e6d25d0e2mr5937663wmf.27.1706543613589; Mon, 29 Jan 2024 07:53:33 -0800 (PST) Received: from euler.schwinge.homeip.net (p200300c8b7344200b5efa23283b9f09b.dip0.t-ipconnect.de. [2003:c8:b734:4200:b5ef:a232:83b9:f09b]) by smtp.gmail.com with ESMTPSA id f15-20020a056000036f00b00337d84efaf7sm8435646wrf.74.2024.01.29.07.53.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 07:53:33 -0800 (PST) From: Thomas Schwinge To: Tobias Burnus Cc: gcc-patches@gcc.gnu.org, Jakub Jelinek , Tom de Vries Subject: Re: [v2][patch] plugin/plugin-nvptx.c: Fix fini_device call when already shutdown [PR113513] In-Reply-To: <53a3c4e3-452c-4445-8d4a-be66dccc9e45@baylibre.com> References: <30b08783-4f6d-4ae1-9459-9391fc8e6262@baylibre.com> <53a3c4e3-452c-4445-8d4a-be66dccc9e45@baylibre.com> User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/29.1 (x86_64-pc-linux-gnu) Date: Mon, 29 Jan 2024 16:53:30 +0100 Message-ID: <875xzcf85h.fsf@euler.schwinge.ddns.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Tobias! On 2024-01-23T10:55:16+0100, Tobias Burnus wrote: > Slightly changed patch: > > nvptx_attach_host_thread_to_device now fails again with an error for=20 > CUDA_ERROR_DEINITIALIZED, except for GOMP_OFFLOAD_fini_device. > > I think it makes more sense that way. Agreed. > Tobias Burnus wrote: >> Testing showed that the libgomp.c/target-52.c failed with: >> >> libgomp: cuCtxGetDevice error: unknown cuda error >> >> libgomp: device finalization failed >> >> This testcase uses OMP_DISPLAY_ENV=3Dtrue and=20 >> OMP_TARGET_OFFLOAD=3Dmandatory, and those env vars matter, i.e. it only= =20 >> fails if dg-set-target-env-var is honored. >> >> If both env vars are set, the device initialization occurs earlier as=20 >> OMP_DEFAULT_DEVICE is shown due to the display-env env var and its=20 >> value (when target-offload-var is 'mandatory') might be either=20 >> 'omp_invalid_device' or '0'. >> >> It turned out that this had an effect on device finalization, which=20 >> caused CUDA to stop earlier than expected. This patch now handles this=20 >> case gracefully. For details, see the commit log message in the=20 >> attached patch and/or the PR. > plugin/plugin-nvptx.c: Fix fini_device call when already shutdown [PR1135= 13] > > The following issue was found when running libgomp.c/target-52.c with > nvptx offloading when the dg-set-target-env-var was honored. Curious, I've never seen this failure mode in my several different configurations. :-| > The issue > occurred for both -foffload=3Ddisable and with offloading configured when > an nvidia device is available. > > At the end of the program, the offloading parts are shutdown via two mean= s: > The callback registered via 'atexit (gomp_target_fini)' and - via code > generated in mkoffload, the '__attribute__((destructor)) fini' function > that calls GOMP_offload_unregister_ver. > > In normal processing, first gomp_target_fini is called - which then sets > GOMP_DEVICE_FINALIZED for the device - and later GOMP_offload_unregister_= ver, > but that's then because the state is GOMP_DEVICE_FINALIZED. > If both OMP_DISPLAY_ENV=3Dtrue and OMP_TARGET_OFFLOAD=3D"mandatory" are s= et, > the call omp_display_env already invokes gomp_init_targets_once, i.e. it > occurs earlier than usual and is invoked via __attribute__((constructor)) > initialize_env. > > For some unknown reasons, while this does not have an effect on the > order of the called plugin functions for initialization, it changes the > order of function calls for shutting down. Namely, when the two environme= nt > variables are set, GOMP_offload_unregister_ver is called now before > gomp_target_fini. Re "unknown reasons", isn't that indeed explained by the different 'atexit' function/'__attribute__((destructor))' sequencing, due to different order of 'atexit'/'__attribute__((constructor))' calls? I think I agree that, defensively, we should behave correctly in libgomp finitialization, no matter in which these calls occur. > And it seems as if CUDA regards a call to cuModuleUnload > (or unloading the last module?) as indication that the device context sho= uld > be destroyed - or, at least, afterwards calling cuCtxGetDevice will return > CUDA_ERROR_DEINITIALIZED. However, this I don't understand -- but would like to. Are you saying that for: --- libgomp/plugin/plugin-nvptx.c +++ libgomp/plugin/plugin-nvptx.c @@ -1556,8 +1556,16 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned ver= sion, const void *target_data) if (image->target_data =3D=3D target_data) { *prev_p =3D image->next; - if (CUDA_CALL_NOCHECK (cuModuleUnload, image->module) !=3D CUDA_SUCCE= SS) + CUresult r; + r =3D CUDA_CALL_NOCHECK (cuModuleUnload, image->module); + GOMP_PLUGIN_debug (0, "%s: cuModuleUnload: %s\n", __FUNCTION__, cuda_= error (r)); + if (r !=3D CUDA_SUCCESS) ret =3D false; + CUdevice dev_; + r =3D CUDA_CALL_NOCHECK (cuCtxGetDevice, &dev_); + GOMP_PLUGIN_debug (0, "%s: cuCtxGetDevice: %s\n", __FUNCTION__, cuda_= error (r)); + GOMP_PLUGIN_debug (0, "%s: dev_=3D%d, dev->dev=3D%d\n", __FUNCTION__,= dev_, dev->dev); + assert (dev_ =3D=3D dev->dev); free (image->fns); free (image); break; ..., you're seeing an error for 'libgomp.c/target-52.c' with 'env OMP_TARGET_OFFLOAD=3Dmandatory OMP_DISPLAY_ENV=3Dtrue'? I get: GOMP_OFFLOAD_unload_image: cuModuleUnload: no error GOMP_OFFLOAD_unload_image: cuCtxGetDevice: no error GOMP_OFFLOAD_unload_image: dev_=3D0, dev->dev=3D0 Or, is something else happening in between the 'cuModuleUnload' and your reportedly failing 'cuCtxGetDevice'? Re your PR113513 details, I don't see how your failure mode could be related to (a) the PTX code ('--with-arch=3Dsm_80'), or the GPU hardware ("NVIDIA RTX A1000 6GB") (..., unless the Nvidia Driver is doing "funny" things, of course...), so could this possibly be due to a recent change in the CUDA Driver/Nvidia Driver? You say "CUDA Version: 12.3", but which which Nvidia Driver version? The latest I've now tested are: Driver Version: 525.147.05 CUDA Version: 12.0 Driver Version: 535.154.05 CUDA Version: 12.2 I'll re-try with a more recent version. > As the previous code in nvptx_attach_host_thread_to_device wasn't expecti= ng > that result, it called > GOMP_PLUGIN_error ("cuCtxGetDevice error: %s", cuda_error (r)); > causing a fatal error of the program. > > This commit handles now CUDA_ERROR_DEINITIALIZED in a special way such > that GOMP_OFFLOAD_fini_device just works. I'd like to please defer that one until we understand the actual origin of the misbehavior. > When reading the code, the following was observed in addition: > When gomp_fini_device is called, it invokes goacc_fini_asyncqueues > to ensure that the queue is emptied. It seems to make sense to do > likewise for GOMP_offload_unregister_ver, which this commit does in > addition. I don't understand why offload image unregistration (a) should trigger 'goacc_fini_asyncqueues', and (b) how that relates to PR113513? Gr=C3=BC=C3=9Fe Thomas