From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by sourceware.org (Postfix) with ESMTPS id 9A98B3858D38 for ; Fri, 8 Mar 2024 15:58:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9A98B3858D38 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=baylibre.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=baylibre.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9A98B3858D38 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::42f ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709913516; cv=none; b=yBi9vcMKn45rT6j+3m2xIoSutdb0vCwO6LcfA8nQF+t/8pCG10FZB9aKhLZmDwZXrp5/PTsjqZDbR6rQggtrlEJkk7klsBsGtb68HgLVRMC4ncd4EWeXo9sbSW0vNAI5mFgHLDEarzsXIQ/WnLpxgoNDpYWAM0OdnHXM4p26pAM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709913516; c=relaxed/simple; bh=kL/jxmDSr3j3/OGfpCcqfhmgtKWA7MLoRjH6FZeNaUw=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=UJ1d1RffiwsrDiUTzD/es1vV87/i1jI3cSx/nUn+bIkuLa9CirrBstKl6HUFR7VNtpa1RNZHTQgjaenBpfXAVp2L8VWaxQ9pv0MiqJhjeGvmWSFCI4UZ9WAH/vRIsycQRE5riD5F4UrumdZKjFL+KudoBvKQ9AgvnyQxct3RYSM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wr1-x42f.google.com with SMTP id ffacd0b85a97d-33e6aca1ca9so1445213f8f.2 for ; Fri, 08 Mar 2024 07:58:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=baylibre-com.20230601.gappssmtp.com; s=20230601; t=1709913512; x=1710518312; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:message-id:date:user-agent :references:in-reply-to:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0ueORcSnjM9H7CQcvzcqqTc+2UngdUPobynsX0gosEs=; b=UvPOHNUMGZEZvZgv5l6BSEv8DL5G7XQLHUUwtKzrG1TCybSyqIJPvUXcwobRJZ97rp zXvLBLw/S5omj7xbiSlSUoPhtNOhH9vyw4r8P1rS9zNLtMIfQRCj0vfRv+qTcGs58OzJ IKkKmdPUiKK4MfAzvjhLHrXrNt4o8ccAxtGJVBLwzLb+Bmw8DS7NsPSu/KLXj+TGGbzT 1sEtmVexEbHJkeSI9ZI+6a5Ppg5r/UqFXFlshW78jwfF0j77FYIQjLqGD+VrFN0e1b7W rIq09NAOnsXZFovXbDYCKfMoEG4r9S+RImohnGuQ5WMwNqzFv4GP7nUZxAwsIZzzO473 8Heg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709913512; x=1710518312; h=content-transfer-encoding:mime-version:message-id:date:user-agent :references:in-reply-to:subject:cc:to:from:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=0ueORcSnjM9H7CQcvzcqqTc+2UngdUPobynsX0gosEs=; b=DW2KUQYty0UX4cDhYlF/89o3lgWZf3SGpeksp+iSlfUR5FHgICohpjgzXbAUIFgkIw FNzkdSRamH/N+ZduVCej6wku+RQG0NXg24LvPhHvYOabJSGr3GxsyY3r0E2nIw7YnKaX lgBuEguGqX0EmtmJ2LmkfJIwrZmW4eZxOcadsUzvQWxawDQv2BjHfEedrph722d9/tsG arBBIdoWrvQE3GTGcANgGO+gVFHXUAJQEfXHLmz5aZ9ni+6AXem7CiM4yQ986hFifh4K DuzmW8uMtKd4F4IO7IEn2zDpu9lOzGAAxO0FgpCjJEV6Gm74v9ck5W4y3ewjBuyvct91 0NbA== X-Gm-Message-State: AOJu0YyHOmMTXOuuUTgje1s6VckLxxSWkZ9lnIBeh5xSEg5asGgqOQvZ oeLSf+WipPf5NmQa/czXVD7Rj+zpP/cMLLku3cmKQGs32xcZQEV1IhrQ284kUCw= X-Google-Smtp-Source: AGHT+IHfTFdra1910QIU39ficcAzBvz5JnKr47QbrB/bHtCH7joXx3Cp06dDGTwQtBTyvwJZv2+kow== X-Received: by 2002:adf:9c86:0:b0:33d:731f:b750 with SMTP id d6-20020adf9c86000000b0033d731fb750mr14481543wre.54.1709913512003; Fri, 08 Mar 2024 07:58:32 -0800 (PST) Received: from euler.schwinge.homeip.net (p200300c8b70336000b0134869109dcb1.dip0.t-ipconnect.de. [2003:c8:b703:3600:b01:3486:9109:dcb1]) by smtp.gmail.com with ESMTPSA id l1-20020adfe9c1000000b0033e756ed840sm2336795wrn.47.2024.03.08.07.58.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Mar 2024 07:58:31 -0800 (PST) From: Thomas Schwinge To: Tobias Burnus Cc: gcc-patches@gcc.gnu.org, Jakub Jelinek Subject: Re: nvptx: 'cuDeviceGetCount' failure is fatal In-Reply-To: <200df3f7-3419-47f6-9fc9-1db257521229@baylibre.com> References: <07fec82a-41cf-fdc5-6307-c068dd95ef1a@mentor.com> <87sf12dxon.fsf@euler.schwinge.ddns.net> <200df3f7-3419-47f6-9fc9-1db257521229@baylibre.com> User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/29.1 (x86_64-pc-linux-gnu) Date: Fri, 08 Mar 2024 16:58:27 +0100 Message-ID: <878r2shg0s.fsf@euler.schwinge.ddns.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Tobias! On 2024-03-07T15:28:21+0100, Tobias Burnus wrote: > Thomas Schwinge wrote: >> OK to push the attached "nvptx: 'cuDeviceGetCount' failure is fatal"? > > I think the real question is: what does a 'cuDeviceGetCount' fail mean? Internally to the CUDA stack: the error codes that you've cited below. Per the state we're in when calling 'cuDeviceGetCount', we only expect 'CUDA_SUCCESS'. Therefore, in our actual use: anything else means a fatal condition that we don't attempt to recover from, like for most of all other device access failures. > Does it mean a serious error =E2=80=93 or could it just be a permissions = issue=20 > such that the user has no device access but otherwise is fine? As you can see, we've done a 'cuInit' right before, so in case there was any permission issue (or similar), that's already settled (in whichever way) by the time we do the 'cuDeviceGetCount'. > Because if it is, e.g., a permission problem =E2=80=93 just returning '0'= (no=20 > devices) would seem to be the proper solution. > > But if it is expected to be always something serious, well, then a fatal= =20 > error makes more sense. ACK; pushed in commit 37078f241a22c45db6380c5e9a79b4d08054bb3d. Gr=C3=BC=C3=9Fe Thomas > The possible exit codes are: > > CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED,=20 > CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE > > which does not really help. > > My impression is that 0 is usually returned if something goes wrong=20 > (e.g. with permissions) such that an error is a real exception. But all=20 > three choices seem to make about equally sense: either host fallback=20 > (with 0 or -1) or a fatal error. > > Tobias