From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 32A653858D1E for ; Tue, 4 Apr 2023 09:08:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 32A653858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 3C3042281B; Tue, 4 Apr 2023 09:08:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1680599310; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kMOywS9an/m9WOUIXuB07Ucqa9fzqBgI83rIB8RA8F4=; b=mJXFjKvAGtzhlqFOG2GB6+4unZBImXfLhnWaJEoDgmJOCNh9xgnIOP9HBY+cijWUHnsqkV LREgE3mKy67TQ2zfft/4IzIGOLcjRtOoxAyKP3SbjlpMbgAZOwTaUagr+YWFvVPkEgwrAa K+sZA1KYpvxyFkJQL9Z9CWH0wXs0FCY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1680599310; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kMOywS9an/m9WOUIXuB07Ucqa9fzqBgI83rIB8RA8F4=; b=KNFl6GIG96uaheCDUijW+rWJU6SJLSd0t5liuE55iNXVXJQG/gf56eJ3hKx+ELOxXzHcra eDJGB/TYrje+vUCA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 1D0961391A; Tue, 4 Apr 2023 09:08:30 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id elPuBQ7pK2RfLQAAMHmgww (envelope-from ); Tue, 04 Apr 2023 09:08:30 +0000 Message-ID: <7d39bff2-71c2-176e-dd22-ec933d766db5@suse.de> Date: Tue, 4 Apr 2023 11:08:15 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [Patch] nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098] To: Thomas Schwinge , Tobias Burnus , gcc-patches@gcc.gnu.org, Jakub Jelinek References: <875ye78uqs.fsf@euler.schwinge.homeip.net> <871ql0xboa.fsf@euler.schwinge.homeip.net> Content-Language: en-US From: Tom de Vries In-Reply-To: <871ql0xboa.fsf@euler.schwinge.homeip.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-13.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 4/4/23 11:02, Thomas Schwinge wrote: > Hi! > > Are we going to install such a work-around? > Hi, LGTM. Thanks, - Tom > > Grüße > Thomas > > > On 2022-12-19T13:04:43+0100, I wrote: >> Hi! >> >> On 2022-12-16T17:19:00+0100, Tobias Burnus wrote: >>> Seems to be a CUDA JIT issue >> >> A Nvidia Driver JIT issue, more precisely. ;-) >> >>> which is fixed by adding a dummy procedure. >> >> Gah... :-| >> >>> Lightly tested with 4 systems at hand, where 2 failed before. >> >> I'm happy to confirm that indeed this does resolve the issue for all >> configurations that I reported in >> "OpenMP/nvptx reverse offload execution test FAILs". >> >> >> As I said on IRC, #gcc, 2022-12-16: >> >>> [...] we're unlikely to reverse-engineer the exact version/conditions >>> where this got fixed, so don't have a useful means for versioning the >>> workaround. Fortunately, it doesn't "cost" anything really. (In >>> constrast to some other GCC/nvptx back end workarounds, as I >>> understand.) >> >> >> Grüße >> Thomas >> >> >>> One had 10.2 and >>> the other had some ancient CUDA where 'nvptx-smi' did not print a CUDA version >>> and requires -mptx=3.1. >>> (I did check that offloading indeed happened and no hostfallback was done.) >>> >>> OK for mainline? >>> >>> Tobias >> >> >>> nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098] >>> >>> Seemingly, the ptx JIT of CUDA <= 10.2 replaces function pointers in global >>> variables by NULL if a translation does not contain any executable code. It >>> works with CUDA 11.1. The code of this commit is about reverse offload; >>> having NULL values disables the side of reverse offload during image load. >>> >>> Solution is the same as found by Thomas for a related issue: Adding a dummy >>> procedure. Cf. the PR of this issue and Thomas' patch >>> "nvptx: Support global constructors/destructors via 'collect2'" >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607749.html >>> >>> As that approach also works here: >>> >>> Co-authored-by: Thomas Schwinge >>> >>> gcc/ >>> PR libgomp/108098 >>> >>> * config/nvptx/mkoffload.cc (process): Emit dummy procedure >>> alongside reverse-offload function table to prevent NULL values >>> of the function addresses. >>> >>> --- >>> gcc/config/nvptx/mkoffload.cc | 14 ++++++++++++++ >>> 1 file changed, 14 insertions(+) >>> >>> diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc >>> index 5d89ba8..8306aa0 100644 >>> --- a/gcc/config/nvptx/mkoffload.cc >>> +++ b/gcc/config/nvptx/mkoffload.cc >>> @@ -357,6 +357,20 @@ process (FILE *in, FILE *out, uint32_t omp_requires) >>> fputc (sm_ver2[i], out); >>> fprintf (out, "\"\n\t\".file 1 \\\"\\\"\"\n"); >>> >>> + /* WORKAROUND - see PR 108098 >>> + It seems as if older CUDA JIT compiler optimizes the function pointers >>> + in offload_func_table to NULL, which can be prevented by adding a >>> + dummy procedure. With CUDA 11.1, it seems to work fine without >>> + workaround while CUDA 10.2 as some ancient version have need the >>> + workaround. Assuming CUDA 11.0 fixes it, emitting it could be >>> + restricted to 'if (sm_ver2[0] < 8 && version2[0] < 7)' as sm_80 and >>> + PTX ISA 7.0 are new in CUDA 11.0; for 11.1 it would be sm_86 and >>> + PTX ISA 7.1. */ >>> + fprintf (out, "\n\t\".func __dummy$func ( );\"\n"); >>> + fprintf (out, "\t\".func __dummy$func ( )\"\n"); >>> + fprintf (out, "\t\"{\"\n"); >>> + fprintf (out, "\t\"}\"\n"); >>> + >>> size_t fidx = 0; >>> for (id = func_ids; id; id = id->next) >>> { > ----------------- > Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955