public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc r14-491] nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098]
@ 2023-05-05  9:28 Tobias Burnus
  0 siblings, 0 replies; only message in thread
From: Tobias Burnus @ 2023-05-05  9:28 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:4359724cba31b2645f6106266bef019c3d6ef16a

commit r14-491-g4359724cba31b2645f6106266bef019c3d6ef16a
Author: Tobias Burnus <tobias@codesourcery.com>
Date:   Fri May 5 11:27:32 2023 +0200

    nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098]
    
    Seemingly, the ptx JIT of CUDA <= 10.2 replaces function pointers in global
    variables by NULL if a translation does not contain any executable code. It
    works with CUDA 11.1.  The code of this commit is about reverse offload;
    having NULL values disables the side of reverse offload during image load.
    
    Solution is the same as found by Thomas for a related issue: Adding a dummy
    procedure. Cf. the PR of this issue and Thomas' patch
    "nvptx: Support global constructors/destructors via 'collect2'"
    https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607749.html
    
    As that approach also works here:
    
    Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
    
    gcc/
            PR libgomp/108098
    
            * config/nvptx/mkoffload.cc (process): Emit dummy procedure
            alongside reverse-offload function table to prevent NULL values
            of the function addresses.

Diff:
---
 gcc/config/nvptx/mkoffload.cc | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index edb03cff1cd..6cdea45cffe 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -357,6 +357,20 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	fputc (sm_ver2[i], out);
       fprintf (out, "\"\n\t\".file 1 \\\"<dummy>\\\"\"\n");
 
+      /* WORKAROUND - see PR 108098
+	 It seems as if older CUDA JIT compiler optimizes the function pointers
+	 in offload_func_table to NULL, which can be prevented by adding a
+	 dummy procedure. With CUDA 11.1, it seems to work fine without
+	 workaround while CUDA 10.2 as some ancient version have need the
+	 workaround. Assuming CUDA 11.0 fixes it, emitting it could be
+	 restricted to 'if (sm_ver2[0] < 8 && version2[0] < 7)' as sm_80 and
+	 PTX ISA 7.0 are new in CUDA 11.0; for 11.1 it would be sm_86 and
+	 PTX ISA 7.1.  */
+      fprintf (out, "\n\t\".func __dummy$func ( );\"\n");
+      fprintf (out, "\t\".func __dummy$func ( )\"\n");
+      fprintf (out, "\t\"{\"\n");
+      fprintf (out, "\t\"}\"\n");
+
       size_t fidx = 0;
       for (id = func_ids; id; id = id->next)
 	{

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-05-05  9:28 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-05  9:28 [gcc r14-491] nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098] Tobias Burnus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).