public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Tobias Burnus <tobias@codesourcery.com>
To: gcc-patches <gcc-patches@gcc.gnu.org>,
	Jakub Jelinek <jakub@redhat.com>, Tom de Vries <tdevries@suse.de>
Subject: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
Date: Mon, 29 Aug 2022 20:43:26 +0200	[thread overview]
Message-ID: <8301889b-64f9-8c60-15ca-2fa1fc495791@codesourcery.com> (raw)
In-Reply-To: <a736b341-fe09-1d7a-eb77-ec3ddc075b38@codesourcery.com>

[-- Attachment #1: Type: text/plain, Size: 1327 bytes --]

Slightly revised version, fixing some issues in mkoffload.cc. Otherwise, the same applies:

On 25.08.22 19:30, Tobias Burnus wrote:
On 25.08.22 16:54, Tobias Burnus wrote:

The attached patch prepare for reverse-offload device->host
function-address lookup by requesting (if needed) the on-device address.


This patch adds the actual implementation for NVPTX.

Having  array[] = {fn1,fn2};  works with nvptx only since sm_35; hence,
if there is a reverse_offload and sm_30 is used, there will be a compile-time
error.

To avoid incompatibilities, I compile with the same PTX ISA .version and
sm_XX version as the (last) file that contains the reverse offload. While
it should not matter, some newer CUDA might not support, e.g., sm_35 or
do not like a specific ISA version - thus, that seemed to be safer.

This is currently effectively a no op as with [1/3] patch, always NULL
is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as
'omp requires reverse_offload' has been specified.

OK for mainline?


Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: fn-lookup-nvptx-v2.diff --]
[-- Type: text/x-patch, Size: 11610 bytes --]

nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup

Add support to nvptx for reverse lookup of function name to prepare for
'omp target device(ancestor:1)'.

gcc/ChangeLog:

	* config/nvptx/mkoffload.cc (struct id_map): Add 'dim' member.
	(record_id): Store func name without quotes, store dim separately.
	(process): For GOMP_REQUIRES_REVERSE_OFFLOAD, check that -march is
	at least sm_35, create '$offload_func_table' global array and init
	with reverse-offload function addresses.
	* config/nvptx/nvptx.cc (write_fn_proto_1, write_fn_proto): New
	force_public attribute to force .visible.
	(nvptx_declare_function_name): For "omp target
	device_ancestor_nohost" attribut, force .visible/TREE_PUBLIC.

libgomp/ChangeLog:

	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Read offload
	function address table '$offload_func_table' if rev_fn_table
	is not NULL.

 gcc/config/nvptx/mkoffload.cc | 119 ++++++++++++++++++++++++++++++++++++++++--
 gcc/config/nvptx/nvptx.cc     |  20 +++++---
 libgomp/plugin/plugin-nvptx.c |  19 +++++++-
 3 files changed, 146 insertions(+), 12 deletions(-)

diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 3eea0a8f138..834b2059aac 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -47,6 +47,7 @@ struct id_map
 {
   id_map *next;
   char *ptx_name;
+  char *dim;
 };
 
 static id_map *func_ids, **funcs_tail = &func_ids;
@@ -108,8 +109,11 @@ xputenv (const char *string)
 static void
 record_id (const char *p1, id_map ***where)
 {
-  const char *end = strchr (p1, '\n');
-  if (!end)
+  gcc_assert (p1[0] == '"');
+  p1++;
+  const char *end = strchr (p1, '"');
+  const char *end2 = strchr (p1, '\n');
+  if (!end || !end2 || end >= end2)
     fatal_error (input_location, "malformed ptx file");
 
   id_map *v = XNEW (id_map);
@@ -117,6 +121,16 @@ record_id (const char *p1, id_map ***where)
   v->ptx_name = XNEWVEC (char, len + 1);
   memcpy (v->ptx_name, p1, len);
   v->ptx_name[len] = '\0';
+  p1 = end + 1;
+  if (*end != '\n')
+    {
+      len = end2 - p1;
+      v->dim = XNEWVEC (char, len + 1);
+      memcpy (v->dim, p1, len);
+      v->dim[len] = '\0';
+    }
+  else
+    v->dim = NULL;
   v->next = NULL;
   id_map **tail = *where;
   *tail = v;
@@ -242,6 +256,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
   id_map const *id;
   unsigned obj_count = 0;
   unsigned ix;
+  const char *sm_ver = NULL, *version = NULL;
+  const char *sm_ver2 = NULL, *version2 = NULL;
+  size_t file_cnt = 0;
+  size_t *file_idx = XALLOCAVEC (size_t, len);
 
   fprintf (out, "#include <stdint.h>\n\n");
 
@@ -250,6 +268,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
   for (size_t i = 0; i != len;)
     {
       char c;
+      bool output_fn_ptr = false;
+      file_idx[file_cnt++] = i;
 
       fprintf (out, "static const char ptx_code_%u[] =\n\t\"", obj_count++);
       while ((c = input[i++]))
@@ -261,6 +281,16 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	    case '\n':
 	      fprintf (out, "\\n\"\n\t\"");
 	      /* Look for mappings on subsequent lines.  */
+	      if (UNLIKELY (startswith (input + i, ".target sm_")))
+		{
+		  sm_ver = input + i + strlen (".target sm_");
+		  continue;
+		}
+	      if (UNLIKELY (startswith (input + i, ".version ")))
+		{
+		  version = input + i + strlen (".version ");
+		  continue;
+		}
 	      while (startswith (input + i, "//:"))
 		{
 		  i += 3;
@@ -268,7 +298,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 		  if (startswith (input + i, "VAR_MAP "))
 		    record_id (input + i + 8, &vars_tail);
 		  else if (startswith (input + i, "FUNC_MAP "))
-		    record_id (input + i + 9, &funcs_tail);
+		    {
+		      output_fn_ptr = true;
+		      record_id (input + i + 9, &funcs_tail);
+		    }
 		  else
 		    abort ();
 		  /* Skip to next line. */
@@ -286,6 +319,81 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	  putc (c, out);
 	}
       fprintf (out, "\";\n\n");
+      if (output_fn_ptr
+	  && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0)
+	{
+	  if (sm_ver && sm_ver[0] == '3' && sm_ver[1] == '0'
+	      && sm_ver[2] == '\n')
+	    fatal_error (input_location,
+			 "%<omp requires reverse_offload%> requires at least "
+			 "%<sm_35%> for %<-misa=%>");
+	  sm_ver2 = sm_ver;
+	  version2 = version;
+	}
+    }
+
+  /* Create function-pointer array, required for reverse
+     offload function-pointer lookup.  */
+
+  if (func_ids && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0)
+    {
+      const char needle[] = "// BEGIN GLOBAL FUNCTION DECL: ";
+      fprintf (out, "static const char ptx_code_%u[] =\n", obj_count++);
+      fprintf (out, "\t\".version ");
+      for (size_t i = 0; version2[i] != '\0' && version2[i] != '\n'; i++)
+	fputc (version2[i], out);
+      fprintf (out, "\"\n\t\".target sm_");
+      for (size_t i = 0; version2[i] != '\0' && sm_ver2[i] != '\n'; i++)
+	fputc (sm_ver2[i], out);
+      fprintf (out, "\"\n\t\".file 1 \\\"<dummy>\\\"\"\n");
+
+      size_t fidx = 0;
+      for (id = func_ids; id; id = id->next)
+	{
+	  /* Only 'nohost' functions are needed - use NULL for the rest.
+	     Alternatively, besides searching for 'BEGIN FUNCTION DECL',
+	     checking for '.visible .entry ' + id->ptx_name would be
+	     required.  */
+	  if (!endswith (id->ptx_name, "$nohost"))
+	    continue;
+	  fprintf (out, "\t\".extern ");
+	  const char *p = input + file_idx[fidx];
+	  while (true)
+	    {
+	      p = strstr (p, needle);
+	      if (!p)
+		{
+		  fidx++;
+		  if (fidx >= file_cnt)
+		    break;
+		  p = input + file_idx[fidx];
+		  continue;
+		}
+	      p += strlen (needle);
+	      if (!startswith (p, id->ptx_name))
+		continue;
+	      p += strlen (id->ptx_name);
+	      if (*p != '\n')
+		continue;
+	      p++;
+	      gcc_assert (startswith (p, ".visible "));
+	      p += strlen (".visible ");
+	      for (; *p != '\0' && *p != '\n'; p++)
+		fputc (*p, out);
+	      break;
+	    }
+	  fprintf (out, "\"\n");
+	  if (fidx == file_cnt)
+	    fatal_error (input_location,
+			 "Cannot find function declaration for %qs",
+			 id->ptx_name);
+	}
+      fprintf (out, "\t\".visible .global .align 8 .u64 "
+		    "$offload_func_table[] = {");
+      for (comma = "", id = func_ids; id; comma = ",", id = id->next)
+	fprintf (out, "%s\"\n\t\t\"%s", comma,
+		 endswith (id->ptx_name, "$nohost") ? id->ptx_name : "0");
+      fprintf (out, "};\\n\";\n\n");
     }
 
   /* Dump out array of pointers to ptx object strings.  */
@@ -300,7 +408,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
   /* Dump out variable idents.  */
   fprintf (out, "static const char *const var_mappings[] = {");
   for (comma = "", id = var_ids; id; comma = ",", id = id->next)
-    fprintf (out, "%s\n\t%s", comma, id->ptx_name);
+    fprintf (out, "%s\n\t\"%s\"", comma, id->ptx_name);
   fprintf (out, "\n};\n\n");
 
   /* Dump out function idents.  */
@@ -309,7 +417,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	   "  unsigned short dim[%d];\n"
 	   "} func_mappings[] = {\n", GOMP_DIM_MAX);
   for (comma = "", id = func_ids; id; comma = ",", id = id->next)
-    fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
+    fprintf (out, "%s\n\t{\"%s\"%s}", comma, id->ptx_name,
+	     id->dim ? id->dim : "");
   fprintf (out, "\n};\n\n");
 
   fprintf (out,
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index e4297e2d6c3..3293c096822 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -989,15 +989,15 @@ write_var_marker (FILE *file, bool is_defn, bool globalize, const char *name)
 
 static void
 write_fn_proto_1 (std::stringstream &s, bool is_defn,
-		  const char *name, const_tree decl)
+		  const char *name, const_tree decl, bool force_public)
 {
   if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL)
-    write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
+    write_fn_marker (s, is_defn, TREE_PUBLIC (decl) || force_public, name);
 
   /* PTX declaration.  */
   if (DECL_EXTERNAL (decl))
     s << ".extern ";
-  else if (TREE_PUBLIC (decl))
+  else if (TREE_PUBLIC (decl) || force_public)
     s << (DECL_WEAK (decl) ? ".weak " : ".visible ");
   s << (write_as_kernel (DECL_ATTRIBUTES (decl)) ? ".entry " : ".func ");
 
@@ -1086,7 +1086,7 @@ write_fn_proto_1 (std::stringstream &s, bool is_defn,
 
 static void
 write_fn_proto (std::stringstream &s, bool is_defn,
-		const char *name, const_tree decl)
+		const char *name, const_tree decl, bool force_public=false)
 {
   const char *replacement = nvptx_name_replacement (name);
   char *replaced_dots = NULL;
@@ -1103,9 +1103,9 @@ write_fn_proto (std::stringstream &s, bool is_defn,
 
   if (is_defn)
     /* Emit a declaration.  The PTX assembler gets upset without it.  */
-    write_fn_proto_1 (s, false, name, decl);
+    write_fn_proto_1 (s, false, name, decl, force_public);
 
-  write_fn_proto_1 (s, is_defn, name, decl);
+  write_fn_proto_1 (s, is_defn, name, decl, force_public);
 
   if (replaced_dots)
     XDELETE (replaced_dots);
@@ -1481,7 +1481,13 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
   tree fntype = TREE_TYPE (decl);
   tree result_type = TREE_TYPE (fntype);
   int argno = 0;
+  bool force_public = false;
 
+  /* For reverse-offload 'nohost' functions: In order to be collectable in
+     '$offload_func_table', cf. mkoffload.cc, the function has to be visible. */
+  if (lookup_attribute ("omp target device_ancestor_nohost",
+			DECL_ATTRIBUTES (decl)))
+    force_public = true;
   if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl))
       && !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl)))
     {
@@ -1493,7 +1499,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
   /* We construct the initial part of the function into a string
      stream, in order to share the prototype writing code.  */
   std::stringstream s;
-  write_fn_proto (s, true, name, decl);
+  write_fn_proto (s, true, name, decl, force_public);
   s << "{\n";
 
   bool return_in_mem = write_return_type (s, false, result_type);
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index d130665ed19..ac400fc2a1d 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1273,7 +1273,7 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
 int
 GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 			 struct addr_pair **target_table,
-			 uint64_t **rev_fn_table __attribute__((unused)))
+			 uint64_t **rev_fn_table)
 {
   CUmodule module;
   const char *const *var_names;
@@ -1376,6 +1376,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
     targ_tbl->start = targ_tbl->end = 0;
   targ_tbl++;
 
+  if (rev_fn_table && fn_entries == 0)
+    *rev_fn_table = NULL;
+  else if (rev_fn_table)
+    {
+      CUdeviceptr var;
+      size_t bytes;
+      r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &var, &bytes, module,
+			     "$offload_func_table");
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r));
+      assert (bytes == sizeof (uint64_t) * fn_entries);
+      *rev_fn_table = GOMP_PLUGIN_malloc (sizeof (uint64_t) * fn_entries);
+      r = CUDA_CALL_NOCHECK (cuMemcpyDtoH, *rev_fn_table, var, bytes);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuda_error (r));
+    }
+
   nvptx_set_clocktick (module, dev);
 
   return fn_entries + var_entries + other_entries;

WARNING: multiple messages have this Message-ID
From: Tobias Burnus <tobias@codesourcery.com>
To: gcc-patches <gcc-patches@gcc.gnu.org>,
	Jakub Jelinek <jakub@redhat.com>, Tom de Vries <tdevries@suse.de>
Subject: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
Date: Mon, 29 Aug 2022 20:43:26 +0200	[thread overview]
Message-ID: <8301889b-64f9-8c60-15ca-2fa1fc495791@codesourcery.com> (raw)
Message-ID: <20220829184326.aznBcG333VA4-4q1loL8mFfyd5IDno5YHhbF5DBaboc@z> (raw)
In-Reply-To: <a736b341-fe09-1d7a-eb77-ec3ddc075b38@codesourcery.com>


[-- Attachment #1.1: Type: text/plain, Size: 1327 bytes --]

Slightly revised version, fixing some issues in mkoffload.cc. Otherwise, the same applies:

On 25.08.22 19:30, Tobias Burnus wrote:
On 25.08.22 16:54, Tobias Burnus wrote:

The attached patch prepare for reverse-offload device->host
function-address lookup by requesting (if needed) the on-device address.


This patch adds the actual implementation for NVPTX.

Having  array[] = {fn1,fn2};  works with nvptx only since sm_35; hence,
if there is a reverse_offload and sm_30 is used, there will be a compile-time
error.

To avoid incompatibilities, I compile with the same PTX ISA .version and
sm_XX version as the (last) file that contains the reverse offload. While
it should not matter, some newer CUDA might not support, e.g., sm_35 or
do not like a specific ISA version - thus, that seemed to be safer.

This is currently effectively a no op as with [1/3] patch, always NULL
is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as
'omp requires reverse_offload' has been specified.

OK for mainline?


Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: fn-lookup-nvptx-v2.diff --]
[-- Type: text/x-patch, Size: 11610 bytes --]

nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup

Add support to nvptx for reverse lookup of function name to prepare for
'omp target device(ancestor:1)'.

gcc/ChangeLog:

	* config/nvptx/mkoffload.cc (struct id_map): Add 'dim' member.
	(record_id): Store func name without quotes, store dim separately.
	(process): For GOMP_REQUIRES_REVERSE_OFFLOAD, check that -march is
	at least sm_35, create '$offload_func_table' global array and init
	with reverse-offload function addresses.
	* config/nvptx/nvptx.cc (write_fn_proto_1, write_fn_proto): New
	force_public attribute to force .visible.
	(nvptx_declare_function_name): For "omp target
	device_ancestor_nohost" attribut, force .visible/TREE_PUBLIC.

libgomp/ChangeLog:

	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Read offload
	function address table '$offload_func_table' if rev_fn_table
	is not NULL.

 gcc/config/nvptx/mkoffload.cc | 119 ++++++++++++++++++++++++++++++++++++++++--
 gcc/config/nvptx/nvptx.cc     |  20 +++++---
 libgomp/plugin/plugin-nvptx.c |  19 +++++++-
 3 files changed, 146 insertions(+), 12 deletions(-)

diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 3eea0a8f138..834b2059aac 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -47,6 +47,7 @@ struct id_map
 {
   id_map *next;
   char *ptx_name;
+  char *dim;
 };
 
 static id_map *func_ids, **funcs_tail = &func_ids;
@@ -108,8 +109,11 @@ xputenv (const char *string)
 static void
 record_id (const char *p1, id_map ***where)
 {
-  const char *end = strchr (p1, '\n');
-  if (!end)
+  gcc_assert (p1[0] == '"');
+  p1++;
+  const char *end = strchr (p1, '"');
+  const char *end2 = strchr (p1, '\n');
+  if (!end || !end2 || end >= end2)
     fatal_error (input_location, "malformed ptx file");
 
   id_map *v = XNEW (id_map);
@@ -117,6 +121,16 @@ record_id (const char *p1, id_map ***where)
   v->ptx_name = XNEWVEC (char, len + 1);
   memcpy (v->ptx_name, p1, len);
   v->ptx_name[len] = '\0';
+  p1 = end + 1;
+  if (*end != '\n')
+    {
+      len = end2 - p1;
+      v->dim = XNEWVEC (char, len + 1);
+      memcpy (v->dim, p1, len);
+      v->dim[len] = '\0';
+    }
+  else
+    v->dim = NULL;
   v->next = NULL;
   id_map **tail = *where;
   *tail = v;
@@ -242,6 +256,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
   id_map const *id;
   unsigned obj_count = 0;
   unsigned ix;
+  const char *sm_ver = NULL, *version = NULL;
+  const char *sm_ver2 = NULL, *version2 = NULL;
+  size_t file_cnt = 0;
+  size_t *file_idx = XALLOCAVEC (size_t, len);
 
   fprintf (out, "#include <stdint.h>\n\n");
 
@@ -250,6 +268,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
   for (size_t i = 0; i != len;)
     {
       char c;
+      bool output_fn_ptr = false;
+      file_idx[file_cnt++] = i;
 
       fprintf (out, "static const char ptx_code_%u[] =\n\t\"", obj_count++);
       while ((c = input[i++]))
@@ -261,6 +281,16 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	    case '\n':
 	      fprintf (out, "\\n\"\n\t\"");
 	      /* Look for mappings on subsequent lines.  */
+	      if (UNLIKELY (startswith (input + i, ".target sm_")))
+		{
+		  sm_ver = input + i + strlen (".target sm_");
+		  continue;
+		}
+	      if (UNLIKELY (startswith (input + i, ".version ")))
+		{
+		  version = input + i + strlen (".version ");
+		  continue;
+		}
 	      while (startswith (input + i, "//:"))
 		{
 		  i += 3;
@@ -268,7 +298,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 		  if (startswith (input + i, "VAR_MAP "))
 		    record_id (input + i + 8, &vars_tail);
 		  else if (startswith (input + i, "FUNC_MAP "))
-		    record_id (input + i + 9, &funcs_tail);
+		    {
+		      output_fn_ptr = true;
+		      record_id (input + i + 9, &funcs_tail);
+		    }
 		  else
 		    abort ();
 		  /* Skip to next line. */
@@ -286,6 +319,81 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	  putc (c, out);
 	}
       fprintf (out, "\";\n\n");
+      if (output_fn_ptr
+	  && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0)
+	{
+	  if (sm_ver && sm_ver[0] == '3' && sm_ver[1] == '0'
+	      && sm_ver[2] == '\n')
+	    fatal_error (input_location,
+			 "%<omp requires reverse_offload%> requires at least "
+			 "%<sm_35%> for %<-misa=%>");
+	  sm_ver2 = sm_ver;
+	  version2 = version;
+	}
+    }
+
+  /* Create function-pointer array, required for reverse
+     offload function-pointer lookup.  */
+
+  if (func_ids && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0)
+    {
+      const char needle[] = "// BEGIN GLOBAL FUNCTION DECL: ";
+      fprintf (out, "static const char ptx_code_%u[] =\n", obj_count++);
+      fprintf (out, "\t\".version ");
+      for (size_t i = 0; version2[i] != '\0' && version2[i] != '\n'; i++)
+	fputc (version2[i], out);
+      fprintf (out, "\"\n\t\".target sm_");
+      for (size_t i = 0; version2[i] != '\0' && sm_ver2[i] != '\n'; i++)
+	fputc (sm_ver2[i], out);
+      fprintf (out, "\"\n\t\".file 1 \\\"<dummy>\\\"\"\n");
+
+      size_t fidx = 0;
+      for (id = func_ids; id; id = id->next)
+	{
+	  /* Only 'nohost' functions are needed - use NULL for the rest.
+	     Alternatively, besides searching for 'BEGIN FUNCTION DECL',
+	     checking for '.visible .entry ' + id->ptx_name would be
+	     required.  */
+	  if (!endswith (id->ptx_name, "$nohost"))
+	    continue;
+	  fprintf (out, "\t\".extern ");
+	  const char *p = input + file_idx[fidx];
+	  while (true)
+	    {
+	      p = strstr (p, needle);
+	      if (!p)
+		{
+		  fidx++;
+		  if (fidx >= file_cnt)
+		    break;
+		  p = input + file_idx[fidx];
+		  continue;
+		}
+	      p += strlen (needle);
+	      if (!startswith (p, id->ptx_name))
+		continue;
+	      p += strlen (id->ptx_name);
+	      if (*p != '\n')
+		continue;
+	      p++;
+	      gcc_assert (startswith (p, ".visible "));
+	      p += strlen (".visible ");
+	      for (; *p != '\0' && *p != '\n'; p++)
+		fputc (*p, out);
+	      break;
+	    }
+	  fprintf (out, "\"\n");
+	  if (fidx == file_cnt)
+	    fatal_error (input_location,
+			 "Cannot find function declaration for %qs",
+			 id->ptx_name);
+	}
+      fprintf (out, "\t\".visible .global .align 8 .u64 "
+		    "$offload_func_table[] = {");
+      for (comma = "", id = func_ids; id; comma = ",", id = id->next)
+	fprintf (out, "%s\"\n\t\t\"%s", comma,
+		 endswith (id->ptx_name, "$nohost") ? id->ptx_name : "0");
+      fprintf (out, "};\\n\";\n\n");
     }
 
   /* Dump out array of pointers to ptx object strings.  */
@@ -300,7 +408,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
   /* Dump out variable idents.  */
   fprintf (out, "static const char *const var_mappings[] = {");
   for (comma = "", id = var_ids; id; comma = ",", id = id->next)
-    fprintf (out, "%s\n\t%s", comma, id->ptx_name);
+    fprintf (out, "%s\n\t\"%s\"", comma, id->ptx_name);
   fprintf (out, "\n};\n\n");
 
   /* Dump out function idents.  */
@@ -309,7 +417,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	   "  unsigned short dim[%d];\n"
 	   "} func_mappings[] = {\n", GOMP_DIM_MAX);
   for (comma = "", id = func_ids; id; comma = ",", id = id->next)
-    fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
+    fprintf (out, "%s\n\t{\"%s\"%s}", comma, id->ptx_name,
+	     id->dim ? id->dim : "");
   fprintf (out, "\n};\n\n");
 
   fprintf (out,
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index e4297e2d6c3..3293c096822 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -989,15 +989,15 @@ write_var_marker (FILE *file, bool is_defn, bool globalize, const char *name)
 
 static void
 write_fn_proto_1 (std::stringstream &s, bool is_defn,
-		  const char *name, const_tree decl)
+		  const char *name, const_tree decl, bool force_public)
 {
   if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL)
-    write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
+    write_fn_marker (s, is_defn, TREE_PUBLIC (decl) || force_public, name);
 
   /* PTX declaration.  */
   if (DECL_EXTERNAL (decl))
     s << ".extern ";
-  else if (TREE_PUBLIC (decl))
+  else if (TREE_PUBLIC (decl) || force_public)
     s << (DECL_WEAK (decl) ? ".weak " : ".visible ");
   s << (write_as_kernel (DECL_ATTRIBUTES (decl)) ? ".entry " : ".func ");
 
@@ -1086,7 +1086,7 @@ write_fn_proto_1 (std::stringstream &s, bool is_defn,
 
 static void
 write_fn_proto (std::stringstream &s, bool is_defn,
-		const char *name, const_tree decl)
+		const char *name, const_tree decl, bool force_public=false)
 {
   const char *replacement = nvptx_name_replacement (name);
   char *replaced_dots = NULL;
@@ -1103,9 +1103,9 @@ write_fn_proto (std::stringstream &s, bool is_defn,
 
   if (is_defn)
     /* Emit a declaration.  The PTX assembler gets upset without it.  */
-    write_fn_proto_1 (s, false, name, decl);
+    write_fn_proto_1 (s, false, name, decl, force_public);
 
-  write_fn_proto_1 (s, is_defn, name, decl);
+  write_fn_proto_1 (s, is_defn, name, decl, force_public);
 
   if (replaced_dots)
     XDELETE (replaced_dots);
@@ -1481,7 +1481,13 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
   tree fntype = TREE_TYPE (decl);
   tree result_type = TREE_TYPE (fntype);
   int argno = 0;
+  bool force_public = false;
 
+  /* For reverse-offload 'nohost' functions: In order to be collectable in
+     '$offload_func_table', cf. mkoffload.cc, the function has to be visible. */
+  if (lookup_attribute ("omp target device_ancestor_nohost",
+			DECL_ATTRIBUTES (decl)))
+    force_public = true;
   if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl))
       && !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl)))
     {
@@ -1493,7 +1499,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
   /* We construct the initial part of the function into a string
      stream, in order to share the prototype writing code.  */
   std::stringstream s;
-  write_fn_proto (s, true, name, decl);
+  write_fn_proto (s, true, name, decl, force_public);
   s << "{\n";
 
   bool return_in_mem = write_return_type (s, false, result_type);
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index d130665ed19..ac400fc2a1d 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1273,7 +1273,7 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
 int
 GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 			 struct addr_pair **target_table,
-			 uint64_t **rev_fn_table __attribute__((unused)))
+			 uint64_t **rev_fn_table)
 {
   CUmodule module;
   const char *const *var_names;
@@ -1376,6 +1376,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
     targ_tbl->start = targ_tbl->end = 0;
   targ_tbl++;
 
+  if (rev_fn_table && fn_entries == 0)
+    *rev_fn_table = NULL;
+  else if (rev_fn_table)
+    {
+      CUdeviceptr var;
+      size_t bytes;
+      r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &var, &bytes, module,
+			     "$offload_func_table");
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r));
+      assert (bytes == sizeof (uint64_t) * fn_entries);
+      *rev_fn_table = GOMP_PLUGIN_malloc (sizeof (uint64_t) * fn_entries);
+      r = CUDA_CALL_NOCHECK (cuMemcpyDtoH, *rev_fn_table, var, bytes);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuda_error (r));
+    }
+
   nvptx_set_clocktick (module, dev);
 
   return fn_entries + var_entries + other_entries;

  parent reply	other threads:[~2022-08-29 18:44 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-25 14:54 [Patch][1/3] libgomp: " Tobias Burnus
2022-08-25 14:54 ` Tobias Burnus
2022-08-25 15:38 ` [Patch][2/3] GCN: libgomp+mkoffload.cc: " Tobias Burnus
2022-08-25 15:38   ` Tobias Burnus
2022-09-09 15:31   ` Jakub Jelinek
2022-08-25 17:30 ` [Patch][2/3] nvptx: " Tobias Burnus
2022-08-25 17:30   ` Tobias Burnus
2022-08-29 18:43   ` Tobias Burnus [this message]
2022-08-29 18:43     ` [Patch][2/3][v2] " Tobias Burnus
2022-09-09 15:36     ` Jakub Jelinek
2022-09-12 12:02       ` [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible (was: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Tobias Burnus
2022-09-12 12:10         ` Jakub Jelinek
2022-10-17 11:59         ` Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c' (was: [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible) Thomas Schwinge
2022-09-23 15:40     ` [og12] Come up with {,UN}LIKELY macros (was: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Thomas Schwinge
2022-09-09 15:29 ` [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Jakub Jelinek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8301889b-64f9-8c60-15ca-2fa1fc495791@codesourcery.com \
    --to=tobias@codesourcery.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=tdevries@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).