public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch][1/3] libgomp: Prepare for reverse offload fn lookup
@ 2022-08-25 14:54 Tobias Burnus
  2022-08-25 14:54 ` Tobias Burnus
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Tobias Burnus @ 2022-08-25 14:54 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 981 bytes --]

Technically, this patch is stand alone, but conceptually it based on the
submitted but not reviewed patch:
"[Patch] OpenMP: Support reverse offload (middle end part)"
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598662.html

With that patch, for reverse offloads ('omp target device(ancestor:1)'),
calls like the following are added:
  GOMP_target_ext (-2 /* initial device */, omp_fn.1
where 'omp_fn.1' on nonhost devices a stub function just required for
looking up the host function pointer via the offload_funcs table.

The attached patch prepare for reverse-offload device->host
function-address lookup by requesting (if needed) the on-device address.

OK for mainline?

Tobias


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: fn-lookup-gen.diff --]
[-- Type: text/x-patch, Size: 5324 bytes --]

libgomp: Prepare for reverse offload fn lookup

Prepare for reverse-offloading function-pointer lookup by passing
a rev_fn_table argument to GOMP_OFFLOAD_load_image.

The argument will be NULL, unless GOMP_REQUIRES_REVERSE_OFFLOAD is
requested and devices not supported it, are filtered out.
(Up to and including this commit, no non-host device claims such
support and the caller currently always passes NULL.)

libgomp/ChangeLog:

	* libgomp-plugin.h (GOMP_OFFLOAD_load_image): Add
	'uint64_t **rev_fn_table' argument.
	* oacc-host.c (host_load_image): Likewise.
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Likewise;
	currently unused.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
	* target.c (gomp_load_image_to_device): Update call but pass
	NULL for now.

liboffloadmic/ChangeLog:

	* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_load_image):
	Add (unused) uint64_t **rev_fn_table argument.

 libgomp/libgomp-plugin.h                         | 2 +-
 libgomp/oacc-host.c                              | 3 ++-
 libgomp/plugin/plugin-gcn.c                      | 7 +++++--
 libgomp/plugin/plugin-nvptx.c                    | 7 +++++--
 libgomp/target.c                                 | 2 +-
 liboffloadmic/plugin/libgomp-plugin-intelmic.cpp | 3 ++-
 6 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index ab3ed638475..57309303e66 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -130,7 +130,7 @@ extern bool GOMP_OFFLOAD_init_device (int);
 extern bool GOMP_OFFLOAD_fini_device (int);
 extern unsigned GOMP_OFFLOAD_version (void);
 extern int GOMP_OFFLOAD_load_image (int, unsigned, const void *,
-				    struct addr_pair **);
+				    struct addr_pair **, uint64_t **);
 extern bool GOMP_OFFLOAD_unload_image (int, unsigned, const void *);
 extern void *GOMP_OFFLOAD_alloc (int, size_t);
 extern bool GOMP_OFFLOAD_free (int, void *);
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index eb11b9cf16a..4e3971ae1a9 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -81,7 +81,8 @@ static int
 host_load_image (int n __attribute__ ((unused)),
 		 unsigned v __attribute__ ((unused)),
 		 const void *t __attribute__ ((unused)),
-		 struct addr_pair **r __attribute__ ((unused)))
+		 struct addr_pair **r __attribute__ ((unused)),
+		 uint64_t **f __attribute__ ((unused)))
 {
   return 0;
 }
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index ea327bf2ca0..363e2950649 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3346,11 +3346,14 @@ GOMP_OFFLOAD_init_device (int n)
 
 /* Load GCN object-code module described by struct gcn_image_desc in
    TARGET_DATA and return references to kernel descriptors in TARGET_TABLE.
-   If there are any constructors then run them.  */
+   If there are any constructors then run them.  If not NULL, REV_FN_TABLE will
+   contain the on-device addresses of the functions for reverse offload.  To be
+   freed by the caller.  */
 
 int
 GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
-			 struct addr_pair **target_table)
+			 struct addr_pair **target_table,
+			 uint64_t **rev_fn_table __attribute__((unused)))
 {
   if (GOMP_VERSION_DEV (version) != GOMP_VERSION_GCN)
     {
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index bc63e274cdf..d130665ed19 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1266,11 +1266,14 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
 }
 
 /* Load the (partial) program described by TARGET_DATA to device
-   number ORD.  Allocate and return TARGET_TABLE.  */
+   number ORD.  Allocate and return TARGET_TABLE.  If not NULL, REV_FN_TABLE
+   will contain the on-device addresses of the functions for reverse offload.
+   To be freed by the caller.  */
 
 int
 GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
-			 struct addr_pair **target_table)
+			 struct addr_pair **target_table,
+			 uint64_t **rev_fn_table __attribute__((unused)))
 {
   CUmodule module;
   const char *const *var_names;
diff --git a/libgomp/target.c b/libgomp/target.c
index 135db1d88ab..da907a2fb48 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2137,7 +2137,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 
   num_target_entries
     = devicep->load_image_func (devicep->target_id, version,
-				target_data, &target_table);
+				target_data, &target_table, NULL);
 
   if (num_target_entries != num_funcs + num_vars
       /* Others (device_num) are included as trailing entries in pair list.  */
diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 33bae0650b4..7be27f0459d 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -349,7 +349,8 @@ GOMP_OFFLOAD_version (void)
 
 extern "C" int
 GOMP_OFFLOAD_load_image (int device, const unsigned version,
-			 const void *target_image, addr_pair **result)
+			 const void *target_image, addr_pair **result,
+			 uint64_t ** /* rev_fn_table */)
 {
   TRACE ("(device = %d, target_image = %p)", device, target_image);
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Patch][1/3] libgomp: Prepare for reverse offload fn lookup
  2022-08-25 14:54 [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Tobias Burnus
@ 2022-08-25 14:54 ` Tobias Burnus
  2022-08-25 15:38 ` [Patch][2/3] GCN: libgomp+mkoffload.cc: " Tobias Burnus
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Tobias Burnus @ 2022-08-25 14:54 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek


[-- Attachment #1.1: Type: text/plain, Size: 981 bytes --]

Technically, this patch is stand alone, but conceptually it based on the
submitted but not reviewed patch:
"[Patch] OpenMP: Support reverse offload (middle end part)"
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598662.html

With that patch, for reverse offloads ('omp target device(ancestor:1)'),
calls like the following are added:
  GOMP_target_ext (-2 /* initial device */, omp_fn.1
where 'omp_fn.1' on nonhost devices a stub function just required for
looking up the host function pointer via the offload_funcs table.

The attached patch prepare for reverse-offload device->host
function-address lookup by requesting (if needed) the on-device address.

OK for mainline?

Tobias


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: fn-lookup-gen.diff --]
[-- Type: text/x-patch, Size: 5324 bytes --]

libgomp: Prepare for reverse offload fn lookup

Prepare for reverse-offloading function-pointer lookup by passing
a rev_fn_table argument to GOMP_OFFLOAD_load_image.

The argument will be NULL, unless GOMP_REQUIRES_REVERSE_OFFLOAD is
requested and devices not supported it, are filtered out.
(Up to and including this commit, no non-host device claims such
support and the caller currently always passes NULL.)

libgomp/ChangeLog:

	* libgomp-plugin.h (GOMP_OFFLOAD_load_image): Add
	'uint64_t **rev_fn_table' argument.
	* oacc-host.c (host_load_image): Likewise.
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Likewise;
	currently unused.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
	* target.c (gomp_load_image_to_device): Update call but pass
	NULL for now.

liboffloadmic/ChangeLog:

	* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_load_image):
	Add (unused) uint64_t **rev_fn_table argument.

 libgomp/libgomp-plugin.h                         | 2 +-
 libgomp/oacc-host.c                              | 3 ++-
 libgomp/plugin/plugin-gcn.c                      | 7 +++++--
 libgomp/plugin/plugin-nvptx.c                    | 7 +++++--
 libgomp/target.c                                 | 2 +-
 liboffloadmic/plugin/libgomp-plugin-intelmic.cpp | 3 ++-
 6 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index ab3ed638475..57309303e66 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -130,7 +130,7 @@ extern bool GOMP_OFFLOAD_init_device (int);
 extern bool GOMP_OFFLOAD_fini_device (int);
 extern unsigned GOMP_OFFLOAD_version (void);
 extern int GOMP_OFFLOAD_load_image (int, unsigned, const void *,
-				    struct addr_pair **);
+				    struct addr_pair **, uint64_t **);
 extern bool GOMP_OFFLOAD_unload_image (int, unsigned, const void *);
 extern void *GOMP_OFFLOAD_alloc (int, size_t);
 extern bool GOMP_OFFLOAD_free (int, void *);
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index eb11b9cf16a..4e3971ae1a9 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -81,7 +81,8 @@ static int
 host_load_image (int n __attribute__ ((unused)),
 		 unsigned v __attribute__ ((unused)),
 		 const void *t __attribute__ ((unused)),
-		 struct addr_pair **r __attribute__ ((unused)))
+		 struct addr_pair **r __attribute__ ((unused)),
+		 uint64_t **f __attribute__ ((unused)))
 {
   return 0;
 }
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index ea327bf2ca0..363e2950649 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3346,11 +3346,14 @@ GOMP_OFFLOAD_init_device (int n)
 
 /* Load GCN object-code module described by struct gcn_image_desc in
    TARGET_DATA and return references to kernel descriptors in TARGET_TABLE.
-   If there are any constructors then run them.  */
+   If there are any constructors then run them.  If not NULL, REV_FN_TABLE will
+   contain the on-device addresses of the functions for reverse offload.  To be
+   freed by the caller.  */
 
 int
 GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
-			 struct addr_pair **target_table)
+			 struct addr_pair **target_table,
+			 uint64_t **rev_fn_table __attribute__((unused)))
 {
   if (GOMP_VERSION_DEV (version) != GOMP_VERSION_GCN)
     {
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index bc63e274cdf..d130665ed19 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1266,11 +1266,14 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
 }
 
 /* Load the (partial) program described by TARGET_DATA to device
-   number ORD.  Allocate and return TARGET_TABLE.  */
+   number ORD.  Allocate and return TARGET_TABLE.  If not NULL, REV_FN_TABLE
+   will contain the on-device addresses of the functions for reverse offload.
+   To be freed by the caller.  */
 
 int
 GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
-			 struct addr_pair **target_table)
+			 struct addr_pair **target_table,
+			 uint64_t **rev_fn_table __attribute__((unused)))
 {
   CUmodule module;
   const char *const *var_names;
diff --git a/libgomp/target.c b/libgomp/target.c
index 135db1d88ab..da907a2fb48 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2137,7 +2137,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 
   num_target_entries
     = devicep->load_image_func (devicep->target_id, version,
-				target_data, &target_table);
+				target_data, &target_table, NULL);
 
   if (num_target_entries != num_funcs + num_vars
       /* Others (device_num) are included as trailing entries in pair list.  */
diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 33bae0650b4..7be27f0459d 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -349,7 +349,8 @@ GOMP_OFFLOAD_version (void)
 
 extern "C" int
 GOMP_OFFLOAD_load_image (int device, const unsigned version,
-			 const void *target_image, addr_pair **result)
+			 const void *target_image, addr_pair **result,
+			 uint64_t ** /* rev_fn_table */)
 {
   TRACE ("(device = %d, target_image = %p)", device, target_image);
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Patch][2/3] GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
  2022-08-25 14:54 [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Tobias Burnus
  2022-08-25 14:54 ` Tobias Burnus
@ 2022-08-25 15:38 ` Tobias Burnus
  2022-08-25 15:38   ` Tobias Burnus
  2022-09-09 15:31   ` Jakub Jelinek
  2022-08-25 17:30 ` [Patch][2/3] nvptx: " Tobias Burnus
  2022-09-09 15:29 ` [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Jakub Jelinek
  3 siblings, 2 replies; 15+ messages in thread
From: Tobias Burnus @ 2022-08-25 15:38 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, Andrew Stubbs

[-- Attachment #1: Type: text/plain, Size: 864 bytes --]

On 25.08.22 16:54, Tobias Burnus wrote:

The attached patch prepare for reverse-offload device->host
function-address lookup by requesting (if needed) the on-device address.


This patch adds the actual implementation for GCN. A variant would be
to only generate .offload_func_table inside mkoffload when
OMP_REQUIRES_REVERSE_OFFLOAD has been requested.

This is currently effectively a no op as with [1/3] patch, always NULL
is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as
'omp requires reverse_offload' has been specified.

OK for mainline?

Tobias


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: fn-lookup-gcn.diff --]
[-- Type: text/x-patch, Size: 4673 bytes --]

GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup

Add support to GCN for reverse lookup of function name to prepare for
'omp target device(ancestor:1)'.

gcc/ChangeLog:

	* config/gcn/mkoffload.cc (process_asm): Create .offload_func_table,
	similar to pre-existing .offload_var_table.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Read
	.offload_func_table to populate rev_fn_table when requested.

 gcc/config/gcn/mkoffload.cc | 11 ++++++++++-
 libgomp/plugin/plugin-gcn.c | 26 +++++++++++++++++++++++++-
 2 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 4206448703a..24d327355e3 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -537,63 +537,72 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 	case IN_VARS:
 	  {
 	    char *varname;
 	    unsigned varsize;
 	    if (sscanf (buf, " .8byte %ms\n", &varname))
 	      {
 		fputs (buf, out);
 		fgets (buf, sizeof (buf), in);
 		if (!sscanf (buf, " .8byte %u\n", &varsize))
 		  abort ();
 		var_count++;
 	      }
 	    break;
 	  }
 	case IN_FUNCS:
 	  {
 	    char *funcname;
 	    if (sscanf (buf, "\t.8byte\t%ms\n", &funcname))
 	      {
+		fputs (buf, out);
 		obstack_ptr_grow (&fns_os, funcname);
 		fn_count++;
 		continue;
 	      }
 	    break;
 	  }
 	}
 
       char dummy;
       if (sscanf (buf, " .section .gnu.offload_vars%c", &dummy) > 0)
 	{
 	  state = IN_VARS;
 
 	  /* Add a global symbol to allow plugin-gcn.c to locate the table
 	     at runtime.  It can't use the "offload_var_table.N" emitted by
 	     the compiler because a) they're not global, and b) there's one
 	     for each input file combined into the binary.  */
 	  fputs (buf, out);
 	  fputs ("\t.global .offload_var_table\n"
 		 "\t.type .offload_var_table, @object\n"
 		 ".offload_var_table:\n",
 		 out);
 	}
       else if (sscanf (buf, " .section .gnu.offload_funcs%c", &dummy) > 0)
-	state = IN_FUNCS;
+	{
+	  state = IN_FUNCS;
+	  /* Likewise for .gnu.offload_vars; used for reverse offload. */
+	  fputs (buf, out);
+	  fputs ("\t.global .offload_func_table\n"
+		 "\t.type .offload_func_table, @object\n"
+		 ".offload_func_table:\n",
+		 out);
+	}
       else if (sscanf (buf, " .amdgpu_metadata%c", &dummy) > 0)
 	{
 	  state = IN_METADATA;
 	  regcount.kernel_name = NULL;
 	  regcount.sgpr_count = regcount.vgpr_count = -1;
 	}
       else if (sscanf (buf, " .section %c", &dummy) > 0
 	       || sscanf (buf, " .text%c", &dummy) > 0
 	       || sscanf (buf, " .bss%c", &dummy) > 0
 	       || sscanf (buf, " .data%c", &dummy) > 0
 	       || sscanf (buf, " .ident %c", &dummy) > 0)
 	state = IN_CODE;
       else if (sscanf (buf, " .end_amdgpu_metadata%c", &dummy) > 0)
 	{
 	  state = IN_CODE;
 	  gcc_assert (regcount.kernel_name != NULL
 		      && regcount.sgpr_count >= 0
 		      && regcount.vgpr_count >= 0);
 	  obstack_grow (&regcounts_os, &regcount, sizeof (regcount));
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 363e2950649..00603981866 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3353,7 +3353,7 @@ GOMP_OFFLOAD_init_device (int n)
 int
 GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 			 struct addr_pair **target_table,
-			 uint64_t **rev_fn_table __attribute__((unused)))
+			 uint64_t **rev_fn_table)
 {
   if (GOMP_VERSION_DEV (version) != GOMP_VERSION_GCN)
     {
@@ -3520,6 +3520,30 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   if (module->fini_array_func)
     kernel_count--;
 
+  if (rev_fn_table != NULL && kernel_count == 0)
+    *rev_fn_table = NULL;
+  else if (rev_fn_table != NULL)
+    {
+      hsa_status_t status;
+      hsa_executable_symbol_t var_symbol;
+      status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL,
+						     ".offload_func_table",
+						     agent->id, 0, &var_symbol);
+      if (status != HSA_STATUS_SUCCESS)
+	hsa_fatal ("Could not find symbol for variable in the code object",
+		   status);
+      uint64_t fn_table_addr;
+      status = hsa_fns.hsa_executable_symbol_get_info_fn
+	(var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS,
+	 &fn_table_addr);
+      if (status != HSA_STATUS_SUCCESS)
+	hsa_fatal ("Could not extract a variable from its symbol", status);
+      *rev_fn_table = GOMP_PLUGIN_malloc (kernel_count * sizeof (uint64_t));
+      GOMP_OFFLOAD_dev2host (agent->device_id, *rev_fn_table,
+			     (void*) fn_table_addr,
+			     kernel_count * sizeof (uint64_t));
+    }
+
   return kernel_count + var_count + other_count;
 }
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Patch][2/3] GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
  2022-08-25 15:38 ` [Patch][2/3] GCN: libgomp+mkoffload.cc: " Tobias Burnus
@ 2022-08-25 15:38   ` Tobias Burnus
  2022-09-09 15:31   ` Jakub Jelinek
  1 sibling, 0 replies; 15+ messages in thread
From: Tobias Burnus @ 2022-08-25 15:38 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, Andrew Stubbs


[-- Attachment #1.1: Type: text/plain, Size: 864 bytes --]

On 25.08.22 16:54, Tobias Burnus wrote:

The attached patch prepare for reverse-offload device->host
function-address lookup by requesting (if needed) the on-device address.


This patch adds the actual implementation for GCN. A variant would be
to only generate .offload_func_table inside mkoffload when
OMP_REQUIRES_REVERSE_OFFLOAD has been requested.

This is currently effectively a no op as with [1/3] patch, always NULL
is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as
'omp requires reverse_offload' has been specified.

OK for mainline?

Tobias


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: fn-lookup-gcn.diff --]
[-- Type: text/x-patch, Size: 4673 bytes --]

GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup

Add support to GCN for reverse lookup of function name to prepare for
'omp target device(ancestor:1)'.

gcc/ChangeLog:

	* config/gcn/mkoffload.cc (process_asm): Create .offload_func_table,
	similar to pre-existing .offload_var_table.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Read
	.offload_func_table to populate rev_fn_table when requested.

 gcc/config/gcn/mkoffload.cc | 11 ++++++++++-
 libgomp/plugin/plugin-gcn.c | 26 +++++++++++++++++++++++++-
 2 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 4206448703a..24d327355e3 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -537,63 +537,72 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 	case IN_VARS:
 	  {
 	    char *varname;
 	    unsigned varsize;
 	    if (sscanf (buf, " .8byte %ms\n", &varname))
 	      {
 		fputs (buf, out);
 		fgets (buf, sizeof (buf), in);
 		if (!sscanf (buf, " .8byte %u\n", &varsize))
 		  abort ();
 		var_count++;
 	      }
 	    break;
 	  }
 	case IN_FUNCS:
 	  {
 	    char *funcname;
 	    if (sscanf (buf, "\t.8byte\t%ms\n", &funcname))
 	      {
+		fputs (buf, out);
 		obstack_ptr_grow (&fns_os, funcname);
 		fn_count++;
 		continue;
 	      }
 	    break;
 	  }
 	}
 
       char dummy;
       if (sscanf (buf, " .section .gnu.offload_vars%c", &dummy) > 0)
 	{
 	  state = IN_VARS;
 
 	  /* Add a global symbol to allow plugin-gcn.c to locate the table
 	     at runtime.  It can't use the "offload_var_table.N" emitted by
 	     the compiler because a) they're not global, and b) there's one
 	     for each input file combined into the binary.  */
 	  fputs (buf, out);
 	  fputs ("\t.global .offload_var_table\n"
 		 "\t.type .offload_var_table, @object\n"
 		 ".offload_var_table:\n",
 		 out);
 	}
       else if (sscanf (buf, " .section .gnu.offload_funcs%c", &dummy) > 0)
-	state = IN_FUNCS;
+	{
+	  state = IN_FUNCS;
+	  /* Likewise for .gnu.offload_vars; used for reverse offload. */
+	  fputs (buf, out);
+	  fputs ("\t.global .offload_func_table\n"
+		 "\t.type .offload_func_table, @object\n"
+		 ".offload_func_table:\n",
+		 out);
+	}
       else if (sscanf (buf, " .amdgpu_metadata%c", &dummy) > 0)
 	{
 	  state = IN_METADATA;
 	  regcount.kernel_name = NULL;
 	  regcount.sgpr_count = regcount.vgpr_count = -1;
 	}
       else if (sscanf (buf, " .section %c", &dummy) > 0
 	       || sscanf (buf, " .text%c", &dummy) > 0
 	       || sscanf (buf, " .bss%c", &dummy) > 0
 	       || sscanf (buf, " .data%c", &dummy) > 0
 	       || sscanf (buf, " .ident %c", &dummy) > 0)
 	state = IN_CODE;
       else if (sscanf (buf, " .end_amdgpu_metadata%c", &dummy) > 0)
 	{
 	  state = IN_CODE;
 	  gcc_assert (regcount.kernel_name != NULL
 		      && regcount.sgpr_count >= 0
 		      && regcount.vgpr_count >= 0);
 	  obstack_grow (&regcounts_os, &regcount, sizeof (regcount));
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 363e2950649..00603981866 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3353,7 +3353,7 @@ GOMP_OFFLOAD_init_device (int n)
 int
 GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 			 struct addr_pair **target_table,
-			 uint64_t **rev_fn_table __attribute__((unused)))
+			 uint64_t **rev_fn_table)
 {
   if (GOMP_VERSION_DEV (version) != GOMP_VERSION_GCN)
     {
@@ -3520,6 +3520,30 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   if (module->fini_array_func)
     kernel_count--;
 
+  if (rev_fn_table != NULL && kernel_count == 0)
+    *rev_fn_table = NULL;
+  else if (rev_fn_table != NULL)
+    {
+      hsa_status_t status;
+      hsa_executable_symbol_t var_symbol;
+      status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL,
+						     ".offload_func_table",
+						     agent->id, 0, &var_symbol);
+      if (status != HSA_STATUS_SUCCESS)
+	hsa_fatal ("Could not find symbol for variable in the code object",
+		   status);
+      uint64_t fn_table_addr;
+      status = hsa_fns.hsa_executable_symbol_get_info_fn
+	(var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS,
+	 &fn_table_addr);
+      if (status != HSA_STATUS_SUCCESS)
+	hsa_fatal ("Could not extract a variable from its symbol", status);
+      *rev_fn_table = GOMP_PLUGIN_malloc (kernel_count * sizeof (uint64_t));
+      GOMP_OFFLOAD_dev2host (agent->device_id, *rev_fn_table,
+			     (void*) fn_table_addr,
+			     kernel_count * sizeof (uint64_t));
+    }
+
   return kernel_count + var_count + other_count;
 }
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Patch][2/3] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
  2022-08-25 14:54 [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Tobias Burnus
  2022-08-25 14:54 ` Tobias Burnus
  2022-08-25 15:38 ` [Patch][2/3] GCN: libgomp+mkoffload.cc: " Tobias Burnus
@ 2022-08-25 17:30 ` Tobias Burnus
  2022-08-25 17:30   ` Tobias Burnus
  2022-08-29 18:43   ` [Patch][2/3][v2] " Tobias Burnus
  2022-09-09 15:29 ` [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Jakub Jelinek
  3 siblings, 2 replies; 15+ messages in thread
From: Tobias Burnus @ 2022-08-25 17:30 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 1192 bytes --]

On 25.08.22 16:54, Tobias Burnus wrote:

The attached patch prepare for reverse-offload device->host
function-address lookup by requesting (if needed) the on-device address.


This patch adds the actual implementation for NVPTX.

Having  array[] = {fn1,fn2};  works with nvptx only since sm_35; hence,
if there is a reverse_offload and sm_30 is used, there will be a compile-time
error.

To avoid incompatibilities, I compile with the same PTX ISA .version and
sm_XX version as the (last) file that contains the reverse offload. While
it should not matter, some newer CUDA might not support, e.g., sm_35 or
do not like a specific ISA version - thus, that seemed to be safer.

This is currently effectively a no op as with [1/3] patch, always NULL
is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as
'omp requires reverse_offload' has been specified.

OK for mainline?

Tobias


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: fn-lookup-nvptx.diff --]
[-- Type: text/x-patch, Size: 11019 bytes --]

nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup

Add support to nvptx for reverse lookup of function name to prepare for
'omp target device(ancestor:1)'.

gcc/ChangeLog:

	* config/nvptx/mkoffload.cc (record_id): Strip quotations from
	function name.
	(process): For GOMP_REQUIRES_REVERSE_OFFLOAD, check that -march is
	at least sm_35, create '$offload_func_table' global array and init
	with reverse-offload function addresses.
	* config/nvptx/nvptx.cc (write_fn_proto_1, write_fn_proto): New
	force_public attribute to force .visible.
	(nvptx_declare_function_name): For "omp target
	device_ancestor_nohost" attribut, force .visible/TREE_PUBLIC.

libgomp/ChangeLog:

	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Read offload
	function address table '$offload_func_table' if rev_fn_table
	is not NULL.

 gcc/config/nvptx/mkoffload.cc | 104 ++++++++++++++++++++++++++++++++++++++++--
 gcc/config/nvptx/nvptx.cc     |  20 +++++---
 libgomp/plugin/plugin-nvptx.c |  19 +++++++-
 3 files changed, 131 insertions(+), 12 deletions(-)

diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 3eea0a8f138..c496766b1cc 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -108,12 +108,14 @@ xputenv (const char *string)
 static void
 record_id (const char *p1, id_map ***where)
 {
+  gcc_assert (p1[0] == '"');
+  p1++;
   const char *end = strchr (p1, '\n');
   if (!end)
     fatal_error (input_location, "malformed ptx file");
 
   id_map *v = XNEW (id_map);
-  size_t len = end - p1;
+  size_t len = end - p1 - 1 ; /* remove tailing '"'  */
   v->ptx_name = XNEWVEC (char, len + 1);
   memcpy (v->ptx_name, p1, len);
   v->ptx_name[len] = '\0';
@@ -242,6 +244,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
   id_map const *id;
   unsigned obj_count = 0;
   unsigned ix;
+  const char *sm_ver = NULL, *version = NULL;
+  const char *sm_ver2 = NULL, *version2 = NULL;
+  size_t file_cnt = 0;
+  size_t *file_idx = XALLOCAVEC (size_t, len);
 
   fprintf (out, "#include <stdint.h>\n\n");
 
@@ -250,6 +256,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
   for (size_t i = 0; i != len;)
     {
       char c;
+      bool output_fn_ptr = false;
+      file_idx[file_cnt++] = i;
 
       fprintf (out, "static const char ptx_code_%u[] =\n\t\"", obj_count++);
       while ((c = input[i++]))
@@ -261,6 +269,16 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	    case '\n':
 	      fprintf (out, "\\n\"\n\t\"");
 	      /* Look for mappings on subsequent lines.  */
+	      if (UNLIKELY (startswith (input + i, ".target sm_")))
+		{
+		  sm_ver = input + i + strlen (".target sm_");
+		  continue;
+		}
+	      if (UNLIKELY (startswith (input + i, ".version ")))
+		{
+		  version = input + i + strlen (".version ");
+		  continue;
+		}
 	      while (startswith (input + i, "//:"))
 		{
 		  i += 3;
@@ -268,7 +286,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 		  if (startswith (input + i, "VAR_MAP "))
 		    record_id (input + i + 8, &vars_tail);
 		  else if (startswith (input + i, "FUNC_MAP "))
-		    record_id (input + i + 9, &funcs_tail);
+		    {
+		      output_fn_ptr = true;
+		      record_id (input + i + 9, &funcs_tail);
+		    }
 		  else
 		    abort ();
 		  /* Skip to next line. */
@@ -286,6 +307,81 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	  putc (c, out);
 	}
       fprintf (out, "\";\n\n");
+      if (output_fn_ptr
+	  && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0)
+	{
+	  if (sm_ver && sm_ver[0] == '3' && sm_ver[1] == '0'
+	      && sm_ver[2] == '\n')
+	    fatal_error (input_location,
+			 "%<omp requires reverse_offload%> requires at least "
+			 "%<sm_35%> for %<-misa=%>");
+	  sm_ver2 = sm_ver;
+	  version2 = version;
+	}
+    }
+
+  /* Create function-pointer array, required for reverse
+     offload function-pointer lookup.  */
+
+  if (func_ids && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0)
+    {
+      const char needle[] = "// BEGIN GLOBAL FUNCTION DECL: ";
+      fprintf (out, "static const char ptx_code_%u[] =\n", obj_count++);
+      fprintf (out, "\t\".version ");
+      for (size_t i = 0; version2[i] != '\0' && version2[i] != '\n'; i++)
+	fputc (version2[i], out);
+      fprintf (out, "\"\n\t\".target sm_");
+      for (size_t i = 0; version2[i] != '\0' && sm_ver2[i] != '\n'; i++)
+	fputc (sm_ver2[i], out);
+      fprintf (out, "\"\n\t\".file 1 \\\"<dummy>\\\"\"\n");
+
+      size_t fidx = 0;
+      for (id = func_ids; id; id = id->next)
+	{
+	  /* Only 'nohost' functions are needed - use NULL for the rest.
+	     Alternatively, besides searching for 'BEGIN FUNCTION DECL',
+	     checking for '.visible .entry ' + id->ptx_name would be
+	     required.  */
+	  if (!endswith (id->ptx_name, "$nohost"))
+	    continue;
+	  fprintf (out, "\t\".extern ");
+	  const char *p = input + file_idx[fidx];
+	  while (true)
+	    {
+	      p = strstr (p, needle);
+	      if (!p)
+		{
+		  fidx++;
+		  if (fidx >= file_cnt)
+		    break;
+		  p = input + file_idx[fidx];
+		  continue;
+		}
+	      p += strlen (needle);
+	      if (!startswith (p, id->ptx_name))
+		continue;
+	      p += strlen (id->ptx_name);
+	      if (*p != '\n')
+		continue;
+	      p++;
+	      gcc_assert (startswith (p, ".visible "));
+	      p += strlen (".visible ");
+	      for (; *p != '\0' && *p != '\n'; p++)
+		fputc (*p, out);
+	      break;
+	    }
+	  fprintf (out, "\"\n");
+	  if (fidx == file_cnt)
+	    fatal_error (input_location,
+			 "Cannot find function declaration for %qs",
+			 id->ptx_name);
+	}
+      fprintf (out, "\t\".visible .global .align 8 .u64 "
+		    "$offload_func_table[] = {");
+      for (comma = "", id = func_ids; id; comma = ",", id = id->next)
+	fprintf (out, "%s\"\n\t\t\"%s", comma,
+		 endswith (id->ptx_name, "$nohost") ? id->ptx_name : "0");
+      fprintf (out, "};\\n\";\n\n");
     }
 
   /* Dump out array of pointers to ptx object strings.  */
@@ -300,7 +396,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
   /* Dump out variable idents.  */
   fprintf (out, "static const char *const var_mappings[] = {");
   for (comma = "", id = var_ids; id; comma = ",", id = id->next)
-    fprintf (out, "%s\n\t%s", comma, id->ptx_name);
+    fprintf (out, "%s\n\t\"%s\"", comma, id->ptx_name);
   fprintf (out, "\n};\n\n");
 
   /* Dump out function idents.  */
@@ -309,7 +405,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	   "  unsigned short dim[%d];\n"
 	   "} func_mappings[] = {\n", GOMP_DIM_MAX);
   for (comma = "", id = func_ids; id; comma = ",", id = id->next)
-    fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
+    fprintf (out, "%s\n\t{\"%s\"}", comma, id->ptx_name);
   fprintf (out, "\n};\n\n");
 
   fprintf (out,
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index e4297e2d6c3..3293c096822 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -989,15 +989,15 @@ write_var_marker (FILE *file, bool is_defn, bool globalize, const char *name)
 
 static void
 write_fn_proto_1 (std::stringstream &s, bool is_defn,
-		  const char *name, const_tree decl)
+		  const char *name, const_tree decl, bool force_public)
 {
   if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL)
-    write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
+    write_fn_marker (s, is_defn, TREE_PUBLIC (decl) || force_public, name);
 
   /* PTX declaration.  */
   if (DECL_EXTERNAL (decl))
     s << ".extern ";
-  else if (TREE_PUBLIC (decl))
+  else if (TREE_PUBLIC (decl) || force_public)
     s << (DECL_WEAK (decl) ? ".weak " : ".visible ");
   s << (write_as_kernel (DECL_ATTRIBUTES (decl)) ? ".entry " : ".func ");
 
@@ -1086,7 +1086,7 @@ write_fn_proto_1 (std::stringstream &s, bool is_defn,
 
 static void
 write_fn_proto (std::stringstream &s, bool is_defn,
-		const char *name, const_tree decl)
+		const char *name, const_tree decl, bool force_public=false)
 {
   const char *replacement = nvptx_name_replacement (name);
   char *replaced_dots = NULL;
@@ -1103,9 +1103,9 @@ write_fn_proto (std::stringstream &s, bool is_defn,
 
   if (is_defn)
     /* Emit a declaration.  The PTX assembler gets upset without it.  */
-    write_fn_proto_1 (s, false, name, decl);
+    write_fn_proto_1 (s, false, name, decl, force_public);
 
-  write_fn_proto_1 (s, is_defn, name, decl);
+  write_fn_proto_1 (s, is_defn, name, decl, force_public);
 
   if (replaced_dots)
     XDELETE (replaced_dots);
@@ -1481,7 +1481,13 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
   tree fntype = TREE_TYPE (decl);
   tree result_type = TREE_TYPE (fntype);
   int argno = 0;
+  bool force_public = false;
 
+  /* For reverse-offload 'nohost' functions: In order to be collectable in
+     '$offload_func_table', cf. mkoffload.cc, the function has to be visible. */
+  if (lookup_attribute ("omp target device_ancestor_nohost",
+			DECL_ATTRIBUTES (decl)))
+    force_public = true;
   if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl))
       && !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl)))
     {
@@ -1493,7 +1499,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
   /* We construct the initial part of the function into a string
      stream, in order to share the prototype writing code.  */
   std::stringstream s;
-  write_fn_proto (s, true, name, decl);
+  write_fn_proto (s, true, name, decl, force_public);
   s << "{\n";
 
   bool return_in_mem = write_return_type (s, false, result_type);
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index d130665ed19..ac400fc2a1d 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1273,7 +1273,7 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
 int
 GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 			 struct addr_pair **target_table,
-			 uint64_t **rev_fn_table __attribute__((unused)))
+			 uint64_t **rev_fn_table)
 {
   CUmodule module;
   const char *const *var_names;
@@ -1376,6 +1376,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
     targ_tbl->start = targ_tbl->end = 0;
   targ_tbl++;
 
+  if (rev_fn_table && fn_entries == 0)
+    *rev_fn_table = NULL;
+  else if (rev_fn_table)
+    {
+      CUdeviceptr var;
+      size_t bytes;
+      r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &var, &bytes, module,
+			     "$offload_func_table");
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r));
+      assert (bytes == sizeof (uint64_t) * fn_entries);
+      *rev_fn_table = GOMP_PLUGIN_malloc (sizeof (uint64_t) * fn_entries);
+      r = CUDA_CALL_NOCHECK (cuMemcpyDtoH, *rev_fn_table, var, bytes);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuda_error (r));
+    }
+
   nvptx_set_clocktick (module, dev);
 
   return fn_entries + var_entries + other_entries;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Patch][2/3] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
  2022-08-25 17:30 ` [Patch][2/3] nvptx: " Tobias Burnus
@ 2022-08-25 17:30   ` Tobias Burnus
  2022-08-29 18:43   ` [Patch][2/3][v2] " Tobias Burnus
  1 sibling, 0 replies; 15+ messages in thread
From: Tobias Burnus @ 2022-08-25 17:30 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, Tom de Vries


[-- Attachment #1.1: Type: text/plain, Size: 1192 bytes --]

On 25.08.22 16:54, Tobias Burnus wrote:

The attached patch prepare for reverse-offload device->host
function-address lookup by requesting (if needed) the on-device address.


This patch adds the actual implementation for NVPTX.

Having  array[] = {fn1,fn2};  works with nvptx only since sm_35; hence,
if there is a reverse_offload and sm_30 is used, there will be a compile-time
error.

To avoid incompatibilities, I compile with the same PTX ISA .version and
sm_XX version as the (last) file that contains the reverse offload. While
it should not matter, some newer CUDA might not support, e.g., sm_35 or
do not like a specific ISA version - thus, that seemed to be safer.

This is currently effectively a no op as with [1/3] patch, always NULL
is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as
'omp requires reverse_offload' has been specified.

OK for mainline?

Tobias


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: fn-lookup-nvptx.diff --]
[-- Type: text/x-patch, Size: 11019 bytes --]

nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup

Add support to nvptx for reverse lookup of function name to prepare for
'omp target device(ancestor:1)'.

gcc/ChangeLog:

	* config/nvptx/mkoffload.cc (record_id): Strip quotations from
	function name.
	(process): For GOMP_REQUIRES_REVERSE_OFFLOAD, check that -march is
	at least sm_35, create '$offload_func_table' global array and init
	with reverse-offload function addresses.
	* config/nvptx/nvptx.cc (write_fn_proto_1, write_fn_proto): New
	force_public attribute to force .visible.
	(nvptx_declare_function_name): For "omp target
	device_ancestor_nohost" attribut, force .visible/TREE_PUBLIC.

libgomp/ChangeLog:

	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Read offload
	function address table '$offload_func_table' if rev_fn_table
	is not NULL.

 gcc/config/nvptx/mkoffload.cc | 104 ++++++++++++++++++++++++++++++++++++++++--
 gcc/config/nvptx/nvptx.cc     |  20 +++++---
 libgomp/plugin/plugin-nvptx.c |  19 +++++++-
 3 files changed, 131 insertions(+), 12 deletions(-)

diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 3eea0a8f138..c496766b1cc 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -108,12 +108,14 @@ xputenv (const char *string)
 static void
 record_id (const char *p1, id_map ***where)
 {
+  gcc_assert (p1[0] == '"');
+  p1++;
   const char *end = strchr (p1, '\n');
   if (!end)
     fatal_error (input_location, "malformed ptx file");
 
   id_map *v = XNEW (id_map);
-  size_t len = end - p1;
+  size_t len = end - p1 - 1 ; /* remove tailing '"'  */
   v->ptx_name = XNEWVEC (char, len + 1);
   memcpy (v->ptx_name, p1, len);
   v->ptx_name[len] = '\0';
@@ -242,6 +244,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
   id_map const *id;
   unsigned obj_count = 0;
   unsigned ix;
+  const char *sm_ver = NULL, *version = NULL;
+  const char *sm_ver2 = NULL, *version2 = NULL;
+  size_t file_cnt = 0;
+  size_t *file_idx = XALLOCAVEC (size_t, len);
 
   fprintf (out, "#include <stdint.h>\n\n");
 
@@ -250,6 +256,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
   for (size_t i = 0; i != len;)
     {
       char c;
+      bool output_fn_ptr = false;
+      file_idx[file_cnt++] = i;
 
       fprintf (out, "static const char ptx_code_%u[] =\n\t\"", obj_count++);
       while ((c = input[i++]))
@@ -261,6 +269,16 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	    case '\n':
 	      fprintf (out, "\\n\"\n\t\"");
 	      /* Look for mappings on subsequent lines.  */
+	      if (UNLIKELY (startswith (input + i, ".target sm_")))
+		{
+		  sm_ver = input + i + strlen (".target sm_");
+		  continue;
+		}
+	      if (UNLIKELY (startswith (input + i, ".version ")))
+		{
+		  version = input + i + strlen (".version ");
+		  continue;
+		}
 	      while (startswith (input + i, "//:"))
 		{
 		  i += 3;
@@ -268,7 +286,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 		  if (startswith (input + i, "VAR_MAP "))
 		    record_id (input + i + 8, &vars_tail);
 		  else if (startswith (input + i, "FUNC_MAP "))
-		    record_id (input + i + 9, &funcs_tail);
+		    {
+		      output_fn_ptr = true;
+		      record_id (input + i + 9, &funcs_tail);
+		    }
 		  else
 		    abort ();
 		  /* Skip to next line. */
@@ -286,6 +307,81 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	  putc (c, out);
 	}
       fprintf (out, "\";\n\n");
+      if (output_fn_ptr
+	  && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0)
+	{
+	  if (sm_ver && sm_ver[0] == '3' && sm_ver[1] == '0'
+	      && sm_ver[2] == '\n')
+	    fatal_error (input_location,
+			 "%<omp requires reverse_offload%> requires at least "
+			 "%<sm_35%> for %<-misa=%>");
+	  sm_ver2 = sm_ver;
+	  version2 = version;
+	}
+    }
+
+  /* Create function-pointer array, required for reverse
+     offload function-pointer lookup.  */
+
+  if (func_ids && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0)
+    {
+      const char needle[] = "// BEGIN GLOBAL FUNCTION DECL: ";
+      fprintf (out, "static const char ptx_code_%u[] =\n", obj_count++);
+      fprintf (out, "\t\".version ");
+      for (size_t i = 0; version2[i] != '\0' && version2[i] != '\n'; i++)
+	fputc (version2[i], out);
+      fprintf (out, "\"\n\t\".target sm_");
+      for (size_t i = 0; version2[i] != '\0' && sm_ver2[i] != '\n'; i++)
+	fputc (sm_ver2[i], out);
+      fprintf (out, "\"\n\t\".file 1 \\\"<dummy>\\\"\"\n");
+
+      size_t fidx = 0;
+      for (id = func_ids; id; id = id->next)
+	{
+	  /* Only 'nohost' functions are needed - use NULL for the rest.
+	     Alternatively, besides searching for 'BEGIN FUNCTION DECL',
+	     checking for '.visible .entry ' + id->ptx_name would be
+	     required.  */
+	  if (!endswith (id->ptx_name, "$nohost"))
+	    continue;
+	  fprintf (out, "\t\".extern ");
+	  const char *p = input + file_idx[fidx];
+	  while (true)
+	    {
+	      p = strstr (p, needle);
+	      if (!p)
+		{
+		  fidx++;
+		  if (fidx >= file_cnt)
+		    break;
+		  p = input + file_idx[fidx];
+		  continue;
+		}
+	      p += strlen (needle);
+	      if (!startswith (p, id->ptx_name))
+		continue;
+	      p += strlen (id->ptx_name);
+	      if (*p != '\n')
+		continue;
+	      p++;
+	      gcc_assert (startswith (p, ".visible "));
+	      p += strlen (".visible ");
+	      for (; *p != '\0' && *p != '\n'; p++)
+		fputc (*p, out);
+	      break;
+	    }
+	  fprintf (out, "\"\n");
+	  if (fidx == file_cnt)
+	    fatal_error (input_location,
+			 "Cannot find function declaration for %qs",
+			 id->ptx_name);
+	}
+      fprintf (out, "\t\".visible .global .align 8 .u64 "
+		    "$offload_func_table[] = {");
+      for (comma = "", id = func_ids; id; comma = ",", id = id->next)
+	fprintf (out, "%s\"\n\t\t\"%s", comma,
+		 endswith (id->ptx_name, "$nohost") ? id->ptx_name : "0");
+      fprintf (out, "};\\n\";\n\n");
     }
 
   /* Dump out array of pointers to ptx object strings.  */
@@ -300,7 +396,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
   /* Dump out variable idents.  */
   fprintf (out, "static const char *const var_mappings[] = {");
   for (comma = "", id = var_ids; id; comma = ",", id = id->next)
-    fprintf (out, "%s\n\t%s", comma, id->ptx_name);
+    fprintf (out, "%s\n\t\"%s\"", comma, id->ptx_name);
   fprintf (out, "\n};\n\n");
 
   /* Dump out function idents.  */
@@ -309,7 +405,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	   "  unsigned short dim[%d];\n"
 	   "} func_mappings[] = {\n", GOMP_DIM_MAX);
   for (comma = "", id = func_ids; id; comma = ",", id = id->next)
-    fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
+    fprintf (out, "%s\n\t{\"%s\"}", comma, id->ptx_name);
   fprintf (out, "\n};\n\n");
 
   fprintf (out,
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index e4297e2d6c3..3293c096822 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -989,15 +989,15 @@ write_var_marker (FILE *file, bool is_defn, bool globalize, const char *name)
 
 static void
 write_fn_proto_1 (std::stringstream &s, bool is_defn,
-		  const char *name, const_tree decl)
+		  const char *name, const_tree decl, bool force_public)
 {
   if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL)
-    write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
+    write_fn_marker (s, is_defn, TREE_PUBLIC (decl) || force_public, name);
 
   /* PTX declaration.  */
   if (DECL_EXTERNAL (decl))
     s << ".extern ";
-  else if (TREE_PUBLIC (decl))
+  else if (TREE_PUBLIC (decl) || force_public)
     s << (DECL_WEAK (decl) ? ".weak " : ".visible ");
   s << (write_as_kernel (DECL_ATTRIBUTES (decl)) ? ".entry " : ".func ");
 
@@ -1086,7 +1086,7 @@ write_fn_proto_1 (std::stringstream &s, bool is_defn,
 
 static void
 write_fn_proto (std::stringstream &s, bool is_defn,
-		const char *name, const_tree decl)
+		const char *name, const_tree decl, bool force_public=false)
 {
   const char *replacement = nvptx_name_replacement (name);
   char *replaced_dots = NULL;
@@ -1103,9 +1103,9 @@ write_fn_proto (std::stringstream &s, bool is_defn,
 
   if (is_defn)
     /* Emit a declaration.  The PTX assembler gets upset without it.  */
-    write_fn_proto_1 (s, false, name, decl);
+    write_fn_proto_1 (s, false, name, decl, force_public);
 
-  write_fn_proto_1 (s, is_defn, name, decl);
+  write_fn_proto_1 (s, is_defn, name, decl, force_public);
 
   if (replaced_dots)
     XDELETE (replaced_dots);
@@ -1481,7 +1481,13 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
   tree fntype = TREE_TYPE (decl);
   tree result_type = TREE_TYPE (fntype);
   int argno = 0;
+  bool force_public = false;
 
+  /* For reverse-offload 'nohost' functions: In order to be collectable in
+     '$offload_func_table', cf. mkoffload.cc, the function has to be visible. */
+  if (lookup_attribute ("omp target device_ancestor_nohost",
+			DECL_ATTRIBUTES (decl)))
+    force_public = true;
   if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl))
       && !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl)))
     {
@@ -1493,7 +1499,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
   /* We construct the initial part of the function into a string
      stream, in order to share the prototype writing code.  */
   std::stringstream s;
-  write_fn_proto (s, true, name, decl);
+  write_fn_proto (s, true, name, decl, force_public);
   s << "{\n";
 
   bool return_in_mem = write_return_type (s, false, result_type);
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index d130665ed19..ac400fc2a1d 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1273,7 +1273,7 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
 int
 GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 			 struct addr_pair **target_table,
-			 uint64_t **rev_fn_table __attribute__((unused)))
+			 uint64_t **rev_fn_table)
 {
   CUmodule module;
   const char *const *var_names;
@@ -1376,6 +1376,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
     targ_tbl->start = targ_tbl->end = 0;
   targ_tbl++;
 
+  if (rev_fn_table && fn_entries == 0)
+    *rev_fn_table = NULL;
+  else if (rev_fn_table)
+    {
+      CUdeviceptr var;
+      size_t bytes;
+      r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &var, &bytes, module,
+			     "$offload_func_table");
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r));
+      assert (bytes == sizeof (uint64_t) * fn_entries);
+      *rev_fn_table = GOMP_PLUGIN_malloc (sizeof (uint64_t) * fn_entries);
+      r = CUDA_CALL_NOCHECK (cuMemcpyDtoH, *rev_fn_table, var, bytes);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuda_error (r));
+    }
+
   nvptx_set_clocktick (module, dev);
 
   return fn_entries + var_entries + other_entries;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
  2022-08-25 17:30 ` [Patch][2/3] nvptx: " Tobias Burnus
  2022-08-25 17:30   ` Tobias Burnus
@ 2022-08-29 18:43   ` Tobias Burnus
  2022-08-29 18:43     ` Tobias Burnus
                       ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Tobias Burnus @ 2022-08-29 18:43 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 1327 bytes --]

Slightly revised version, fixing some issues in mkoffload.cc. Otherwise, the same applies:

On 25.08.22 19:30, Tobias Burnus wrote:
On 25.08.22 16:54, Tobias Burnus wrote:

The attached patch prepare for reverse-offload device->host
function-address lookup by requesting (if needed) the on-device address.


This patch adds the actual implementation for NVPTX.

Having  array[] = {fn1,fn2};  works with nvptx only since sm_35; hence,
if there is a reverse_offload and sm_30 is used, there will be a compile-time
error.

To avoid incompatibilities, I compile with the same PTX ISA .version and
sm_XX version as the (last) file that contains the reverse offload. While
it should not matter, some newer CUDA might not support, e.g., sm_35 or
do not like a specific ISA version - thus, that seemed to be safer.

This is currently effectively a no op as with [1/3] patch, always NULL
is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as
'omp requires reverse_offload' has been specified.

OK for mainline?


Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: fn-lookup-nvptx-v2.diff --]
[-- Type: text/x-patch, Size: 11610 bytes --]

nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup

Add support to nvptx for reverse lookup of function name to prepare for
'omp target device(ancestor:1)'.

gcc/ChangeLog:

	* config/nvptx/mkoffload.cc (struct id_map): Add 'dim' member.
	(record_id): Store func name without quotes, store dim separately.
	(process): For GOMP_REQUIRES_REVERSE_OFFLOAD, check that -march is
	at least sm_35, create '$offload_func_table' global array and init
	with reverse-offload function addresses.
	* config/nvptx/nvptx.cc (write_fn_proto_1, write_fn_proto): New
	force_public attribute to force .visible.
	(nvptx_declare_function_name): For "omp target
	device_ancestor_nohost" attribut, force .visible/TREE_PUBLIC.

libgomp/ChangeLog:

	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Read offload
	function address table '$offload_func_table' if rev_fn_table
	is not NULL.

 gcc/config/nvptx/mkoffload.cc | 119 ++++++++++++++++++++++++++++++++++++++++--
 gcc/config/nvptx/nvptx.cc     |  20 +++++---
 libgomp/plugin/plugin-nvptx.c |  19 +++++++-
 3 files changed, 146 insertions(+), 12 deletions(-)

diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 3eea0a8f138..834b2059aac 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -47,6 +47,7 @@ struct id_map
 {
   id_map *next;
   char *ptx_name;
+  char *dim;
 };
 
 static id_map *func_ids, **funcs_tail = &func_ids;
@@ -108,8 +109,11 @@ xputenv (const char *string)
 static void
 record_id (const char *p1, id_map ***where)
 {
-  const char *end = strchr (p1, '\n');
-  if (!end)
+  gcc_assert (p1[0] == '"');
+  p1++;
+  const char *end = strchr (p1, '"');
+  const char *end2 = strchr (p1, '\n');
+  if (!end || !end2 || end >= end2)
     fatal_error (input_location, "malformed ptx file");
 
   id_map *v = XNEW (id_map);
@@ -117,6 +121,16 @@ record_id (const char *p1, id_map ***where)
   v->ptx_name = XNEWVEC (char, len + 1);
   memcpy (v->ptx_name, p1, len);
   v->ptx_name[len] = '\0';
+  p1 = end + 1;
+  if (*end != '\n')
+    {
+      len = end2 - p1;
+      v->dim = XNEWVEC (char, len + 1);
+      memcpy (v->dim, p1, len);
+      v->dim[len] = '\0';
+    }
+  else
+    v->dim = NULL;
   v->next = NULL;
   id_map **tail = *where;
   *tail = v;
@@ -242,6 +256,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
   id_map const *id;
   unsigned obj_count = 0;
   unsigned ix;
+  const char *sm_ver = NULL, *version = NULL;
+  const char *sm_ver2 = NULL, *version2 = NULL;
+  size_t file_cnt = 0;
+  size_t *file_idx = XALLOCAVEC (size_t, len);
 
   fprintf (out, "#include <stdint.h>\n\n");
 
@@ -250,6 +268,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
   for (size_t i = 0; i != len;)
     {
       char c;
+      bool output_fn_ptr = false;
+      file_idx[file_cnt++] = i;
 
       fprintf (out, "static const char ptx_code_%u[] =\n\t\"", obj_count++);
       while ((c = input[i++]))
@@ -261,6 +281,16 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	    case '\n':
 	      fprintf (out, "\\n\"\n\t\"");
 	      /* Look for mappings on subsequent lines.  */
+	      if (UNLIKELY (startswith (input + i, ".target sm_")))
+		{
+		  sm_ver = input + i + strlen (".target sm_");
+		  continue;
+		}
+	      if (UNLIKELY (startswith (input + i, ".version ")))
+		{
+		  version = input + i + strlen (".version ");
+		  continue;
+		}
 	      while (startswith (input + i, "//:"))
 		{
 		  i += 3;
@@ -268,7 +298,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 		  if (startswith (input + i, "VAR_MAP "))
 		    record_id (input + i + 8, &vars_tail);
 		  else if (startswith (input + i, "FUNC_MAP "))
-		    record_id (input + i + 9, &funcs_tail);
+		    {
+		      output_fn_ptr = true;
+		      record_id (input + i + 9, &funcs_tail);
+		    }
 		  else
 		    abort ();
 		  /* Skip to next line. */
@@ -286,6 +319,81 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	  putc (c, out);
 	}
       fprintf (out, "\";\n\n");
+      if (output_fn_ptr
+	  && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0)
+	{
+	  if (sm_ver && sm_ver[0] == '3' && sm_ver[1] == '0'
+	      && sm_ver[2] == '\n')
+	    fatal_error (input_location,
+			 "%<omp requires reverse_offload%> requires at least "
+			 "%<sm_35%> for %<-misa=%>");
+	  sm_ver2 = sm_ver;
+	  version2 = version;
+	}
+    }
+
+  /* Create function-pointer array, required for reverse
+     offload function-pointer lookup.  */
+
+  if (func_ids && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0)
+    {
+      const char needle[] = "// BEGIN GLOBAL FUNCTION DECL: ";
+      fprintf (out, "static const char ptx_code_%u[] =\n", obj_count++);
+      fprintf (out, "\t\".version ");
+      for (size_t i = 0; version2[i] != '\0' && version2[i] != '\n'; i++)
+	fputc (version2[i], out);
+      fprintf (out, "\"\n\t\".target sm_");
+      for (size_t i = 0; version2[i] != '\0' && sm_ver2[i] != '\n'; i++)
+	fputc (sm_ver2[i], out);
+      fprintf (out, "\"\n\t\".file 1 \\\"<dummy>\\\"\"\n");
+
+      size_t fidx = 0;
+      for (id = func_ids; id; id = id->next)
+	{
+	  /* Only 'nohost' functions are needed - use NULL for the rest.
+	     Alternatively, besides searching for 'BEGIN FUNCTION DECL',
+	     checking for '.visible .entry ' + id->ptx_name would be
+	     required.  */
+	  if (!endswith (id->ptx_name, "$nohost"))
+	    continue;
+	  fprintf (out, "\t\".extern ");
+	  const char *p = input + file_idx[fidx];
+	  while (true)
+	    {
+	      p = strstr (p, needle);
+	      if (!p)
+		{
+		  fidx++;
+		  if (fidx >= file_cnt)
+		    break;
+		  p = input + file_idx[fidx];
+		  continue;
+		}
+	      p += strlen (needle);
+	      if (!startswith (p, id->ptx_name))
+		continue;
+	      p += strlen (id->ptx_name);
+	      if (*p != '\n')
+		continue;
+	      p++;
+	      gcc_assert (startswith (p, ".visible "));
+	      p += strlen (".visible ");
+	      for (; *p != '\0' && *p != '\n'; p++)
+		fputc (*p, out);
+	      break;
+	    }
+	  fprintf (out, "\"\n");
+	  if (fidx == file_cnt)
+	    fatal_error (input_location,
+			 "Cannot find function declaration for %qs",
+			 id->ptx_name);
+	}
+      fprintf (out, "\t\".visible .global .align 8 .u64 "
+		    "$offload_func_table[] = {");
+      for (comma = "", id = func_ids; id; comma = ",", id = id->next)
+	fprintf (out, "%s\"\n\t\t\"%s", comma,
+		 endswith (id->ptx_name, "$nohost") ? id->ptx_name : "0");
+      fprintf (out, "};\\n\";\n\n");
     }
 
   /* Dump out array of pointers to ptx object strings.  */
@@ -300,7 +408,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
   /* Dump out variable idents.  */
   fprintf (out, "static const char *const var_mappings[] = {");
   for (comma = "", id = var_ids; id; comma = ",", id = id->next)
-    fprintf (out, "%s\n\t%s", comma, id->ptx_name);
+    fprintf (out, "%s\n\t\"%s\"", comma, id->ptx_name);
   fprintf (out, "\n};\n\n");
 
   /* Dump out function idents.  */
@@ -309,7 +417,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	   "  unsigned short dim[%d];\n"
 	   "} func_mappings[] = {\n", GOMP_DIM_MAX);
   for (comma = "", id = func_ids; id; comma = ",", id = id->next)
-    fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
+    fprintf (out, "%s\n\t{\"%s\"%s}", comma, id->ptx_name,
+	     id->dim ? id->dim : "");
   fprintf (out, "\n};\n\n");
 
   fprintf (out,
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index e4297e2d6c3..3293c096822 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -989,15 +989,15 @@ write_var_marker (FILE *file, bool is_defn, bool globalize, const char *name)
 
 static void
 write_fn_proto_1 (std::stringstream &s, bool is_defn,
-		  const char *name, const_tree decl)
+		  const char *name, const_tree decl, bool force_public)
 {
   if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL)
-    write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
+    write_fn_marker (s, is_defn, TREE_PUBLIC (decl) || force_public, name);
 
   /* PTX declaration.  */
   if (DECL_EXTERNAL (decl))
     s << ".extern ";
-  else if (TREE_PUBLIC (decl))
+  else if (TREE_PUBLIC (decl) || force_public)
     s << (DECL_WEAK (decl) ? ".weak " : ".visible ");
   s << (write_as_kernel (DECL_ATTRIBUTES (decl)) ? ".entry " : ".func ");
 
@@ -1086,7 +1086,7 @@ write_fn_proto_1 (std::stringstream &s, bool is_defn,
 
 static void
 write_fn_proto (std::stringstream &s, bool is_defn,
-		const char *name, const_tree decl)
+		const char *name, const_tree decl, bool force_public=false)
 {
   const char *replacement = nvptx_name_replacement (name);
   char *replaced_dots = NULL;
@@ -1103,9 +1103,9 @@ write_fn_proto (std::stringstream &s, bool is_defn,
 
   if (is_defn)
     /* Emit a declaration.  The PTX assembler gets upset without it.  */
-    write_fn_proto_1 (s, false, name, decl);
+    write_fn_proto_1 (s, false, name, decl, force_public);
 
-  write_fn_proto_1 (s, is_defn, name, decl);
+  write_fn_proto_1 (s, is_defn, name, decl, force_public);
 
   if (replaced_dots)
     XDELETE (replaced_dots);
@@ -1481,7 +1481,13 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
   tree fntype = TREE_TYPE (decl);
   tree result_type = TREE_TYPE (fntype);
   int argno = 0;
+  bool force_public = false;
 
+  /* For reverse-offload 'nohost' functions: In order to be collectable in
+     '$offload_func_table', cf. mkoffload.cc, the function has to be visible. */
+  if (lookup_attribute ("omp target device_ancestor_nohost",
+			DECL_ATTRIBUTES (decl)))
+    force_public = true;
   if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl))
       && !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl)))
     {
@@ -1493,7 +1499,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
   /* We construct the initial part of the function into a string
      stream, in order to share the prototype writing code.  */
   std::stringstream s;
-  write_fn_proto (s, true, name, decl);
+  write_fn_proto (s, true, name, decl, force_public);
   s << "{\n";
 
   bool return_in_mem = write_return_type (s, false, result_type);
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index d130665ed19..ac400fc2a1d 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1273,7 +1273,7 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
 int
 GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 			 struct addr_pair **target_table,
-			 uint64_t **rev_fn_table __attribute__((unused)))
+			 uint64_t **rev_fn_table)
 {
   CUmodule module;
   const char *const *var_names;
@@ -1376,6 +1376,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
     targ_tbl->start = targ_tbl->end = 0;
   targ_tbl++;
 
+  if (rev_fn_table && fn_entries == 0)
+    *rev_fn_table = NULL;
+  else if (rev_fn_table)
+    {
+      CUdeviceptr var;
+      size_t bytes;
+      r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &var, &bytes, module,
+			     "$offload_func_table");
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r));
+      assert (bytes == sizeof (uint64_t) * fn_entries);
+      *rev_fn_table = GOMP_PLUGIN_malloc (sizeof (uint64_t) * fn_entries);
+      r = CUDA_CALL_NOCHECK (cuMemcpyDtoH, *rev_fn_table, var, bytes);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuda_error (r));
+    }
+
   nvptx_set_clocktick (module, dev);
 
   return fn_entries + var_entries + other_entries;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
  2022-08-29 18:43   ` [Patch][2/3][v2] " Tobias Burnus
@ 2022-08-29 18:43     ` Tobias Burnus
  2022-09-09 15:36     ` Jakub Jelinek
  2022-09-23 15:40     ` [og12] Come up with {,UN}LIKELY macros (was: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Thomas Schwinge
  2 siblings, 0 replies; 15+ messages in thread
From: Tobias Burnus @ 2022-08-29 18:43 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, Tom de Vries


[-- Attachment #1.1: Type: text/plain, Size: 1327 bytes --]

Slightly revised version, fixing some issues in mkoffload.cc. Otherwise, the same applies:

On 25.08.22 19:30, Tobias Burnus wrote:
On 25.08.22 16:54, Tobias Burnus wrote:

The attached patch prepare for reverse-offload device->host
function-address lookup by requesting (if needed) the on-device address.


This patch adds the actual implementation for NVPTX.

Having  array[] = {fn1,fn2};  works with nvptx only since sm_35; hence,
if there is a reverse_offload and sm_30 is used, there will be a compile-time
error.

To avoid incompatibilities, I compile with the same PTX ISA .version and
sm_XX version as the (last) file that contains the reverse offload. While
it should not matter, some newer CUDA might not support, e.g., sm_35 or
do not like a specific ISA version - thus, that seemed to be safer.

This is currently effectively a no op as with [1/3] patch, always NULL
is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as
'omp requires reverse_offload' has been specified.

OK for mainline?


Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: fn-lookup-nvptx-v2.diff --]
[-- Type: text/x-patch, Size: 11610 bytes --]

nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup

Add support to nvptx for reverse lookup of function name to prepare for
'omp target device(ancestor:1)'.

gcc/ChangeLog:

	* config/nvptx/mkoffload.cc (struct id_map): Add 'dim' member.
	(record_id): Store func name without quotes, store dim separately.
	(process): For GOMP_REQUIRES_REVERSE_OFFLOAD, check that -march is
	at least sm_35, create '$offload_func_table' global array and init
	with reverse-offload function addresses.
	* config/nvptx/nvptx.cc (write_fn_proto_1, write_fn_proto): New
	force_public attribute to force .visible.
	(nvptx_declare_function_name): For "omp target
	device_ancestor_nohost" attribut, force .visible/TREE_PUBLIC.

libgomp/ChangeLog:

	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Read offload
	function address table '$offload_func_table' if rev_fn_table
	is not NULL.

 gcc/config/nvptx/mkoffload.cc | 119 ++++++++++++++++++++++++++++++++++++++++--
 gcc/config/nvptx/nvptx.cc     |  20 +++++---
 libgomp/plugin/plugin-nvptx.c |  19 +++++++-
 3 files changed, 146 insertions(+), 12 deletions(-)

diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 3eea0a8f138..834b2059aac 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -47,6 +47,7 @@ struct id_map
 {
   id_map *next;
   char *ptx_name;
+  char *dim;
 };
 
 static id_map *func_ids, **funcs_tail = &func_ids;
@@ -108,8 +109,11 @@ xputenv (const char *string)
 static void
 record_id (const char *p1, id_map ***where)
 {
-  const char *end = strchr (p1, '\n');
-  if (!end)
+  gcc_assert (p1[0] == '"');
+  p1++;
+  const char *end = strchr (p1, '"');
+  const char *end2 = strchr (p1, '\n');
+  if (!end || !end2 || end >= end2)
     fatal_error (input_location, "malformed ptx file");
 
   id_map *v = XNEW (id_map);
@@ -117,6 +121,16 @@ record_id (const char *p1, id_map ***where)
   v->ptx_name = XNEWVEC (char, len + 1);
   memcpy (v->ptx_name, p1, len);
   v->ptx_name[len] = '\0';
+  p1 = end + 1;
+  if (*end != '\n')
+    {
+      len = end2 - p1;
+      v->dim = XNEWVEC (char, len + 1);
+      memcpy (v->dim, p1, len);
+      v->dim[len] = '\0';
+    }
+  else
+    v->dim = NULL;
   v->next = NULL;
   id_map **tail = *where;
   *tail = v;
@@ -242,6 +256,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
   id_map const *id;
   unsigned obj_count = 0;
   unsigned ix;
+  const char *sm_ver = NULL, *version = NULL;
+  const char *sm_ver2 = NULL, *version2 = NULL;
+  size_t file_cnt = 0;
+  size_t *file_idx = XALLOCAVEC (size_t, len);
 
   fprintf (out, "#include <stdint.h>\n\n");
 
@@ -250,6 +268,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
   for (size_t i = 0; i != len;)
     {
       char c;
+      bool output_fn_ptr = false;
+      file_idx[file_cnt++] = i;
 
       fprintf (out, "static const char ptx_code_%u[] =\n\t\"", obj_count++);
       while ((c = input[i++]))
@@ -261,6 +281,16 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	    case '\n':
 	      fprintf (out, "\\n\"\n\t\"");
 	      /* Look for mappings on subsequent lines.  */
+	      if (UNLIKELY (startswith (input + i, ".target sm_")))
+		{
+		  sm_ver = input + i + strlen (".target sm_");
+		  continue;
+		}
+	      if (UNLIKELY (startswith (input + i, ".version ")))
+		{
+		  version = input + i + strlen (".version ");
+		  continue;
+		}
 	      while (startswith (input + i, "//:"))
 		{
 		  i += 3;
@@ -268,7 +298,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 		  if (startswith (input + i, "VAR_MAP "))
 		    record_id (input + i + 8, &vars_tail);
 		  else if (startswith (input + i, "FUNC_MAP "))
-		    record_id (input + i + 9, &funcs_tail);
+		    {
+		      output_fn_ptr = true;
+		      record_id (input + i + 9, &funcs_tail);
+		    }
 		  else
 		    abort ();
 		  /* Skip to next line. */
@@ -286,6 +319,81 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	  putc (c, out);
 	}
       fprintf (out, "\";\n\n");
+      if (output_fn_ptr
+	  && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0)
+	{
+	  if (sm_ver && sm_ver[0] == '3' && sm_ver[1] == '0'
+	      && sm_ver[2] == '\n')
+	    fatal_error (input_location,
+			 "%<omp requires reverse_offload%> requires at least "
+			 "%<sm_35%> for %<-misa=%>");
+	  sm_ver2 = sm_ver;
+	  version2 = version;
+	}
+    }
+
+  /* Create function-pointer array, required for reverse
+     offload function-pointer lookup.  */
+
+  if (func_ids && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0)
+    {
+      const char needle[] = "// BEGIN GLOBAL FUNCTION DECL: ";
+      fprintf (out, "static const char ptx_code_%u[] =\n", obj_count++);
+      fprintf (out, "\t\".version ");
+      for (size_t i = 0; version2[i] != '\0' && version2[i] != '\n'; i++)
+	fputc (version2[i], out);
+      fprintf (out, "\"\n\t\".target sm_");
+      for (size_t i = 0; version2[i] != '\0' && sm_ver2[i] != '\n'; i++)
+	fputc (sm_ver2[i], out);
+      fprintf (out, "\"\n\t\".file 1 \\\"<dummy>\\\"\"\n");
+
+      size_t fidx = 0;
+      for (id = func_ids; id; id = id->next)
+	{
+	  /* Only 'nohost' functions are needed - use NULL for the rest.
+	     Alternatively, besides searching for 'BEGIN FUNCTION DECL',
+	     checking for '.visible .entry ' + id->ptx_name would be
+	     required.  */
+	  if (!endswith (id->ptx_name, "$nohost"))
+	    continue;
+	  fprintf (out, "\t\".extern ");
+	  const char *p = input + file_idx[fidx];
+	  while (true)
+	    {
+	      p = strstr (p, needle);
+	      if (!p)
+		{
+		  fidx++;
+		  if (fidx >= file_cnt)
+		    break;
+		  p = input + file_idx[fidx];
+		  continue;
+		}
+	      p += strlen (needle);
+	      if (!startswith (p, id->ptx_name))
+		continue;
+	      p += strlen (id->ptx_name);
+	      if (*p != '\n')
+		continue;
+	      p++;
+	      gcc_assert (startswith (p, ".visible "));
+	      p += strlen (".visible ");
+	      for (; *p != '\0' && *p != '\n'; p++)
+		fputc (*p, out);
+	      break;
+	    }
+	  fprintf (out, "\"\n");
+	  if (fidx == file_cnt)
+	    fatal_error (input_location,
+			 "Cannot find function declaration for %qs",
+			 id->ptx_name);
+	}
+      fprintf (out, "\t\".visible .global .align 8 .u64 "
+		    "$offload_func_table[] = {");
+      for (comma = "", id = func_ids; id; comma = ",", id = id->next)
+	fprintf (out, "%s\"\n\t\t\"%s", comma,
+		 endswith (id->ptx_name, "$nohost") ? id->ptx_name : "0");
+      fprintf (out, "};\\n\";\n\n");
     }
 
   /* Dump out array of pointers to ptx object strings.  */
@@ -300,7 +408,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
   /* Dump out variable idents.  */
   fprintf (out, "static const char *const var_mappings[] = {");
   for (comma = "", id = var_ids; id; comma = ",", id = id->next)
-    fprintf (out, "%s\n\t%s", comma, id->ptx_name);
+    fprintf (out, "%s\n\t\"%s\"", comma, id->ptx_name);
   fprintf (out, "\n};\n\n");
 
   /* Dump out function idents.  */
@@ -309,7 +417,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	   "  unsigned short dim[%d];\n"
 	   "} func_mappings[] = {\n", GOMP_DIM_MAX);
   for (comma = "", id = func_ids; id; comma = ",", id = id->next)
-    fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
+    fprintf (out, "%s\n\t{\"%s\"%s}", comma, id->ptx_name,
+	     id->dim ? id->dim : "");
   fprintf (out, "\n};\n\n");
 
   fprintf (out,
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index e4297e2d6c3..3293c096822 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -989,15 +989,15 @@ write_var_marker (FILE *file, bool is_defn, bool globalize, const char *name)
 
 static void
 write_fn_proto_1 (std::stringstream &s, bool is_defn,
-		  const char *name, const_tree decl)
+		  const char *name, const_tree decl, bool force_public)
 {
   if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL)
-    write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
+    write_fn_marker (s, is_defn, TREE_PUBLIC (decl) || force_public, name);
 
   /* PTX declaration.  */
   if (DECL_EXTERNAL (decl))
     s << ".extern ";
-  else if (TREE_PUBLIC (decl))
+  else if (TREE_PUBLIC (decl) || force_public)
     s << (DECL_WEAK (decl) ? ".weak " : ".visible ");
   s << (write_as_kernel (DECL_ATTRIBUTES (decl)) ? ".entry " : ".func ");
 
@@ -1086,7 +1086,7 @@ write_fn_proto_1 (std::stringstream &s, bool is_defn,
 
 static void
 write_fn_proto (std::stringstream &s, bool is_defn,
-		const char *name, const_tree decl)
+		const char *name, const_tree decl, bool force_public=false)
 {
   const char *replacement = nvptx_name_replacement (name);
   char *replaced_dots = NULL;
@@ -1103,9 +1103,9 @@ write_fn_proto (std::stringstream &s, bool is_defn,
 
   if (is_defn)
     /* Emit a declaration.  The PTX assembler gets upset without it.  */
-    write_fn_proto_1 (s, false, name, decl);
+    write_fn_proto_1 (s, false, name, decl, force_public);
 
-  write_fn_proto_1 (s, is_defn, name, decl);
+  write_fn_proto_1 (s, is_defn, name, decl, force_public);
 
   if (replaced_dots)
     XDELETE (replaced_dots);
@@ -1481,7 +1481,13 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
   tree fntype = TREE_TYPE (decl);
   tree result_type = TREE_TYPE (fntype);
   int argno = 0;
+  bool force_public = false;
 
+  /* For reverse-offload 'nohost' functions: In order to be collectable in
+     '$offload_func_table', cf. mkoffload.cc, the function has to be visible. */
+  if (lookup_attribute ("omp target device_ancestor_nohost",
+			DECL_ATTRIBUTES (decl)))
+    force_public = true;
   if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl))
       && !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl)))
     {
@@ -1493,7 +1499,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
   /* We construct the initial part of the function into a string
      stream, in order to share the prototype writing code.  */
   std::stringstream s;
-  write_fn_proto (s, true, name, decl);
+  write_fn_proto (s, true, name, decl, force_public);
   s << "{\n";
 
   bool return_in_mem = write_return_type (s, false, result_type);
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index d130665ed19..ac400fc2a1d 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1273,7 +1273,7 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
 int
 GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 			 struct addr_pair **target_table,
-			 uint64_t **rev_fn_table __attribute__((unused)))
+			 uint64_t **rev_fn_table)
 {
   CUmodule module;
   const char *const *var_names;
@@ -1376,6 +1376,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
     targ_tbl->start = targ_tbl->end = 0;
   targ_tbl++;
 
+  if (rev_fn_table && fn_entries == 0)
+    *rev_fn_table = NULL;
+  else if (rev_fn_table)
+    {
+      CUdeviceptr var;
+      size_t bytes;
+      r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &var, &bytes, module,
+			     "$offload_func_table");
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r));
+      assert (bytes == sizeof (uint64_t) * fn_entries);
+      *rev_fn_table = GOMP_PLUGIN_malloc (sizeof (uint64_t) * fn_entries);
+      r = CUDA_CALL_NOCHECK (cuMemcpyDtoH, *rev_fn_table, var, bytes);
+      if (r != CUDA_SUCCESS)
+	GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuda_error (r));
+    }
+
   nvptx_set_clocktick (module, dev);
 
   return fn_entries + var_entries + other_entries;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch][1/3] libgomp: Prepare for reverse offload fn lookup
  2022-08-25 14:54 [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Tobias Burnus
                   ` (2 preceding siblings ...)
  2022-08-25 17:30 ` [Patch][2/3] nvptx: " Tobias Burnus
@ 2022-09-09 15:29 ` Jakub Jelinek
  3 siblings, 0 replies; 15+ messages in thread
From: Jakub Jelinek @ 2022-09-09 15:29 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: gcc-patches

On Thu, Aug 25, 2022 at 04:54:51PM +0200, Tobias Burnus wrote:
> Technically, this patch is stand alone, but conceptually it based on the
> submitted but not reviewed patch:
> "[Patch] OpenMP: Support reverse offload (middle end part)"
> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598662.html
> 
> With that patch, for reverse offloads ('omp target device(ancestor:1)'),
> calls like the following are added:
>  GOMP_target_ext (-2 /* initial device */, omp_fn.1
> where 'omp_fn.1' on nonhost devices a stub function just required for
> looking up the host function pointer via the offload_funcs table.
> 
> The attached patch prepare for reverse-offload device->host
> function-address lookup by requesting (if needed) the on-device address.
> 
> OK for mainline?
> 
> Tobias
> 
> 
> -----------------
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

> libgomp: Prepare for reverse offload fn lookup
> 
> Prepare for reverse-offloading function-pointer lookup by passing
> a rev_fn_table argument to GOMP_OFFLOAD_load_image.
> 
> The argument will be NULL, unless GOMP_REQUIRES_REVERSE_OFFLOAD is
> requested and devices not supported it, are filtered out.
> (Up to and including this commit, no non-host device claims such
> support and the caller currently always passes NULL.)
> 
> libgomp/ChangeLog:
> 
> 	* libgomp-plugin.h (GOMP_OFFLOAD_load_image): Add
> 	'uint64_t **rev_fn_table' argument.
> 	* oacc-host.c (host_load_image): Likewise.
> 	* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Likewise;
> 	currently unused.
> 	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
> 	* target.c (gomp_load_image_to_device): Update call but pass
> 	NULL for now.
> 
> liboffloadmic/ChangeLog:
> 
> 	* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_load_image):
> 	Add (unused) uint64_t **rev_fn_table argument.

Ok, thanks.

	Jakub


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch][2/3] GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
  2022-08-25 15:38 ` [Patch][2/3] GCN: libgomp+mkoffload.cc: " Tobias Burnus
  2022-08-25 15:38   ` Tobias Burnus
@ 2022-09-09 15:31   ` Jakub Jelinek
  1 sibling, 0 replies; 15+ messages in thread
From: Jakub Jelinek @ 2022-09-09 15:31 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: gcc-patches, Andrew Stubbs

On Thu, Aug 25, 2022 at 05:38:58PM +0200, Tobias Burnus wrote:
> On 25.08.22 16:54, Tobias Burnus wrote:
> 
> The attached patch prepare for reverse-offload device->host
> function-address lookup by requesting (if needed) the on-device address.
> 
> 
> This patch adds the actual implementation for GCN. A variant would be
> to only generate .offload_func_table inside mkoffload when
> OMP_REQUIRES_REVERSE_OFFLOAD has been requested.
> 
> This is currently effectively a no op as with [1/3] patch, always NULL
> is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as
> 'omp requires reverse_offload' has been specified.
> 
> OK for mainline?
> 
> Tobias
> 
> 
> -----------------
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

> GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
> 
> Add support to GCN for reverse lookup of function name to prepare for
> 'omp target device(ancestor:1)'.
> 
> gcc/ChangeLog:
> 
> 	* config/gcn/mkoffload.cc (process_asm): Create .offload_func_table,
> 	similar to pre-existing .offload_var_table.
> 
> libgomp/ChangeLog:
> 
> 	* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Read
> 	.offload_func_table to populate rev_fn_table when requested.

Ok.

	Jakub


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
  2022-08-29 18:43   ` [Patch][2/3][v2] " Tobias Burnus
  2022-08-29 18:43     ` Tobias Burnus
@ 2022-09-09 15:36     ` Jakub Jelinek
  2022-09-12 12:02       ` [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible (was: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Tobias Burnus
  2022-09-23 15:40     ` [og12] Come up with {,UN}LIKELY macros (was: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Thomas Schwinge
  2 siblings, 1 reply; 15+ messages in thread
From: Jakub Jelinek @ 2022-09-09 15:36 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: gcc-patches, Tom de Vries

On Mon, Aug 29, 2022 at 08:43:26PM +0200, Tobias Burnus wrote:
> Slightly revised version, fixing some issues in mkoffload.cc. Otherwise, the same applies:
> 
> On 25.08.22 19:30, Tobias Burnus wrote:
> On 25.08.22 16:54, Tobias Burnus wrote:
> 
> The attached patch prepare for reverse-offload device->host
> function-address lookup by requesting (if needed) the on-device address.
> 
> 
> This patch adds the actual implementation for NVPTX.
> 
> Having  array[] = {fn1,fn2};  works with nvptx only since sm_35; hence,
> if there is a reverse_offload and sm_30 is used, there will be a compile-time
> error.

Wonder if we instead shouldn't arrange for silent request for no PTX
offloading (or one with warning?) if sm_30 and reverse offload is needed.
Error might be too harsh, the program can still offload to GCN or host
just fine...

Otherwise LGTM.

	Jakub


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible (was: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup)
  2022-09-09 15:36     ` Jakub Jelinek
@ 2022-09-12 12:02       ` Tobias Burnus
  2022-09-12 12:10         ` Jakub Jelinek
  2022-10-17 11:59         ` Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c' (was: [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible) Thomas Schwinge
  0 siblings, 2 replies; 15+ messages in thread
From: Tobias Burnus @ 2022-09-12 12:02 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Tom de Vries


[-- Attachment #1.1: Type: text/plain, Size: 1784 bytes --]

On 09.09.22 17:36, Jakub Jelinek wrote:

Wonder if we instead shouldn't arrange for silent request for no PTX
offloading (or one with warning?) if sm_30 and reverse offload is needed.
Error might be too harsh, the program can still offload to GCN or host
just fine...


Attached patch now implements the warning. I think silently failing is
not the proper solution. It is too confusing and without telling the
user, they may not notice this issue.


I also changed the "progname" variable; it seems as if the only user is
the diagnostic machinery and it seems to make sense to have:

nvptx mkoffload: warning: 'omp requires reverse_offload' requires at least
'sm_35' for '-misa=' - disabling offload-code generation for this device type

instead of guessing whether "mkoffload" is for the host, gcn or nvptx. I know
that the common way is to use the binary name ("lto1") or ... but I still
think this prefix – which is the tool_name. Makes sense.

(BTW: gcc/config/i386/intelmic-mkoffload.cc uses "mkoffload-intelmic".)

I also changed "-misa=" in the diagnostic to "-march=" as Tom changed the default
from "-misa=" (now an alias) to "-march=" in GCC 12 (+plus added -march-map= in addition).

I also added a testcase for this – and updated the testsuite for the dg-warning.

OK for mainline?

Tobias

PS: As no code for nvptx is generated, there is no "device present bit not used" warning
with GOMP_DEBUG, but as there is a compile-time warning, I guess that's fine.


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: nvptx-warn-if-sm-too-old-v3.diff --]
[-- Type: text/x-patch, Size: 7573 bytes --]

nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible

Reverse offload requests at least -misa=sm_35; with this patch, a warning
instead of an error is shown, still permitting reverse offload for all
other configured device types. This is achieved by not calling
GOMP_offload_register_ver (and stopping generating pointless 'static const char'
variables, once known.)

The tool_name as progname changes adds "nvptx " and "gcn " to the
"mkoffload: warning/error:" diagnostic.

gcc/ChangeLog:

	* config/nvptx/mkoffload.cc (process): Replace a fatal_error by
	a warning + not enabling offloading if -misa=sm_30 prevents
	reverse offload.
	(main): Use tool_name as progname for diagnostic.
	* config/gcn/mkoffload.cc (main): Likewise.

libgomp/ChangeLog:

	* libgomp.texi (Offload-Target Specifics: nvptx): Document
	that reverse offload requires >= -march=sm_35.
	* testsuite/libgomp.c-c++-common/requires-4.c: Build for nvptx
	with -misa=sm_35.
	* testsuite/libgomp.c-c++-common/requires-5.c: Likewise.
	* testsuite/libgomp.c-c++-common/requires-6.c: Likewise.
	* testsuite/libgomp.c-c++-common/reverse-offload-1.c: Likewise.
	* testsuite/libgomp.fortran/reverse-offload-1.f90: Likewise.
	* testsuite/libgomp.c/reverse-offload-sm30.c: New test.

 gcc/config/gcn/mkoffload.cc                             |  2 +-
 gcc/config/nvptx/mkoffload.cc                           | 17 +++++++++++++----
 libgomp/libgomp.texi                                    |  3 +++
 libgomp/testsuite/libgomp.c-c++-common/requires-4.c     |  1 +
 libgomp/testsuite/libgomp.c-c++-common/requires-5.c     |  1 +
 libgomp/testsuite/libgomp.c-c++-common/requires-6.c     |  2 ++
 .../testsuite/libgomp.c-c++-common/reverse-offload-1.c  |  1 +
 libgomp/testsuite/libgomp.c/reverse-offload-sm30.c      | 15 +++++++++++++++
 libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90 |  1 +
 9 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 24d327355e3..64037806acf 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -805,7 +805,7 @@ main (int argc, char **argv)
   FILE *cfile = stdout;
   const char *outname = 0;
 
-  progname = "mkoffload";
+  progname = tool_name;
   diagnostic_initialize (global_dc, 0);
 
   obstack_init (&files_to_cleanup);
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 834b2059aac..5f3e07ad066 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -324,9 +324,18 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	{
 	  if (sm_ver && sm_ver[0] == '3' && sm_ver[1] == '0'
 	      && sm_ver[2] == '\n')
-	    fatal_error (input_location,
-			 "%<omp requires reverse_offload%> requires at least "
-			 "%<sm_35%> for %<-misa=%>");
+	    {
+	      warning_at (input_location, 0,
+			  "%<omp requires reverse_offload%> requires at "
+			  "least %<sm_35%> for %<-march=%> - disabling "
+			  "offload-code generation for this device type");
+	      /* As now an empty file is compiled and there is no call to
+		 GOMP_offload_register_ver, this device type is effectively
+		 disabled.  */
+	      fflush (out);
+	      ftruncate (fileno (out), 0);
+	      return;
+	    }
 	  sm_ver2 = sm_ver;
 	  version2 = version;
 	}
@@ -526,7 +535,7 @@ main (int argc, char **argv)
   FILE *out = stdout;
   const char *outname = 0;
 
-  progname = "mkoffload";
+  progname = tool_name;
   diagnostic_initialize (global_dc, 0);
 
   if (atexit (mkoffload_cleanup) != 0)
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 4eaad4348bb..1f402d6df79 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -4386,6 +4386,9 @@ The implementation remark:
 @item I/O within OpenMP target regions and OpenACC parallel/kernels is supported
       using the C library @code{printf} functions and the Fortran
       @code{print}/@code{write} statements.
+@item Compilation OpenMP code that contains @code{requires reverse_offload}
+      requires at least @code{-march=sm_35}, compiling for @code{-march=sm_30}
+      is not supported.
 @end itemize
 
 
diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
index 6ed5a5f647a..5883eff0d93 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-flto" } */
+/* { dg-additional-options "-foffload-options=nvptx-none=-misa=sm_35" { target { offload_target_nvptx } } } */
 /* { dg-additional-sources requires-4-aux.c } */
 
 /* Check no diagnostic by device-compiler's or host compiler's lto1.
diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-5.c b/libgomp/testsuite/libgomp.c-c++-common/requires-5.c
index 7fe0c735d27..d43d78db6fa 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/requires-5.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-5.c
@@ -1,3 +1,4 @@
+/* { dg-additional-options "-foffload-options=nvptx-none=-misa=sm_35" { target { offload_target_nvptx } } } */
 /* { dg-additional-sources requires-5-aux.c } */
 
 /* Depending on offload device capabilities, it may print something like the
diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-6.c b/libgomp/testsuite/libgomp.c-c++-common/requires-6.c
index b00c7459bbc..a25b4d2dedd 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/requires-6.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-6.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-foffload-options=nvptx-none=-misa=sm_35" { target { offload_target_nvptx } } } */
+
 #pragma omp requires unified_shared_memory, unified_address, reverse_offload
 
 /* The requires line is not active as there is none of:
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c b/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c
index 976e129f560..52d828caf1c 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c
@@ -1,4 +1,5 @@
 /* { dg-do run }  */
+/* { dg-additional-options "-foffload-options=nvptx-none=-misa=sm_35" { target { offload_target_nvptx } } } */
 /* { dg-additional-sources reverse-offload-1-aux.c } */
 
 /* Check that reverse offload works in particular:
diff --git a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
new file mode 100644
index 00000000000..14aed0132b7
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
@@ -0,0 +1,15 @@
+/* { dg-do link { target { offload_target_nvptx } } } */
+/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload=-mptx=_" } */
+
+#pragma omp requires reverse_offload
+
+int
+main ()
+{
+  #pragma omp target
+    {
+    }
+  return 0;
+}
+
+/* { dg-warning "'omp requires reverse_offload' requires at least 'sm_35' for '-march=' - disabling offload-code generation for this device type" "" { target *-*-* } 0 } */
diff --git a/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90 b/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90
index 7cfb8b6552e..de68011f8f7 100644
--- a/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90
@@ -1,4 +1,5 @@
 ! { dg-do run }
+! { dg-additional-options "-foffload-options=nvptx-none=-misa=sm_35" { target { offload_target_nvptx } } }
 ! { dg-additional-sources reverse-offload-1-aux.f90 }
 
 ! Check that reverse offload works in particular:

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible (was: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup)
  2022-09-12 12:02       ` [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible (was: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Tobias Burnus
@ 2022-09-12 12:10         ` Jakub Jelinek
  2022-10-17 11:59         ` Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c' (was: [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible) Thomas Schwinge
  1 sibling, 0 replies; 15+ messages in thread
From: Jakub Jelinek @ 2022-09-12 12:10 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: gcc-patches, Tom de Vries

On Mon, Sep 12, 2022 at 02:02:16PM +0200, Tobias Burnus wrote:
> +	    {
> +	      warning_at (input_location, 0,
> +			  "%<omp requires reverse_offload%> requires at "
> +			  "least %<sm_35%> for %<-march=%> - disabling "
> +			  "offload-code generation for this device type");

I wonder whether it shouldn't talk about
-foffload-options=nvptx-none=-march=
instead of just -march=.

Otherwise LGTM.

	Jakub


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [og12] Come up with {,UN}LIKELY macros (was: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup)
  2022-08-29 18:43   ` [Patch][2/3][v2] " Tobias Burnus
  2022-08-29 18:43     ` Tobias Burnus
  2022-09-09 15:36     ` Jakub Jelinek
@ 2022-09-23 15:40     ` Thomas Schwinge
  2 siblings, 0 replies; 15+ messages in thread
From: Thomas Schwinge @ 2022-09-23 15:40 UTC (permalink / raw)
  To: Tobias Burnus, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2160 bytes --]

Hi!

Since the 2022-09-12 backport of this:

On 2022-08-29T20:43:26+0200, Tobias Burnus <tobias@codesourcery.com> wrote:
> nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup

... to og12 in commit 2b6ad53fd76c7bb9605be417d137a7d9a18f2117, the og12
branch didn't build anymore:

    [...]/gcc/config/nvptx/mkoffload.cc: In function 'void process(FILE*, FILE*, uint32_t)':
    [...]/gcc/config/nvptx/mkoffload.cc:284:59: error: 'UNLIKELY' was not declared in this scope
            if (UNLIKELY (startswith (input + i, ".target sm_")))
                                                               ^
    [...]/gcc/config/nvptx/mkoffload.cc:289:57: error: 'UNLIKELY' was not declared in this scope
            if (UNLIKELY (startswith (input + i, ".version ")))
                                                             ^
    make[2]: *** [[...]/gcc/config/nvptx/t-nvptx:8: mkoffload.o] Error 1

> --- a/gcc/config/nvptx/mkoffload.cc
> +++ b/gcc/config/nvptx/mkoffload.cc

> @@ -261,6 +281,16 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
>           case '\n':
>             fprintf (out, "\\n\"\n\t\"");
>             /* Look for mappings on subsequent lines.  */
> +           if (UNLIKELY (startswith (input + i, ".target sm_")))
> +             {
> +               sm_ver = input + i + strlen (".target sm_");
> +               continue;
> +             }
> +           if (UNLIKELY (startswith (input + i, ".version ")))
> +             {
> +               version = input + i + strlen (".version ");
> +               continue;
> +             }

To fix this, I've pushed a (very much reduced) partial cherry-pick of
commit r13-171-g22d9c8802add09a93308319fc37dd3a0f1125393
"Come up with {,UN}LIKELY macros" to og12 branch in
commit 44b77201a5431450f608b4538fefb1319127de13, see attached.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Come-up-with-UN-LIKELY-macros.og12.patch --]
[-- Type: text/x-diff, Size: 1429 bytes --]

From 44b77201a5431450f608b4538fefb1319127de13 Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Thu, 3 Feb 2022 10:58:18 +0100
Subject: [PATCH] Come up with {,UN}LIKELY macros.

gcc/ChangeLog:

	* system.h (LIKELY): Define.
	(UNLIKELY): Likewise.

(cherry picked from commit 22d9c8802add09a93308319fc37dd3a0f1125393, partial)
---
 gcc/ChangeLog.omp | 8 ++++++++
 gcc/system.h      | 3 +++
 2 files changed, 11 insertions(+)

diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp
index 4f80bcbd356..30c3abfc15b 100644
--- a/gcc/ChangeLog.omp
+++ b/gcc/ChangeLog.omp
@@ -1,3 +1,11 @@
+2022-09-23  Thomas Schwinge  <thomas@codesourcery.com>
+
+	Backport from master branch:
+	2022-05-09  Martin Liska  <mliska@suse.cz>
+
+	* system.h (LIKELY): Define.
+	(UNLIKELY): Likewise.
+
 2022-09-12  Tobias Burnus  <tobias@codesourcery.com>
 
 	Backport from mainline:
diff --git a/gcc/system.h b/gcc/system.h
index e10c34f70ec..6b6868d0bbf 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -736,6 +736,9 @@ extern int vsnprintf (char *, size_t, const char *, va_list);
 #define __builtin_expect(a, b) (a)
 #endif
 
+#define LIKELY(x) (__builtin_expect ((x), 1))
+#define UNLIKELY(x) (__builtin_expect ((x), 0))
+
 /* Some of the headers included by <memory> can use "abort" within a
    namespace, e.g. "_VSTD::abort();", which fails after we use the
    preprocessor to redefine "abort" as "fancy_abort" below.  */
-- 
2.35.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c' (was: [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible)
  2022-09-12 12:02       ` [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible (was: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Tobias Burnus
  2022-09-12 12:10         ` Jakub Jelinek
@ 2022-10-17 11:59         ` Thomas Schwinge
  1 sibling, 0 replies; 15+ messages in thread
From: Thomas Schwinge @ 2022-10-17 11:59 UTC (permalink / raw)
  To: Tobias Burnus, gcc-patches; +Cc: Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1351 bytes --]

Hi!

On 2022-09-12T14:02:16+0200, Tobias Burnus <tobias@codesourcery.com> wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
> @@ -0,0 +1,15 @@
> +/* { dg-do link { target { offload_target_nvptx } } } */
> +/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload=-mptx=_" } */

Pushed to master branch
commit b61796663ba1fe8fb83203829398f3f89ec212b7
"Fix nvptx-specific '-foffload-options' syntax in
'libgomp.c/reverse-offload-sm30.c'", see attached.

Cherry-picked pushed to devel/omp/gcc-12 branch in
commit f36ce95ad928578aa6739f61480e6c8fbaf2248e
"Fix nvptx-specific '-foffload-options' syntax in
'libgomp.c/reverse-offload-sm30.c'", see attached.


Grüße
 Thomas


> +
> +#pragma omp requires reverse_offload
> +
> +int
> +main ()
> +{
> +  #pragma omp target
> +    {
> +    }
> +  return 0;
> +}
> +
> +/* { dg-warning "'omp requires reverse_offload' requires at least 'sm_35' for '-march=' - disabling offload-code generation for this device type" "" { target *-*-* } 0 } */


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Fix-nvptx-specific-foffload-options-syntax-in-libgom.patch --]
[-- Type: text/x-diff, Size: 1327 bytes --]

From b61796663ba1fe8fb83203829398f3f89ec212b7 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Fri, 23 Sep 2022 11:29:50 +0200
Subject: [PATCH] Fix nvptx-specific '-foffload-options' syntax in
 'libgomp.c/reverse-offload-sm30.c'

That is, '-mptx=_' is only valid in '-foffload-options=nvptx-none', too.

Fix test case added in recent
commit r13-2625-g6b43f556f392a7165582aca36a19fe7389d995b2 "nvptx/mkoffload.cc:
Warn instead of error when reverse offload is not possible".

	libgomp/
	* testsuite/libgomp.c/reverse-offload-sm30.c: Fix nvptx-specific
	'-foffload-options' syntax.
---
 libgomp/testsuite/libgomp.c/reverse-offload-sm30.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
index fbfeae1fd41..7f10fd4ded9 100644
--- a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
+++ b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
@@ -1,5 +1,5 @@
 /* { dg-do link { target { offload_target_nvptx } } } */
-/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload=-mptx=_" } */
+/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload-options=nvptx-none=-mptx=_" } */
 
 #pragma omp requires reverse_offload
 
-- 
2.35.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0001-Fix-nvptx-specific-foffload-options-syntax-in-l.og12.patch --]
[-- Type: text/x-diff, Size: 2001 bytes --]

From f36ce95ad928578aa6739f61480e6c8fbaf2248e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Fri, 23 Sep 2022 11:29:50 +0200
Subject: [PATCH] Fix nvptx-specific '-foffload-options' syntax in
 'libgomp.c/reverse-offload-sm30.c'

That is, '-mptx=_' is only valid in '-foffload-options=nvptx-none', too.

Fix test case added in recent
commit r13-2625-g6b43f556f392a7165582aca36a19fe7389d995b2 "nvptx/mkoffload.cc:
Warn instead of error when reverse offload is not possible".

	libgomp/
	* testsuite/libgomp.c/reverse-offload-sm30.c: Fix nvptx-specific
	'-foffload-options' syntax.

(cherry picked from commit b61796663ba1fe8fb83203829398f3f89ec212b7)
---
 libgomp/ChangeLog.omp                              | 8 ++++++++
 libgomp/testsuite/libgomp.c/reverse-offload-sm30.c | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index cb3541be378..048314eb1be 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,3 +1,11 @@
+2022-10-17  Thomas Schwinge  <thomas@codesourcery.com>
+
+	Backported from master:
+	2022-10-17  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* testsuite/libgomp.c/reverse-offload-sm30.c: Fix nvptx-specific
+	'-foffload-options' syntax.
+
 2022-10-14  Julian Brown  <julian@codesourcery.com>
 
 	* testsuite/libgomp.oacc-fortran/declare-1.f90: Adjust scan output.
diff --git a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
index fbfeae1fd41..7f10fd4ded9 100644
--- a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
+++ b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
@@ -1,5 +1,5 @@
 /* { dg-do link { target { offload_target_nvptx } } } */
-/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload=-mptx=_" } */
+/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload-options=nvptx-none=-mptx=_" } */
 
 #pragma omp requires reverse_offload
 
-- 
2.35.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-10-17 11:59 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-25 14:54 [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Tobias Burnus
2022-08-25 14:54 ` Tobias Burnus
2022-08-25 15:38 ` [Patch][2/3] GCN: libgomp+mkoffload.cc: " Tobias Burnus
2022-08-25 15:38   ` Tobias Burnus
2022-09-09 15:31   ` Jakub Jelinek
2022-08-25 17:30 ` [Patch][2/3] nvptx: " Tobias Burnus
2022-08-25 17:30   ` Tobias Burnus
2022-08-29 18:43   ` [Patch][2/3][v2] " Tobias Burnus
2022-08-29 18:43     ` Tobias Burnus
2022-09-09 15:36     ` Jakub Jelinek
2022-09-12 12:02       ` [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible (was: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Tobias Burnus
2022-09-12 12:10         ` Jakub Jelinek
2022-10-17 11:59         ` Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c' (was: [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible) Thomas Schwinge
2022-09-23 15:40     ` [og12] Come up with {,UN}LIKELY macros (was: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Thomas Schwinge
2022-09-09 15:29 ` [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).