* [Patch][1/3] libgomp: Prepare for reverse offload fn lookup
@ 2022-08-25 14:54 Tobias Burnus
2022-08-25 14:54 ` Tobias Burnus
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Tobias Burnus @ 2022-08-25 14:54 UTC (permalink / raw)
To: gcc-patches, Jakub Jelinek
[-- Attachment #1: Type: text/plain, Size: 981 bytes --]
Technically, this patch is stand alone, but conceptually it based on the
submitted but not reviewed patch:
"[Patch] OpenMP: Support reverse offload (middle end part)"
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598662.html
With that patch, for reverse offloads ('omp target device(ancestor:1)'),
calls like the following are added:
GOMP_target_ext (-2 /* initial device */, omp_fn.1
where 'omp_fn.1' on nonhost devices a stub function just required for
looking up the host function pointer via the offload_funcs table.
The attached patch prepare for reverse-offload device->host
function-address lookup by requesting (if needed) the on-device address.
OK for mainline?
Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Attachment #2: fn-lookup-gen.diff --]
[-- Type: text/x-patch, Size: 5324 bytes --]
libgomp: Prepare for reverse offload fn lookup
Prepare for reverse-offloading function-pointer lookup by passing
a rev_fn_table argument to GOMP_OFFLOAD_load_image.
The argument will be NULL, unless GOMP_REQUIRES_REVERSE_OFFLOAD is
requested and devices not supported it, are filtered out.
(Up to and including this commit, no non-host device claims such
support and the caller currently always passes NULL.)
libgomp/ChangeLog:
* libgomp-plugin.h (GOMP_OFFLOAD_load_image): Add
'uint64_t **rev_fn_table' argument.
* oacc-host.c (host_load_image): Likewise.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Likewise;
currently unused.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
* target.c (gomp_load_image_to_device): Update call but pass
NULL for now.
liboffloadmic/ChangeLog:
* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_load_image):
Add (unused) uint64_t **rev_fn_table argument.
libgomp/libgomp-plugin.h | 2 +-
libgomp/oacc-host.c | 3 ++-
libgomp/plugin/plugin-gcn.c | 7 +++++--
libgomp/plugin/plugin-nvptx.c | 7 +++++--
libgomp/target.c | 2 +-
liboffloadmic/plugin/libgomp-plugin-intelmic.cpp | 3 ++-
6 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index ab3ed638475..57309303e66 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -130,7 +130,7 @@ extern bool GOMP_OFFLOAD_init_device (int);
extern bool GOMP_OFFLOAD_fini_device (int);
extern unsigned GOMP_OFFLOAD_version (void);
extern int GOMP_OFFLOAD_load_image (int, unsigned, const void *,
- struct addr_pair **);
+ struct addr_pair **, uint64_t **);
extern bool GOMP_OFFLOAD_unload_image (int, unsigned, const void *);
extern void *GOMP_OFFLOAD_alloc (int, size_t);
extern bool GOMP_OFFLOAD_free (int, void *);
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index eb11b9cf16a..4e3971ae1a9 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -81,7 +81,8 @@ static int
host_load_image (int n __attribute__ ((unused)),
unsigned v __attribute__ ((unused)),
const void *t __attribute__ ((unused)),
- struct addr_pair **r __attribute__ ((unused)))
+ struct addr_pair **r __attribute__ ((unused)),
+ uint64_t **f __attribute__ ((unused)))
{
return 0;
}
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index ea327bf2ca0..363e2950649 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3346,11 +3346,14 @@ GOMP_OFFLOAD_init_device (int n)
/* Load GCN object-code module described by struct gcn_image_desc in
TARGET_DATA and return references to kernel descriptors in TARGET_TABLE.
- If there are any constructors then run them. */
+ If there are any constructors then run them. If not NULL, REV_FN_TABLE will
+ contain the on-device addresses of the functions for reverse offload. To be
+ freed by the caller. */
int
GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
- struct addr_pair **target_table)
+ struct addr_pair **target_table,
+ uint64_t **rev_fn_table __attribute__((unused)))
{
if (GOMP_VERSION_DEV (version) != GOMP_VERSION_GCN)
{
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index bc63e274cdf..d130665ed19 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1266,11 +1266,14 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
}
/* Load the (partial) program described by TARGET_DATA to device
- number ORD. Allocate and return TARGET_TABLE. */
+ number ORD. Allocate and return TARGET_TABLE. If not NULL, REV_FN_TABLE
+ will contain the on-device addresses of the functions for reverse offload.
+ To be freed by the caller. */
int
GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
- struct addr_pair **target_table)
+ struct addr_pair **target_table,
+ uint64_t **rev_fn_table __attribute__((unused)))
{
CUmodule module;
const char *const *var_names;
diff --git a/libgomp/target.c b/libgomp/target.c
index 135db1d88ab..da907a2fb48 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2137,7 +2137,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
num_target_entries
= devicep->load_image_func (devicep->target_id, version,
- target_data, &target_table);
+ target_data, &target_table, NULL);
if (num_target_entries != num_funcs + num_vars
/* Others (device_num) are included as trailing entries in pair list. */
diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 33bae0650b4..7be27f0459d 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -349,7 +349,8 @@ GOMP_OFFLOAD_version (void)
extern "C" int
GOMP_OFFLOAD_load_image (int device, const unsigned version,
- const void *target_image, addr_pair **result)
+ const void *target_image, addr_pair **result,
+ uint64_t ** /* rev_fn_table */)
{
TRACE ("(device = %d, target_image = %p)", device, target_image);
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Patch][1/3] libgomp: Prepare for reverse offload fn lookup
2022-08-25 14:54 [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Tobias Burnus
@ 2022-08-25 14:54 ` Tobias Burnus
2022-08-25 15:38 ` [Patch][2/3] GCN: libgomp+mkoffload.cc: " Tobias Burnus
` (2 subsequent siblings)
3 siblings, 0 replies; 16+ messages in thread
From: Tobias Burnus @ 2022-08-25 14:54 UTC (permalink / raw)
To: gcc-patches, Jakub Jelinek
[-- Attachment #1.1: Type: text/plain, Size: 981 bytes --]
Technically, this patch is stand alone, but conceptually it based on the
submitted but not reviewed patch:
"[Patch] OpenMP: Support reverse offload (middle end part)"
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598662.html
With that patch, for reverse offloads ('omp target device(ancestor:1)'),
calls like the following are added:
GOMP_target_ext (-2 /* initial device */, omp_fn.1
where 'omp_fn.1' on nonhost devices a stub function just required for
looking up the host function pointer via the offload_funcs table.
The attached patch prepare for reverse-offload device->host
function-address lookup by requesting (if needed) the on-device address.
OK for mainline?
Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Attachment #2: fn-lookup-gen.diff --]
[-- Type: text/x-patch, Size: 5324 bytes --]
libgomp: Prepare for reverse offload fn lookup
Prepare for reverse-offloading function-pointer lookup by passing
a rev_fn_table argument to GOMP_OFFLOAD_load_image.
The argument will be NULL, unless GOMP_REQUIRES_REVERSE_OFFLOAD is
requested and devices not supported it, are filtered out.
(Up to and including this commit, no non-host device claims such
support and the caller currently always passes NULL.)
libgomp/ChangeLog:
* libgomp-plugin.h (GOMP_OFFLOAD_load_image): Add
'uint64_t **rev_fn_table' argument.
* oacc-host.c (host_load_image): Likewise.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Likewise;
currently unused.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
* target.c (gomp_load_image_to_device): Update call but pass
NULL for now.
liboffloadmic/ChangeLog:
* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_load_image):
Add (unused) uint64_t **rev_fn_table argument.
libgomp/libgomp-plugin.h | 2 +-
libgomp/oacc-host.c | 3 ++-
libgomp/plugin/plugin-gcn.c | 7 +++++--
libgomp/plugin/plugin-nvptx.c | 7 +++++--
libgomp/target.c | 2 +-
liboffloadmic/plugin/libgomp-plugin-intelmic.cpp | 3 ++-
6 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index ab3ed638475..57309303e66 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -130,7 +130,7 @@ extern bool GOMP_OFFLOAD_init_device (int);
extern bool GOMP_OFFLOAD_fini_device (int);
extern unsigned GOMP_OFFLOAD_version (void);
extern int GOMP_OFFLOAD_load_image (int, unsigned, const void *,
- struct addr_pair **);
+ struct addr_pair **, uint64_t **);
extern bool GOMP_OFFLOAD_unload_image (int, unsigned, const void *);
extern void *GOMP_OFFLOAD_alloc (int, size_t);
extern bool GOMP_OFFLOAD_free (int, void *);
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index eb11b9cf16a..4e3971ae1a9 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -81,7 +81,8 @@ static int
host_load_image (int n __attribute__ ((unused)),
unsigned v __attribute__ ((unused)),
const void *t __attribute__ ((unused)),
- struct addr_pair **r __attribute__ ((unused)))
+ struct addr_pair **r __attribute__ ((unused)),
+ uint64_t **f __attribute__ ((unused)))
{
return 0;
}
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index ea327bf2ca0..363e2950649 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3346,11 +3346,14 @@ GOMP_OFFLOAD_init_device (int n)
/* Load GCN object-code module described by struct gcn_image_desc in
TARGET_DATA and return references to kernel descriptors in TARGET_TABLE.
- If there are any constructors then run them. */
+ If there are any constructors then run them. If not NULL, REV_FN_TABLE will
+ contain the on-device addresses of the functions for reverse offload. To be
+ freed by the caller. */
int
GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
- struct addr_pair **target_table)
+ struct addr_pair **target_table,
+ uint64_t **rev_fn_table __attribute__((unused)))
{
if (GOMP_VERSION_DEV (version) != GOMP_VERSION_GCN)
{
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index bc63e274cdf..d130665ed19 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1266,11 +1266,14 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
}
/* Load the (partial) program described by TARGET_DATA to device
- number ORD. Allocate and return TARGET_TABLE. */
+ number ORD. Allocate and return TARGET_TABLE. If not NULL, REV_FN_TABLE
+ will contain the on-device addresses of the functions for reverse offload.
+ To be freed by the caller. */
int
GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
- struct addr_pair **target_table)
+ struct addr_pair **target_table,
+ uint64_t **rev_fn_table __attribute__((unused)))
{
CUmodule module;
const char *const *var_names;
diff --git a/libgomp/target.c b/libgomp/target.c
index 135db1d88ab..da907a2fb48 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2137,7 +2137,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
num_target_entries
= devicep->load_image_func (devicep->target_id, version,
- target_data, &target_table);
+ target_data, &target_table, NULL);
if (num_target_entries != num_funcs + num_vars
/* Others (device_num) are included as trailing entries in pair list. */
diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 33bae0650b4..7be27f0459d 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -349,7 +349,8 @@ GOMP_OFFLOAD_version (void)
extern "C" int
GOMP_OFFLOAD_load_image (int device, const unsigned version,
- const void *target_image, addr_pair **result)
+ const void *target_image, addr_pair **result,
+ uint64_t ** /* rev_fn_table */)
{
TRACE ("(device = %d, target_image = %p)", device, target_image);
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Patch][2/3] GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
2022-08-25 14:54 [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Tobias Burnus
2022-08-25 14:54 ` Tobias Burnus
@ 2022-08-25 15:38 ` Tobias Burnus
2022-08-25 15:38 ` Tobias Burnus
2022-09-09 15:31 ` Jakub Jelinek
2022-08-25 17:30 ` [Patch][2/3] nvptx: " Tobias Burnus
2022-09-09 15:29 ` [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Jakub Jelinek
3 siblings, 2 replies; 16+ messages in thread
From: Tobias Burnus @ 2022-08-25 15:38 UTC (permalink / raw)
To: gcc-patches, Jakub Jelinek, Andrew Stubbs
[-- Attachment #1: Type: text/plain, Size: 864 bytes --]
On 25.08.22 16:54, Tobias Burnus wrote:
The attached patch prepare for reverse-offload device->host
function-address lookup by requesting (if needed) the on-device address.
This patch adds the actual implementation for GCN. A variant would be
to only generate .offload_func_table inside mkoffload when
OMP_REQUIRES_REVERSE_OFFLOAD has been requested.
This is currently effectively a no op as with [1/3] patch, always NULL
is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as
'omp requires reverse_offload' has been specified.
OK for mainline?
Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Attachment #2: fn-lookup-gcn.diff --]
[-- Type: text/x-patch, Size: 4673 bytes --]
GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
Add support to GCN for reverse lookup of function name to prepare for
'omp target device(ancestor:1)'.
gcc/ChangeLog:
* config/gcn/mkoffload.cc (process_asm): Create .offload_func_table,
similar to pre-existing .offload_var_table.
libgomp/ChangeLog:
* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Read
.offload_func_table to populate rev_fn_table when requested.
gcc/config/gcn/mkoffload.cc | 11 ++++++++++-
libgomp/plugin/plugin-gcn.c | 26 +++++++++++++++++++++++++-
2 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 4206448703a..24d327355e3 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -537,63 +537,72 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
case IN_VARS:
{
char *varname;
unsigned varsize;
if (sscanf (buf, " .8byte %ms\n", &varname))
{
fputs (buf, out);
fgets (buf, sizeof (buf), in);
if (!sscanf (buf, " .8byte %u\n", &varsize))
abort ();
var_count++;
}
break;
}
case IN_FUNCS:
{
char *funcname;
if (sscanf (buf, "\t.8byte\t%ms\n", &funcname))
{
+ fputs (buf, out);
obstack_ptr_grow (&fns_os, funcname);
fn_count++;
continue;
}
break;
}
}
char dummy;
if (sscanf (buf, " .section .gnu.offload_vars%c", &dummy) > 0)
{
state = IN_VARS;
/* Add a global symbol to allow plugin-gcn.c to locate the table
at runtime. It can't use the "offload_var_table.N" emitted by
the compiler because a) they're not global, and b) there's one
for each input file combined into the binary. */
fputs (buf, out);
fputs ("\t.global .offload_var_table\n"
"\t.type .offload_var_table, @object\n"
".offload_var_table:\n",
out);
}
else if (sscanf (buf, " .section .gnu.offload_funcs%c", &dummy) > 0)
- state = IN_FUNCS;
+ {
+ state = IN_FUNCS;
+ /* Likewise for .gnu.offload_vars; used for reverse offload. */
+ fputs (buf, out);
+ fputs ("\t.global .offload_func_table\n"
+ "\t.type .offload_func_table, @object\n"
+ ".offload_func_table:\n",
+ out);
+ }
else if (sscanf (buf, " .amdgpu_metadata%c", &dummy) > 0)
{
state = IN_METADATA;
regcount.kernel_name = NULL;
regcount.sgpr_count = regcount.vgpr_count = -1;
}
else if (sscanf (buf, " .section %c", &dummy) > 0
|| sscanf (buf, " .text%c", &dummy) > 0
|| sscanf (buf, " .bss%c", &dummy) > 0
|| sscanf (buf, " .data%c", &dummy) > 0
|| sscanf (buf, " .ident %c", &dummy) > 0)
state = IN_CODE;
else if (sscanf (buf, " .end_amdgpu_metadata%c", &dummy) > 0)
{
state = IN_CODE;
gcc_assert (regcount.kernel_name != NULL
&& regcount.sgpr_count >= 0
&& regcount.vgpr_count >= 0);
obstack_grow (®counts_os, ®count, sizeof (regcount));
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 363e2950649..00603981866 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3353,7 +3353,7 @@ GOMP_OFFLOAD_init_device (int n)
int
GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
struct addr_pair **target_table,
- uint64_t **rev_fn_table __attribute__((unused)))
+ uint64_t **rev_fn_table)
{
if (GOMP_VERSION_DEV (version) != GOMP_VERSION_GCN)
{
@@ -3520,6 +3520,30 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
if (module->fini_array_func)
kernel_count--;
+ if (rev_fn_table != NULL && kernel_count == 0)
+ *rev_fn_table = NULL;
+ else if (rev_fn_table != NULL)
+ {
+ hsa_status_t status;
+ hsa_executable_symbol_t var_symbol;
+ status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL,
+ ".offload_func_table",
+ agent->id, 0, &var_symbol);
+ if (status != HSA_STATUS_SUCCESS)
+ hsa_fatal ("Could not find symbol for variable in the code object",
+ status);
+ uint64_t fn_table_addr;
+ status = hsa_fns.hsa_executable_symbol_get_info_fn
+ (var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS,
+ &fn_table_addr);
+ if (status != HSA_STATUS_SUCCESS)
+ hsa_fatal ("Could not extract a variable from its symbol", status);
+ *rev_fn_table = GOMP_PLUGIN_malloc (kernel_count * sizeof (uint64_t));
+ GOMP_OFFLOAD_dev2host (agent->device_id, *rev_fn_table,
+ (void*) fn_table_addr,
+ kernel_count * sizeof (uint64_t));
+ }
+
return kernel_count + var_count + other_count;
}
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Patch][2/3] GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
2022-08-25 15:38 ` [Patch][2/3] GCN: libgomp+mkoffload.cc: " Tobias Burnus
@ 2022-08-25 15:38 ` Tobias Burnus
2022-09-09 15:31 ` Jakub Jelinek
1 sibling, 0 replies; 16+ messages in thread
From: Tobias Burnus @ 2022-08-25 15:38 UTC (permalink / raw)
To: gcc-patches, Jakub Jelinek, Andrew Stubbs
[-- Attachment #1.1: Type: text/plain, Size: 864 bytes --]
On 25.08.22 16:54, Tobias Burnus wrote:
The attached patch prepare for reverse-offload device->host
function-address lookup by requesting (if needed) the on-device address.
This patch adds the actual implementation for GCN. A variant would be
to only generate .offload_func_table inside mkoffload when
OMP_REQUIRES_REVERSE_OFFLOAD has been requested.
This is currently effectively a no op as with [1/3] patch, always NULL
is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as
'omp requires reverse_offload' has been specified.
OK for mainline?
Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Attachment #2: fn-lookup-gcn.diff --]
[-- Type: text/x-patch, Size: 4673 bytes --]
GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
Add support to GCN for reverse lookup of function name to prepare for
'omp target device(ancestor:1)'.
gcc/ChangeLog:
* config/gcn/mkoffload.cc (process_asm): Create .offload_func_table,
similar to pre-existing .offload_var_table.
libgomp/ChangeLog:
* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Read
.offload_func_table to populate rev_fn_table when requested.
gcc/config/gcn/mkoffload.cc | 11 ++++++++++-
libgomp/plugin/plugin-gcn.c | 26 +++++++++++++++++++++++++-
2 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 4206448703a..24d327355e3 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -537,63 +537,72 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
case IN_VARS:
{
char *varname;
unsigned varsize;
if (sscanf (buf, " .8byte %ms\n", &varname))
{
fputs (buf, out);
fgets (buf, sizeof (buf), in);
if (!sscanf (buf, " .8byte %u\n", &varsize))
abort ();
var_count++;
}
break;
}
case IN_FUNCS:
{
char *funcname;
if (sscanf (buf, "\t.8byte\t%ms\n", &funcname))
{
+ fputs (buf, out);
obstack_ptr_grow (&fns_os, funcname);
fn_count++;
continue;
}
break;
}
}
char dummy;
if (sscanf (buf, " .section .gnu.offload_vars%c", &dummy) > 0)
{
state = IN_VARS;
/* Add a global symbol to allow plugin-gcn.c to locate the table
at runtime. It can't use the "offload_var_table.N" emitted by
the compiler because a) they're not global, and b) there's one
for each input file combined into the binary. */
fputs (buf, out);
fputs ("\t.global .offload_var_table\n"
"\t.type .offload_var_table, @object\n"
".offload_var_table:\n",
out);
}
else if (sscanf (buf, " .section .gnu.offload_funcs%c", &dummy) > 0)
- state = IN_FUNCS;
+ {
+ state = IN_FUNCS;
+ /* Likewise for .gnu.offload_vars; used for reverse offload. */
+ fputs (buf, out);
+ fputs ("\t.global .offload_func_table\n"
+ "\t.type .offload_func_table, @object\n"
+ ".offload_func_table:\n",
+ out);
+ }
else if (sscanf (buf, " .amdgpu_metadata%c", &dummy) > 0)
{
state = IN_METADATA;
regcount.kernel_name = NULL;
regcount.sgpr_count = regcount.vgpr_count = -1;
}
else if (sscanf (buf, " .section %c", &dummy) > 0
|| sscanf (buf, " .text%c", &dummy) > 0
|| sscanf (buf, " .bss%c", &dummy) > 0
|| sscanf (buf, " .data%c", &dummy) > 0
|| sscanf (buf, " .ident %c", &dummy) > 0)
state = IN_CODE;
else if (sscanf (buf, " .end_amdgpu_metadata%c", &dummy) > 0)
{
state = IN_CODE;
gcc_assert (regcount.kernel_name != NULL
&& regcount.sgpr_count >= 0
&& regcount.vgpr_count >= 0);
obstack_grow (®counts_os, ®count, sizeof (regcount));
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 363e2950649..00603981866 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3353,7 +3353,7 @@ GOMP_OFFLOAD_init_device (int n)
int
GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
struct addr_pair **target_table,
- uint64_t **rev_fn_table __attribute__((unused)))
+ uint64_t **rev_fn_table)
{
if (GOMP_VERSION_DEV (version) != GOMP_VERSION_GCN)
{
@@ -3520,6 +3520,30 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
if (module->fini_array_func)
kernel_count--;
+ if (rev_fn_table != NULL && kernel_count == 0)
+ *rev_fn_table = NULL;
+ else if (rev_fn_table != NULL)
+ {
+ hsa_status_t status;
+ hsa_executable_symbol_t var_symbol;
+ status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL,
+ ".offload_func_table",
+ agent->id, 0, &var_symbol);
+ if (status != HSA_STATUS_SUCCESS)
+ hsa_fatal ("Could not find symbol for variable in the code object",
+ status);
+ uint64_t fn_table_addr;
+ status = hsa_fns.hsa_executable_symbol_get_info_fn
+ (var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS,
+ &fn_table_addr);
+ if (status != HSA_STATUS_SUCCESS)
+ hsa_fatal ("Could not extract a variable from its symbol", status);
+ *rev_fn_table = GOMP_PLUGIN_malloc (kernel_count * sizeof (uint64_t));
+ GOMP_OFFLOAD_dev2host (agent->device_id, *rev_fn_table,
+ (void*) fn_table_addr,
+ kernel_count * sizeof (uint64_t));
+ }
+
return kernel_count + var_count + other_count;
}
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Patch][2/3] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
2022-08-25 14:54 [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Tobias Burnus
2022-08-25 14:54 ` Tobias Burnus
2022-08-25 15:38 ` [Patch][2/3] GCN: libgomp+mkoffload.cc: " Tobias Burnus
@ 2022-08-25 17:30 ` Tobias Burnus
2022-08-25 17:30 ` Tobias Burnus
` (2 more replies)
2022-09-09 15:29 ` [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Jakub Jelinek
3 siblings, 3 replies; 16+ messages in thread
From: Tobias Burnus @ 2022-08-25 17:30 UTC (permalink / raw)
To: gcc-patches, Jakub Jelinek, Tom de Vries
[-- Attachment #1: Type: text/plain, Size: 1192 bytes --]
On 25.08.22 16:54, Tobias Burnus wrote:
The attached patch prepare for reverse-offload device->host
function-address lookup by requesting (if needed) the on-device address.
This patch adds the actual implementation for NVPTX.
Having array[] = {fn1,fn2}; works with nvptx only since sm_35; hence,
if there is a reverse_offload and sm_30 is used, there will be a compile-time
error.
To avoid incompatibilities, I compile with the same PTX ISA .version and
sm_XX version as the (last) file that contains the reverse offload. While
it should not matter, some newer CUDA might not support, e.g., sm_35 or
do not like a specific ISA version - thus, that seemed to be safer.
This is currently effectively a no op as with [1/3] patch, always NULL
is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as
'omp requires reverse_offload' has been specified.
OK for mainline?
Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Attachment #2: fn-lookup-nvptx.diff --]
[-- Type: text/x-patch, Size: 11019 bytes --]
nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
Add support to nvptx for reverse lookup of function name to prepare for
'omp target device(ancestor:1)'.
gcc/ChangeLog:
* config/nvptx/mkoffload.cc (record_id): Strip quotations from
function name.
(process): For GOMP_REQUIRES_REVERSE_OFFLOAD, check that -march is
at least sm_35, create '$offload_func_table' global array and init
with reverse-offload function addresses.
* config/nvptx/nvptx.cc (write_fn_proto_1, write_fn_proto): New
force_public attribute to force .visible.
(nvptx_declare_function_name): For "omp target
device_ancestor_nohost" attribut, force .visible/TREE_PUBLIC.
libgomp/ChangeLog:
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Read offload
function address table '$offload_func_table' if rev_fn_table
is not NULL.
gcc/config/nvptx/mkoffload.cc | 104 ++++++++++++++++++++++++++++++++++++++++--
gcc/config/nvptx/nvptx.cc | 20 +++++---
libgomp/plugin/plugin-nvptx.c | 19 +++++++-
3 files changed, 131 insertions(+), 12 deletions(-)
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 3eea0a8f138..c496766b1cc 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -108,12 +108,14 @@ xputenv (const char *string)
static void
record_id (const char *p1, id_map ***where)
{
+ gcc_assert (p1[0] == '"');
+ p1++;
const char *end = strchr (p1, '\n');
if (!end)
fatal_error (input_location, "malformed ptx file");
id_map *v = XNEW (id_map);
- size_t len = end - p1;
+ size_t len = end - p1 - 1 ; /* remove tailing '"' */
v->ptx_name = XNEWVEC (char, len + 1);
memcpy (v->ptx_name, p1, len);
v->ptx_name[len] = '\0';
@@ -242,6 +244,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
id_map const *id;
unsigned obj_count = 0;
unsigned ix;
+ const char *sm_ver = NULL, *version = NULL;
+ const char *sm_ver2 = NULL, *version2 = NULL;
+ size_t file_cnt = 0;
+ size_t *file_idx = XALLOCAVEC (size_t, len);
fprintf (out, "#include <stdint.h>\n\n");
@@ -250,6 +256,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
for (size_t i = 0; i != len;)
{
char c;
+ bool output_fn_ptr = false;
+ file_idx[file_cnt++] = i;
fprintf (out, "static const char ptx_code_%u[] =\n\t\"", obj_count++);
while ((c = input[i++]))
@@ -261,6 +269,16 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
case '\n':
fprintf (out, "\\n\"\n\t\"");
/* Look for mappings on subsequent lines. */
+ if (UNLIKELY (startswith (input + i, ".target sm_")))
+ {
+ sm_ver = input + i + strlen (".target sm_");
+ continue;
+ }
+ if (UNLIKELY (startswith (input + i, ".version ")))
+ {
+ version = input + i + strlen (".version ");
+ continue;
+ }
while (startswith (input + i, "//:"))
{
i += 3;
@@ -268,7 +286,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
if (startswith (input + i, "VAR_MAP "))
record_id (input + i + 8, &vars_tail);
else if (startswith (input + i, "FUNC_MAP "))
- record_id (input + i + 9, &funcs_tail);
+ {
+ output_fn_ptr = true;
+ record_id (input + i + 9, &funcs_tail);
+ }
else
abort ();
/* Skip to next line. */
@@ -286,6 +307,81 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
putc (c, out);
}
fprintf (out, "\";\n\n");
+ if (output_fn_ptr
+ && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0)
+ {
+ if (sm_ver && sm_ver[0] == '3' && sm_ver[1] == '0'
+ && sm_ver[2] == '\n')
+ fatal_error (input_location,
+ "%<omp requires reverse_offload%> requires at least "
+ "%<sm_35%> for %<-misa=%>");
+ sm_ver2 = sm_ver;
+ version2 = version;
+ }
+ }
+
+ /* Create function-pointer array, required for reverse
+ offload function-pointer lookup. */
+
+ if (func_ids && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0)
+ {
+ const char needle[] = "// BEGIN GLOBAL FUNCTION DECL: ";
+ fprintf (out, "static const char ptx_code_%u[] =\n", obj_count++);
+ fprintf (out, "\t\".version ");
+ for (size_t i = 0; version2[i] != '\0' && version2[i] != '\n'; i++)
+ fputc (version2[i], out);
+ fprintf (out, "\"\n\t\".target sm_");
+ for (size_t i = 0; version2[i] != '\0' && sm_ver2[i] != '\n'; i++)
+ fputc (sm_ver2[i], out);
+ fprintf (out, "\"\n\t\".file 1 \\\"<dummy>\\\"\"\n");
+
+ size_t fidx = 0;
+ for (id = func_ids; id; id = id->next)
+ {
+ /* Only 'nohost' functions are needed - use NULL for the rest.
+ Alternatively, besides searching for 'BEGIN FUNCTION DECL',
+ checking for '.visible .entry ' + id->ptx_name would be
+ required. */
+ if (!endswith (id->ptx_name, "$nohost"))
+ continue;
+ fprintf (out, "\t\".extern ");
+ const char *p = input + file_idx[fidx];
+ while (true)
+ {
+ p = strstr (p, needle);
+ if (!p)
+ {
+ fidx++;
+ if (fidx >= file_cnt)
+ break;
+ p = input + file_idx[fidx];
+ continue;
+ }
+ p += strlen (needle);
+ if (!startswith (p, id->ptx_name))
+ continue;
+ p += strlen (id->ptx_name);
+ if (*p != '\n')
+ continue;
+ p++;
+ gcc_assert (startswith (p, ".visible "));
+ p += strlen (".visible ");
+ for (; *p != '\0' && *p != '\n'; p++)
+ fputc (*p, out);
+ break;
+ }
+ fprintf (out, "\"\n");
+ if (fidx == file_cnt)
+ fatal_error (input_location,
+ "Cannot find function declaration for %qs",
+ id->ptx_name);
+ }
+ fprintf (out, "\t\".visible .global .align 8 .u64 "
+ "$offload_func_table[] = {");
+ for (comma = "", id = func_ids; id; comma = ",", id = id->next)
+ fprintf (out, "%s\"\n\t\t\"%s", comma,
+ endswith (id->ptx_name, "$nohost") ? id->ptx_name : "0");
+ fprintf (out, "};\\n\";\n\n");
}
/* Dump out array of pointers to ptx object strings. */
@@ -300,7 +396,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
/* Dump out variable idents. */
fprintf (out, "static const char *const var_mappings[] = {");
for (comma = "", id = var_ids; id; comma = ",", id = id->next)
- fprintf (out, "%s\n\t%s", comma, id->ptx_name);
+ fprintf (out, "%s\n\t\"%s\"", comma, id->ptx_name);
fprintf (out, "\n};\n\n");
/* Dump out function idents. */
@@ -309,7 +405,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
" unsigned short dim[%d];\n"
"} func_mappings[] = {\n", GOMP_DIM_MAX);
for (comma = "", id = func_ids; id; comma = ",", id = id->next)
- fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
+ fprintf (out, "%s\n\t{\"%s\"}", comma, id->ptx_name);
fprintf (out, "\n};\n\n");
fprintf (out,
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index e4297e2d6c3..3293c096822 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -989,15 +989,15 @@ write_var_marker (FILE *file, bool is_defn, bool globalize, const char *name)
static void
write_fn_proto_1 (std::stringstream &s, bool is_defn,
- const char *name, const_tree decl)
+ const char *name, const_tree decl, bool force_public)
{
if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL)
- write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
+ write_fn_marker (s, is_defn, TREE_PUBLIC (decl) || force_public, name);
/* PTX declaration. */
if (DECL_EXTERNAL (decl))
s << ".extern ";
- else if (TREE_PUBLIC (decl))
+ else if (TREE_PUBLIC (decl) || force_public)
s << (DECL_WEAK (decl) ? ".weak " : ".visible ");
s << (write_as_kernel (DECL_ATTRIBUTES (decl)) ? ".entry " : ".func ");
@@ -1086,7 +1086,7 @@ write_fn_proto_1 (std::stringstream &s, bool is_defn,
static void
write_fn_proto (std::stringstream &s, bool is_defn,
- const char *name, const_tree decl)
+ const char *name, const_tree decl, bool force_public=false)
{
const char *replacement = nvptx_name_replacement (name);
char *replaced_dots = NULL;
@@ -1103,9 +1103,9 @@ write_fn_proto (std::stringstream &s, bool is_defn,
if (is_defn)
/* Emit a declaration. The PTX assembler gets upset without it. */
- write_fn_proto_1 (s, false, name, decl);
+ write_fn_proto_1 (s, false, name, decl, force_public);
- write_fn_proto_1 (s, is_defn, name, decl);
+ write_fn_proto_1 (s, is_defn, name, decl, force_public);
if (replaced_dots)
XDELETE (replaced_dots);
@@ -1481,7 +1481,13 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
tree fntype = TREE_TYPE (decl);
tree result_type = TREE_TYPE (fntype);
int argno = 0;
+ bool force_public = false;
+ /* For reverse-offload 'nohost' functions: In order to be collectable in
+ '$offload_func_table', cf. mkoffload.cc, the function has to be visible. */
+ if (lookup_attribute ("omp target device_ancestor_nohost",
+ DECL_ATTRIBUTES (decl)))
+ force_public = true;
if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl))
&& !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl)))
{
@@ -1493,7 +1499,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
/* We construct the initial part of the function into a string
stream, in order to share the prototype writing code. */
std::stringstream s;
- write_fn_proto (s, true, name, decl);
+ write_fn_proto (s, true, name, decl, force_public);
s << "{\n";
bool return_in_mem = write_return_type (s, false, result_type);
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index d130665ed19..ac400fc2a1d 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1273,7 +1273,7 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
int
GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
struct addr_pair **target_table,
- uint64_t **rev_fn_table __attribute__((unused)))
+ uint64_t **rev_fn_table)
{
CUmodule module;
const char *const *var_names;
@@ -1376,6 +1376,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
targ_tbl->start = targ_tbl->end = 0;
targ_tbl++;
+ if (rev_fn_table && fn_entries == 0)
+ *rev_fn_table = NULL;
+ else if (rev_fn_table)
+ {
+ CUdeviceptr var;
+ size_t bytes;
+ r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &var, &bytes, module,
+ "$offload_func_table");
+ if (r != CUDA_SUCCESS)
+ GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r));
+ assert (bytes == sizeof (uint64_t) * fn_entries);
+ *rev_fn_table = GOMP_PLUGIN_malloc (sizeof (uint64_t) * fn_entries);
+ r = CUDA_CALL_NOCHECK (cuMemcpyDtoH, *rev_fn_table, var, bytes);
+ if (r != CUDA_SUCCESS)
+ GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuda_error (r));
+ }
+
nvptx_set_clocktick (module, dev);
return fn_entries + var_entries + other_entries;
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Patch][2/3] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
2022-08-25 17:30 ` [Patch][2/3] nvptx: " Tobias Burnus
@ 2022-08-25 17:30 ` Tobias Burnus
2022-08-29 18:43 ` [Patch][2/3][v2] " Tobias Burnus
2025-06-04 17:22 ` [PUSHED] Avoid SIGSEGV in nvptx 'mkoffload' for voluminous PTX code Thomas Schwinge
2 siblings, 0 replies; 16+ messages in thread
From: Tobias Burnus @ 2022-08-25 17:30 UTC (permalink / raw)
To: gcc-patches, Jakub Jelinek, Tom de Vries
[-- Attachment #1.1: Type: text/plain, Size: 1192 bytes --]
On 25.08.22 16:54, Tobias Burnus wrote:
The attached patch prepare for reverse-offload device->host
function-address lookup by requesting (if needed) the on-device address.
This patch adds the actual implementation for NVPTX.
Having array[] = {fn1,fn2}; works with nvptx only since sm_35; hence,
if there is a reverse_offload and sm_30 is used, there will be a compile-time
error.
To avoid incompatibilities, I compile with the same PTX ISA .version and
sm_XX version as the (last) file that contains the reverse offload. While
it should not matter, some newer CUDA might not support, e.g., sm_35 or
do not like a specific ISA version - thus, that seemed to be safer.
This is currently effectively a no op as with [1/3] patch, always NULL
is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as
'omp requires reverse_offload' has been specified.
OK for mainline?
Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Attachment #2: fn-lookup-nvptx.diff --]
[-- Type: text/x-patch, Size: 11019 bytes --]
nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
Add support to nvptx for reverse lookup of function name to prepare for
'omp target device(ancestor:1)'.
gcc/ChangeLog:
* config/nvptx/mkoffload.cc (record_id): Strip quotations from
function name.
(process): For GOMP_REQUIRES_REVERSE_OFFLOAD, check that -march is
at least sm_35, create '$offload_func_table' global array and init
with reverse-offload function addresses.
* config/nvptx/nvptx.cc (write_fn_proto_1, write_fn_proto): New
force_public attribute to force .visible.
(nvptx_declare_function_name): For "omp target
device_ancestor_nohost" attribut, force .visible/TREE_PUBLIC.
libgomp/ChangeLog:
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Read offload
function address table '$offload_func_table' if rev_fn_table
is not NULL.
gcc/config/nvptx/mkoffload.cc | 104 ++++++++++++++++++++++++++++++++++++++++--
gcc/config/nvptx/nvptx.cc | 20 +++++---
libgomp/plugin/plugin-nvptx.c | 19 +++++++-
3 files changed, 131 insertions(+), 12 deletions(-)
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 3eea0a8f138..c496766b1cc 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -108,12 +108,14 @@ xputenv (const char *string)
static void
record_id (const char *p1, id_map ***where)
{
+ gcc_assert (p1[0] == '"');
+ p1++;
const char *end = strchr (p1, '\n');
if (!end)
fatal_error (input_location, "malformed ptx file");
id_map *v = XNEW (id_map);
- size_t len = end - p1;
+ size_t len = end - p1 - 1 ; /* remove tailing '"' */
v->ptx_name = XNEWVEC (char, len + 1);
memcpy (v->ptx_name, p1, len);
v->ptx_name[len] = '\0';
@@ -242,6 +244,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
id_map const *id;
unsigned obj_count = 0;
unsigned ix;
+ const char *sm_ver = NULL, *version = NULL;
+ const char *sm_ver2 = NULL, *version2 = NULL;
+ size_t file_cnt = 0;
+ size_t *file_idx = XALLOCAVEC (size_t, len);
fprintf (out, "#include <stdint.h>\n\n");
@@ -250,6 +256,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
for (size_t i = 0; i != len;)
{
char c;
+ bool output_fn_ptr = false;
+ file_idx[file_cnt++] = i;
fprintf (out, "static const char ptx_code_%u[] =\n\t\"", obj_count++);
while ((c = input[i++]))
@@ -261,6 +269,16 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
case '\n':
fprintf (out, "\\n\"\n\t\"");
/* Look for mappings on subsequent lines. */
+ if (UNLIKELY (startswith (input + i, ".target sm_")))
+ {
+ sm_ver = input + i + strlen (".target sm_");
+ continue;
+ }
+ if (UNLIKELY (startswith (input + i, ".version ")))
+ {
+ version = input + i + strlen (".version ");
+ continue;
+ }
while (startswith (input + i, "//:"))
{
i += 3;
@@ -268,7 +286,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
if (startswith (input + i, "VAR_MAP "))
record_id (input + i + 8, &vars_tail);
else if (startswith (input + i, "FUNC_MAP "))
- record_id (input + i + 9, &funcs_tail);
+ {
+ output_fn_ptr = true;
+ record_id (input + i + 9, &funcs_tail);
+ }
else
abort ();
/* Skip to next line. */
@@ -286,6 +307,81 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
putc (c, out);
}
fprintf (out, "\";\n\n");
+ if (output_fn_ptr
+ && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0)
+ {
+ if (sm_ver && sm_ver[0] == '3' && sm_ver[1] == '0'
+ && sm_ver[2] == '\n')
+ fatal_error (input_location,
+ "%<omp requires reverse_offload%> requires at least "
+ "%<sm_35%> for %<-misa=%>");
+ sm_ver2 = sm_ver;
+ version2 = version;
+ }
+ }
+
+ /* Create function-pointer array, required for reverse
+ offload function-pointer lookup. */
+
+ if (func_ids && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0)
+ {
+ const char needle[] = "// BEGIN GLOBAL FUNCTION DECL: ";
+ fprintf (out, "static const char ptx_code_%u[] =\n", obj_count++);
+ fprintf (out, "\t\".version ");
+ for (size_t i = 0; version2[i] != '\0' && version2[i] != '\n'; i++)
+ fputc (version2[i], out);
+ fprintf (out, "\"\n\t\".target sm_");
+ for (size_t i = 0; version2[i] != '\0' && sm_ver2[i] != '\n'; i++)
+ fputc (sm_ver2[i], out);
+ fprintf (out, "\"\n\t\".file 1 \\\"<dummy>\\\"\"\n");
+
+ size_t fidx = 0;
+ for (id = func_ids; id; id = id->next)
+ {
+ /* Only 'nohost' functions are needed - use NULL for the rest.
+ Alternatively, besides searching for 'BEGIN FUNCTION DECL',
+ checking for '.visible .entry ' + id->ptx_name would be
+ required. */
+ if (!endswith (id->ptx_name, "$nohost"))
+ continue;
+ fprintf (out, "\t\".extern ");
+ const char *p = input + file_idx[fidx];
+ while (true)
+ {
+ p = strstr (p, needle);
+ if (!p)
+ {
+ fidx++;
+ if (fidx >= file_cnt)
+ break;
+ p = input + file_idx[fidx];
+ continue;
+ }
+ p += strlen (needle);
+ if (!startswith (p, id->ptx_name))
+ continue;
+ p += strlen (id->ptx_name);
+ if (*p != '\n')
+ continue;
+ p++;
+ gcc_assert (startswith (p, ".visible "));
+ p += strlen (".visible ");
+ for (; *p != '\0' && *p != '\n'; p++)
+ fputc (*p, out);
+ break;
+ }
+ fprintf (out, "\"\n");
+ if (fidx == file_cnt)
+ fatal_error (input_location,
+ "Cannot find function declaration for %qs",
+ id->ptx_name);
+ }
+ fprintf (out, "\t\".visible .global .align 8 .u64 "
+ "$offload_func_table[] = {");
+ for (comma = "", id = func_ids; id; comma = ",", id = id->next)
+ fprintf (out, "%s\"\n\t\t\"%s", comma,
+ endswith (id->ptx_name, "$nohost") ? id->ptx_name : "0");
+ fprintf (out, "};\\n\";\n\n");
}
/* Dump out array of pointers to ptx object strings. */
@@ -300,7 +396,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
/* Dump out variable idents. */
fprintf (out, "static const char *const var_mappings[] = {");
for (comma = "", id = var_ids; id; comma = ",", id = id->next)
- fprintf (out, "%s\n\t%s", comma, id->ptx_name);
+ fprintf (out, "%s\n\t\"%s\"", comma, id->ptx_name);
fprintf (out, "\n};\n\n");
/* Dump out function idents. */
@@ -309,7 +405,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
" unsigned short dim[%d];\n"
"} func_mappings[] = {\n", GOMP_DIM_MAX);
for (comma = "", id = func_ids; id; comma = ",", id = id->next)
- fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
+ fprintf (out, "%s\n\t{\"%s\"}", comma, id->ptx_name);
fprintf (out, "\n};\n\n");
fprintf (out,
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index e4297e2d6c3..3293c096822 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -989,15 +989,15 @@ write_var_marker (FILE *file, bool is_defn, bool globalize, const char *name)
static void
write_fn_proto_1 (std::stringstream &s, bool is_defn,
- const char *name, const_tree decl)
+ const char *name, const_tree decl, bool force_public)
{
if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL)
- write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
+ write_fn_marker (s, is_defn, TREE_PUBLIC (decl) || force_public, name);
/* PTX declaration. */
if (DECL_EXTERNAL (decl))
s << ".extern ";
- else if (TREE_PUBLIC (decl))
+ else if (TREE_PUBLIC (decl) || force_public)
s << (DECL_WEAK (decl) ? ".weak " : ".visible ");
s << (write_as_kernel (DECL_ATTRIBUTES (decl)) ? ".entry " : ".func ");
@@ -1086,7 +1086,7 @@ write_fn_proto_1 (std::stringstream &s, bool is_defn,
static void
write_fn_proto (std::stringstream &s, bool is_defn,
- const char *name, const_tree decl)
+ const char *name, const_tree decl, bool force_public=false)
{
const char *replacement = nvptx_name_replacement (name);
char *replaced_dots = NULL;
@@ -1103,9 +1103,9 @@ write_fn_proto (std::stringstream &s, bool is_defn,
if (is_defn)
/* Emit a declaration. The PTX assembler gets upset without it. */
- write_fn_proto_1 (s, false, name, decl);
+ write_fn_proto_1 (s, false, name, decl, force_public);
- write_fn_proto_1 (s, is_defn, name, decl);
+ write_fn_proto_1 (s, is_defn, name, decl, force_public);
if (replaced_dots)
XDELETE (replaced_dots);
@@ -1481,7 +1481,13 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
tree fntype = TREE_TYPE (decl);
tree result_type = TREE_TYPE (fntype);
int argno = 0;
+ bool force_public = false;
+ /* For reverse-offload 'nohost' functions: In order to be collectable in
+ '$offload_func_table', cf. mkoffload.cc, the function has to be visible. */
+ if (lookup_attribute ("omp target device_ancestor_nohost",
+ DECL_ATTRIBUTES (decl)))
+ force_public = true;
if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl))
&& !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl)))
{
@@ -1493,7 +1499,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
/* We construct the initial part of the function into a string
stream, in order to share the prototype writing code. */
std::stringstream s;
- write_fn_proto (s, true, name, decl);
+ write_fn_proto (s, true, name, decl, force_public);
s << "{\n";
bool return_in_mem = write_return_type (s, false, result_type);
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index d130665ed19..ac400fc2a1d 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1273,7 +1273,7 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
int
GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
struct addr_pair **target_table,
- uint64_t **rev_fn_table __attribute__((unused)))
+ uint64_t **rev_fn_table)
{
CUmodule module;
const char *const *var_names;
@@ -1376,6 +1376,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
targ_tbl->start = targ_tbl->end = 0;
targ_tbl++;
+ if (rev_fn_table && fn_entries == 0)
+ *rev_fn_table = NULL;
+ else if (rev_fn_table)
+ {
+ CUdeviceptr var;
+ size_t bytes;
+ r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &var, &bytes, module,
+ "$offload_func_table");
+ if (r != CUDA_SUCCESS)
+ GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r));
+ assert (bytes == sizeof (uint64_t) * fn_entries);
+ *rev_fn_table = GOMP_PLUGIN_malloc (sizeof (uint64_t) * fn_entries);
+ r = CUDA_CALL_NOCHECK (cuMemcpyDtoH, *rev_fn_table, var, bytes);
+ if (r != CUDA_SUCCESS)
+ GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuda_error (r));
+ }
+
nvptx_set_clocktick (module, dev);
return fn_entries + var_entries + other_entries;
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
2022-08-25 17:30 ` [Patch][2/3] nvptx: " Tobias Burnus
2022-08-25 17:30 ` Tobias Burnus
@ 2022-08-29 18:43 ` Tobias Burnus
2022-08-29 18:43 ` Tobias Burnus
` (2 more replies)
2025-06-04 17:22 ` [PUSHED] Avoid SIGSEGV in nvptx 'mkoffload' for voluminous PTX code Thomas Schwinge
2 siblings, 3 replies; 16+ messages in thread
From: Tobias Burnus @ 2022-08-29 18:43 UTC (permalink / raw)
To: gcc-patches, Jakub Jelinek, Tom de Vries
[-- Attachment #1: Type: text/plain, Size: 1327 bytes --]
Slightly revised version, fixing some issues in mkoffload.cc. Otherwise, the same applies:
On 25.08.22 19:30, Tobias Burnus wrote:
On 25.08.22 16:54, Tobias Burnus wrote:
The attached patch prepare for reverse-offload device->host
function-address lookup by requesting (if needed) the on-device address.
This patch adds the actual implementation for NVPTX.
Having array[] = {fn1,fn2}; works with nvptx only since sm_35; hence,
if there is a reverse_offload and sm_30 is used, there will be a compile-time
error.
To avoid incompatibilities, I compile with the same PTX ISA .version and
sm_XX version as the (last) file that contains the reverse offload. While
it should not matter, some newer CUDA might not support, e.g., sm_35 or
do not like a specific ISA version - thus, that seemed to be safer.
This is currently effectively a no op as with [1/3] patch, always NULL
is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as
'omp requires reverse_offload' has been specified.
OK for mainline?
Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Attachment #2: fn-lookup-nvptx-v2.diff --]
[-- Type: text/x-patch, Size: 11610 bytes --]
nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
Add support to nvptx for reverse lookup of function name to prepare for
'omp target device(ancestor:1)'.
gcc/ChangeLog:
* config/nvptx/mkoffload.cc (struct id_map): Add 'dim' member.
(record_id): Store func name without quotes, store dim separately.
(process): For GOMP_REQUIRES_REVERSE_OFFLOAD, check that -march is
at least sm_35, create '$offload_func_table' global array and init
with reverse-offload function addresses.
* config/nvptx/nvptx.cc (write_fn_proto_1, write_fn_proto): New
force_public attribute to force .visible.
(nvptx_declare_function_name): For "omp target
device_ancestor_nohost" attribut, force .visible/TREE_PUBLIC.
libgomp/ChangeLog:
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Read offload
function address table '$offload_func_table' if rev_fn_table
is not NULL.
gcc/config/nvptx/mkoffload.cc | 119 ++++++++++++++++++++++++++++++++++++++++--
gcc/config/nvptx/nvptx.cc | 20 +++++---
libgomp/plugin/plugin-nvptx.c | 19 +++++++-
3 files changed, 146 insertions(+), 12 deletions(-)
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 3eea0a8f138..834b2059aac 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -47,6 +47,7 @@ struct id_map
{
id_map *next;
char *ptx_name;
+ char *dim;
};
static id_map *func_ids, **funcs_tail = &func_ids;
@@ -108,8 +109,11 @@ xputenv (const char *string)
static void
record_id (const char *p1, id_map ***where)
{
- const char *end = strchr (p1, '\n');
- if (!end)
+ gcc_assert (p1[0] == '"');
+ p1++;
+ const char *end = strchr (p1, '"');
+ const char *end2 = strchr (p1, '\n');
+ if (!end || !end2 || end >= end2)
fatal_error (input_location, "malformed ptx file");
id_map *v = XNEW (id_map);
@@ -117,6 +121,16 @@ record_id (const char *p1, id_map ***where)
v->ptx_name = XNEWVEC (char, len + 1);
memcpy (v->ptx_name, p1, len);
v->ptx_name[len] = '\0';
+ p1 = end + 1;
+ if (*end != '\n')
+ {
+ len = end2 - p1;
+ v->dim = XNEWVEC (char, len + 1);
+ memcpy (v->dim, p1, len);
+ v->dim[len] = '\0';
+ }
+ else
+ v->dim = NULL;
v->next = NULL;
id_map **tail = *where;
*tail = v;
@@ -242,6 +256,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
id_map const *id;
unsigned obj_count = 0;
unsigned ix;
+ const char *sm_ver = NULL, *version = NULL;
+ const char *sm_ver2 = NULL, *version2 = NULL;
+ size_t file_cnt = 0;
+ size_t *file_idx = XALLOCAVEC (size_t, len);
fprintf (out, "#include <stdint.h>\n\n");
@@ -250,6 +268,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
for (size_t i = 0; i != len;)
{
char c;
+ bool output_fn_ptr = false;
+ file_idx[file_cnt++] = i;
fprintf (out, "static const char ptx_code_%u[] =\n\t\"", obj_count++);
while ((c = input[i++]))
@@ -261,6 +281,16 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
case '\n':
fprintf (out, "\\n\"\n\t\"");
/* Look for mappings on subsequent lines. */
+ if (UNLIKELY (startswith (input + i, ".target sm_")))
+ {
+ sm_ver = input + i + strlen (".target sm_");
+ continue;
+ }
+ if (UNLIKELY (startswith (input + i, ".version ")))
+ {
+ version = input + i + strlen (".version ");
+ continue;
+ }
while (startswith (input + i, "//:"))
{
i += 3;
@@ -268,7 +298,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
if (startswith (input + i, "VAR_MAP "))
record_id (input + i + 8, &vars_tail);
else if (startswith (input + i, "FUNC_MAP "))
- record_id (input + i + 9, &funcs_tail);
+ {
+ output_fn_ptr = true;
+ record_id (input + i + 9, &funcs_tail);
+ }
else
abort ();
/* Skip to next line. */
@@ -286,6 +319,81 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
putc (c, out);
}
fprintf (out, "\";\n\n");
+ if (output_fn_ptr
+ && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0)
+ {
+ if (sm_ver && sm_ver[0] == '3' && sm_ver[1] == '0'
+ && sm_ver[2] == '\n')
+ fatal_error (input_location,
+ "%<omp requires reverse_offload%> requires at least "
+ "%<sm_35%> for %<-misa=%>");
+ sm_ver2 = sm_ver;
+ version2 = version;
+ }
+ }
+
+ /* Create function-pointer array, required for reverse
+ offload function-pointer lookup. */
+
+ if (func_ids && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0)
+ {
+ const char needle[] = "// BEGIN GLOBAL FUNCTION DECL: ";
+ fprintf (out, "static const char ptx_code_%u[] =\n", obj_count++);
+ fprintf (out, "\t\".version ");
+ for (size_t i = 0; version2[i] != '\0' && version2[i] != '\n'; i++)
+ fputc (version2[i], out);
+ fprintf (out, "\"\n\t\".target sm_");
+ for (size_t i = 0; version2[i] != '\0' && sm_ver2[i] != '\n'; i++)
+ fputc (sm_ver2[i], out);
+ fprintf (out, "\"\n\t\".file 1 \\\"<dummy>\\\"\"\n");
+
+ size_t fidx = 0;
+ for (id = func_ids; id; id = id->next)
+ {
+ /* Only 'nohost' functions are needed - use NULL for the rest.
+ Alternatively, besides searching for 'BEGIN FUNCTION DECL',
+ checking for '.visible .entry ' + id->ptx_name would be
+ required. */
+ if (!endswith (id->ptx_name, "$nohost"))
+ continue;
+ fprintf (out, "\t\".extern ");
+ const char *p = input + file_idx[fidx];
+ while (true)
+ {
+ p = strstr (p, needle);
+ if (!p)
+ {
+ fidx++;
+ if (fidx >= file_cnt)
+ break;
+ p = input + file_idx[fidx];
+ continue;
+ }
+ p += strlen (needle);
+ if (!startswith (p, id->ptx_name))
+ continue;
+ p += strlen (id->ptx_name);
+ if (*p != '\n')
+ continue;
+ p++;
+ gcc_assert (startswith (p, ".visible "));
+ p += strlen (".visible ");
+ for (; *p != '\0' && *p != '\n'; p++)
+ fputc (*p, out);
+ break;
+ }
+ fprintf (out, "\"\n");
+ if (fidx == file_cnt)
+ fatal_error (input_location,
+ "Cannot find function declaration for %qs",
+ id->ptx_name);
+ }
+ fprintf (out, "\t\".visible .global .align 8 .u64 "
+ "$offload_func_table[] = {");
+ for (comma = "", id = func_ids; id; comma = ",", id = id->next)
+ fprintf (out, "%s\"\n\t\t\"%s", comma,
+ endswith (id->ptx_name, "$nohost") ? id->ptx_name : "0");
+ fprintf (out, "};\\n\";\n\n");
}
/* Dump out array of pointers to ptx object strings. */
@@ -300,7 +408,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
/* Dump out variable idents. */
fprintf (out, "static const char *const var_mappings[] = {");
for (comma = "", id = var_ids; id; comma = ",", id = id->next)
- fprintf (out, "%s\n\t%s", comma, id->ptx_name);
+ fprintf (out, "%s\n\t\"%s\"", comma, id->ptx_name);
fprintf (out, "\n};\n\n");
/* Dump out function idents. */
@@ -309,7 +417,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
" unsigned short dim[%d];\n"
"} func_mappings[] = {\n", GOMP_DIM_MAX);
for (comma = "", id = func_ids; id; comma = ",", id = id->next)
- fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
+ fprintf (out, "%s\n\t{\"%s\"%s}", comma, id->ptx_name,
+ id->dim ? id->dim : "");
fprintf (out, "\n};\n\n");
fprintf (out,
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index e4297e2d6c3..3293c096822 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -989,15 +989,15 @@ write_var_marker (FILE *file, bool is_defn, bool globalize, const char *name)
static void
write_fn_proto_1 (std::stringstream &s, bool is_defn,
- const char *name, const_tree decl)
+ const char *name, const_tree decl, bool force_public)
{
if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL)
- write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
+ write_fn_marker (s, is_defn, TREE_PUBLIC (decl) || force_public, name);
/* PTX declaration. */
if (DECL_EXTERNAL (decl))
s << ".extern ";
- else if (TREE_PUBLIC (decl))
+ else if (TREE_PUBLIC (decl) || force_public)
s << (DECL_WEAK (decl) ? ".weak " : ".visible ");
s << (write_as_kernel (DECL_ATTRIBUTES (decl)) ? ".entry " : ".func ");
@@ -1086,7 +1086,7 @@ write_fn_proto_1 (std::stringstream &s, bool is_defn,
static void
write_fn_proto (std::stringstream &s, bool is_defn,
- const char *name, const_tree decl)
+ const char *name, const_tree decl, bool force_public=false)
{
const char *replacement = nvptx_name_replacement (name);
char *replaced_dots = NULL;
@@ -1103,9 +1103,9 @@ write_fn_proto (std::stringstream &s, bool is_defn,
if (is_defn)
/* Emit a declaration. The PTX assembler gets upset without it. */
- write_fn_proto_1 (s, false, name, decl);
+ write_fn_proto_1 (s, false, name, decl, force_public);
- write_fn_proto_1 (s, is_defn, name, decl);
+ write_fn_proto_1 (s, is_defn, name, decl, force_public);
if (replaced_dots)
XDELETE (replaced_dots);
@@ -1481,7 +1481,13 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
tree fntype = TREE_TYPE (decl);
tree result_type = TREE_TYPE (fntype);
int argno = 0;
+ bool force_public = false;
+ /* For reverse-offload 'nohost' functions: In order to be collectable in
+ '$offload_func_table', cf. mkoffload.cc, the function has to be visible. */
+ if (lookup_attribute ("omp target device_ancestor_nohost",
+ DECL_ATTRIBUTES (decl)))
+ force_public = true;
if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl))
&& !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl)))
{
@@ -1493,7 +1499,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
/* We construct the initial part of the function into a string
stream, in order to share the prototype writing code. */
std::stringstream s;
- write_fn_proto (s, true, name, decl);
+ write_fn_proto (s, true, name, decl, force_public);
s << "{\n";
bool return_in_mem = write_return_type (s, false, result_type);
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index d130665ed19..ac400fc2a1d 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1273,7 +1273,7 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
int
GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
struct addr_pair **target_table,
- uint64_t **rev_fn_table __attribute__((unused)))
+ uint64_t **rev_fn_table)
{
CUmodule module;
const char *const *var_names;
@@ -1376,6 +1376,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
targ_tbl->start = targ_tbl->end = 0;
targ_tbl++;
+ if (rev_fn_table && fn_entries == 0)
+ *rev_fn_table = NULL;
+ else if (rev_fn_table)
+ {
+ CUdeviceptr var;
+ size_t bytes;
+ r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &var, &bytes, module,
+ "$offload_func_table");
+ if (r != CUDA_SUCCESS)
+ GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r));
+ assert (bytes == sizeof (uint64_t) * fn_entries);
+ *rev_fn_table = GOMP_PLUGIN_malloc (sizeof (uint64_t) * fn_entries);
+ r = CUDA_CALL_NOCHECK (cuMemcpyDtoH, *rev_fn_table, var, bytes);
+ if (r != CUDA_SUCCESS)
+ GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuda_error (r));
+ }
+
nvptx_set_clocktick (module, dev);
return fn_entries + var_entries + other_entries;
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
2022-08-29 18:43 ` [Patch][2/3][v2] " Tobias Burnus
@ 2022-08-29 18:43 ` Tobias Burnus
2022-09-09 15:36 ` Jakub Jelinek
2022-09-23 15:40 ` [og12] Come up with {,UN}LIKELY macros (was: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Thomas Schwinge
2 siblings, 0 replies; 16+ messages in thread
From: Tobias Burnus @ 2022-08-29 18:43 UTC (permalink / raw)
To: gcc-patches, Jakub Jelinek, Tom de Vries
[-- Attachment #1.1: Type: text/plain, Size: 1327 bytes --]
Slightly revised version, fixing some issues in mkoffload.cc. Otherwise, the same applies:
On 25.08.22 19:30, Tobias Burnus wrote:
On 25.08.22 16:54, Tobias Burnus wrote:
The attached patch prepare for reverse-offload device->host
function-address lookup by requesting (if needed) the on-device address.
This patch adds the actual implementation for NVPTX.
Having array[] = {fn1,fn2}; works with nvptx only since sm_35; hence,
if there is a reverse_offload and sm_30 is used, there will be a compile-time
error.
To avoid incompatibilities, I compile with the same PTX ISA .version and
sm_XX version as the (last) file that contains the reverse offload. While
it should not matter, some newer CUDA might not support, e.g., sm_35 or
do not like a specific ISA version - thus, that seemed to be safer.
This is currently effectively a no op as with [1/3] patch, always NULL
is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as
'omp requires reverse_offload' has been specified.
OK for mainline?
Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Attachment #2: fn-lookup-nvptx-v2.diff --]
[-- Type: text/x-patch, Size: 11610 bytes --]
nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
Add support to nvptx for reverse lookup of function name to prepare for
'omp target device(ancestor:1)'.
gcc/ChangeLog:
* config/nvptx/mkoffload.cc (struct id_map): Add 'dim' member.
(record_id): Store func name without quotes, store dim separately.
(process): For GOMP_REQUIRES_REVERSE_OFFLOAD, check that -march is
at least sm_35, create '$offload_func_table' global array and init
with reverse-offload function addresses.
* config/nvptx/nvptx.cc (write_fn_proto_1, write_fn_proto): New
force_public attribute to force .visible.
(nvptx_declare_function_name): For "omp target
device_ancestor_nohost" attribut, force .visible/TREE_PUBLIC.
libgomp/ChangeLog:
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Read offload
function address table '$offload_func_table' if rev_fn_table
is not NULL.
gcc/config/nvptx/mkoffload.cc | 119 ++++++++++++++++++++++++++++++++++++++++--
gcc/config/nvptx/nvptx.cc | 20 +++++---
libgomp/plugin/plugin-nvptx.c | 19 +++++++-
3 files changed, 146 insertions(+), 12 deletions(-)
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 3eea0a8f138..834b2059aac 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -47,6 +47,7 @@ struct id_map
{
id_map *next;
char *ptx_name;
+ char *dim;
};
static id_map *func_ids, **funcs_tail = &func_ids;
@@ -108,8 +109,11 @@ xputenv (const char *string)
static void
record_id (const char *p1, id_map ***where)
{
- const char *end = strchr (p1, '\n');
- if (!end)
+ gcc_assert (p1[0] == '"');
+ p1++;
+ const char *end = strchr (p1, '"');
+ const char *end2 = strchr (p1, '\n');
+ if (!end || !end2 || end >= end2)
fatal_error (input_location, "malformed ptx file");
id_map *v = XNEW (id_map);
@@ -117,6 +121,16 @@ record_id (const char *p1, id_map ***where)
v->ptx_name = XNEWVEC (char, len + 1);
memcpy (v->ptx_name, p1, len);
v->ptx_name[len] = '\0';
+ p1 = end + 1;
+ if (*end != '\n')
+ {
+ len = end2 - p1;
+ v->dim = XNEWVEC (char, len + 1);
+ memcpy (v->dim, p1, len);
+ v->dim[len] = '\0';
+ }
+ else
+ v->dim = NULL;
v->next = NULL;
id_map **tail = *where;
*tail = v;
@@ -242,6 +256,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
id_map const *id;
unsigned obj_count = 0;
unsigned ix;
+ const char *sm_ver = NULL, *version = NULL;
+ const char *sm_ver2 = NULL, *version2 = NULL;
+ size_t file_cnt = 0;
+ size_t *file_idx = XALLOCAVEC (size_t, len);
fprintf (out, "#include <stdint.h>\n\n");
@@ -250,6 +268,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
for (size_t i = 0; i != len;)
{
char c;
+ bool output_fn_ptr = false;
+ file_idx[file_cnt++] = i;
fprintf (out, "static const char ptx_code_%u[] =\n\t\"", obj_count++);
while ((c = input[i++]))
@@ -261,6 +281,16 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
case '\n':
fprintf (out, "\\n\"\n\t\"");
/* Look for mappings on subsequent lines. */
+ if (UNLIKELY (startswith (input + i, ".target sm_")))
+ {
+ sm_ver = input + i + strlen (".target sm_");
+ continue;
+ }
+ if (UNLIKELY (startswith (input + i, ".version ")))
+ {
+ version = input + i + strlen (".version ");
+ continue;
+ }
while (startswith (input + i, "//:"))
{
i += 3;
@@ -268,7 +298,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
if (startswith (input + i, "VAR_MAP "))
record_id (input + i + 8, &vars_tail);
else if (startswith (input + i, "FUNC_MAP "))
- record_id (input + i + 9, &funcs_tail);
+ {
+ output_fn_ptr = true;
+ record_id (input + i + 9, &funcs_tail);
+ }
else
abort ();
/* Skip to next line. */
@@ -286,6 +319,81 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
putc (c, out);
}
fprintf (out, "\";\n\n");
+ if (output_fn_ptr
+ && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0)
+ {
+ if (sm_ver && sm_ver[0] == '3' && sm_ver[1] == '0'
+ && sm_ver[2] == '\n')
+ fatal_error (input_location,
+ "%<omp requires reverse_offload%> requires at least "
+ "%<sm_35%> for %<-misa=%>");
+ sm_ver2 = sm_ver;
+ version2 = version;
+ }
+ }
+
+ /* Create function-pointer array, required for reverse
+ offload function-pointer lookup. */
+
+ if (func_ids && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0)
+ {
+ const char needle[] = "// BEGIN GLOBAL FUNCTION DECL: ";
+ fprintf (out, "static const char ptx_code_%u[] =\n", obj_count++);
+ fprintf (out, "\t\".version ");
+ for (size_t i = 0; version2[i] != '\0' && version2[i] != '\n'; i++)
+ fputc (version2[i], out);
+ fprintf (out, "\"\n\t\".target sm_");
+ for (size_t i = 0; version2[i] != '\0' && sm_ver2[i] != '\n'; i++)
+ fputc (sm_ver2[i], out);
+ fprintf (out, "\"\n\t\".file 1 \\\"<dummy>\\\"\"\n");
+
+ size_t fidx = 0;
+ for (id = func_ids; id; id = id->next)
+ {
+ /* Only 'nohost' functions are needed - use NULL for the rest.
+ Alternatively, besides searching for 'BEGIN FUNCTION DECL',
+ checking for '.visible .entry ' + id->ptx_name would be
+ required. */
+ if (!endswith (id->ptx_name, "$nohost"))
+ continue;
+ fprintf (out, "\t\".extern ");
+ const char *p = input + file_idx[fidx];
+ while (true)
+ {
+ p = strstr (p, needle);
+ if (!p)
+ {
+ fidx++;
+ if (fidx >= file_cnt)
+ break;
+ p = input + file_idx[fidx];
+ continue;
+ }
+ p += strlen (needle);
+ if (!startswith (p, id->ptx_name))
+ continue;
+ p += strlen (id->ptx_name);
+ if (*p != '\n')
+ continue;
+ p++;
+ gcc_assert (startswith (p, ".visible "));
+ p += strlen (".visible ");
+ for (; *p != '\0' && *p != '\n'; p++)
+ fputc (*p, out);
+ break;
+ }
+ fprintf (out, "\"\n");
+ if (fidx == file_cnt)
+ fatal_error (input_location,
+ "Cannot find function declaration for %qs",
+ id->ptx_name);
+ }
+ fprintf (out, "\t\".visible .global .align 8 .u64 "
+ "$offload_func_table[] = {");
+ for (comma = "", id = func_ids; id; comma = ",", id = id->next)
+ fprintf (out, "%s\"\n\t\t\"%s", comma,
+ endswith (id->ptx_name, "$nohost") ? id->ptx_name : "0");
+ fprintf (out, "};\\n\";\n\n");
}
/* Dump out array of pointers to ptx object strings. */
@@ -300,7 +408,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
/* Dump out variable idents. */
fprintf (out, "static const char *const var_mappings[] = {");
for (comma = "", id = var_ids; id; comma = ",", id = id->next)
- fprintf (out, "%s\n\t%s", comma, id->ptx_name);
+ fprintf (out, "%s\n\t\"%s\"", comma, id->ptx_name);
fprintf (out, "\n};\n\n");
/* Dump out function idents. */
@@ -309,7 +417,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
" unsigned short dim[%d];\n"
"} func_mappings[] = {\n", GOMP_DIM_MAX);
for (comma = "", id = func_ids; id; comma = ",", id = id->next)
- fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
+ fprintf (out, "%s\n\t{\"%s\"%s}", comma, id->ptx_name,
+ id->dim ? id->dim : "");
fprintf (out, "\n};\n\n");
fprintf (out,
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index e4297e2d6c3..3293c096822 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -989,15 +989,15 @@ write_var_marker (FILE *file, bool is_defn, bool globalize, const char *name)
static void
write_fn_proto_1 (std::stringstream &s, bool is_defn,
- const char *name, const_tree decl)
+ const char *name, const_tree decl, bool force_public)
{
if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL)
- write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
+ write_fn_marker (s, is_defn, TREE_PUBLIC (decl) || force_public, name);
/* PTX declaration. */
if (DECL_EXTERNAL (decl))
s << ".extern ";
- else if (TREE_PUBLIC (decl))
+ else if (TREE_PUBLIC (decl) || force_public)
s << (DECL_WEAK (decl) ? ".weak " : ".visible ");
s << (write_as_kernel (DECL_ATTRIBUTES (decl)) ? ".entry " : ".func ");
@@ -1086,7 +1086,7 @@ write_fn_proto_1 (std::stringstream &s, bool is_defn,
static void
write_fn_proto (std::stringstream &s, bool is_defn,
- const char *name, const_tree decl)
+ const char *name, const_tree decl, bool force_public=false)
{
const char *replacement = nvptx_name_replacement (name);
char *replaced_dots = NULL;
@@ -1103,9 +1103,9 @@ write_fn_proto (std::stringstream &s, bool is_defn,
if (is_defn)
/* Emit a declaration. The PTX assembler gets upset without it. */
- write_fn_proto_1 (s, false, name, decl);
+ write_fn_proto_1 (s, false, name, decl, force_public);
- write_fn_proto_1 (s, is_defn, name, decl);
+ write_fn_proto_1 (s, is_defn, name, decl, force_public);
if (replaced_dots)
XDELETE (replaced_dots);
@@ -1481,7 +1481,13 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
tree fntype = TREE_TYPE (decl);
tree result_type = TREE_TYPE (fntype);
int argno = 0;
+ bool force_public = false;
+ /* For reverse-offload 'nohost' functions: In order to be collectable in
+ '$offload_func_table', cf. mkoffload.cc, the function has to be visible. */
+ if (lookup_attribute ("omp target device_ancestor_nohost",
+ DECL_ATTRIBUTES (decl)))
+ force_public = true;
if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl))
&& !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl)))
{
@@ -1493,7 +1499,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
/* We construct the initial part of the function into a string
stream, in order to share the prototype writing code. */
std::stringstream s;
- write_fn_proto (s, true, name, decl);
+ write_fn_proto (s, true, name, decl, force_public);
s << "{\n";
bool return_in_mem = write_return_type (s, false, result_type);
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index d130665ed19..ac400fc2a1d 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1273,7 +1273,7 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
int
GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
struct addr_pair **target_table,
- uint64_t **rev_fn_table __attribute__((unused)))
+ uint64_t **rev_fn_table)
{
CUmodule module;
const char *const *var_names;
@@ -1376,6 +1376,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
targ_tbl->start = targ_tbl->end = 0;
targ_tbl++;
+ if (rev_fn_table && fn_entries == 0)
+ *rev_fn_table = NULL;
+ else if (rev_fn_table)
+ {
+ CUdeviceptr var;
+ size_t bytes;
+ r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &var, &bytes, module,
+ "$offload_func_table");
+ if (r != CUDA_SUCCESS)
+ GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r));
+ assert (bytes == sizeof (uint64_t) * fn_entries);
+ *rev_fn_table = GOMP_PLUGIN_malloc (sizeof (uint64_t) * fn_entries);
+ r = CUDA_CALL_NOCHECK (cuMemcpyDtoH, *rev_fn_table, var, bytes);
+ if (r != CUDA_SUCCESS)
+ GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuda_error (r));
+ }
+
nvptx_set_clocktick (module, dev);
return fn_entries + var_entries + other_entries;
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch][1/3] libgomp: Prepare for reverse offload fn lookup
2022-08-25 14:54 [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Tobias Burnus
` (2 preceding siblings ...)
2022-08-25 17:30 ` [Patch][2/3] nvptx: " Tobias Burnus
@ 2022-09-09 15:29 ` Jakub Jelinek
3 siblings, 0 replies; 16+ messages in thread
From: Jakub Jelinek @ 2022-09-09 15:29 UTC (permalink / raw)
To: Tobias Burnus; +Cc: gcc-patches
On Thu, Aug 25, 2022 at 04:54:51PM +0200, Tobias Burnus wrote:
> Technically, this patch is stand alone, but conceptually it based on the
> submitted but not reviewed patch:
> "[Patch] OpenMP: Support reverse offload (middle end part)"
> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598662.html
>
> With that patch, for reverse offloads ('omp target device(ancestor:1)'),
> calls like the following are added:
> GOMP_target_ext (-2 /* initial device */, omp_fn.1
> where 'omp_fn.1' on nonhost devices a stub function just required for
> looking up the host function pointer via the offload_funcs table.
>
> The attached patch prepare for reverse-offload device->host
> function-address lookup by requesting (if needed) the on-device address.
>
> OK for mainline?
>
> Tobias
>
>
> -----------------
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
> libgomp: Prepare for reverse offload fn lookup
>
> Prepare for reverse-offloading function-pointer lookup by passing
> a rev_fn_table argument to GOMP_OFFLOAD_load_image.
>
> The argument will be NULL, unless GOMP_REQUIRES_REVERSE_OFFLOAD is
> requested and devices not supported it, are filtered out.
> (Up to and including this commit, no non-host device claims such
> support and the caller currently always passes NULL.)
>
> libgomp/ChangeLog:
>
> * libgomp-plugin.h (GOMP_OFFLOAD_load_image): Add
> 'uint64_t **rev_fn_table' argument.
> * oacc-host.c (host_load_image): Likewise.
> * plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Likewise;
> currently unused.
> * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
> * target.c (gomp_load_image_to_device): Update call but pass
> NULL for now.
>
> liboffloadmic/ChangeLog:
>
> * plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_load_image):
> Add (unused) uint64_t **rev_fn_table argument.
Ok, thanks.
Jakub
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch][2/3] GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
2022-08-25 15:38 ` [Patch][2/3] GCN: libgomp+mkoffload.cc: " Tobias Burnus
2022-08-25 15:38 ` Tobias Burnus
@ 2022-09-09 15:31 ` Jakub Jelinek
1 sibling, 0 replies; 16+ messages in thread
From: Jakub Jelinek @ 2022-09-09 15:31 UTC (permalink / raw)
To: Tobias Burnus; +Cc: gcc-patches, Andrew Stubbs
On Thu, Aug 25, 2022 at 05:38:58PM +0200, Tobias Burnus wrote:
> On 25.08.22 16:54, Tobias Burnus wrote:
>
> The attached patch prepare for reverse-offload device->host
> function-address lookup by requesting (if needed) the on-device address.
>
>
> This patch adds the actual implementation for GCN. A variant would be
> to only generate .offload_func_table inside mkoffload when
> OMP_REQUIRES_REVERSE_OFFLOAD has been requested.
>
> This is currently effectively a no op as with [1/3] patch, always NULL
> is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as
> 'omp requires reverse_offload' has been specified.
>
> OK for mainline?
>
> Tobias
>
>
> -----------------
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
> GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
>
> Add support to GCN for reverse lookup of function name to prepare for
> 'omp target device(ancestor:1)'.
>
> gcc/ChangeLog:
>
> * config/gcn/mkoffload.cc (process_asm): Create .offload_func_table,
> similar to pre-existing .offload_var_table.
>
> libgomp/ChangeLog:
>
> * plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Read
> .offload_func_table to populate rev_fn_table when requested.
Ok.
Jakub
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
2022-08-29 18:43 ` [Patch][2/3][v2] " Tobias Burnus
2022-08-29 18:43 ` Tobias Burnus
@ 2022-09-09 15:36 ` Jakub Jelinek
2022-09-12 12:02 ` [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible (was: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Tobias Burnus
2022-09-23 15:40 ` [og12] Come up with {,UN}LIKELY macros (was: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Thomas Schwinge
2 siblings, 1 reply; 16+ messages in thread
From: Jakub Jelinek @ 2022-09-09 15:36 UTC (permalink / raw)
To: Tobias Burnus; +Cc: gcc-patches, Tom de Vries
On Mon, Aug 29, 2022 at 08:43:26PM +0200, Tobias Burnus wrote:
> Slightly revised version, fixing some issues in mkoffload.cc. Otherwise, the same applies:
>
> On 25.08.22 19:30, Tobias Burnus wrote:
> On 25.08.22 16:54, Tobias Burnus wrote:
>
> The attached patch prepare for reverse-offload device->host
> function-address lookup by requesting (if needed) the on-device address.
>
>
> This patch adds the actual implementation for NVPTX.
>
> Having array[] = {fn1,fn2}; works with nvptx only since sm_35; hence,
> if there is a reverse_offload and sm_30 is used, there will be a compile-time
> error.
Wonder if we instead shouldn't arrange for silent request for no PTX
offloading (or one with warning?) if sm_30 and reverse offload is needed.
Error might be too harsh, the program can still offload to GCN or host
just fine...
Otherwise LGTM.
Jakub
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible (was: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup)
2022-09-09 15:36 ` Jakub Jelinek
@ 2022-09-12 12:02 ` Tobias Burnus
2022-09-12 12:10 ` Jakub Jelinek
2022-10-17 11:59 ` Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c' (was: [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible) Thomas Schwinge
0 siblings, 2 replies; 16+ messages in thread
From: Tobias Burnus @ 2022-09-12 12:02 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: gcc-patches, Tom de Vries
[-- Attachment #1.1: Type: text/plain, Size: 1784 bytes --]
On 09.09.22 17:36, Jakub Jelinek wrote:
Wonder if we instead shouldn't arrange for silent request for no PTX
offloading (or one with warning?) if sm_30 and reverse offload is needed.
Error might be too harsh, the program can still offload to GCN or host
just fine...
Attached patch now implements the warning. I think silently failing is
not the proper solution. It is too confusing and without telling the
user, they may not notice this issue.
I also changed the "progname" variable; it seems as if the only user is
the diagnostic machinery and it seems to make sense to have:
nvptx mkoffload: warning: 'omp requires reverse_offload' requires at least
'sm_35' for '-misa=' - disabling offload-code generation for this device type
instead of guessing whether "mkoffload" is for the host, gcn or nvptx. I know
that the common way is to use the binary name ("lto1") or ... but I still
think this prefix – which is the tool_name. Makes sense.
(BTW: gcc/config/i386/intelmic-mkoffload.cc uses "mkoffload-intelmic".)
I also changed "-misa=" in the diagnostic to "-march=" as Tom changed the default
from "-misa=" (now an alias) to "-march=" in GCC 12 (+plus added -march-map= in addition).
I also added a testcase for this – and updated the testsuite for the dg-warning.
OK for mainline?
Tobias
PS: As no code for nvptx is generated, there is no "device present bit not used" warning
with GOMP_DEBUG, but as there is a compile-time warning, I guess that's fine.
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Attachment #2: nvptx-warn-if-sm-too-old-v3.diff --]
[-- Type: text/x-patch, Size: 7573 bytes --]
nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible
Reverse offload requests at least -misa=sm_35; with this patch, a warning
instead of an error is shown, still permitting reverse offload for all
other configured device types. This is achieved by not calling
GOMP_offload_register_ver (and stopping generating pointless 'static const char'
variables, once known.)
The tool_name as progname changes adds "nvptx " and "gcn " to the
"mkoffload: warning/error:" diagnostic.
gcc/ChangeLog:
* config/nvptx/mkoffload.cc (process): Replace a fatal_error by
a warning + not enabling offloading if -misa=sm_30 prevents
reverse offload.
(main): Use tool_name as progname for diagnostic.
* config/gcn/mkoffload.cc (main): Likewise.
libgomp/ChangeLog:
* libgomp.texi (Offload-Target Specifics: nvptx): Document
that reverse offload requires >= -march=sm_35.
* testsuite/libgomp.c-c++-common/requires-4.c: Build for nvptx
with -misa=sm_35.
* testsuite/libgomp.c-c++-common/requires-5.c: Likewise.
* testsuite/libgomp.c-c++-common/requires-6.c: Likewise.
* testsuite/libgomp.c-c++-common/reverse-offload-1.c: Likewise.
* testsuite/libgomp.fortran/reverse-offload-1.f90: Likewise.
* testsuite/libgomp.c/reverse-offload-sm30.c: New test.
gcc/config/gcn/mkoffload.cc | 2 +-
gcc/config/nvptx/mkoffload.cc | 17 +++++++++++++----
libgomp/libgomp.texi | 3 +++
libgomp/testsuite/libgomp.c-c++-common/requires-4.c | 1 +
libgomp/testsuite/libgomp.c-c++-common/requires-5.c | 1 +
libgomp/testsuite/libgomp.c-c++-common/requires-6.c | 2 ++
.../testsuite/libgomp.c-c++-common/reverse-offload-1.c | 1 +
libgomp/testsuite/libgomp.c/reverse-offload-sm30.c | 15 +++++++++++++++
libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90 | 1 +
9 files changed, 38 insertions(+), 5 deletions(-)
diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 24d327355e3..64037806acf 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -805,7 +805,7 @@ main (int argc, char **argv)
FILE *cfile = stdout;
const char *outname = 0;
- progname = "mkoffload";
+ progname = tool_name;
diagnostic_initialize (global_dc, 0);
obstack_init (&files_to_cleanup);
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 834b2059aac..5f3e07ad066 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -324,9 +324,18 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
{
if (sm_ver && sm_ver[0] == '3' && sm_ver[1] == '0'
&& sm_ver[2] == '\n')
- fatal_error (input_location,
- "%<omp requires reverse_offload%> requires at least "
- "%<sm_35%> for %<-misa=%>");
+ {
+ warning_at (input_location, 0,
+ "%<omp requires reverse_offload%> requires at "
+ "least %<sm_35%> for %<-march=%> - disabling "
+ "offload-code generation for this device type");
+ /* As now an empty file is compiled and there is no call to
+ GOMP_offload_register_ver, this device type is effectively
+ disabled. */
+ fflush (out);
+ ftruncate (fileno (out), 0);
+ return;
+ }
sm_ver2 = sm_ver;
version2 = version;
}
@@ -526,7 +535,7 @@ main (int argc, char **argv)
FILE *out = stdout;
const char *outname = 0;
- progname = "mkoffload";
+ progname = tool_name;
diagnostic_initialize (global_dc, 0);
if (atexit (mkoffload_cleanup) != 0)
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 4eaad4348bb..1f402d6df79 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -4386,6 +4386,9 @@ The implementation remark:
@item I/O within OpenMP target regions and OpenACC parallel/kernels is supported
using the C library @code{printf} functions and the Fortran
@code{print}/@code{write} statements.
+@item Compilation OpenMP code that contains @code{requires reverse_offload}
+ requires at least @code{-march=sm_35}, compiling for @code{-march=sm_30}
+ is not supported.
@end itemize
diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
index 6ed5a5f647a..5883eff0d93 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
@@ -1,4 +1,5 @@
/* { dg-additional-options "-flto" } */
+/* { dg-additional-options "-foffload-options=nvptx-none=-misa=sm_35" { target { offload_target_nvptx } } } */
/* { dg-additional-sources requires-4-aux.c } */
/* Check no diagnostic by device-compiler's or host compiler's lto1.
diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-5.c b/libgomp/testsuite/libgomp.c-c++-common/requires-5.c
index 7fe0c735d27..d43d78db6fa 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/requires-5.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-5.c
@@ -1,3 +1,4 @@
+/* { dg-additional-options "-foffload-options=nvptx-none=-misa=sm_35" { target { offload_target_nvptx } } } */
/* { dg-additional-sources requires-5-aux.c } */
/* Depending on offload device capabilities, it may print something like the
diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-6.c b/libgomp/testsuite/libgomp.c-c++-common/requires-6.c
index b00c7459bbc..a25b4d2dedd 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/requires-6.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-6.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-foffload-options=nvptx-none=-misa=sm_35" { target { offload_target_nvptx } } } */
+
#pragma omp requires unified_shared_memory, unified_address, reverse_offload
/* The requires line is not active as there is none of:
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c b/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c
index 976e129f560..52d828caf1c 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c
@@ -1,4 +1,5 @@
/* { dg-do run } */
+/* { dg-additional-options "-foffload-options=nvptx-none=-misa=sm_35" { target { offload_target_nvptx } } } */
/* { dg-additional-sources reverse-offload-1-aux.c } */
/* Check that reverse offload works in particular:
diff --git a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
new file mode 100644
index 00000000000..14aed0132b7
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
@@ -0,0 +1,15 @@
+/* { dg-do link { target { offload_target_nvptx } } } */
+/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload=-mptx=_" } */
+
+#pragma omp requires reverse_offload
+
+int
+main ()
+{
+ #pragma omp target
+ {
+ }
+ return 0;
+}
+
+/* { dg-warning "'omp requires reverse_offload' requires at least 'sm_35' for '-march=' - disabling offload-code generation for this device type" "" { target *-*-* } 0 } */
diff --git a/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90 b/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90
index 7cfb8b6552e..de68011f8f7 100644
--- a/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90
@@ -1,4 +1,5 @@
! { dg-do run }
+! { dg-additional-options "-foffload-options=nvptx-none=-misa=sm_35" { target { offload_target_nvptx } } }
! { dg-additional-sources reverse-offload-1-aux.f90 }
! Check that reverse offload works in particular:
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible (was: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup)
2022-09-12 12:02 ` [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible (was: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Tobias Burnus
@ 2022-09-12 12:10 ` Jakub Jelinek
2022-10-17 11:59 ` Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c' (was: [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible) Thomas Schwinge
1 sibling, 0 replies; 16+ messages in thread
From: Jakub Jelinek @ 2022-09-12 12:10 UTC (permalink / raw)
To: Tobias Burnus; +Cc: gcc-patches, Tom de Vries
On Mon, Sep 12, 2022 at 02:02:16PM +0200, Tobias Burnus wrote:
> + {
> + warning_at (input_location, 0,
> + "%<omp requires reverse_offload%> requires at "
> + "least %<sm_35%> for %<-march=%> - disabling "
> + "offload-code generation for this device type");
I wonder whether it shouldn't talk about
-foffload-options=nvptx-none=-march=
instead of just -march=.
Otherwise LGTM.
Jakub
^ permalink raw reply [flat|nested] 16+ messages in thread
* [og12] Come up with {,UN}LIKELY macros (was: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup)
2022-08-29 18:43 ` [Patch][2/3][v2] " Tobias Burnus
2022-08-29 18:43 ` Tobias Burnus
2022-09-09 15:36 ` Jakub Jelinek
@ 2022-09-23 15:40 ` Thomas Schwinge
2 siblings, 0 replies; 16+ messages in thread
From: Thomas Schwinge @ 2022-09-23 15:40 UTC (permalink / raw)
To: Tobias Burnus, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 2160 bytes --]
Hi!
Since the 2022-09-12 backport of this:
On 2022-08-29T20:43:26+0200, Tobias Burnus <tobias@codesourcery.com> wrote:
> nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup
... to og12 in commit 2b6ad53fd76c7bb9605be417d137a7d9a18f2117, the og12
branch didn't build anymore:
[...]/gcc/config/nvptx/mkoffload.cc: In function 'void process(FILE*, FILE*, uint32_t)':
[...]/gcc/config/nvptx/mkoffload.cc:284:59: error: 'UNLIKELY' was not declared in this scope
if (UNLIKELY (startswith (input + i, ".target sm_")))
^
[...]/gcc/config/nvptx/mkoffload.cc:289:57: error: 'UNLIKELY' was not declared in this scope
if (UNLIKELY (startswith (input + i, ".version ")))
^
make[2]: *** [[...]/gcc/config/nvptx/t-nvptx:8: mkoffload.o] Error 1
> --- a/gcc/config/nvptx/mkoffload.cc
> +++ b/gcc/config/nvptx/mkoffload.cc
> @@ -261,6 +281,16 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
> case '\n':
> fprintf (out, "\\n\"\n\t\"");
> /* Look for mappings on subsequent lines. */
> + if (UNLIKELY (startswith (input + i, ".target sm_")))
> + {
> + sm_ver = input + i + strlen (".target sm_");
> + continue;
> + }
> + if (UNLIKELY (startswith (input + i, ".version ")))
> + {
> + version = input + i + strlen (".version ");
> + continue;
> + }
To fix this, I've pushed a (very much reduced) partial cherry-pick of
commit r13-171-g22d9c8802add09a93308319fc37dd3a0f1125393
"Come up with {,UN}LIKELY macros" to og12 branch in
commit 44b77201a5431450f608b4538fefb1319127de13, see attached.
Grüße
Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Come-up-with-UN-LIKELY-macros.og12.patch --]
[-- Type: text/x-diff, Size: 1429 bytes --]
From 44b77201a5431450f608b4538fefb1319127de13 Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Thu, 3 Feb 2022 10:58:18 +0100
Subject: [PATCH] Come up with {,UN}LIKELY macros.
gcc/ChangeLog:
* system.h (LIKELY): Define.
(UNLIKELY): Likewise.
(cherry picked from commit 22d9c8802add09a93308319fc37dd3a0f1125393, partial)
---
gcc/ChangeLog.omp | 8 ++++++++
gcc/system.h | 3 +++
2 files changed, 11 insertions(+)
diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp
index 4f80bcbd356..30c3abfc15b 100644
--- a/gcc/ChangeLog.omp
+++ b/gcc/ChangeLog.omp
@@ -1,3 +1,11 @@
+2022-09-23 Thomas Schwinge <thomas@codesourcery.com>
+
+ Backport from master branch:
+ 2022-05-09 Martin Liska <mliska@suse.cz>
+
+ * system.h (LIKELY): Define.
+ (UNLIKELY): Likewise.
+
2022-09-12 Tobias Burnus <tobias@codesourcery.com>
Backport from mainline:
diff --git a/gcc/system.h b/gcc/system.h
index e10c34f70ec..6b6868d0bbf 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -736,6 +736,9 @@ extern int vsnprintf (char *, size_t, const char *, va_list);
#define __builtin_expect(a, b) (a)
#endif
+#define LIKELY(x) (__builtin_expect ((x), 1))
+#define UNLIKELY(x) (__builtin_expect ((x), 0))
+
/* Some of the headers included by <memory> can use "abort" within a
namespace, e.g. "_VSTD::abort();", which fails after we use the
preprocessor to redefine "abort" as "fancy_abort" below. */
--
2.35.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c' (was: [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible)
2022-09-12 12:02 ` [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible (was: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Tobias Burnus
2022-09-12 12:10 ` Jakub Jelinek
@ 2022-10-17 11:59 ` Thomas Schwinge
1 sibling, 0 replies; 16+ messages in thread
From: Thomas Schwinge @ 2022-10-17 11:59 UTC (permalink / raw)
To: Tobias Burnus, gcc-patches; +Cc: Jakub Jelinek
[-- Attachment #1: Type: text/plain, Size: 1351 bytes --]
Hi!
On 2022-09-12T14:02:16+0200, Tobias Burnus <tobias@codesourcery.com> wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
> @@ -0,0 +1,15 @@
> +/* { dg-do link { target { offload_target_nvptx } } } */
> +/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload=-mptx=_" } */
Pushed to master branch
commit b61796663ba1fe8fb83203829398f3f89ec212b7
"Fix nvptx-specific '-foffload-options' syntax in
'libgomp.c/reverse-offload-sm30.c'", see attached.
Cherry-picked pushed to devel/omp/gcc-12 branch in
commit f36ce95ad928578aa6739f61480e6c8fbaf2248e
"Fix nvptx-specific '-foffload-options' syntax in
'libgomp.c/reverse-offload-sm30.c'", see attached.
Grüße
Thomas
> +
> +#pragma omp requires reverse_offload
> +
> +int
> +main ()
> +{
> + #pragma omp target
> + {
> + }
> + return 0;
> +}
> +
> +/* { dg-warning "'omp requires reverse_offload' requires at least 'sm_35' for '-march=' - disabling offload-code generation for this device type" "" { target *-*-* } 0 } */
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Fix-nvptx-specific-foffload-options-syntax-in-libgom.patch --]
[-- Type: text/x-diff, Size: 1327 bytes --]
From b61796663ba1fe8fb83203829398f3f89ec212b7 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Fri, 23 Sep 2022 11:29:50 +0200
Subject: [PATCH] Fix nvptx-specific '-foffload-options' syntax in
'libgomp.c/reverse-offload-sm30.c'
That is, '-mptx=_' is only valid in '-foffload-options=nvptx-none', too.
Fix test case added in recent
commit r13-2625-g6b43f556f392a7165582aca36a19fe7389d995b2 "nvptx/mkoffload.cc:
Warn instead of error when reverse offload is not possible".
libgomp/
* testsuite/libgomp.c/reverse-offload-sm30.c: Fix nvptx-specific
'-foffload-options' syntax.
---
libgomp/testsuite/libgomp.c/reverse-offload-sm30.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
index fbfeae1fd41..7f10fd4ded9 100644
--- a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
+++ b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
@@ -1,5 +1,5 @@
/* { dg-do link { target { offload_target_nvptx } } } */
-/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload=-mptx=_" } */
+/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload-options=nvptx-none=-mptx=_" } */
#pragma omp requires reverse_offload
--
2.35.1
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0001-Fix-nvptx-specific-foffload-options-syntax-in-l.og12.patch --]
[-- Type: text/x-diff, Size: 2001 bytes --]
From f36ce95ad928578aa6739f61480e6c8fbaf2248e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Fri, 23 Sep 2022 11:29:50 +0200
Subject: [PATCH] Fix nvptx-specific '-foffload-options' syntax in
'libgomp.c/reverse-offload-sm30.c'
That is, '-mptx=_' is only valid in '-foffload-options=nvptx-none', too.
Fix test case added in recent
commit r13-2625-g6b43f556f392a7165582aca36a19fe7389d995b2 "nvptx/mkoffload.cc:
Warn instead of error when reverse offload is not possible".
libgomp/
* testsuite/libgomp.c/reverse-offload-sm30.c: Fix nvptx-specific
'-foffload-options' syntax.
(cherry picked from commit b61796663ba1fe8fb83203829398f3f89ec212b7)
---
libgomp/ChangeLog.omp | 8 ++++++++
libgomp/testsuite/libgomp.c/reverse-offload-sm30.c | 2 +-
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index cb3541be378..048314eb1be 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,3 +1,11 @@
+2022-10-17 Thomas Schwinge <thomas@codesourcery.com>
+
+ Backported from master:
+ 2022-10-17 Thomas Schwinge <thomas@codesourcery.com>
+
+ * testsuite/libgomp.c/reverse-offload-sm30.c: Fix nvptx-specific
+ '-foffload-options' syntax.
+
2022-10-14 Julian Brown <julian@codesourcery.com>
* testsuite/libgomp.oacc-fortran/declare-1.f90: Adjust scan output.
diff --git a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
index fbfeae1fd41..7f10fd4ded9 100644
--- a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
+++ b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c
@@ -1,5 +1,5 @@
/* { dg-do link { target { offload_target_nvptx } } } */
-/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload=-mptx=_" } */
+/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload-options=nvptx-none=-mptx=_" } */
#pragma omp requires reverse_offload
--
2.35.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PUSHED] Avoid SIGSEGV in nvptx 'mkoffload' for voluminous PTX code
2022-08-25 17:30 ` [Patch][2/3] nvptx: " Tobias Burnus
2022-08-25 17:30 ` Tobias Burnus
2022-08-29 18:43 ` [Patch][2/3][v2] " Tobias Burnus
@ 2025-06-04 17:22 ` Thomas Schwinge
2 siblings, 0 replies; 16+ messages in thread
From: Thomas Schwinge @ 2025-06-04 17:22 UTC (permalink / raw)
To: gcc-patches; +Cc: tburnus
In commit 50be486dff4ea2676ed022e9524ef190b92ae2b1
"nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup", some
additional tracking of the PTX code was added, and this assumes that
potentially every single character of PTX code needs to be tracked as a new
chunk of PTX code. That's problematic if we're dealing with voluminous PTX
code (for example, non-trivial C++ code), and the 'file_idx' 'alloca'tion then
causes stack overflow. For example:
FAIL: libgomp.c++/target-std__valarray-1.C (test for excess errors)
UNRESOLVED: libgomp.c++/target-std__valarray-1.C compilation failed to produce executable
lto-wrapper: fatal error: [...]/build-gcc/gcc//accel/nvptx-none/mkoffload terminated with signal 11 [Segmentation fault], core dumped
gcc/
* config/nvptx/mkoffload.cc (process): Use an 'auto_vec' for
'file_idx'.
---
gcc/config/nvptx/mkoffload.cc | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index e7ec0ef4f6a..bb3f0fcee6b 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -260,8 +260,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
unsigned ix;
const char *sm_ver = NULL, *version = NULL;
const char *sm_ver2 = NULL, *version2 = NULL;
- size_t file_cnt = 0;
- size_t *file_idx = XALLOCAVEC (size_t, len);
+ /* To reduce the number of reallocations for 'file_idx', guess 'file_cnt'
+ (very roughly...), based on 'len'. */
+ const size_t file_cnt_guessed = 13 + len / 27720;
+ auto_vec<size_t> file_idx (file_cnt_guessed);
fprintf (out, "#include <stdint.h>\n\n");
@@ -269,9 +271,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
terminated by a NUL. */
for (size_t i = 0; i != len;)
{
+ file_idx.safe_push (i);
+
char c;
bool output_fn_ptr = false;
- file_idx[file_cnt++] = i;
fprintf (out, "static const char ptx_code_%u[] =\n\t\"", obj_count++);
while ((c = input[i++]))
@@ -349,6 +352,9 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
}
}
+ const size_t file_cnt = file_idx.length ();
+ gcc_checking_assert (file_cnt == obj_count);
+
/* Create function-pointer array, required for reverse
offload function-pointer lookup. */
--
2.34.1
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-06-04 17:23 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-25 14:54 [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Tobias Burnus
2022-08-25 14:54 ` Tobias Burnus
2022-08-25 15:38 ` [Patch][2/3] GCN: libgomp+mkoffload.cc: " Tobias Burnus
2022-08-25 15:38 ` Tobias Burnus
2022-09-09 15:31 ` Jakub Jelinek
2022-08-25 17:30 ` [Patch][2/3] nvptx: " Tobias Burnus
2022-08-25 17:30 ` Tobias Burnus
2022-08-29 18:43 ` [Patch][2/3][v2] " Tobias Burnus
2022-08-29 18:43 ` Tobias Burnus
2022-09-09 15:36 ` Jakub Jelinek
2022-09-12 12:02 ` [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible (was: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Tobias Burnus
2022-09-12 12:10 ` Jakub Jelinek
2022-10-17 11:59 ` Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c' (was: [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible) Thomas Schwinge
2022-09-23 15:40 ` [og12] Come up with {,UN}LIKELY macros (was: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Thomas Schwinge
2025-06-04 17:22 ` [PUSHED] Avoid SIGSEGV in nvptx 'mkoffload' for voluminous PTX code Thomas Schwinge
2022-09-09 15:29 ` [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Jakub Jelinek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).