* [Patch][1/3] libgomp: Prepare for reverse offload fn lookup @ 2022-08-25 14:54 Tobias Burnus 2022-08-25 14:54 ` Tobias Burnus ` (3 more replies) 0 siblings, 4 replies; 15+ messages in thread From: Tobias Burnus @ 2022-08-25 14:54 UTC (permalink / raw) To: gcc-patches, Jakub Jelinek [-- Attachment #1: Type: text/plain, Size: 981 bytes --] Technically, this patch is stand alone, but conceptually it based on the submitted but not reviewed patch: "[Patch] OpenMP: Support reverse offload (middle end part)" https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598662.html With that patch, for reverse offloads ('omp target device(ancestor:1)'), calls like the following are added: GOMP_target_ext (-2 /* initial device */, omp_fn.1 where 'omp_fn.1' on nonhost devices a stub function just required for looking up the host function pointer via the offload_funcs table. The attached patch prepare for reverse-offload device->host function-address lookup by requesting (if needed) the on-device address. OK for mainline? Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 [-- Attachment #2: fn-lookup-gen.diff --] [-- Type: text/x-patch, Size: 5324 bytes --] libgomp: Prepare for reverse offload fn lookup Prepare for reverse-offloading function-pointer lookup by passing a rev_fn_table argument to GOMP_OFFLOAD_load_image. The argument will be NULL, unless GOMP_REQUIRES_REVERSE_OFFLOAD is requested and devices not supported it, are filtered out. (Up to and including this commit, no non-host device claims such support and the caller currently always passes NULL.) libgomp/ChangeLog: * libgomp-plugin.h (GOMP_OFFLOAD_load_image): Add 'uint64_t **rev_fn_table' argument. * oacc-host.c (host_load_image): Likewise. * plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Likewise; currently unused. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise. * target.c (gomp_load_image_to_device): Update call but pass NULL for now. liboffloadmic/ChangeLog: * plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_load_image): Add (unused) uint64_t **rev_fn_table argument. libgomp/libgomp-plugin.h | 2 +- libgomp/oacc-host.c | 3 ++- libgomp/plugin/plugin-gcn.c | 7 +++++-- libgomp/plugin/plugin-nvptx.c | 7 +++++-- libgomp/target.c | 2 +- liboffloadmic/plugin/libgomp-plugin-intelmic.cpp | 3 ++- 6 files changed, 16 insertions(+), 8 deletions(-) diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h index ab3ed638475..57309303e66 100644 --- a/libgomp/libgomp-plugin.h +++ b/libgomp/libgomp-plugin.h @@ -130,7 +130,7 @@ extern bool GOMP_OFFLOAD_init_device (int); extern bool GOMP_OFFLOAD_fini_device (int); extern unsigned GOMP_OFFLOAD_version (void); extern int GOMP_OFFLOAD_load_image (int, unsigned, const void *, - struct addr_pair **); + struct addr_pair **, uint64_t **); extern bool GOMP_OFFLOAD_unload_image (int, unsigned, const void *); extern void *GOMP_OFFLOAD_alloc (int, size_t); extern bool GOMP_OFFLOAD_free (int, void *); diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c index eb11b9cf16a..4e3971ae1a9 100644 --- a/libgomp/oacc-host.c +++ b/libgomp/oacc-host.c @@ -81,7 +81,8 @@ static int host_load_image (int n __attribute__ ((unused)), unsigned v __attribute__ ((unused)), const void *t __attribute__ ((unused)), - struct addr_pair **r __attribute__ ((unused))) + struct addr_pair **r __attribute__ ((unused)), + uint64_t **f __attribute__ ((unused))) { return 0; } diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index ea327bf2ca0..363e2950649 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -3346,11 +3346,14 @@ GOMP_OFFLOAD_init_device (int n) /* Load GCN object-code module described by struct gcn_image_desc in TARGET_DATA and return references to kernel descriptors in TARGET_TABLE. - If there are any constructors then run them. */ + If there are any constructors then run them. If not NULL, REV_FN_TABLE will + contain the on-device addresses of the functions for reverse offload. To be + freed by the caller. */ int GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, - struct addr_pair **target_table) + struct addr_pair **target_table, + uint64_t **rev_fn_table __attribute__((unused))) { if (GOMP_VERSION_DEV (version) != GOMP_VERSION_GCN) { diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index bc63e274cdf..d130665ed19 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1266,11 +1266,14 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev) } /* Load the (partial) program described by TARGET_DATA to device - number ORD. Allocate and return TARGET_TABLE. */ + number ORD. Allocate and return TARGET_TABLE. If not NULL, REV_FN_TABLE + will contain the on-device addresses of the functions for reverse offload. + To be freed by the caller. */ int GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, - struct addr_pair **target_table) + struct addr_pair **target_table, + uint64_t **rev_fn_table __attribute__((unused))) { CUmodule module; const char *const *var_names; diff --git a/libgomp/target.c b/libgomp/target.c index 135db1d88ab..da907a2fb48 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -2137,7 +2137,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version, num_target_entries = devicep->load_image_func (devicep->target_id, version, - target_data, &target_table); + target_data, &target_table, NULL); if (num_target_entries != num_funcs + num_vars /* Others (device_num) are included as trailing entries in pair list. */ diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp index 33bae0650b4..7be27f0459d 100644 --- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp +++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp @@ -349,7 +349,8 @@ GOMP_OFFLOAD_version (void) extern "C" int GOMP_OFFLOAD_load_image (int device, const unsigned version, - const void *target_image, addr_pair **result) + const void *target_image, addr_pair **result, + uint64_t ** /* rev_fn_table */) { TRACE ("(device = %d, target_image = %p)", device, target_image); ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Patch][1/3] libgomp: Prepare for reverse offload fn lookup 2022-08-25 14:54 [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Tobias Burnus @ 2022-08-25 14:54 ` Tobias Burnus 2022-08-25 15:38 ` [Patch][2/3] GCN: libgomp+mkoffload.cc: " Tobias Burnus ` (2 subsequent siblings) 3 siblings, 0 replies; 15+ messages in thread From: Tobias Burnus @ 2022-08-25 14:54 UTC (permalink / raw) To: gcc-patches, Jakub Jelinek [-- Attachment #1.1: Type: text/plain, Size: 981 bytes --] Technically, this patch is stand alone, but conceptually it based on the submitted but not reviewed patch: "[Patch] OpenMP: Support reverse offload (middle end part)" https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598662.html With that patch, for reverse offloads ('omp target device(ancestor:1)'), calls like the following are added: GOMP_target_ext (-2 /* initial device */, omp_fn.1 where 'omp_fn.1' on nonhost devices a stub function just required for looking up the host function pointer via the offload_funcs table. The attached patch prepare for reverse-offload device->host function-address lookup by requesting (if needed) the on-device address. OK for mainline? Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 [-- Attachment #2: fn-lookup-gen.diff --] [-- Type: text/x-patch, Size: 5324 bytes --] libgomp: Prepare for reverse offload fn lookup Prepare for reverse-offloading function-pointer lookup by passing a rev_fn_table argument to GOMP_OFFLOAD_load_image. The argument will be NULL, unless GOMP_REQUIRES_REVERSE_OFFLOAD is requested and devices not supported it, are filtered out. (Up to and including this commit, no non-host device claims such support and the caller currently always passes NULL.) libgomp/ChangeLog: * libgomp-plugin.h (GOMP_OFFLOAD_load_image): Add 'uint64_t **rev_fn_table' argument. * oacc-host.c (host_load_image): Likewise. * plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Likewise; currently unused. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise. * target.c (gomp_load_image_to_device): Update call but pass NULL for now. liboffloadmic/ChangeLog: * plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_load_image): Add (unused) uint64_t **rev_fn_table argument. libgomp/libgomp-plugin.h | 2 +- libgomp/oacc-host.c | 3 ++- libgomp/plugin/plugin-gcn.c | 7 +++++-- libgomp/plugin/plugin-nvptx.c | 7 +++++-- libgomp/target.c | 2 +- liboffloadmic/plugin/libgomp-plugin-intelmic.cpp | 3 ++- 6 files changed, 16 insertions(+), 8 deletions(-) diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h index ab3ed638475..57309303e66 100644 --- a/libgomp/libgomp-plugin.h +++ b/libgomp/libgomp-plugin.h @@ -130,7 +130,7 @@ extern bool GOMP_OFFLOAD_init_device (int); extern bool GOMP_OFFLOAD_fini_device (int); extern unsigned GOMP_OFFLOAD_version (void); extern int GOMP_OFFLOAD_load_image (int, unsigned, const void *, - struct addr_pair **); + struct addr_pair **, uint64_t **); extern bool GOMP_OFFLOAD_unload_image (int, unsigned, const void *); extern void *GOMP_OFFLOAD_alloc (int, size_t); extern bool GOMP_OFFLOAD_free (int, void *); diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c index eb11b9cf16a..4e3971ae1a9 100644 --- a/libgomp/oacc-host.c +++ b/libgomp/oacc-host.c @@ -81,7 +81,8 @@ static int host_load_image (int n __attribute__ ((unused)), unsigned v __attribute__ ((unused)), const void *t __attribute__ ((unused)), - struct addr_pair **r __attribute__ ((unused))) + struct addr_pair **r __attribute__ ((unused)), + uint64_t **f __attribute__ ((unused))) { return 0; } diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index ea327bf2ca0..363e2950649 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -3346,11 +3346,14 @@ GOMP_OFFLOAD_init_device (int n) /* Load GCN object-code module described by struct gcn_image_desc in TARGET_DATA and return references to kernel descriptors in TARGET_TABLE. - If there are any constructors then run them. */ + If there are any constructors then run them. If not NULL, REV_FN_TABLE will + contain the on-device addresses of the functions for reverse offload. To be + freed by the caller. */ int GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, - struct addr_pair **target_table) + struct addr_pair **target_table, + uint64_t **rev_fn_table __attribute__((unused))) { if (GOMP_VERSION_DEV (version) != GOMP_VERSION_GCN) { diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index bc63e274cdf..d130665ed19 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1266,11 +1266,14 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev) } /* Load the (partial) program described by TARGET_DATA to device - number ORD. Allocate and return TARGET_TABLE. */ + number ORD. Allocate and return TARGET_TABLE. If not NULL, REV_FN_TABLE + will contain the on-device addresses of the functions for reverse offload. + To be freed by the caller. */ int GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, - struct addr_pair **target_table) + struct addr_pair **target_table, + uint64_t **rev_fn_table __attribute__((unused))) { CUmodule module; const char *const *var_names; diff --git a/libgomp/target.c b/libgomp/target.c index 135db1d88ab..da907a2fb48 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -2137,7 +2137,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version, num_target_entries = devicep->load_image_func (devicep->target_id, version, - target_data, &target_table); + target_data, &target_table, NULL); if (num_target_entries != num_funcs + num_vars /* Others (device_num) are included as trailing entries in pair list. */ diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp index 33bae0650b4..7be27f0459d 100644 --- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp +++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp @@ -349,7 +349,8 @@ GOMP_OFFLOAD_version (void) extern "C" int GOMP_OFFLOAD_load_image (int device, const unsigned version, - const void *target_image, addr_pair **result) + const void *target_image, addr_pair **result, + uint64_t ** /* rev_fn_table */) { TRACE ("(device = %d, target_image = %p)", device, target_image); ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Patch][2/3] GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup 2022-08-25 14:54 [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Tobias Burnus 2022-08-25 14:54 ` Tobias Burnus @ 2022-08-25 15:38 ` Tobias Burnus 2022-08-25 15:38 ` Tobias Burnus 2022-09-09 15:31 ` Jakub Jelinek 2022-08-25 17:30 ` [Patch][2/3] nvptx: " Tobias Burnus 2022-09-09 15:29 ` [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Jakub Jelinek 3 siblings, 2 replies; 15+ messages in thread From: Tobias Burnus @ 2022-08-25 15:38 UTC (permalink / raw) To: gcc-patches, Jakub Jelinek, Andrew Stubbs [-- Attachment #1: Type: text/plain, Size: 864 bytes --] On 25.08.22 16:54, Tobias Burnus wrote: The attached patch prepare for reverse-offload device->host function-address lookup by requesting (if needed) the on-device address. This patch adds the actual implementation for GCN. A variant would be to only generate .offload_func_table inside mkoffload when OMP_REQUIRES_REVERSE_OFFLOAD has been requested. This is currently effectively a no op as with [1/3] patch, always NULL is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as 'omp requires reverse_offload' has been specified. OK for mainline? Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 [-- Attachment #2: fn-lookup-gcn.diff --] [-- Type: text/x-patch, Size: 4673 bytes --] GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup Add support to GCN for reverse lookup of function name to prepare for 'omp target device(ancestor:1)'. gcc/ChangeLog: * config/gcn/mkoffload.cc (process_asm): Create .offload_func_table, similar to pre-existing .offload_var_table. libgomp/ChangeLog: * plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Read .offload_func_table to populate rev_fn_table when requested. gcc/config/gcn/mkoffload.cc | 11 ++++++++++- libgomp/plugin/plugin-gcn.c | 26 +++++++++++++++++++++++++- 2 files changed, 35 insertions(+), 2 deletions(-) diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc index 4206448703a..24d327355e3 100644 --- a/gcc/config/gcn/mkoffload.cc +++ b/gcc/config/gcn/mkoffload.cc @@ -537,63 +537,72 @@ process_asm (FILE *in, FILE *out, FILE *cfile) case IN_VARS: { char *varname; unsigned varsize; if (sscanf (buf, " .8byte %ms\n", &varname)) { fputs (buf, out); fgets (buf, sizeof (buf), in); if (!sscanf (buf, " .8byte %u\n", &varsize)) abort (); var_count++; } break; } case IN_FUNCS: { char *funcname; if (sscanf (buf, "\t.8byte\t%ms\n", &funcname)) { + fputs (buf, out); obstack_ptr_grow (&fns_os, funcname); fn_count++; continue; } break; } } char dummy; if (sscanf (buf, " .section .gnu.offload_vars%c", &dummy) > 0) { state = IN_VARS; /* Add a global symbol to allow plugin-gcn.c to locate the table at runtime. It can't use the "offload_var_table.N" emitted by the compiler because a) they're not global, and b) there's one for each input file combined into the binary. */ fputs (buf, out); fputs ("\t.global .offload_var_table\n" "\t.type .offload_var_table, @object\n" ".offload_var_table:\n", out); } else if (sscanf (buf, " .section .gnu.offload_funcs%c", &dummy) > 0) - state = IN_FUNCS; + { + state = IN_FUNCS; + /* Likewise for .gnu.offload_vars; used for reverse offload. */ + fputs (buf, out); + fputs ("\t.global .offload_func_table\n" + "\t.type .offload_func_table, @object\n" + ".offload_func_table:\n", + out); + } else if (sscanf (buf, " .amdgpu_metadata%c", &dummy) > 0) { state = IN_METADATA; regcount.kernel_name = NULL; regcount.sgpr_count = regcount.vgpr_count = -1; } else if (sscanf (buf, " .section %c", &dummy) > 0 || sscanf (buf, " .text%c", &dummy) > 0 || sscanf (buf, " .bss%c", &dummy) > 0 || sscanf (buf, " .data%c", &dummy) > 0 || sscanf (buf, " .ident %c", &dummy) > 0) state = IN_CODE; else if (sscanf (buf, " .end_amdgpu_metadata%c", &dummy) > 0) { state = IN_CODE; gcc_assert (regcount.kernel_name != NULL && regcount.sgpr_count >= 0 && regcount.vgpr_count >= 0); obstack_grow (®counts_os, ®count, sizeof (regcount)); diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 363e2950649..00603981866 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -3353,7 +3353,7 @@ GOMP_OFFLOAD_init_device (int n) int GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, struct addr_pair **target_table, - uint64_t **rev_fn_table __attribute__((unused))) + uint64_t **rev_fn_table) { if (GOMP_VERSION_DEV (version) != GOMP_VERSION_GCN) { @@ -3520,6 +3520,30 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, if (module->fini_array_func) kernel_count--; + if (rev_fn_table != NULL && kernel_count == 0) + *rev_fn_table = NULL; + else if (rev_fn_table != NULL) + { + hsa_status_t status; + hsa_executable_symbol_t var_symbol; + status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL, + ".offload_func_table", + agent->id, 0, &var_symbol); + if (status != HSA_STATUS_SUCCESS) + hsa_fatal ("Could not find symbol for variable in the code object", + status); + uint64_t fn_table_addr; + status = hsa_fns.hsa_executable_symbol_get_info_fn + (var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS, + &fn_table_addr); + if (status != HSA_STATUS_SUCCESS) + hsa_fatal ("Could not extract a variable from its symbol", status); + *rev_fn_table = GOMP_PLUGIN_malloc (kernel_count * sizeof (uint64_t)); + GOMP_OFFLOAD_dev2host (agent->device_id, *rev_fn_table, + (void*) fn_table_addr, + kernel_count * sizeof (uint64_t)); + } + return kernel_count + var_count + other_count; } ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Patch][2/3] GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup 2022-08-25 15:38 ` [Patch][2/3] GCN: libgomp+mkoffload.cc: " Tobias Burnus @ 2022-08-25 15:38 ` Tobias Burnus 2022-09-09 15:31 ` Jakub Jelinek 1 sibling, 0 replies; 15+ messages in thread From: Tobias Burnus @ 2022-08-25 15:38 UTC (permalink / raw) To: gcc-patches, Jakub Jelinek, Andrew Stubbs [-- Attachment #1.1: Type: text/plain, Size: 864 bytes --] On 25.08.22 16:54, Tobias Burnus wrote: The attached patch prepare for reverse-offload device->host function-address lookup by requesting (if needed) the on-device address. This patch adds the actual implementation for GCN. A variant would be to only generate .offload_func_table inside mkoffload when OMP_REQUIRES_REVERSE_OFFLOAD has been requested. This is currently effectively a no op as with [1/3] patch, always NULL is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as 'omp requires reverse_offload' has been specified. OK for mainline? Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 [-- Attachment #2: fn-lookup-gcn.diff --] [-- Type: text/x-patch, Size: 4673 bytes --] GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup Add support to GCN for reverse lookup of function name to prepare for 'omp target device(ancestor:1)'. gcc/ChangeLog: * config/gcn/mkoffload.cc (process_asm): Create .offload_func_table, similar to pre-existing .offload_var_table. libgomp/ChangeLog: * plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Read .offload_func_table to populate rev_fn_table when requested. gcc/config/gcn/mkoffload.cc | 11 ++++++++++- libgomp/plugin/plugin-gcn.c | 26 +++++++++++++++++++++++++- 2 files changed, 35 insertions(+), 2 deletions(-) diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc index 4206448703a..24d327355e3 100644 --- a/gcc/config/gcn/mkoffload.cc +++ b/gcc/config/gcn/mkoffload.cc @@ -537,63 +537,72 @@ process_asm (FILE *in, FILE *out, FILE *cfile) case IN_VARS: { char *varname; unsigned varsize; if (sscanf (buf, " .8byte %ms\n", &varname)) { fputs (buf, out); fgets (buf, sizeof (buf), in); if (!sscanf (buf, " .8byte %u\n", &varsize)) abort (); var_count++; } break; } case IN_FUNCS: { char *funcname; if (sscanf (buf, "\t.8byte\t%ms\n", &funcname)) { + fputs (buf, out); obstack_ptr_grow (&fns_os, funcname); fn_count++; continue; } break; } } char dummy; if (sscanf (buf, " .section .gnu.offload_vars%c", &dummy) > 0) { state = IN_VARS; /* Add a global symbol to allow plugin-gcn.c to locate the table at runtime. It can't use the "offload_var_table.N" emitted by the compiler because a) they're not global, and b) there's one for each input file combined into the binary. */ fputs (buf, out); fputs ("\t.global .offload_var_table\n" "\t.type .offload_var_table, @object\n" ".offload_var_table:\n", out); } else if (sscanf (buf, " .section .gnu.offload_funcs%c", &dummy) > 0) - state = IN_FUNCS; + { + state = IN_FUNCS; + /* Likewise for .gnu.offload_vars; used for reverse offload. */ + fputs (buf, out); + fputs ("\t.global .offload_func_table\n" + "\t.type .offload_func_table, @object\n" + ".offload_func_table:\n", + out); + } else if (sscanf (buf, " .amdgpu_metadata%c", &dummy) > 0) { state = IN_METADATA; regcount.kernel_name = NULL; regcount.sgpr_count = regcount.vgpr_count = -1; } else if (sscanf (buf, " .section %c", &dummy) > 0 || sscanf (buf, " .text%c", &dummy) > 0 || sscanf (buf, " .bss%c", &dummy) > 0 || sscanf (buf, " .data%c", &dummy) > 0 || sscanf (buf, " .ident %c", &dummy) > 0) state = IN_CODE; else if (sscanf (buf, " .end_amdgpu_metadata%c", &dummy) > 0) { state = IN_CODE; gcc_assert (regcount.kernel_name != NULL && regcount.sgpr_count >= 0 && regcount.vgpr_count >= 0); obstack_grow (®counts_os, ®count, sizeof (regcount)); diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 363e2950649..00603981866 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -3353,7 +3353,7 @@ GOMP_OFFLOAD_init_device (int n) int GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, struct addr_pair **target_table, - uint64_t **rev_fn_table __attribute__((unused))) + uint64_t **rev_fn_table) { if (GOMP_VERSION_DEV (version) != GOMP_VERSION_GCN) { @@ -3520,6 +3520,30 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, if (module->fini_array_func) kernel_count--; + if (rev_fn_table != NULL && kernel_count == 0) + *rev_fn_table = NULL; + else if (rev_fn_table != NULL) + { + hsa_status_t status; + hsa_executable_symbol_t var_symbol; + status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL, + ".offload_func_table", + agent->id, 0, &var_symbol); + if (status != HSA_STATUS_SUCCESS) + hsa_fatal ("Could not find symbol for variable in the code object", + status); + uint64_t fn_table_addr; + status = hsa_fns.hsa_executable_symbol_get_info_fn + (var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS, + &fn_table_addr); + if (status != HSA_STATUS_SUCCESS) + hsa_fatal ("Could not extract a variable from its symbol", status); + *rev_fn_table = GOMP_PLUGIN_malloc (kernel_count * sizeof (uint64_t)); + GOMP_OFFLOAD_dev2host (agent->device_id, *rev_fn_table, + (void*) fn_table_addr, + kernel_count * sizeof (uint64_t)); + } + return kernel_count + var_count + other_count; } ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch][2/3] GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup 2022-08-25 15:38 ` [Patch][2/3] GCN: libgomp+mkoffload.cc: " Tobias Burnus 2022-08-25 15:38 ` Tobias Burnus @ 2022-09-09 15:31 ` Jakub Jelinek 1 sibling, 0 replies; 15+ messages in thread From: Jakub Jelinek @ 2022-09-09 15:31 UTC (permalink / raw) To: Tobias Burnus; +Cc: gcc-patches, Andrew Stubbs On Thu, Aug 25, 2022 at 05:38:58PM +0200, Tobias Burnus wrote: > On 25.08.22 16:54, Tobias Burnus wrote: > > The attached patch prepare for reverse-offload device->host > function-address lookup by requesting (if needed) the on-device address. > > > This patch adds the actual implementation for GCN. A variant would be > to only generate .offload_func_table inside mkoffload when > OMP_REQUIRES_REVERSE_OFFLOAD has been requested. > > This is currently effectively a no op as with [1/3] patch, always NULL > is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as > 'omp requires reverse_offload' has been specified. > > OK for mainline? > > Tobias > > > ----------------- > Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 > GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup > > Add support to GCN for reverse lookup of function name to prepare for > 'omp target device(ancestor:1)'. > > gcc/ChangeLog: > > * config/gcn/mkoffload.cc (process_asm): Create .offload_func_table, > similar to pre-existing .offload_var_table. > > libgomp/ChangeLog: > > * plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Read > .offload_func_table to populate rev_fn_table when requested. Ok. Jakub ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Patch][2/3] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup 2022-08-25 14:54 [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Tobias Burnus 2022-08-25 14:54 ` Tobias Burnus 2022-08-25 15:38 ` [Patch][2/3] GCN: libgomp+mkoffload.cc: " Tobias Burnus @ 2022-08-25 17:30 ` Tobias Burnus 2022-08-25 17:30 ` Tobias Burnus 2022-08-29 18:43 ` [Patch][2/3][v2] " Tobias Burnus 2022-09-09 15:29 ` [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Jakub Jelinek 3 siblings, 2 replies; 15+ messages in thread From: Tobias Burnus @ 2022-08-25 17:30 UTC (permalink / raw) To: gcc-patches, Jakub Jelinek, Tom de Vries [-- Attachment #1: Type: text/plain, Size: 1192 bytes --] On 25.08.22 16:54, Tobias Burnus wrote: The attached patch prepare for reverse-offload device->host function-address lookup by requesting (if needed) the on-device address. This patch adds the actual implementation for NVPTX. Having array[] = {fn1,fn2}; works with nvptx only since sm_35; hence, if there is a reverse_offload and sm_30 is used, there will be a compile-time error. To avoid incompatibilities, I compile with the same PTX ISA .version and sm_XX version as the (last) file that contains the reverse offload. While it should not matter, some newer CUDA might not support, e.g., sm_35 or do not like a specific ISA version - thus, that seemed to be safer. This is currently effectively a no op as with [1/3] patch, always NULL is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as 'omp requires reverse_offload' has been specified. OK for mainline? Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 [-- Attachment #2: fn-lookup-nvptx.diff --] [-- Type: text/x-patch, Size: 11019 bytes --] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup Add support to nvptx for reverse lookup of function name to prepare for 'omp target device(ancestor:1)'. gcc/ChangeLog: * config/nvptx/mkoffload.cc (record_id): Strip quotations from function name. (process): For GOMP_REQUIRES_REVERSE_OFFLOAD, check that -march is at least sm_35, create '$offload_func_table' global array and init with reverse-offload function addresses. * config/nvptx/nvptx.cc (write_fn_proto_1, write_fn_proto): New force_public attribute to force .visible. (nvptx_declare_function_name): For "omp target device_ancestor_nohost" attribut, force .visible/TREE_PUBLIC. libgomp/ChangeLog: * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Read offload function address table '$offload_func_table' if rev_fn_table is not NULL. gcc/config/nvptx/mkoffload.cc | 104 ++++++++++++++++++++++++++++++++++++++++-- gcc/config/nvptx/nvptx.cc | 20 +++++--- libgomp/plugin/plugin-nvptx.c | 19 +++++++- 3 files changed, 131 insertions(+), 12 deletions(-) diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc index 3eea0a8f138..c496766b1cc 100644 --- a/gcc/config/nvptx/mkoffload.cc +++ b/gcc/config/nvptx/mkoffload.cc @@ -108,12 +108,14 @@ xputenv (const char *string) static void record_id (const char *p1, id_map ***where) { + gcc_assert (p1[0] == '"'); + p1++; const char *end = strchr (p1, '\n'); if (!end) fatal_error (input_location, "malformed ptx file"); id_map *v = XNEW (id_map); - size_t len = end - p1; + size_t len = end - p1 - 1 ; /* remove tailing '"' */ v->ptx_name = XNEWVEC (char, len + 1); memcpy (v->ptx_name, p1, len); v->ptx_name[len] = '\0'; @@ -242,6 +244,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires) id_map const *id; unsigned obj_count = 0; unsigned ix; + const char *sm_ver = NULL, *version = NULL; + const char *sm_ver2 = NULL, *version2 = NULL; + size_t file_cnt = 0; + size_t *file_idx = XALLOCAVEC (size_t, len); fprintf (out, "#include <stdint.h>\n\n"); @@ -250,6 +256,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires) for (size_t i = 0; i != len;) { char c; + bool output_fn_ptr = false; + file_idx[file_cnt++] = i; fprintf (out, "static const char ptx_code_%u[] =\n\t\"", obj_count++); while ((c = input[i++])) @@ -261,6 +269,16 @@ process (FILE *in, FILE *out, uint32_t omp_requires) case '\n': fprintf (out, "\\n\"\n\t\""); /* Look for mappings on subsequent lines. */ + if (UNLIKELY (startswith (input + i, ".target sm_"))) + { + sm_ver = input + i + strlen (".target sm_"); + continue; + } + if (UNLIKELY (startswith (input + i, ".version "))) + { + version = input + i + strlen (".version "); + continue; + } while (startswith (input + i, "//:")) { i += 3; @@ -268,7 +286,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires) if (startswith (input + i, "VAR_MAP ")) record_id (input + i + 8, &vars_tail); else if (startswith (input + i, "FUNC_MAP ")) - record_id (input + i + 9, &funcs_tail); + { + output_fn_ptr = true; + record_id (input + i + 9, &funcs_tail); + } else abort (); /* Skip to next line. */ @@ -286,6 +307,81 @@ process (FILE *in, FILE *out, uint32_t omp_requires) putc (c, out); } fprintf (out, "\";\n\n"); + if (output_fn_ptr + && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0) + { + if (sm_ver && sm_ver[0] == '3' && sm_ver[1] == '0' + && sm_ver[2] == '\n') + fatal_error (input_location, + "%<omp requires reverse_offload%> requires at least " + "%<sm_35%> for %<-misa=%>"); + sm_ver2 = sm_ver; + version2 = version; + } + } + + /* Create function-pointer array, required for reverse + offload function-pointer lookup. */ + + if (func_ids && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0) + { + const char needle[] = "// BEGIN GLOBAL FUNCTION DECL: "; + fprintf (out, "static const char ptx_code_%u[] =\n", obj_count++); + fprintf (out, "\t\".version "); + for (size_t i = 0; version2[i] != '\0' && version2[i] != '\n'; i++) + fputc (version2[i], out); + fprintf (out, "\"\n\t\".target sm_"); + for (size_t i = 0; version2[i] != '\0' && sm_ver2[i] != '\n'; i++) + fputc (sm_ver2[i], out); + fprintf (out, "\"\n\t\".file 1 \\\"<dummy>\\\"\"\n"); + + size_t fidx = 0; + for (id = func_ids; id; id = id->next) + { + /* Only 'nohost' functions are needed - use NULL for the rest. + Alternatively, besides searching for 'BEGIN FUNCTION DECL', + checking for '.visible .entry ' + id->ptx_name would be + required. */ + if (!endswith (id->ptx_name, "$nohost")) + continue; + fprintf (out, "\t\".extern "); + const char *p = input + file_idx[fidx]; + while (true) + { + p = strstr (p, needle); + if (!p) + { + fidx++; + if (fidx >= file_cnt) + break; + p = input + file_idx[fidx]; + continue; + } + p += strlen (needle); + if (!startswith (p, id->ptx_name)) + continue; + p += strlen (id->ptx_name); + if (*p != '\n') + continue; + p++; + gcc_assert (startswith (p, ".visible ")); + p += strlen (".visible "); + for (; *p != '\0' && *p != '\n'; p++) + fputc (*p, out); + break; + } + fprintf (out, "\"\n"); + if (fidx == file_cnt) + fatal_error (input_location, + "Cannot find function declaration for %qs", + id->ptx_name); + } + fprintf (out, "\t\".visible .global .align 8 .u64 " + "$offload_func_table[] = {"); + for (comma = "", id = func_ids; id; comma = ",", id = id->next) + fprintf (out, "%s\"\n\t\t\"%s", comma, + endswith (id->ptx_name, "$nohost") ? id->ptx_name : "0"); + fprintf (out, "};\\n\";\n\n"); } /* Dump out array of pointers to ptx object strings. */ @@ -300,7 +396,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires) /* Dump out variable idents. */ fprintf (out, "static const char *const var_mappings[] = {"); for (comma = "", id = var_ids; id; comma = ",", id = id->next) - fprintf (out, "%s\n\t%s", comma, id->ptx_name); + fprintf (out, "%s\n\t\"%s\"", comma, id->ptx_name); fprintf (out, "\n};\n\n"); /* Dump out function idents. */ @@ -309,7 +405,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires) " unsigned short dim[%d];\n" "} func_mappings[] = {\n", GOMP_DIM_MAX); for (comma = "", id = func_ids; id; comma = ",", id = id->next) - fprintf (out, "%s\n\t{%s}", comma, id->ptx_name); + fprintf (out, "%s\n\t{\"%s\"}", comma, id->ptx_name); fprintf (out, "\n};\n\n"); fprintf (out, diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index e4297e2d6c3..3293c096822 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -989,15 +989,15 @@ write_var_marker (FILE *file, bool is_defn, bool globalize, const char *name) static void write_fn_proto_1 (std::stringstream &s, bool is_defn, - const char *name, const_tree decl) + const char *name, const_tree decl, bool force_public) { if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL) - write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name); + write_fn_marker (s, is_defn, TREE_PUBLIC (decl) || force_public, name); /* PTX declaration. */ if (DECL_EXTERNAL (decl)) s << ".extern "; - else if (TREE_PUBLIC (decl)) + else if (TREE_PUBLIC (decl) || force_public) s << (DECL_WEAK (decl) ? ".weak " : ".visible "); s << (write_as_kernel (DECL_ATTRIBUTES (decl)) ? ".entry " : ".func "); @@ -1086,7 +1086,7 @@ write_fn_proto_1 (std::stringstream &s, bool is_defn, static void write_fn_proto (std::stringstream &s, bool is_defn, - const char *name, const_tree decl) + const char *name, const_tree decl, bool force_public=false) { const char *replacement = nvptx_name_replacement (name); char *replaced_dots = NULL; @@ -1103,9 +1103,9 @@ write_fn_proto (std::stringstream &s, bool is_defn, if (is_defn) /* Emit a declaration. The PTX assembler gets upset without it. */ - write_fn_proto_1 (s, false, name, decl); + write_fn_proto_1 (s, false, name, decl, force_public); - write_fn_proto_1 (s, is_defn, name, decl); + write_fn_proto_1 (s, is_defn, name, decl, force_public); if (replaced_dots) XDELETE (replaced_dots); @@ -1481,7 +1481,13 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) tree fntype = TREE_TYPE (decl); tree result_type = TREE_TYPE (fntype); int argno = 0; + bool force_public = false; + /* For reverse-offload 'nohost' functions: In order to be collectable in + '$offload_func_table', cf. mkoffload.cc, the function has to be visible. */ + if (lookup_attribute ("omp target device_ancestor_nohost", + DECL_ATTRIBUTES (decl))) + force_public = true; if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl)) && !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl))) { @@ -1493,7 +1499,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) /* We construct the initial part of the function into a string stream, in order to share the prototype writing code. */ std::stringstream s; - write_fn_proto (s, true, name, decl); + write_fn_proto (s, true, name, decl, force_public); s << "{\n"; bool return_in_mem = write_return_type (s, false, result_type); diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index d130665ed19..ac400fc2a1d 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1273,7 +1273,7 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev) int GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, struct addr_pair **target_table, - uint64_t **rev_fn_table __attribute__((unused))) + uint64_t **rev_fn_table) { CUmodule module; const char *const *var_names; @@ -1376,6 +1376,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, targ_tbl->start = targ_tbl->end = 0; targ_tbl++; + if (rev_fn_table && fn_entries == 0) + *rev_fn_table = NULL; + else if (rev_fn_table) + { + CUdeviceptr var; + size_t bytes; + r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &var, &bytes, module, + "$offload_func_table"); + if (r != CUDA_SUCCESS) + GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r)); + assert (bytes == sizeof (uint64_t) * fn_entries); + *rev_fn_table = GOMP_PLUGIN_malloc (sizeof (uint64_t) * fn_entries); + r = CUDA_CALL_NOCHECK (cuMemcpyDtoH, *rev_fn_table, var, bytes); + if (r != CUDA_SUCCESS) + GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuda_error (r)); + } + nvptx_set_clocktick (module, dev); return fn_entries + var_entries + other_entries; ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Patch][2/3] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup 2022-08-25 17:30 ` [Patch][2/3] nvptx: " Tobias Burnus @ 2022-08-25 17:30 ` Tobias Burnus 2022-08-29 18:43 ` [Patch][2/3][v2] " Tobias Burnus 1 sibling, 0 replies; 15+ messages in thread From: Tobias Burnus @ 2022-08-25 17:30 UTC (permalink / raw) To: gcc-patches, Jakub Jelinek, Tom de Vries [-- Attachment #1.1: Type: text/plain, Size: 1192 bytes --] On 25.08.22 16:54, Tobias Burnus wrote: The attached patch prepare for reverse-offload device->host function-address lookup by requesting (if needed) the on-device address. This patch adds the actual implementation for NVPTX. Having array[] = {fn1,fn2}; works with nvptx only since sm_35; hence, if there is a reverse_offload and sm_30 is used, there will be a compile-time error. To avoid incompatibilities, I compile with the same PTX ISA .version and sm_XX version as the (last) file that contains the reverse offload. While it should not matter, some newer CUDA might not support, e.g., sm_35 or do not like a specific ISA version - thus, that seemed to be safer. This is currently effectively a no op as with [1/3] patch, always NULL is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as 'omp requires reverse_offload' has been specified. OK for mainline? Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 [-- Attachment #2: fn-lookup-nvptx.diff --] [-- Type: text/x-patch, Size: 11019 bytes --] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup Add support to nvptx for reverse lookup of function name to prepare for 'omp target device(ancestor:1)'. gcc/ChangeLog: * config/nvptx/mkoffload.cc (record_id): Strip quotations from function name. (process): For GOMP_REQUIRES_REVERSE_OFFLOAD, check that -march is at least sm_35, create '$offload_func_table' global array and init with reverse-offload function addresses. * config/nvptx/nvptx.cc (write_fn_proto_1, write_fn_proto): New force_public attribute to force .visible. (nvptx_declare_function_name): For "omp target device_ancestor_nohost" attribut, force .visible/TREE_PUBLIC. libgomp/ChangeLog: * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Read offload function address table '$offload_func_table' if rev_fn_table is not NULL. gcc/config/nvptx/mkoffload.cc | 104 ++++++++++++++++++++++++++++++++++++++++-- gcc/config/nvptx/nvptx.cc | 20 +++++--- libgomp/plugin/plugin-nvptx.c | 19 +++++++- 3 files changed, 131 insertions(+), 12 deletions(-) diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc index 3eea0a8f138..c496766b1cc 100644 --- a/gcc/config/nvptx/mkoffload.cc +++ b/gcc/config/nvptx/mkoffload.cc @@ -108,12 +108,14 @@ xputenv (const char *string) static void record_id (const char *p1, id_map ***where) { + gcc_assert (p1[0] == '"'); + p1++; const char *end = strchr (p1, '\n'); if (!end) fatal_error (input_location, "malformed ptx file"); id_map *v = XNEW (id_map); - size_t len = end - p1; + size_t len = end - p1 - 1 ; /* remove tailing '"' */ v->ptx_name = XNEWVEC (char, len + 1); memcpy (v->ptx_name, p1, len); v->ptx_name[len] = '\0'; @@ -242,6 +244,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires) id_map const *id; unsigned obj_count = 0; unsigned ix; + const char *sm_ver = NULL, *version = NULL; + const char *sm_ver2 = NULL, *version2 = NULL; + size_t file_cnt = 0; + size_t *file_idx = XALLOCAVEC (size_t, len); fprintf (out, "#include <stdint.h>\n\n"); @@ -250,6 +256,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires) for (size_t i = 0; i != len;) { char c; + bool output_fn_ptr = false; + file_idx[file_cnt++] = i; fprintf (out, "static const char ptx_code_%u[] =\n\t\"", obj_count++); while ((c = input[i++])) @@ -261,6 +269,16 @@ process (FILE *in, FILE *out, uint32_t omp_requires) case '\n': fprintf (out, "\\n\"\n\t\""); /* Look for mappings on subsequent lines. */ + if (UNLIKELY (startswith (input + i, ".target sm_"))) + { + sm_ver = input + i + strlen (".target sm_"); + continue; + } + if (UNLIKELY (startswith (input + i, ".version "))) + { + version = input + i + strlen (".version "); + continue; + } while (startswith (input + i, "//:")) { i += 3; @@ -268,7 +286,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires) if (startswith (input + i, "VAR_MAP ")) record_id (input + i + 8, &vars_tail); else if (startswith (input + i, "FUNC_MAP ")) - record_id (input + i + 9, &funcs_tail); + { + output_fn_ptr = true; + record_id (input + i + 9, &funcs_tail); + } else abort (); /* Skip to next line. */ @@ -286,6 +307,81 @@ process (FILE *in, FILE *out, uint32_t omp_requires) putc (c, out); } fprintf (out, "\";\n\n"); + if (output_fn_ptr + && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0) + { + if (sm_ver && sm_ver[0] == '3' && sm_ver[1] == '0' + && sm_ver[2] == '\n') + fatal_error (input_location, + "%<omp requires reverse_offload%> requires at least " + "%<sm_35%> for %<-misa=%>"); + sm_ver2 = sm_ver; + version2 = version; + } + } + + /* Create function-pointer array, required for reverse + offload function-pointer lookup. */ + + if (func_ids && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0) + { + const char needle[] = "// BEGIN GLOBAL FUNCTION DECL: "; + fprintf (out, "static const char ptx_code_%u[] =\n", obj_count++); + fprintf (out, "\t\".version "); + for (size_t i = 0; version2[i] != '\0' && version2[i] != '\n'; i++) + fputc (version2[i], out); + fprintf (out, "\"\n\t\".target sm_"); + for (size_t i = 0; version2[i] != '\0' && sm_ver2[i] != '\n'; i++) + fputc (sm_ver2[i], out); + fprintf (out, "\"\n\t\".file 1 \\\"<dummy>\\\"\"\n"); + + size_t fidx = 0; + for (id = func_ids; id; id = id->next) + { + /* Only 'nohost' functions are needed - use NULL for the rest. + Alternatively, besides searching for 'BEGIN FUNCTION DECL', + checking for '.visible .entry ' + id->ptx_name would be + required. */ + if (!endswith (id->ptx_name, "$nohost")) + continue; + fprintf (out, "\t\".extern "); + const char *p = input + file_idx[fidx]; + while (true) + { + p = strstr (p, needle); + if (!p) + { + fidx++; + if (fidx >= file_cnt) + break; + p = input + file_idx[fidx]; + continue; + } + p += strlen (needle); + if (!startswith (p, id->ptx_name)) + continue; + p += strlen (id->ptx_name); + if (*p != '\n') + continue; + p++; + gcc_assert (startswith (p, ".visible ")); + p += strlen (".visible "); + for (; *p != '\0' && *p != '\n'; p++) + fputc (*p, out); + break; + } + fprintf (out, "\"\n"); + if (fidx == file_cnt) + fatal_error (input_location, + "Cannot find function declaration for %qs", + id->ptx_name); + } + fprintf (out, "\t\".visible .global .align 8 .u64 " + "$offload_func_table[] = {"); + for (comma = "", id = func_ids; id; comma = ",", id = id->next) + fprintf (out, "%s\"\n\t\t\"%s", comma, + endswith (id->ptx_name, "$nohost") ? id->ptx_name : "0"); + fprintf (out, "};\\n\";\n\n"); } /* Dump out array of pointers to ptx object strings. */ @@ -300,7 +396,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires) /* Dump out variable idents. */ fprintf (out, "static const char *const var_mappings[] = {"); for (comma = "", id = var_ids; id; comma = ",", id = id->next) - fprintf (out, "%s\n\t%s", comma, id->ptx_name); + fprintf (out, "%s\n\t\"%s\"", comma, id->ptx_name); fprintf (out, "\n};\n\n"); /* Dump out function idents. */ @@ -309,7 +405,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires) " unsigned short dim[%d];\n" "} func_mappings[] = {\n", GOMP_DIM_MAX); for (comma = "", id = func_ids; id; comma = ",", id = id->next) - fprintf (out, "%s\n\t{%s}", comma, id->ptx_name); + fprintf (out, "%s\n\t{\"%s\"}", comma, id->ptx_name); fprintf (out, "\n};\n\n"); fprintf (out, diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index e4297e2d6c3..3293c096822 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -989,15 +989,15 @@ write_var_marker (FILE *file, bool is_defn, bool globalize, const char *name) static void write_fn_proto_1 (std::stringstream &s, bool is_defn, - const char *name, const_tree decl) + const char *name, const_tree decl, bool force_public) { if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL) - write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name); + write_fn_marker (s, is_defn, TREE_PUBLIC (decl) || force_public, name); /* PTX declaration. */ if (DECL_EXTERNAL (decl)) s << ".extern "; - else if (TREE_PUBLIC (decl)) + else if (TREE_PUBLIC (decl) || force_public) s << (DECL_WEAK (decl) ? ".weak " : ".visible "); s << (write_as_kernel (DECL_ATTRIBUTES (decl)) ? ".entry " : ".func "); @@ -1086,7 +1086,7 @@ write_fn_proto_1 (std::stringstream &s, bool is_defn, static void write_fn_proto (std::stringstream &s, bool is_defn, - const char *name, const_tree decl) + const char *name, const_tree decl, bool force_public=false) { const char *replacement = nvptx_name_replacement (name); char *replaced_dots = NULL; @@ -1103,9 +1103,9 @@ write_fn_proto (std::stringstream &s, bool is_defn, if (is_defn) /* Emit a declaration. The PTX assembler gets upset without it. */ - write_fn_proto_1 (s, false, name, decl); + write_fn_proto_1 (s, false, name, decl, force_public); - write_fn_proto_1 (s, is_defn, name, decl); + write_fn_proto_1 (s, is_defn, name, decl, force_public); if (replaced_dots) XDELETE (replaced_dots); @@ -1481,7 +1481,13 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) tree fntype = TREE_TYPE (decl); tree result_type = TREE_TYPE (fntype); int argno = 0; + bool force_public = false; + /* For reverse-offload 'nohost' functions: In order to be collectable in + '$offload_func_table', cf. mkoffload.cc, the function has to be visible. */ + if (lookup_attribute ("omp target device_ancestor_nohost", + DECL_ATTRIBUTES (decl))) + force_public = true; if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl)) && !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl))) { @@ -1493,7 +1499,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) /* We construct the initial part of the function into a string stream, in order to share the prototype writing code. */ std::stringstream s; - write_fn_proto (s, true, name, decl); + write_fn_proto (s, true, name, decl, force_public); s << "{\n"; bool return_in_mem = write_return_type (s, false, result_type); diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index d130665ed19..ac400fc2a1d 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1273,7 +1273,7 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev) int GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, struct addr_pair **target_table, - uint64_t **rev_fn_table __attribute__((unused))) + uint64_t **rev_fn_table) { CUmodule module; const char *const *var_names; @@ -1376,6 +1376,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, targ_tbl->start = targ_tbl->end = 0; targ_tbl++; + if (rev_fn_table && fn_entries == 0) + *rev_fn_table = NULL; + else if (rev_fn_table) + { + CUdeviceptr var; + size_t bytes; + r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &var, &bytes, module, + "$offload_func_table"); + if (r != CUDA_SUCCESS) + GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r)); + assert (bytes == sizeof (uint64_t) * fn_entries); + *rev_fn_table = GOMP_PLUGIN_malloc (sizeof (uint64_t) * fn_entries); + r = CUDA_CALL_NOCHECK (cuMemcpyDtoH, *rev_fn_table, var, bytes); + if (r != CUDA_SUCCESS) + GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuda_error (r)); + } + nvptx_set_clocktick (module, dev); return fn_entries + var_entries + other_entries; ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup 2022-08-25 17:30 ` [Patch][2/3] nvptx: " Tobias Burnus 2022-08-25 17:30 ` Tobias Burnus @ 2022-08-29 18:43 ` Tobias Burnus 2022-08-29 18:43 ` Tobias Burnus ` (2 more replies) 1 sibling, 3 replies; 15+ messages in thread From: Tobias Burnus @ 2022-08-29 18:43 UTC (permalink / raw) To: gcc-patches, Jakub Jelinek, Tom de Vries [-- Attachment #1: Type: text/plain, Size: 1327 bytes --] Slightly revised version, fixing some issues in mkoffload.cc. Otherwise, the same applies: On 25.08.22 19:30, Tobias Burnus wrote: On 25.08.22 16:54, Tobias Burnus wrote: The attached patch prepare for reverse-offload device->host function-address lookup by requesting (if needed) the on-device address. This patch adds the actual implementation for NVPTX. Having array[] = {fn1,fn2}; works with nvptx only since sm_35; hence, if there is a reverse_offload and sm_30 is used, there will be a compile-time error. To avoid incompatibilities, I compile with the same PTX ISA .version and sm_XX version as the (last) file that contains the reverse offload. While it should not matter, some newer CUDA might not support, e.g., sm_35 or do not like a specific ISA version - thus, that seemed to be safer. This is currently effectively a no op as with [1/3] patch, always NULL is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as 'omp requires reverse_offload' has been specified. OK for mainline? Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 [-- Attachment #2: fn-lookup-nvptx-v2.diff --] [-- Type: text/x-patch, Size: 11610 bytes --] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup Add support to nvptx for reverse lookup of function name to prepare for 'omp target device(ancestor:1)'. gcc/ChangeLog: * config/nvptx/mkoffload.cc (struct id_map): Add 'dim' member. (record_id): Store func name without quotes, store dim separately. (process): For GOMP_REQUIRES_REVERSE_OFFLOAD, check that -march is at least sm_35, create '$offload_func_table' global array and init with reverse-offload function addresses. * config/nvptx/nvptx.cc (write_fn_proto_1, write_fn_proto): New force_public attribute to force .visible. (nvptx_declare_function_name): For "omp target device_ancestor_nohost" attribut, force .visible/TREE_PUBLIC. libgomp/ChangeLog: * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Read offload function address table '$offload_func_table' if rev_fn_table is not NULL. gcc/config/nvptx/mkoffload.cc | 119 ++++++++++++++++++++++++++++++++++++++++-- gcc/config/nvptx/nvptx.cc | 20 +++++--- libgomp/plugin/plugin-nvptx.c | 19 +++++++- 3 files changed, 146 insertions(+), 12 deletions(-) diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc index 3eea0a8f138..834b2059aac 100644 --- a/gcc/config/nvptx/mkoffload.cc +++ b/gcc/config/nvptx/mkoffload.cc @@ -47,6 +47,7 @@ struct id_map { id_map *next; char *ptx_name; + char *dim; }; static id_map *func_ids, **funcs_tail = &func_ids; @@ -108,8 +109,11 @@ xputenv (const char *string) static void record_id (const char *p1, id_map ***where) { - const char *end = strchr (p1, '\n'); - if (!end) + gcc_assert (p1[0] == '"'); + p1++; + const char *end = strchr (p1, '"'); + const char *end2 = strchr (p1, '\n'); + if (!end || !end2 || end >= end2) fatal_error (input_location, "malformed ptx file"); id_map *v = XNEW (id_map); @@ -117,6 +121,16 @@ record_id (const char *p1, id_map ***where) v->ptx_name = XNEWVEC (char, len + 1); memcpy (v->ptx_name, p1, len); v->ptx_name[len] = '\0'; + p1 = end + 1; + if (*end != '\n') + { + len = end2 - p1; + v->dim = XNEWVEC (char, len + 1); + memcpy (v->dim, p1, len); + v->dim[len] = '\0'; + } + else + v->dim = NULL; v->next = NULL; id_map **tail = *where; *tail = v; @@ -242,6 +256,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires) id_map const *id; unsigned obj_count = 0; unsigned ix; + const char *sm_ver = NULL, *version = NULL; + const char *sm_ver2 = NULL, *version2 = NULL; + size_t file_cnt = 0; + size_t *file_idx = XALLOCAVEC (size_t, len); fprintf (out, "#include <stdint.h>\n\n"); @@ -250,6 +268,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires) for (size_t i = 0; i != len;) { char c; + bool output_fn_ptr = false; + file_idx[file_cnt++] = i; fprintf (out, "static const char ptx_code_%u[] =\n\t\"", obj_count++); while ((c = input[i++])) @@ -261,6 +281,16 @@ process (FILE *in, FILE *out, uint32_t omp_requires) case '\n': fprintf (out, "\\n\"\n\t\""); /* Look for mappings on subsequent lines. */ + if (UNLIKELY (startswith (input + i, ".target sm_"))) + { + sm_ver = input + i + strlen (".target sm_"); + continue; + } + if (UNLIKELY (startswith (input + i, ".version "))) + { + version = input + i + strlen (".version "); + continue; + } while (startswith (input + i, "//:")) { i += 3; @@ -268,7 +298,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires) if (startswith (input + i, "VAR_MAP ")) record_id (input + i + 8, &vars_tail); else if (startswith (input + i, "FUNC_MAP ")) - record_id (input + i + 9, &funcs_tail); + { + output_fn_ptr = true; + record_id (input + i + 9, &funcs_tail); + } else abort (); /* Skip to next line. */ @@ -286,6 +319,81 @@ process (FILE *in, FILE *out, uint32_t omp_requires) putc (c, out); } fprintf (out, "\";\n\n"); + if (output_fn_ptr + && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0) + { + if (sm_ver && sm_ver[0] == '3' && sm_ver[1] == '0' + && sm_ver[2] == '\n') + fatal_error (input_location, + "%<omp requires reverse_offload%> requires at least " + "%<sm_35%> for %<-misa=%>"); + sm_ver2 = sm_ver; + version2 = version; + } + } + + /* Create function-pointer array, required for reverse + offload function-pointer lookup. */ + + if (func_ids && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0) + { + const char needle[] = "// BEGIN GLOBAL FUNCTION DECL: "; + fprintf (out, "static const char ptx_code_%u[] =\n", obj_count++); + fprintf (out, "\t\".version "); + for (size_t i = 0; version2[i] != '\0' && version2[i] != '\n'; i++) + fputc (version2[i], out); + fprintf (out, "\"\n\t\".target sm_"); + for (size_t i = 0; version2[i] != '\0' && sm_ver2[i] != '\n'; i++) + fputc (sm_ver2[i], out); + fprintf (out, "\"\n\t\".file 1 \\\"<dummy>\\\"\"\n"); + + size_t fidx = 0; + for (id = func_ids; id; id = id->next) + { + /* Only 'nohost' functions are needed - use NULL for the rest. + Alternatively, besides searching for 'BEGIN FUNCTION DECL', + checking for '.visible .entry ' + id->ptx_name would be + required. */ + if (!endswith (id->ptx_name, "$nohost")) + continue; + fprintf (out, "\t\".extern "); + const char *p = input + file_idx[fidx]; + while (true) + { + p = strstr (p, needle); + if (!p) + { + fidx++; + if (fidx >= file_cnt) + break; + p = input + file_idx[fidx]; + continue; + } + p += strlen (needle); + if (!startswith (p, id->ptx_name)) + continue; + p += strlen (id->ptx_name); + if (*p != '\n') + continue; + p++; + gcc_assert (startswith (p, ".visible ")); + p += strlen (".visible "); + for (; *p != '\0' && *p != '\n'; p++) + fputc (*p, out); + break; + } + fprintf (out, "\"\n"); + if (fidx == file_cnt) + fatal_error (input_location, + "Cannot find function declaration for %qs", + id->ptx_name); + } + fprintf (out, "\t\".visible .global .align 8 .u64 " + "$offload_func_table[] = {"); + for (comma = "", id = func_ids; id; comma = ",", id = id->next) + fprintf (out, "%s\"\n\t\t\"%s", comma, + endswith (id->ptx_name, "$nohost") ? id->ptx_name : "0"); + fprintf (out, "};\\n\";\n\n"); } /* Dump out array of pointers to ptx object strings. */ @@ -300,7 +408,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires) /* Dump out variable idents. */ fprintf (out, "static const char *const var_mappings[] = {"); for (comma = "", id = var_ids; id; comma = ",", id = id->next) - fprintf (out, "%s\n\t%s", comma, id->ptx_name); + fprintf (out, "%s\n\t\"%s\"", comma, id->ptx_name); fprintf (out, "\n};\n\n"); /* Dump out function idents. */ @@ -309,7 +417,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires) " unsigned short dim[%d];\n" "} func_mappings[] = {\n", GOMP_DIM_MAX); for (comma = "", id = func_ids; id; comma = ",", id = id->next) - fprintf (out, "%s\n\t{%s}", comma, id->ptx_name); + fprintf (out, "%s\n\t{\"%s\"%s}", comma, id->ptx_name, + id->dim ? id->dim : ""); fprintf (out, "\n};\n\n"); fprintf (out, diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index e4297e2d6c3..3293c096822 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -989,15 +989,15 @@ write_var_marker (FILE *file, bool is_defn, bool globalize, const char *name) static void write_fn_proto_1 (std::stringstream &s, bool is_defn, - const char *name, const_tree decl) + const char *name, const_tree decl, bool force_public) { if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL) - write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name); + write_fn_marker (s, is_defn, TREE_PUBLIC (decl) || force_public, name); /* PTX declaration. */ if (DECL_EXTERNAL (decl)) s << ".extern "; - else if (TREE_PUBLIC (decl)) + else if (TREE_PUBLIC (decl) || force_public) s << (DECL_WEAK (decl) ? ".weak " : ".visible "); s << (write_as_kernel (DECL_ATTRIBUTES (decl)) ? ".entry " : ".func "); @@ -1086,7 +1086,7 @@ write_fn_proto_1 (std::stringstream &s, bool is_defn, static void write_fn_proto (std::stringstream &s, bool is_defn, - const char *name, const_tree decl) + const char *name, const_tree decl, bool force_public=false) { const char *replacement = nvptx_name_replacement (name); char *replaced_dots = NULL; @@ -1103,9 +1103,9 @@ write_fn_proto (std::stringstream &s, bool is_defn, if (is_defn) /* Emit a declaration. The PTX assembler gets upset without it. */ - write_fn_proto_1 (s, false, name, decl); + write_fn_proto_1 (s, false, name, decl, force_public); - write_fn_proto_1 (s, is_defn, name, decl); + write_fn_proto_1 (s, is_defn, name, decl, force_public); if (replaced_dots) XDELETE (replaced_dots); @@ -1481,7 +1481,13 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) tree fntype = TREE_TYPE (decl); tree result_type = TREE_TYPE (fntype); int argno = 0; + bool force_public = false; + /* For reverse-offload 'nohost' functions: In order to be collectable in + '$offload_func_table', cf. mkoffload.cc, the function has to be visible. */ + if (lookup_attribute ("omp target device_ancestor_nohost", + DECL_ATTRIBUTES (decl))) + force_public = true; if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl)) && !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl))) { @@ -1493,7 +1499,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) /* We construct the initial part of the function into a string stream, in order to share the prototype writing code. */ std::stringstream s; - write_fn_proto (s, true, name, decl); + write_fn_proto (s, true, name, decl, force_public); s << "{\n"; bool return_in_mem = write_return_type (s, false, result_type); diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index d130665ed19..ac400fc2a1d 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1273,7 +1273,7 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev) int GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, struct addr_pair **target_table, - uint64_t **rev_fn_table __attribute__((unused))) + uint64_t **rev_fn_table) { CUmodule module; const char *const *var_names; @@ -1376,6 +1376,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, targ_tbl->start = targ_tbl->end = 0; targ_tbl++; + if (rev_fn_table && fn_entries == 0) + *rev_fn_table = NULL; + else if (rev_fn_table) + { + CUdeviceptr var; + size_t bytes; + r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &var, &bytes, module, + "$offload_func_table"); + if (r != CUDA_SUCCESS) + GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r)); + assert (bytes == sizeof (uint64_t) * fn_entries); + *rev_fn_table = GOMP_PLUGIN_malloc (sizeof (uint64_t) * fn_entries); + r = CUDA_CALL_NOCHECK (cuMemcpyDtoH, *rev_fn_table, var, bytes); + if (r != CUDA_SUCCESS) + GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuda_error (r)); + } + nvptx_set_clocktick (module, dev); return fn_entries + var_entries + other_entries; ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup 2022-08-29 18:43 ` [Patch][2/3][v2] " Tobias Burnus @ 2022-08-29 18:43 ` Tobias Burnus 2022-09-09 15:36 ` Jakub Jelinek 2022-09-23 15:40 ` [og12] Come up with {,UN}LIKELY macros (was: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Thomas Schwinge 2 siblings, 0 replies; 15+ messages in thread From: Tobias Burnus @ 2022-08-29 18:43 UTC (permalink / raw) To: gcc-patches, Jakub Jelinek, Tom de Vries [-- Attachment #1.1: Type: text/plain, Size: 1327 bytes --] Slightly revised version, fixing some issues in mkoffload.cc. Otherwise, the same applies: On 25.08.22 19:30, Tobias Burnus wrote: On 25.08.22 16:54, Tobias Burnus wrote: The attached patch prepare for reverse-offload device->host function-address lookup by requesting (if needed) the on-device address. This patch adds the actual implementation for NVPTX. Having array[] = {fn1,fn2}; works with nvptx only since sm_35; hence, if there is a reverse_offload and sm_30 is used, there will be a compile-time error. To avoid incompatibilities, I compile with the same PTX ISA .version and sm_XX version as the (last) file that contains the reverse offload. While it should not matter, some newer CUDA might not support, e.g., sm_35 or do not like a specific ISA version - thus, that seemed to be safer. This is currently effectively a no op as with [1/3] patch, always NULL is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as 'omp requires reverse_offload' has been specified. OK for mainline? Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 [-- Attachment #2: fn-lookup-nvptx-v2.diff --] [-- Type: text/x-patch, Size: 11610 bytes --] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup Add support to nvptx for reverse lookup of function name to prepare for 'omp target device(ancestor:1)'. gcc/ChangeLog: * config/nvptx/mkoffload.cc (struct id_map): Add 'dim' member. (record_id): Store func name without quotes, store dim separately. (process): For GOMP_REQUIRES_REVERSE_OFFLOAD, check that -march is at least sm_35, create '$offload_func_table' global array and init with reverse-offload function addresses. * config/nvptx/nvptx.cc (write_fn_proto_1, write_fn_proto): New force_public attribute to force .visible. (nvptx_declare_function_name): For "omp target device_ancestor_nohost" attribut, force .visible/TREE_PUBLIC. libgomp/ChangeLog: * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Read offload function address table '$offload_func_table' if rev_fn_table is not NULL. gcc/config/nvptx/mkoffload.cc | 119 ++++++++++++++++++++++++++++++++++++++++-- gcc/config/nvptx/nvptx.cc | 20 +++++--- libgomp/plugin/plugin-nvptx.c | 19 +++++++- 3 files changed, 146 insertions(+), 12 deletions(-) diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc index 3eea0a8f138..834b2059aac 100644 --- a/gcc/config/nvptx/mkoffload.cc +++ b/gcc/config/nvptx/mkoffload.cc @@ -47,6 +47,7 @@ struct id_map { id_map *next; char *ptx_name; + char *dim; }; static id_map *func_ids, **funcs_tail = &func_ids; @@ -108,8 +109,11 @@ xputenv (const char *string) static void record_id (const char *p1, id_map ***where) { - const char *end = strchr (p1, '\n'); - if (!end) + gcc_assert (p1[0] == '"'); + p1++; + const char *end = strchr (p1, '"'); + const char *end2 = strchr (p1, '\n'); + if (!end || !end2 || end >= end2) fatal_error (input_location, "malformed ptx file"); id_map *v = XNEW (id_map); @@ -117,6 +121,16 @@ record_id (const char *p1, id_map ***where) v->ptx_name = XNEWVEC (char, len + 1); memcpy (v->ptx_name, p1, len); v->ptx_name[len] = '\0'; + p1 = end + 1; + if (*end != '\n') + { + len = end2 - p1; + v->dim = XNEWVEC (char, len + 1); + memcpy (v->dim, p1, len); + v->dim[len] = '\0'; + } + else + v->dim = NULL; v->next = NULL; id_map **tail = *where; *tail = v; @@ -242,6 +256,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires) id_map const *id; unsigned obj_count = 0; unsigned ix; + const char *sm_ver = NULL, *version = NULL; + const char *sm_ver2 = NULL, *version2 = NULL; + size_t file_cnt = 0; + size_t *file_idx = XALLOCAVEC (size_t, len); fprintf (out, "#include <stdint.h>\n\n"); @@ -250,6 +268,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires) for (size_t i = 0; i != len;) { char c; + bool output_fn_ptr = false; + file_idx[file_cnt++] = i; fprintf (out, "static const char ptx_code_%u[] =\n\t\"", obj_count++); while ((c = input[i++])) @@ -261,6 +281,16 @@ process (FILE *in, FILE *out, uint32_t omp_requires) case '\n': fprintf (out, "\\n\"\n\t\""); /* Look for mappings on subsequent lines. */ + if (UNLIKELY (startswith (input + i, ".target sm_"))) + { + sm_ver = input + i + strlen (".target sm_"); + continue; + } + if (UNLIKELY (startswith (input + i, ".version "))) + { + version = input + i + strlen (".version "); + continue; + } while (startswith (input + i, "//:")) { i += 3; @@ -268,7 +298,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires) if (startswith (input + i, "VAR_MAP ")) record_id (input + i + 8, &vars_tail); else if (startswith (input + i, "FUNC_MAP ")) - record_id (input + i + 9, &funcs_tail); + { + output_fn_ptr = true; + record_id (input + i + 9, &funcs_tail); + } else abort (); /* Skip to next line. */ @@ -286,6 +319,81 @@ process (FILE *in, FILE *out, uint32_t omp_requires) putc (c, out); } fprintf (out, "\";\n\n"); + if (output_fn_ptr + && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0) + { + if (sm_ver && sm_ver[0] == '3' && sm_ver[1] == '0' + && sm_ver[2] == '\n') + fatal_error (input_location, + "%<omp requires reverse_offload%> requires at least " + "%<sm_35%> for %<-misa=%>"); + sm_ver2 = sm_ver; + version2 = version; + } + } + + /* Create function-pointer array, required for reverse + offload function-pointer lookup. */ + + if (func_ids && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0) + { + const char needle[] = "// BEGIN GLOBAL FUNCTION DECL: "; + fprintf (out, "static const char ptx_code_%u[] =\n", obj_count++); + fprintf (out, "\t\".version "); + for (size_t i = 0; version2[i] != '\0' && version2[i] != '\n'; i++) + fputc (version2[i], out); + fprintf (out, "\"\n\t\".target sm_"); + for (size_t i = 0; version2[i] != '\0' && sm_ver2[i] != '\n'; i++) + fputc (sm_ver2[i], out); + fprintf (out, "\"\n\t\".file 1 \\\"<dummy>\\\"\"\n"); + + size_t fidx = 0; + for (id = func_ids; id; id = id->next) + { + /* Only 'nohost' functions are needed - use NULL for the rest. + Alternatively, besides searching for 'BEGIN FUNCTION DECL', + checking for '.visible .entry ' + id->ptx_name would be + required. */ + if (!endswith (id->ptx_name, "$nohost")) + continue; + fprintf (out, "\t\".extern "); + const char *p = input + file_idx[fidx]; + while (true) + { + p = strstr (p, needle); + if (!p) + { + fidx++; + if (fidx >= file_cnt) + break; + p = input + file_idx[fidx]; + continue; + } + p += strlen (needle); + if (!startswith (p, id->ptx_name)) + continue; + p += strlen (id->ptx_name); + if (*p != '\n') + continue; + p++; + gcc_assert (startswith (p, ".visible ")); + p += strlen (".visible "); + for (; *p != '\0' && *p != '\n'; p++) + fputc (*p, out); + break; + } + fprintf (out, "\"\n"); + if (fidx == file_cnt) + fatal_error (input_location, + "Cannot find function declaration for %qs", + id->ptx_name); + } + fprintf (out, "\t\".visible .global .align 8 .u64 " + "$offload_func_table[] = {"); + for (comma = "", id = func_ids; id; comma = ",", id = id->next) + fprintf (out, "%s\"\n\t\t\"%s", comma, + endswith (id->ptx_name, "$nohost") ? id->ptx_name : "0"); + fprintf (out, "};\\n\";\n\n"); } /* Dump out array of pointers to ptx object strings. */ @@ -300,7 +408,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires) /* Dump out variable idents. */ fprintf (out, "static const char *const var_mappings[] = {"); for (comma = "", id = var_ids; id; comma = ",", id = id->next) - fprintf (out, "%s\n\t%s", comma, id->ptx_name); + fprintf (out, "%s\n\t\"%s\"", comma, id->ptx_name); fprintf (out, "\n};\n\n"); /* Dump out function idents. */ @@ -309,7 +417,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires) " unsigned short dim[%d];\n" "} func_mappings[] = {\n", GOMP_DIM_MAX); for (comma = "", id = func_ids; id; comma = ",", id = id->next) - fprintf (out, "%s\n\t{%s}", comma, id->ptx_name); + fprintf (out, "%s\n\t{\"%s\"%s}", comma, id->ptx_name, + id->dim ? id->dim : ""); fprintf (out, "\n};\n\n"); fprintf (out, diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index e4297e2d6c3..3293c096822 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -989,15 +989,15 @@ write_var_marker (FILE *file, bool is_defn, bool globalize, const char *name) static void write_fn_proto_1 (std::stringstream &s, bool is_defn, - const char *name, const_tree decl) + const char *name, const_tree decl, bool force_public) { if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL) - write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name); + write_fn_marker (s, is_defn, TREE_PUBLIC (decl) || force_public, name); /* PTX declaration. */ if (DECL_EXTERNAL (decl)) s << ".extern "; - else if (TREE_PUBLIC (decl)) + else if (TREE_PUBLIC (decl) || force_public) s << (DECL_WEAK (decl) ? ".weak " : ".visible "); s << (write_as_kernel (DECL_ATTRIBUTES (decl)) ? ".entry " : ".func "); @@ -1086,7 +1086,7 @@ write_fn_proto_1 (std::stringstream &s, bool is_defn, static void write_fn_proto (std::stringstream &s, bool is_defn, - const char *name, const_tree decl) + const char *name, const_tree decl, bool force_public=false) { const char *replacement = nvptx_name_replacement (name); char *replaced_dots = NULL; @@ -1103,9 +1103,9 @@ write_fn_proto (std::stringstream &s, bool is_defn, if (is_defn) /* Emit a declaration. The PTX assembler gets upset without it. */ - write_fn_proto_1 (s, false, name, decl); + write_fn_proto_1 (s, false, name, decl, force_public); - write_fn_proto_1 (s, is_defn, name, decl); + write_fn_proto_1 (s, is_defn, name, decl, force_public); if (replaced_dots) XDELETE (replaced_dots); @@ -1481,7 +1481,13 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) tree fntype = TREE_TYPE (decl); tree result_type = TREE_TYPE (fntype); int argno = 0; + bool force_public = false; + /* For reverse-offload 'nohost' functions: In order to be collectable in + '$offload_func_table', cf. mkoffload.cc, the function has to be visible. */ + if (lookup_attribute ("omp target device_ancestor_nohost", + DECL_ATTRIBUTES (decl))) + force_public = true; if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl)) && !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl))) { @@ -1493,7 +1499,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) /* We construct the initial part of the function into a string stream, in order to share the prototype writing code. */ std::stringstream s; - write_fn_proto (s, true, name, decl); + write_fn_proto (s, true, name, decl, force_public); s << "{\n"; bool return_in_mem = write_return_type (s, false, result_type); diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index d130665ed19..ac400fc2a1d 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1273,7 +1273,7 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev) int GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, struct addr_pair **target_table, - uint64_t **rev_fn_table __attribute__((unused))) + uint64_t **rev_fn_table) { CUmodule module; const char *const *var_names; @@ -1376,6 +1376,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, targ_tbl->start = targ_tbl->end = 0; targ_tbl++; + if (rev_fn_table && fn_entries == 0) + *rev_fn_table = NULL; + else if (rev_fn_table) + { + CUdeviceptr var; + size_t bytes; + r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &var, &bytes, module, + "$offload_func_table"); + if (r != CUDA_SUCCESS) + GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r)); + assert (bytes == sizeof (uint64_t) * fn_entries); + *rev_fn_table = GOMP_PLUGIN_malloc (sizeof (uint64_t) * fn_entries); + r = CUDA_CALL_NOCHECK (cuMemcpyDtoH, *rev_fn_table, var, bytes); + if (r != CUDA_SUCCESS) + GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuda_error (r)); + } + nvptx_set_clocktick (module, dev); return fn_entries + var_entries + other_entries; ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup 2022-08-29 18:43 ` [Patch][2/3][v2] " Tobias Burnus 2022-08-29 18:43 ` Tobias Burnus @ 2022-09-09 15:36 ` Jakub Jelinek 2022-09-12 12:02 ` [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible (was: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Tobias Burnus 2022-09-23 15:40 ` [og12] Come up with {,UN}LIKELY macros (was: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Thomas Schwinge 2 siblings, 1 reply; 15+ messages in thread From: Jakub Jelinek @ 2022-09-09 15:36 UTC (permalink / raw) To: Tobias Burnus; +Cc: gcc-patches, Tom de Vries On Mon, Aug 29, 2022 at 08:43:26PM +0200, Tobias Burnus wrote: > Slightly revised version, fixing some issues in mkoffload.cc. Otherwise, the same applies: > > On 25.08.22 19:30, Tobias Burnus wrote: > On 25.08.22 16:54, Tobias Burnus wrote: > > The attached patch prepare for reverse-offload device->host > function-address lookup by requesting (if needed) the on-device address. > > > This patch adds the actual implementation for NVPTX. > > Having array[] = {fn1,fn2}; works with nvptx only since sm_35; hence, > if there is a reverse_offload and sm_30 is used, there will be a compile-time > error. Wonder if we instead shouldn't arrange for silent request for no PTX offloading (or one with warning?) if sm_30 and reverse offload is needed. Error might be too harsh, the program can still offload to GCN or host just fine... Otherwise LGTM. Jakub ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible (was: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) 2022-09-09 15:36 ` Jakub Jelinek @ 2022-09-12 12:02 ` Tobias Burnus 2022-09-12 12:10 ` Jakub Jelinek 2022-10-17 11:59 ` Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c' (was: [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible) Thomas Schwinge 0 siblings, 2 replies; 15+ messages in thread From: Tobias Burnus @ 2022-09-12 12:02 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc-patches, Tom de Vries [-- Attachment #1.1: Type: text/plain, Size: 1784 bytes --] On 09.09.22 17:36, Jakub Jelinek wrote: Wonder if we instead shouldn't arrange for silent request for no PTX offloading (or one with warning?) if sm_30 and reverse offload is needed. Error might be too harsh, the program can still offload to GCN or host just fine... Attached patch now implements the warning. I think silently failing is not the proper solution. It is too confusing and without telling the user, they may not notice this issue. I also changed the "progname" variable; it seems as if the only user is the diagnostic machinery and it seems to make sense to have: nvptx mkoffload: warning: 'omp requires reverse_offload' requires at least 'sm_35' for '-misa=' - disabling offload-code generation for this device type instead of guessing whether "mkoffload" is for the host, gcn or nvptx. I know that the common way is to use the binary name ("lto1") or ... but I still think this prefix – which is the tool_name. Makes sense. (BTW: gcc/config/i386/intelmic-mkoffload.cc uses "mkoffload-intelmic".) I also changed "-misa=" in the diagnostic to "-march=" as Tom changed the default from "-misa=" (now an alias) to "-march=" in GCC 12 (+plus added -march-map= in addition). I also added a testcase for this – and updated the testsuite for the dg-warning. OK for mainline? Tobias PS: As no code for nvptx is generated, there is no "device present bit not used" warning with GOMP_DEBUG, but as there is a compile-time warning, I guess that's fine. ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 [-- Attachment #2: nvptx-warn-if-sm-too-old-v3.diff --] [-- Type: text/x-patch, Size: 7573 bytes --] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible Reverse offload requests at least -misa=sm_35; with this patch, a warning instead of an error is shown, still permitting reverse offload for all other configured device types. This is achieved by not calling GOMP_offload_register_ver (and stopping generating pointless 'static const char' variables, once known.) The tool_name as progname changes adds "nvptx " and "gcn " to the "mkoffload: warning/error:" diagnostic. gcc/ChangeLog: * config/nvptx/mkoffload.cc (process): Replace a fatal_error by a warning + not enabling offloading if -misa=sm_30 prevents reverse offload. (main): Use tool_name as progname for diagnostic. * config/gcn/mkoffload.cc (main): Likewise. libgomp/ChangeLog: * libgomp.texi (Offload-Target Specifics: nvptx): Document that reverse offload requires >= -march=sm_35. * testsuite/libgomp.c-c++-common/requires-4.c: Build for nvptx with -misa=sm_35. * testsuite/libgomp.c-c++-common/requires-5.c: Likewise. * testsuite/libgomp.c-c++-common/requires-6.c: Likewise. * testsuite/libgomp.c-c++-common/reverse-offload-1.c: Likewise. * testsuite/libgomp.fortran/reverse-offload-1.f90: Likewise. * testsuite/libgomp.c/reverse-offload-sm30.c: New test. gcc/config/gcn/mkoffload.cc | 2 +- gcc/config/nvptx/mkoffload.cc | 17 +++++++++++++---- libgomp/libgomp.texi | 3 +++ libgomp/testsuite/libgomp.c-c++-common/requires-4.c | 1 + libgomp/testsuite/libgomp.c-c++-common/requires-5.c | 1 + libgomp/testsuite/libgomp.c-c++-common/requires-6.c | 2 ++ .../testsuite/libgomp.c-c++-common/reverse-offload-1.c | 1 + libgomp/testsuite/libgomp.c/reverse-offload-sm30.c | 15 +++++++++++++++ libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90 | 1 + 9 files changed, 38 insertions(+), 5 deletions(-) diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc index 24d327355e3..64037806acf 100644 --- a/gcc/config/gcn/mkoffload.cc +++ b/gcc/config/gcn/mkoffload.cc @@ -805,7 +805,7 @@ main (int argc, char **argv) FILE *cfile = stdout; const char *outname = 0; - progname = "mkoffload"; + progname = tool_name; diagnostic_initialize (global_dc, 0); obstack_init (&files_to_cleanup); diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc index 834b2059aac..5f3e07ad066 100644 --- a/gcc/config/nvptx/mkoffload.cc +++ b/gcc/config/nvptx/mkoffload.cc @@ -324,9 +324,18 @@ process (FILE *in, FILE *out, uint32_t omp_requires) { if (sm_ver && sm_ver[0] == '3' && sm_ver[1] == '0' && sm_ver[2] == '\n') - fatal_error (input_location, - "%<omp requires reverse_offload%> requires at least " - "%<sm_35%> for %<-misa=%>"); + { + warning_at (input_location, 0, + "%<omp requires reverse_offload%> requires at " + "least %<sm_35%> for %<-march=%> - disabling " + "offload-code generation for this device type"); + /* As now an empty file is compiled and there is no call to + GOMP_offload_register_ver, this device type is effectively + disabled. */ + fflush (out); + ftruncate (fileno (out), 0); + return; + } sm_ver2 = sm_ver; version2 = version; } @@ -526,7 +535,7 @@ main (int argc, char **argv) FILE *out = stdout; const char *outname = 0; - progname = "mkoffload"; + progname = tool_name; diagnostic_initialize (global_dc, 0); if (atexit (mkoffload_cleanup) != 0) diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 4eaad4348bb..1f402d6df79 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -4386,6 +4386,9 @@ The implementation remark: @item I/O within OpenMP target regions and OpenACC parallel/kernels is supported using the C library @code{printf} functions and the Fortran @code{print}/@code{write} statements. +@item Compilation OpenMP code that contains @code{requires reverse_offload} + requires at least @code{-march=sm_35}, compiling for @code{-march=sm_30} + is not supported. @end itemize diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c index 6ed5a5f647a..5883eff0d93 100644 --- a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c @@ -1,4 +1,5 @@ /* { dg-additional-options "-flto" } */ +/* { dg-additional-options "-foffload-options=nvptx-none=-misa=sm_35" { target { offload_target_nvptx } } } */ /* { dg-additional-sources requires-4-aux.c } */ /* Check no diagnostic by device-compiler's or host compiler's lto1. diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-5.c b/libgomp/testsuite/libgomp.c-c++-common/requires-5.c index 7fe0c735d27..d43d78db6fa 100644 --- a/libgomp/testsuite/libgomp.c-c++-common/requires-5.c +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-5.c @@ -1,3 +1,4 @@ +/* { dg-additional-options "-foffload-options=nvptx-none=-misa=sm_35" { target { offload_target_nvptx } } } */ /* { dg-additional-sources requires-5-aux.c } */ /* Depending on offload device capabilities, it may print something like the diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-6.c b/libgomp/testsuite/libgomp.c-c++-common/requires-6.c index b00c7459bbc..a25b4d2dedd 100644 --- a/libgomp/testsuite/libgomp.c-c++-common/requires-6.c +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-6.c @@ -1,3 +1,5 @@ +/* { dg-additional-options "-foffload-options=nvptx-none=-misa=sm_35" { target { offload_target_nvptx } } } */ + #pragma omp requires unified_shared_memory, unified_address, reverse_offload /* The requires line is not active as there is none of: diff --git a/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c b/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c index 976e129f560..52d828caf1c 100644 --- a/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c +++ b/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c @@ -1,4 +1,5 @@ /* { dg-do run } */ +/* { dg-additional-options "-foffload-options=nvptx-none=-misa=sm_35" { target { offload_target_nvptx } } } */ /* { dg-additional-sources reverse-offload-1-aux.c } */ /* Check that reverse offload works in particular: diff --git a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c new file mode 100644 index 00000000000..14aed0132b7 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c @@ -0,0 +1,15 @@ +/* { dg-do link { target { offload_target_nvptx } } } */ +/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload=-mptx=_" } */ + +#pragma omp requires reverse_offload + +int +main () +{ + #pragma omp target + { + } + return 0; +} + +/* { dg-warning "'omp requires reverse_offload' requires at least 'sm_35' for '-march=' - disabling offload-code generation for this device type" "" { target *-*-* } 0 } */ diff --git a/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90 b/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90 index 7cfb8b6552e..de68011f8f7 100644 --- a/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90 +++ b/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90 @@ -1,4 +1,5 @@ ! { dg-do run } +! { dg-additional-options "-foffload-options=nvptx-none=-misa=sm_35" { target { offload_target_nvptx } } } ! { dg-additional-sources reverse-offload-1-aux.f90 } ! Check that reverse offload works in particular: ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible (was: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) 2022-09-12 12:02 ` [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible (was: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Tobias Burnus @ 2022-09-12 12:10 ` Jakub Jelinek 2022-10-17 11:59 ` Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c' (was: [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible) Thomas Schwinge 1 sibling, 0 replies; 15+ messages in thread From: Jakub Jelinek @ 2022-09-12 12:10 UTC (permalink / raw) To: Tobias Burnus; +Cc: gcc-patches, Tom de Vries On Mon, Sep 12, 2022 at 02:02:16PM +0200, Tobias Burnus wrote: > + { > + warning_at (input_location, 0, > + "%<omp requires reverse_offload%> requires at " > + "least %<sm_35%> for %<-march=%> - disabling " > + "offload-code generation for this device type"); I wonder whether it shouldn't talk about -foffload-options=nvptx-none=-march= instead of just -march=. Otherwise LGTM. Jakub ^ permalink raw reply [flat|nested] 15+ messages in thread
* Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c' (was: [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible) 2022-09-12 12:02 ` [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible (was: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Tobias Burnus 2022-09-12 12:10 ` Jakub Jelinek @ 2022-10-17 11:59 ` Thomas Schwinge 1 sibling, 0 replies; 15+ messages in thread From: Thomas Schwinge @ 2022-10-17 11:59 UTC (permalink / raw) To: Tobias Burnus, gcc-patches; +Cc: Jakub Jelinek [-- Attachment #1: Type: text/plain, Size: 1351 bytes --] Hi! On 2022-09-12T14:02:16+0200, Tobias Burnus <tobias@codesourcery.com> wrote: > --- /dev/null > +++ b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c > @@ -0,0 +1,15 @@ > +/* { dg-do link { target { offload_target_nvptx } } } */ > +/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload=-mptx=_" } */ Pushed to master branch commit b61796663ba1fe8fb83203829398f3f89ec212b7 "Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c'", see attached. Cherry-picked pushed to devel/omp/gcc-12 branch in commit f36ce95ad928578aa6739f61480e6c8fbaf2248e "Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c'", see attached. Grüße Thomas > + > +#pragma omp requires reverse_offload > + > +int > +main () > +{ > + #pragma omp target > + { > + } > + return 0; > +} > + > +/* { dg-warning "'omp requires reverse_offload' requires at least 'sm_35' for '-march=' - disabling offload-code generation for this device type" "" { target *-*-* } 0 } */ ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Fix-nvptx-specific-foffload-options-syntax-in-libgom.patch --] [-- Type: text/x-diff, Size: 1327 bytes --] From b61796663ba1fe8fb83203829398f3f89ec212b7 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge <thomas@codesourcery.com> Date: Fri, 23 Sep 2022 11:29:50 +0200 Subject: [PATCH] Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c' That is, '-mptx=_' is only valid in '-foffload-options=nvptx-none', too. Fix test case added in recent commit r13-2625-g6b43f556f392a7165582aca36a19fe7389d995b2 "nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible". libgomp/ * testsuite/libgomp.c/reverse-offload-sm30.c: Fix nvptx-specific '-foffload-options' syntax. --- libgomp/testsuite/libgomp.c/reverse-offload-sm30.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c index fbfeae1fd41..7f10fd4ded9 100644 --- a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c +++ b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c @@ -1,5 +1,5 @@ /* { dg-do link { target { offload_target_nvptx } } } */ -/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload=-mptx=_" } */ +/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload-options=nvptx-none=-mptx=_" } */ #pragma omp requires reverse_offload -- 2.35.1 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: 0001-Fix-nvptx-specific-foffload-options-syntax-in-l.og12.patch --] [-- Type: text/x-diff, Size: 2001 bytes --] From f36ce95ad928578aa6739f61480e6c8fbaf2248e Mon Sep 17 00:00:00 2001 From: Thomas Schwinge <thomas@codesourcery.com> Date: Fri, 23 Sep 2022 11:29:50 +0200 Subject: [PATCH] Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c' That is, '-mptx=_' is only valid in '-foffload-options=nvptx-none', too. Fix test case added in recent commit r13-2625-g6b43f556f392a7165582aca36a19fe7389d995b2 "nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible". libgomp/ * testsuite/libgomp.c/reverse-offload-sm30.c: Fix nvptx-specific '-foffload-options' syntax. (cherry picked from commit b61796663ba1fe8fb83203829398f3f89ec212b7) --- libgomp/ChangeLog.omp | 8 ++++++++ libgomp/testsuite/libgomp.c/reverse-offload-sm30.c | 2 +- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp index cb3541be378..048314eb1be 100644 --- a/libgomp/ChangeLog.omp +++ b/libgomp/ChangeLog.omp @@ -1,3 +1,11 @@ +2022-10-17 Thomas Schwinge <thomas@codesourcery.com> + + Backported from master: + 2022-10-17 Thomas Schwinge <thomas@codesourcery.com> + + * testsuite/libgomp.c/reverse-offload-sm30.c: Fix nvptx-specific + '-foffload-options' syntax. + 2022-10-14 Julian Brown <julian@codesourcery.com> * testsuite/libgomp.oacc-fortran/declare-1.f90: Adjust scan output. diff --git a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c index fbfeae1fd41..7f10fd4ded9 100644 --- a/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c +++ b/libgomp/testsuite/libgomp.c/reverse-offload-sm30.c @@ -1,5 +1,5 @@ /* { dg-do link { target { offload_target_nvptx } } } */ -/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload=-mptx=_" } */ +/* { dg-additional-options "-foffload-options=nvptx-none=-march=sm_30 -foffload-options=nvptx-none=-mptx=_" } */ #pragma omp requires reverse_offload -- 2.35.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [og12] Come up with {,UN}LIKELY macros (was: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) 2022-08-29 18:43 ` [Patch][2/3][v2] " Tobias Burnus 2022-08-29 18:43 ` Tobias Burnus 2022-09-09 15:36 ` Jakub Jelinek @ 2022-09-23 15:40 ` Thomas Schwinge 2 siblings, 0 replies; 15+ messages in thread From: Thomas Schwinge @ 2022-09-23 15:40 UTC (permalink / raw) To: Tobias Burnus, gcc-patches [-- Attachment #1: Type: text/plain, Size: 2160 bytes --] Hi! Since the 2022-09-12 backport of this: On 2022-08-29T20:43:26+0200, Tobias Burnus <tobias@codesourcery.com> wrote: > nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup ... to og12 in commit 2b6ad53fd76c7bb9605be417d137a7d9a18f2117, the og12 branch didn't build anymore: [...]/gcc/config/nvptx/mkoffload.cc: In function 'void process(FILE*, FILE*, uint32_t)': [...]/gcc/config/nvptx/mkoffload.cc:284:59: error: 'UNLIKELY' was not declared in this scope if (UNLIKELY (startswith (input + i, ".target sm_"))) ^ [...]/gcc/config/nvptx/mkoffload.cc:289:57: error: 'UNLIKELY' was not declared in this scope if (UNLIKELY (startswith (input + i, ".version "))) ^ make[2]: *** [[...]/gcc/config/nvptx/t-nvptx:8: mkoffload.o] Error 1 > --- a/gcc/config/nvptx/mkoffload.cc > +++ b/gcc/config/nvptx/mkoffload.cc > @@ -261,6 +281,16 @@ process (FILE *in, FILE *out, uint32_t omp_requires) > case '\n': > fprintf (out, "\\n\"\n\t\""); > /* Look for mappings on subsequent lines. */ > + if (UNLIKELY (startswith (input + i, ".target sm_"))) > + { > + sm_ver = input + i + strlen (".target sm_"); > + continue; > + } > + if (UNLIKELY (startswith (input + i, ".version "))) > + { > + version = input + i + strlen (".version "); > + continue; > + } To fix this, I've pushed a (very much reduced) partial cherry-pick of commit r13-171-g22d9c8802add09a93308319fc37dd3a0f1125393 "Come up with {,UN}LIKELY macros" to og12 branch in commit 44b77201a5431450f608b4538fefb1319127de13, see attached. Grüße Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Come-up-with-UN-LIKELY-macros.og12.patch --] [-- Type: text/x-diff, Size: 1429 bytes --] From 44b77201a5431450f608b4538fefb1319127de13 Mon Sep 17 00:00:00 2001 From: Martin Liska <mliska@suse.cz> Date: Thu, 3 Feb 2022 10:58:18 +0100 Subject: [PATCH] Come up with {,UN}LIKELY macros. gcc/ChangeLog: * system.h (LIKELY): Define. (UNLIKELY): Likewise. (cherry picked from commit 22d9c8802add09a93308319fc37dd3a0f1125393, partial) --- gcc/ChangeLog.omp | 8 ++++++++ gcc/system.h | 3 +++ 2 files changed, 11 insertions(+) diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp index 4f80bcbd356..30c3abfc15b 100644 --- a/gcc/ChangeLog.omp +++ b/gcc/ChangeLog.omp @@ -1,3 +1,11 @@ +2022-09-23 Thomas Schwinge <thomas@codesourcery.com> + + Backport from master branch: + 2022-05-09 Martin Liska <mliska@suse.cz> + + * system.h (LIKELY): Define. + (UNLIKELY): Likewise. + 2022-09-12 Tobias Burnus <tobias@codesourcery.com> Backport from mainline: diff --git a/gcc/system.h b/gcc/system.h index e10c34f70ec..6b6868d0bbf 100644 --- a/gcc/system.h +++ b/gcc/system.h @@ -736,6 +736,9 @@ extern int vsnprintf (char *, size_t, const char *, va_list); #define __builtin_expect(a, b) (a) #endif +#define LIKELY(x) (__builtin_expect ((x), 1)) +#define UNLIKELY(x) (__builtin_expect ((x), 0)) + /* Some of the headers included by <memory> can use "abort" within a namespace, e.g. "_VSTD::abort();", which fails after we use the preprocessor to redefine "abort" as "fancy_abort" below. */ -- 2.35.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch][1/3] libgomp: Prepare for reverse offload fn lookup 2022-08-25 14:54 [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Tobias Burnus ` (2 preceding siblings ...) 2022-08-25 17:30 ` [Patch][2/3] nvptx: " Tobias Burnus @ 2022-09-09 15:29 ` Jakub Jelinek 3 siblings, 0 replies; 15+ messages in thread From: Jakub Jelinek @ 2022-09-09 15:29 UTC (permalink / raw) To: Tobias Burnus; +Cc: gcc-patches On Thu, Aug 25, 2022 at 04:54:51PM +0200, Tobias Burnus wrote: > Technically, this patch is stand alone, but conceptually it based on the > submitted but not reviewed patch: > "[Patch] OpenMP: Support reverse offload (middle end part)" > https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598662.html > > With that patch, for reverse offloads ('omp target device(ancestor:1)'), > calls like the following are added: > GOMP_target_ext (-2 /* initial device */, omp_fn.1 > where 'omp_fn.1' on nonhost devices a stub function just required for > looking up the host function pointer via the offload_funcs table. > > The attached patch prepare for reverse-offload device->host > function-address lookup by requesting (if needed) the on-device address. > > OK for mainline? > > Tobias > > > ----------------- > Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 > libgomp: Prepare for reverse offload fn lookup > > Prepare for reverse-offloading function-pointer lookup by passing > a rev_fn_table argument to GOMP_OFFLOAD_load_image. > > The argument will be NULL, unless GOMP_REQUIRES_REVERSE_OFFLOAD is > requested and devices not supported it, are filtered out. > (Up to and including this commit, no non-host device claims such > support and the caller currently always passes NULL.) > > libgomp/ChangeLog: > > * libgomp-plugin.h (GOMP_OFFLOAD_load_image): Add > 'uint64_t **rev_fn_table' argument. > * oacc-host.c (host_load_image): Likewise. > * plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Likewise; > currently unused. > * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise. > * target.c (gomp_load_image_to_device): Update call but pass > NULL for now. > > liboffloadmic/ChangeLog: > > * plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_load_image): > Add (unused) uint64_t **rev_fn_table argument. Ok, thanks. Jakub ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2022-10-17 11:59 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-25 14:54 [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Tobias Burnus 2022-08-25 14:54 ` Tobias Burnus 2022-08-25 15:38 ` [Patch][2/3] GCN: libgomp+mkoffload.cc: " Tobias Burnus 2022-08-25 15:38 ` Tobias Burnus 2022-09-09 15:31 ` Jakub Jelinek 2022-08-25 17:30 ` [Patch][2/3] nvptx: " Tobias Burnus 2022-08-25 17:30 ` Tobias Burnus 2022-08-29 18:43 ` [Patch][2/3][v2] " Tobias Burnus 2022-08-29 18:43 ` Tobias Burnus 2022-09-09 15:36 ` Jakub Jelinek 2022-09-12 12:02 ` [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible (was: Re: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Tobias Burnus 2022-09-12 12:10 ` Jakub Jelinek 2022-10-17 11:59 ` Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c' (was: [Patch] nvptx/mkoffload.cc: Warn instead of error when reverse offload is not possible) Thomas Schwinge 2022-09-23 15:40 ` [og12] Come up with {,UN}LIKELY macros (was: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup) Thomas Schwinge 2022-09-09 15:29 ` [Patch][1/3] libgomp: Prepare for reverse offload fn lookup Jakub Jelinek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).