* [PATCH] openmp: Add support for the 'indirect' clause in C/C++
@ 2023-10-08 13:13 Kwok Cheung Yeung
2023-10-17 13:12 ` Tobias Burnus
0 siblings, 1 reply; 28+ messages in thread
From: Kwok Cheung Yeung @ 2023-10-08 13:13 UTC (permalink / raw)
To: gcc-patches; +Cc: Jakub Jelinek, tobias
[-- Attachment #1: Type: text/plain, Size: 2048 bytes --]
Hello
This patch adds support for the 'indirect' clause in the 'declare
target' directive in C/C++ (Fortran to follow) and adds the necessary
infrastructure to support indirect calls in target regions. This allows
one to pass in pointers to functions that have been declared as indirect
from the host to the target, then invoked via the passed-in pointer on
the target device.
This is done by processing the functions declared as indirect in a
similar way to regular kernels - they are added as a separate entry to
the offload tables which are embedded into the target code by mkoffload.
When the image is loaded, the host reads the target version of the
offload table, then combines it with the host version to produce an
address map. This map is then written to the device memory and a pointer
is set to point to it.
The omp_device_lower pass now runs if any indirect functions are
present. The pass searches for any indirect function calls, and runs a
new builtin BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR to process the
function pointer before making the indirect call.
The builtin (implemented by GOMP_target_map_indirect_ptr) searches
through the address map, returning the target address if found, or the
original address if not. I've added two search algorithms - a simple
linear search through the map, and another which builds up a splay tree
from the map and uses that to do the search. I've enabled the splay-tree
version by default, but the linear search is useful for debugging
purposes so I have kept it in.
The C++ support is currently limited to normal indirect calls - virtual
calls on objects do not currently work. I believe the main issue is that
the vtables are not currently copied across to the target. I have added
some handling for OBJ_TYPE_REF to prevent the compiler from ICEing when
it encounters a virtual call, but without the vtable this cannot work
properly.
Tested on a x86_64 host with offloading to NVPTX and AMD GCN, and
bootstrapped on a x86_64 host. Okay for mainline?
Thanks
Kwok
[-- Attachment #2: 0001-openmp-Add-support-for-the-indirect-clause-in-C-C.patch --]
[-- Type: text/plain, Size: 75111 bytes --]
From 46129c254990a9fff4b6d8512f04ad8fa7d61f0e Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <kcy@codesourcery.com>
Date: Sun, 8 Oct 2023 13:50:25 +0100
Subject: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
This adds support for the 'indirect' clause in the 'declare target'
directive. Functions declared as indirect may be called via function
pointers passed from the host in offloaded code.
Virtual calls to member functions via the object pointer in C++ are
currently not supported in target regions.
2023-10-08 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/c-family/
* c-attribs.cc (c_common_attribute_table): Add attribute for
indirect functions.
* c-pragma.h (enum parma_omp_clause): Add entry for indirect clause.
gcc/c/
* c-decl.cc (c_decl_attributes): Add attribute for indirect
functions.
* c-lang.h (c_omp_declare_target_attr): Add indirect field.
* c-parser.cc (c_parser_omp_clause_name): Handle indirect clause.
(c_parser_omp_clause_indirect): New.
(c_parser_omp_all_clauses): Handle indirect clause.
(OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
(c_parser_omp_declare_target): Handle indirect clause.
(OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
(c_parser_omp_begin): Handle indirect clause.
* c-typeck.cc (c_finish_omp_clauses): Handle indirect clause.
gcc/cp/
* cp-tree.h (cp_omp_declare_target_attr): Add indirect field.
* decl2.cc (cplus_decl_attributes): Add attribute for indirect
functions.
* parser.cc (cp_parser_omp_clause_name): Handle indirect clause.
(cp_parser_omp_clause_indirect): New.
(cp_parser_omp_all_clauses): Handle indirect clause.
(handle_omp_declare_target_clause): Add extra parameter. Add
indirect attribute for indirect functions.
(OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
(cp_parser_omp_declare_target): Handle indirect clause.
(OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
(cp_parser_omp_begin): Handle indirect clause.
* semantics.cc (finish_omp_clauses): Handle indirect clause.
gcc/
* lto-cgraph.cc (enum LTO_symtab_tags): Add tag for indirect
functions.
(output_offload_tables): Write indirect functions.
(input_offload_tables): read indirect functions.
* lto-section-names.h (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New.
* omp-builtins.def (BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR): New.
* omp-offload.cc (offload_ind_funcs): New.
(omp_discover_implicit_declare_target): Add functions marked with
'omp declare target indirect' to indirect functions list.
(omp_finish_file): Add indirect functions to section for offload
indirect functions.
(execute_omp_device_lower): Redirect indirect calls on target by
passing function pointer to BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR.
(pass_omp_device_lower::gate): Run pass_omp_device_lower if
indirect functions are present.
* omp-offload.h (offload_ind_funcs): New.
* tree-core.h (omp_clause_code): Add OMP_CLAUSE_INDIRECT.
* tree.cc (omp_clause_num_ops): Add entry for OMP_CLAUSE_INDIRECT.
(omp_clause_code_name): Likewise.
* tree.h (OMP_CLAUSE_INDIRECT_EXPR): New.
* config/gcn/mkoffload.cc (process_asm): Process offload_ind_funcs
section. Count number of indirect functions.
(process_obj): Emit number of indirect functions.
* config/nvptx/mkoffload.cc (ind_func_ids, ind_funcs_tail): New.
(process): Emit offload_ind_func_table in PTX code. Emit indirect
function names and count in image.
* config/nvptx/nvptx.cc (nvptx_record_offload_symbol): Mark
indirect functions in PTX code with IND_FUNC_MAP.
gcc/testsuite/
* c-c++-common/gomp/declare-target-indirect-1.c: New.
* c-c++-common/gomp/declare-target-indirect-2.c: New.
libgcc/
* offloadstuff.c (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New.
(__offload_ind_func_table): New.
(__offload_ind_funcs_end): New.
(__OFFLOAD_TABLE__): Add entries for indirect functions.
libgomp/
* Makefile.am (libgomp_la_SOURCES): Add target-indirect.c.
* Makefile.in: Regenerate.
* libgomp-plugin.h (GOMP_INDIRECT_ADDR_MAP): New define.
(GOMP_OFFLOAD_load_image): Add extra argument.
* libgomp.h (struct indirect_splay_tree_key_s): New.
(indirect_splay_tree_node, indirect_splay_tree,
indirect_splay_tree_key): New.
(indirect_splay_compare): New.
* libgomp.map (GOMP_5.1.1): Add GOMP_target_map_indirect_ptr.
* libgomp.texi (OpenMP 5.1): Update documentation on indirect
calls in target region and on indirect clause.
* libgomp_g.h (GOMP_target_map_indirect_ptr): Add prototype.
* oacc-host.c (host_load_image): Add extra argument.
* target.c (gomp_load_image_to_device): Pass host indirect
functions table to load_image_func.
* config/accel/target-indirect.c: New.
* config/linux/target-indirect.c: New.
* config/gcn/team.c (build_indirect_map): Add prototype.
(gomp_gcn_enter_kernel): Initialize support for indirect
function calls on GCN target.
* config/nvptx/team.c (build_indirect_map): Add prototype.
(gomp_nvptx_main): Initialize support for indirect function
calls on NVPTX target.
* plugin/plugin-gcn.c (struct gcn_image_desc): Add field for
indirect functions count.
(GOMP_OFFLOAD_load_image): Add extra argument. Build address
translation table and copy it to target memory.
* plugin/plugin-nvptx.c (nvptx_tdata): Add field for indirect
functions count.
(GOMP_OFFLOAD_load_image): Add extra argument. Build address
translation table and copy it to target memory.
* testsuite/libgomp.c-c++-common/declare-target-indirect-1.c: New.
* testsuite/libgomp.c-c++-common/declare-target-indirect-2.c: New.
---
gcc/c-family/c-attribs.cc | 2 +
gcc/c-family/c-pragma.h | 1 +
gcc/c/c-decl.cc | 8 ++
gcc/c/c-lang.h | 1 +
gcc/c/c-parser.cc | 89 +++++++++++--
gcc/c/c-typeck.cc | 1 +
gcc/config/gcn/mkoffload.cc | 29 +++-
gcc/config/nvptx/mkoffload.cc | 87 +++++++++++-
gcc/config/nvptx/nvptx.cc | 6 +-
gcc/cp/cp-tree.h | 1 +
gcc/cp/decl2.cc | 6 +
gcc/cp/parser.cc | 100 ++++++++++++--
gcc/cp/semantics.cc | 1 +
gcc/lto-cgraph.cc | 27 ++++
gcc/lto-section-names.h | 1 +
gcc/omp-builtins.def | 3 +
gcc/omp-offload.cc | 77 +++++++++--
gcc/omp-offload.h | 1 +
.../gomp/declare-target-indirect-1.c | 51 +++++++
.../gomp/declare-target-indirect-2.c | 20 +++
gcc/tree-core.h | 3 +
gcc/tree.cc | 2 +
gcc/tree.h | 4 +
libgcc/offloadstuff.c | 12 +-
libgomp/Makefile.am | 2 +-
libgomp/Makefile.in | 5 +-
libgomp/config/accel/target-indirect.c | 126 ++++++++++++++++++
libgomp/config/gcn/team.c | 4 +
libgomp/config/linux/target-indirect.c | 31 +++++
libgomp/config/nvptx/team.c | 5 +
libgomp/libgomp-plugin.h | 5 +-
libgomp/libgomp.h | 23 ++++
libgomp/libgomp.map | 1 +
libgomp/libgomp.texi | 4 +-
libgomp/libgomp_g.h | 1 +
libgomp/oacc-host.c | 3 +-
libgomp/plugin/plugin-gcn.c | 87 +++++++++++-
libgomp/plugin/plugin-nvptx.c | 62 ++++++++-
libgomp/target.c | 11 +-
.../declare-target-indirect-1.c | 21 +++
.../declare-target-indirect-2.c | 33 +++++
41 files changed, 909 insertions(+), 48 deletions(-)
create mode 100644 gcc/testsuite/c-c++-common/gomp/declare-target-indirect-1.c
create mode 100644 gcc/testsuite/c-c++-common/gomp/declare-target-indirect-2.c
create mode 100644 libgomp/config/accel/target-indirect.c
create mode 100644 libgomp/config/linux/target-indirect.c
create mode 100644 libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-1.c
create mode 100644 libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index dca7548b2c6..cb409216f69 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -518,6 +518,8 @@ const struct attribute_spec c_common_attribute_table[] =
handle_omp_declare_target_attribute, NULL },
{ "omp declare target implicit", 0, 0, true, false, false, false,
handle_omp_declare_target_attribute, NULL },
+ { "omp declare target indirect", 0, 0, true, false, false, false,
+ handle_omp_declare_target_attribute, NULL },
{ "omp declare target host", 0, 0, true, false, false, false,
handle_omp_declare_target_attribute, NULL },
{ "omp declare target nohost", 0, 0, true, false, false, false,
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 603c5151978..902a924360b 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -125,6 +125,7 @@ enum pragma_omp_clause {
PRAGMA_OMP_CLAUSE_IF,
PRAGMA_OMP_CLAUSE_IN_REDUCTION,
PRAGMA_OMP_CLAUSE_INBRANCH,
+ PRAGMA_OMP_CLAUSE_INDIRECT,
PRAGMA_OMP_CLAUSE_IS_DEVICE_PTR,
PRAGMA_OMP_CLAUSE_LASTPRIVATE,
PRAGMA_OMP_CLAUSE_LINEAR,
diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 5822faf01b4..f0bc1c65621 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -5331,6 +5331,14 @@ c_decl_attributes (tree *node, tree attributes, int flags)
attributes
= tree_cons (get_identifier ("omp declare target nohost"),
NULL_TREE, attributes);
+
+ int indirect
+ = current_omp_declare_target_attribute->last ().indirect;
+ if (indirect && !lookup_attribute ("omp declare target indirect",
+ attributes))
+ attributes
+ = tree_cons (get_identifier ("omp declare target indirect"),
+ NULL_TREE, attributes);
}
}
diff --git a/gcc/c/c-lang.h b/gcc/c/c-lang.h
index 4fea11855f1..cb13e34e80e 100644
--- a/gcc/c/c-lang.h
+++ b/gcc/c/c-lang.h
@@ -62,6 +62,7 @@ struct GTY(()) language_function {
struct GTY(()) c_omp_declare_target_attr {
int device_type;
+ int indirect;
};
/* If non-zero, implicit "omp declare target" attribute is added into the
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 0d468b86bd8..50120a10a5f 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -13975,6 +13975,8 @@ c_parser_omp_clause_name (c_parser *parser)
result = PRAGMA_OMP_CLAUSE_IN_REDUCTION;
else if (!strcmp ("inbranch", p))
result = PRAGMA_OMP_CLAUSE_INBRANCH;
+ else if (!strcmp ("indirect", p))
+ result = PRAGMA_OMP_CLAUSE_INDIRECT;
else if (!strcmp ("independent", p))
result = PRAGMA_OACC_CLAUSE_INDEPENDENT;
else if (!strcmp ("is_device_ptr", p))
@@ -14835,6 +14837,47 @@ c_parser_omp_clause_final (c_parser *parser, tree list)
return list;
}
+/* OpenMP 5.1:
+ indirect [( expression )]
+*/
+
+static tree
+c_parser_omp_clause_indirect (c_parser *parser, tree list)
+{
+ location_t location = c_parser_peek_token (parser)->location;
+ tree t;
+
+ if (c_parser_peek_token (parser)->type == CPP_OPEN_PAREN)
+ {
+ matching_parens parens;
+ if (!parens.require_open (parser))
+ return list;
+
+ location_t loc = c_parser_peek_token (parser)->location;
+ c_expr expr = c_parser_expr_no_commas (parser, NULL);
+ expr = convert_lvalue_to_rvalue (loc, expr, true, true);
+ t = c_objc_common_truthvalue_conversion (loc, expr.value);
+ t = c_fully_fold (t, false, NULL);
+ if (!INTEGRAL_TYPE_P (TREE_TYPE (t))
+ || TREE_CODE (t) != INTEGER_CST)
+ {
+ c_parser_error (parser, "expected constant integer expression");
+ return list;
+ }
+ parens.skip_until_found_close (parser);
+ }
+ else
+ t = integer_one_node;
+
+ check_no_duplicate_clause (list, OMP_CLAUSE_INDIRECT, "indirect");
+
+ tree c = build_omp_clause (location, OMP_CLAUSE_INDIRECT);
+ OMP_CLAUSE_INDIRECT_EXPR (c) = t;
+ OMP_CLAUSE_CHAIN (c) = list;
+
+ return c;
+}
+
/* OpenACC, OpenMP 2.5:
if ( expression )
@@ -18352,6 +18395,10 @@ c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask,
true, clauses);
c_name = "in_reduction";
break;
+ case PRAGMA_OMP_CLAUSE_INDIRECT:
+ clauses = c_parser_omp_clause_indirect (parser, clauses);
+ c_name = "indirect";
+ break;
case PRAGMA_OMP_CLAUSE_LASTPRIVATE:
clauses = c_parser_omp_clause_lastprivate (parser, clauses);
c_name = "lastprivate";
@@ -23748,13 +23795,15 @@ c_finish_omp_declare_simd (c_parser *parser, tree fndecl, tree parms,
( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_TO) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_ENTER) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_LINK) \
- | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEVICE_TYPE))
+ | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEVICE_TYPE) \
+ | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_INDIRECT))
static void
c_parser_omp_declare_target (c_parser *parser)
{
tree clauses = NULL_TREE;
int device_type = 0;
+ bool indirect = false;
bool only_device_type = true;
if (c_parser_next_token_is (parser, CPP_NAME)
|| (c_parser_next_token_is (parser, CPP_COMMA)
@@ -23771,16 +23820,21 @@ c_parser_omp_declare_target (c_parser *parser)
else
{
c_parser_skip_to_pragma_eol (parser);
- c_omp_declare_target_attr attr = { -1 };
+ c_omp_declare_target_attr attr = { -1, 0 };
vec_safe_push (current_omp_declare_target_attribute, attr);
return;
}
- for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
- if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE)
- device_type |= OMP_CLAUSE_DEVICE_TYPE_KIND (c);
for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
{
if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE)
+ device_type |= OMP_CLAUSE_DEVICE_TYPE_KIND (c);
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_INDIRECT)
+ indirect |= !integer_zerop (OMP_CLAUSE_INDIRECT_EXPR (c));
+ }
+ for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+ {
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE
+ || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_INDIRECT)
continue;
tree t = OMP_CLAUSE_DECL (c), id;
tree at1 = lookup_attribute ("omp declare target", DECL_ATTRIBUTES (t));
@@ -23848,6 +23902,17 @@ c_parser_omp_declare_target (c_parser *parser)
= tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
}
}
+ if (indirect)
+ {
+ tree at4 = lookup_attribute ("omp declare target indirect",
+ DECL_ATTRIBUTES (t));
+ if (at4 == NULL_TREE)
+ {
+ id = get_identifier ("omp declare target indirect");
+ DECL_ATTRIBUTES (t)
+ = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+ }
+ }
}
if (device_type && only_device_type)
error_at (OMP_CLAUSE_LOCATION (clauses),
@@ -23860,7 +23925,8 @@ c_parser_omp_declare_target (c_parser *parser)
#pragma omp begin declare target clauses[optseq] new-line */
#define OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK \
- (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEVICE_TYPE)
+ ( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEVICE_TYPE) \
+ | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_INDIRECT))
static void
c_parser_omp_begin (c_parser *parser)
@@ -23883,10 +23949,15 @@ c_parser_omp_begin (c_parser *parser)
OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK,
"#pragma omp begin declare target");
int device_type = 0;
+ int indirect = 0;
for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
- if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE)
- device_type |= OMP_CLAUSE_DEVICE_TYPE_KIND (c);
- c_omp_declare_target_attr attr = { device_type };
+ {
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE)
+ device_type |= OMP_CLAUSE_DEVICE_TYPE_KIND (c);
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_INDIRECT)
+ indirect |= !integer_zerop (OMP_CLAUSE_INDIRECT_EXPR (c));
+ }
+ c_omp_declare_target_attr attr = { device_type, indirect };
vec_safe_push (current_omp_declare_target_attribute, attr);
}
else
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index e55e887da14..ed889dd353a 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -15880,6 +15880,7 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
case OMP_CLAUSE_IF_PRESENT:
case OMP_CLAUSE_FINALIZE:
case OMP_CLAUSE_NOHOST:
+ case OMP_CLAUSE_INDIRECT:
pc = &OMP_CLAUSE_CHAIN (c);
continue;
diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 8b608bf024e..6a109c3a926 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -477,7 +477,8 @@ copy_early_debug_info (const char *infile, const char *outfile)
static void
process_asm (FILE *in, FILE *out, FILE *cfile)
{
- int fn_count = 0, var_count = 0, dims_count = 0, regcount_count = 0;
+ int fn_count = 0, var_count = 0, ind_fn_count = 0;
+ int dims_count = 0, regcount_count = 0;
struct obstack fns_os, dims_os, regcounts_os;
obstack_init (&fns_os);
obstack_init (&dims_os);
@@ -506,7 +507,8 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
{ IN_CODE,
IN_METADATA,
IN_VARS,
- IN_FUNCS
+ IN_FUNCS,
+ IN_IND_FUNCS,
} state = IN_CODE;
while (fgets (buf, sizeof (buf), in))
{
@@ -568,6 +570,17 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
}
break;
}
+ case IN_IND_FUNCS:
+ {
+ char *funcname;
+ if (sscanf (buf, "\t.8byte\t%ms\n", &funcname))
+ {
+ fputs (buf, out);
+ ind_fn_count++;
+ continue;
+ }
+ break;
+ }
}
char dummy;
@@ -595,6 +608,15 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
".offload_func_table:\n",
out);
}
+ else if (sscanf (buf, " .section .gnu.offload_ind_funcs%c", &dummy) > 0)
+ {
+ state = IN_IND_FUNCS;
+ fputs (buf, out);
+ fputs ("\t.global .offload_ind_func_table\n"
+ "\t.type .offload_ind_func_table, @object\n"
+ ".offload_ind_func_table:\n",
+ out);
+ }
else if (sscanf (buf, " .amdgpu_metadata%c", &dummy) > 0)
{
state = IN_METADATA;
@@ -632,6 +654,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
fprintf (cfile, "#include <stdbool.h>\n\n");
fprintf (cfile, "static const int gcn_num_vars = %d;\n\n", var_count);
+ fprintf (cfile, "static const int gcn_num_ind_funcs = %d;\n\n", ind_fn_count);
/* Dump out function idents. */
fprintf (cfile, "static const struct hsa_kernel_description {\n"
@@ -726,12 +749,14 @@ process_obj (FILE *in, FILE *cfile, uint32_t omp_requires)
" const struct gcn_image *gcn_image;\n"
" unsigned kernel_count;\n"
" const struct hsa_kernel_description *kernel_infos;\n"
+ " unsigned ind_func_count;\n"
" unsigned global_variable_count;\n"
"} gcn_data = {\n"
" %d,\n"
" &gcn_image,\n"
" sizeof (gcn_kernels) / sizeof (gcn_kernels[0]),\n"
" gcn_kernels,\n"
+ " gcn_num_ind_funcs,\n"
" gcn_num_vars\n"
"};\n\n", omp_requires);
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index aaea9fb320d..fb75ca090df 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -51,6 +51,7 @@ struct id_map
};
static id_map *func_ids, **funcs_tail = &func_ids;
+static id_map *ind_func_ids, **ind_funcs_tail = &ind_func_ids;
static id_map *var_ids, **vars_tail = &var_ids;
/* Files to unlink. */
@@ -302,6 +303,11 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
output_fn_ptr = true;
record_id (input + i + 9, &funcs_tail);
}
+ else if (startswith (input + i, "IND_FUNC_MAP "))
+ {
+ output_fn_ptr = true;
+ record_id (input + i + 13, &ind_funcs_tail);
+ }
else
abort ();
/* Skip to next line. */
@@ -422,6 +428,77 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
fprintf (out, "};\\n\";\n\n");
}
+ if (ind_func_ids)
+ {
+ const char needle[] = "// BEGIN GLOBAL FUNCTION DECL: ";
+
+ fprintf (out, "static const char ptx_code_%u[] =\n", obj_count++);
+ fprintf (out, "\t\".version ");
+ for (size_t i = 0; version[i] != '\0' && version[i] != '\n'; i++)
+ fputc (version[i], out);
+ fprintf (out, "\"\n\t\".target sm_");
+ for (size_t i = 0; sm_ver[i] != '\0' && sm_ver[i] != '\n'; i++)
+ fputc (sm_ver[i], out);
+ fprintf (out, "\"\n\t\".file 2 \\\"<dummy>\\\"\"\n");
+
+ /* WORKAROUND - see PR 108098
+ It seems as if older CUDA JIT compiler optimizes the function pointers
+ in offload_func_table to NULL, which can be prevented by adding a
+ dummy procedure. With CUDA 11.1, it seems to work fine without
+ workaround while CUDA 10.2 as some ancient version have need the
+ workaround. Assuming CUDA 11.0 fixes it, emitting it could be
+ restricted to 'if (sm_ver2[0] < 8 && version2[0] < 7)' as sm_80 and
+ PTX ISA 7.0 are new in CUDA 11.0; for 11.1 it would be sm_86 and
+ PTX ISA 7.1. */
+ fprintf (out, "\n\t\".func __dummy$func2 ( );\"\n");
+ fprintf (out, "\t\".func __dummy$func2 ( )\"\n");
+ fprintf (out, "\t\"{\"\n");
+ fprintf (out, "\t\"}\"\n");
+
+ size_t fidx = 0;
+ for (id = ind_func_ids; id; id = id->next)
+ {
+ fprintf (out, "\t\".extern ");
+ const char *p = input + file_idx[fidx];
+ while (true)
+ {
+ p = strstr (p, needle);
+ if (!p)
+ {
+ fidx++;
+ if (fidx >= file_cnt)
+ break;
+ p = input + file_idx[fidx];
+ continue;
+ }
+ p += strlen (needle);
+ if (!startswith (p, id->ptx_name))
+ continue;
+ p += strlen (id->ptx_name);
+ if (*p != '\n')
+ continue;
+ p++;
+ /* Skip over any directives. */
+ while (!startswith (p, ".func"))
+ while (*p++ != ' ');
+ for (; *p != '\0' && *p != '\n'; p++)
+ fputc (*p, out);
+ break;
+ }
+ fprintf (out, "\"\n");
+ if (fidx == file_cnt)
+ fatal_error (input_location,
+ "Cannot find function declaration for %qs",
+ id->ptx_name);
+ }
+
+ fprintf (out, "\t\".visible .global .align 8 .u64 "
+ "$offload_ind_func_table[] = {");
+ for (comma = "", id = ind_func_ids; id; comma = ",", id = id->next)
+ fprintf (out, "%s\"\n\t\t\"%s", comma, id->ptx_name);
+ fprintf (out, "};\\n\";\n\n");
+ }
+
/* Dump out array of pointers to ptx object strings. */
fprintf (out, "static const struct ptx_obj {\n"
" const char *code;\n"
@@ -447,6 +524,12 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
id->dim ? id->dim : "");
fprintf (out, "\n};\n\n");
+ /* Dump out indirect function idents. */
+ fprintf (out, "static const char *const ind_func_mappings[] = {");
+ for (comma = "", id = ind_func_ids; id; comma = ",", id = id->next)
+ fprintf (out, "%s\n\t\"%s\"", comma, id->ptx_name);
+ fprintf (out, "\n};\n\n");
+
fprintf (out,
"static const struct nvptx_data {\n"
" uintptr_t omp_requires_mask;\n"
@@ -456,12 +539,14 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
" unsigned var_num;\n"
" const struct nvptx_fn *fn_names;\n"
" unsigned fn_num;\n"
+ " unsigned ind_fn_num;\n"
"} nvptx_data = {\n"
" %d, ptx_objs, sizeof (ptx_objs) / sizeof (ptx_objs[0]),\n"
" var_mappings,"
" sizeof (var_mappings) / sizeof (var_mappings[0]),\n"
" func_mappings,"
- " sizeof (func_mappings) / sizeof (func_mappings[0])\n"
+ " sizeof (func_mappings) / sizeof (func_mappings[0]),\n"
+ " sizeof (ind_func_mappings) / sizeof (ind_func_mappings[0])\n"
"};\n\n", omp_requires);
fprintf (out, "#ifdef __cplusplus\n"
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index edef39fb5e1..eaff812a520 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -5918,7 +5918,11 @@ nvptx_record_offload_symbol (tree decl)
/* OpenMP offloading does not set this attribute. */
tree dims = attr ? TREE_VALUE (attr) : NULL_TREE;
- fprintf (asm_out_file, "//:FUNC_MAP \"%s\"",
+ fprintf (asm_out_file, "//:");
+ if (lookup_attribute ("omp declare target indirect",
+ DECL_ATTRIBUTES (decl)))
+ fprintf (asm_out_file, "IND_");
+ fprintf (asm_out_file, "FUNC_MAP \"%s\"",
IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
for (; dims; dims = TREE_CHAIN (dims))
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 6e34952da99..8d0ad38c1ab 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -1831,6 +1831,7 @@ union GTY((desc ("cp_tree_node_structure (&%h)"),
struct GTY(()) cp_omp_declare_target_attr {
bool attr_syntax;
int device_type;
+ bool indirect;
};
struct GTY(()) cp_omp_begin_assumes_data {
diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 344e19ec98b..f89d32950e9 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -1768,6 +1768,12 @@ cplus_decl_attributes (tree *decl, tree attributes, int flags)
attributes
= tree_cons (get_identifier ("omp declare target nohost"),
NULL_TREE, attributes);
+ if (last.indirect
+ && !lookup_attribute ("omp declare target indirect",
+ attributes))
+ attributes
+ = tree_cons (get_identifier ("omp declare target indirect"),
+ NULL_TREE, attributes);
}
}
}
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index f3abae716fe..8ca155a29f8 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -37516,6 +37516,8 @@ cp_parser_omp_clause_name (cp_parser *parser)
result = PRAGMA_OMP_CLAUSE_IN_REDUCTION;
else if (!strcmp ("inbranch", p))
result = PRAGMA_OMP_CLAUSE_INBRANCH;
+ else if (!strcmp ("indirect", p))
+ result = PRAGMA_OMP_CLAUSE_INDIRECT;
else if (!strcmp ("independent", p))
result = PRAGMA_OACC_CLAUSE_INDEPENDENT;
else if (!strcmp ("is_device_ptr", p))
@@ -38548,6 +38550,50 @@ cp_parser_omp_clause_final (cp_parser *parser, tree list, location_t location)
return c;
}
+/* OpenMP 5.1:
+ indirect [( expression )]
+*/
+
+static tree
+cp_parser_omp_clause_indirect (cp_parser *parser, tree list,
+ location_t location)
+{
+ tree t;
+
+ if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN))
+ {
+ matching_parens parens;
+ if (!parens.require_open (parser))
+ return list;
+
+ t = cp_parser_assignment_expression (parser);
+
+ if (t != error_mark_node)
+ {
+ t = fold_non_dependent_expr (t);
+ if (!value_dependent_expression_p (t)
+ && (!INTEGRAL_TYPE_P (TREE_TYPE (t))
+ || !tree_fits_shwi_p (t)))
+ error_at (location, "expected constant integer expression");
+ }
+ if (t == error_mark_node
+ || !parens.require_close (parser))
+ cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
+ /*or_comma=*/false,
+ /*consume_paren=*/true);
+ }
+ else
+ t = integer_one_node;
+
+ check_no_duplicate_clause (list, OMP_CLAUSE_INDIRECT, "indirect", location);
+
+ tree c = build_omp_clause (location, OMP_CLAUSE_INDIRECT);
+ OMP_CLAUSE_INDIRECT_EXPR (c) = t;
+ OMP_CLAUSE_CHAIN (c) = list;
+
+ return c;
+}
+
/* OpenMP 2.5:
if ( expression )
@@ -41572,6 +41618,11 @@ cp_parser_omp_all_clauses (cp_parser *parser, omp_clause_mask mask,
true, clauses);
c_name = "in_reduction";
break;
+ case PRAGMA_OMP_CLAUSE_INDIRECT:
+ clauses = cp_parser_omp_clause_indirect (parser, clauses,
+ token->location);
+ c_name = "indirect";
+ break;
case PRAGMA_OMP_CLAUSE_LASTPRIVATE:
clauses = cp_parser_omp_clause_lastprivate (parser, clauses);
c_name = "lastprivate";
@@ -48109,7 +48160,8 @@ cp_maybe_parse_omp_decl (tree decl, tree d)
on #pragma omp declare target. Return false if errors were reported. */
static bool
-handle_omp_declare_target_clause (tree c, tree t, int device_type)
+handle_omp_declare_target_clause (tree c, tree t, int device_type,
+ bool indirect)
{
tree at1 = lookup_attribute ("omp declare target", DECL_ATTRIBUTES (t));
tree at2 = lookup_attribute ("omp declare target link", DECL_ATTRIBUTES (t));
@@ -48173,6 +48225,17 @@ handle_omp_declare_target_clause (tree c, tree t, int device_type)
DECL_ATTRIBUTES (t) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
}
}
+ if (indirect)
+ {
+ tree at4 = lookup_attribute ("omp declare target indirect",
+ DECL_ATTRIBUTES (t));
+ if (at4 == NULL_TREE)
+ {
+ id = get_identifier ("omp declare target indirect");
+ DECL_ATTRIBUTES (t)
+ = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+ }
+ }
return true;
}
@@ -48190,13 +48253,15 @@ handle_omp_declare_target_clause (tree c, tree t, int device_type)
( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_TO) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_ENTER) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_LINK) \
- | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEVICE_TYPE))
+ | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEVICE_TYPE) \
+ | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_INDIRECT))
static void
cp_parser_omp_declare_target (cp_parser *parser, cp_token *pragma_tok)
{
tree clauses = NULL_TREE;
int device_type = 0;
+ bool indirect = false;
bool only_device_type = true;
if (cp_lexer_next_token_is (parser->lexer, CPP_NAME)
|| (cp_lexer_next_token_is (parser->lexer, CPP_COMMA)
@@ -48215,21 +48280,26 @@ cp_parser_omp_declare_target (cp_parser *parser, cp_token *pragma_tok)
else
{
cp_omp_declare_target_attr a
- = { parser->lexer->in_omp_attribute_pragma, -1 };
+ = { parser->lexer->in_omp_attribute_pragma, -1, false };
vec_safe_push (scope_chain->omp_declare_target_attribute, a);
cp_parser_require_pragma_eol (parser, pragma_tok);
return;
}
- for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
- if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE)
- device_type |= OMP_CLAUSE_DEVICE_TYPE_KIND (c);
for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
{
if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE)
+ device_type |= OMP_CLAUSE_DEVICE_TYPE_KIND (c);
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_INDIRECT)
+ indirect |= !integer_zerop (OMP_CLAUSE_INDIRECT_EXPR (c));
+ }
+ for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+ {
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE
+ || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_INDIRECT)
continue;
tree t = OMP_CLAUSE_DECL (c);
only_device_type = false;
- if (!handle_omp_declare_target_clause (c, t, device_type))
+ if (!handle_omp_declare_target_clause (c, t, device_type, indirect))
continue;
if (VAR_OR_FUNCTION_DECL_P (t)
&& DECL_LOCAL_DECL_P (t)
@@ -48237,7 +48307,7 @@ cp_parser_omp_declare_target (cp_parser *parser, cp_token *pragma_tok)
&& DECL_LOCAL_DECL_ALIAS (t)
&& DECL_LOCAL_DECL_ALIAS (t) != error_mark_node)
handle_omp_declare_target_clause (c, DECL_LOCAL_DECL_ALIAS (t),
- device_type);
+ device_type, indirect);
}
if (device_type && only_device_type)
error_at (OMP_CLAUSE_LOCATION (clauses),
@@ -48250,7 +48320,8 @@ cp_parser_omp_declare_target (cp_parser *parser, cp_token *pragma_tok)
# pragma omp begin declare target clauses[optseq] new-line */
#define OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK \
- (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEVICE_TYPE)
+ ( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEVICE_TYPE) \
+ | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_INDIRECT))
static void
cp_parser_omp_begin (cp_parser *parser, cp_token *pragma_tok)
@@ -48280,11 +48351,16 @@ cp_parser_omp_begin (cp_parser *parser, cp_token *pragma_tok)
"#pragma omp begin declare target",
pragma_tok);
int device_type = 0;
+ bool indirect = 0;
for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
- if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE)
- device_type |= OMP_CLAUSE_DEVICE_TYPE_KIND (c);
+ {
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE)
+ device_type |= OMP_CLAUSE_DEVICE_TYPE_KIND (c);
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_INDIRECT)
+ indirect |= !integer_zerop (OMP_CLAUSE_INDIRECT_EXPR (c));
+ }
cp_omp_declare_target_attr a
- = { in_omp_attribute_pragma, device_type };
+ = { in_omp_attribute_pragma, device_type, indirect };
vec_safe_push (scope_chain->omp_declare_target_attribute, a);
}
else
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 80ef1364e33..5fd7ccb187e 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -8830,6 +8830,7 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
case OMP_CLAUSE_IF_PRESENT:
case OMP_CLAUSE_FINALIZE:
case OMP_CLAUSE_NOHOST:
+ case OMP_CLAUSE_INDIRECT:
break;
case OMP_CLAUSE_MERGEABLE:
diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
index 32c0f5ac6db..db6a22a444e 100644
--- a/gcc/lto-cgraph.cc
+++ b/gcc/lto-cgraph.cc
@@ -68,6 +68,7 @@ enum LTO_symtab_tags
LTO_symtab_edge,
LTO_symtab_indirect_edge,
LTO_symtab_variable,
+ LTO_symtab_indirect_function,
LTO_symtab_last_tag
};
@@ -1111,6 +1112,18 @@ output_offload_tables (void)
(*offload_vars)[i]);
}
+ for (unsigned i = 0; i < vec_safe_length (offload_ind_funcs); i++)
+ {
+ symtab_node *node = symtab_node::get ((*offload_ind_funcs)[i]);
+ if (!node)
+ continue;
+ node->force_output = true;
+ streamer_write_enum (ob->main_stream, LTO_symtab_tags,
+ LTO_symtab_last_tag, LTO_symtab_indirect_function);
+ lto_output_fn_decl_ref (ob->decl_state, ob->main_stream,
+ (*offload_ind_funcs)[i]);
+ }
+
if (output_requires)
{
HOST_WIDE_INT val = ((HOST_WIDE_INT) omp_requires_mask
@@ -1134,6 +1147,7 @@ output_offload_tables (void)
{
vec_free (offload_funcs);
vec_free (offload_vars);
+ vec_free (offload_ind_funcs);
}
}
@@ -1863,6 +1877,19 @@ input_offload_tables (bool do_force_output)
varpool_node::get (var_decl)->force_output = 1;
tmp_decl = var_decl;
}
+ else if (tag == LTO_symtab_indirect_function)
+ {
+ tree fn_decl
+ = lto_input_fn_decl_ref (ib, file_data);
+ vec_safe_push (offload_ind_funcs, fn_decl);
+
+ /* Prevent IPA from removing fn_decl as unreachable, since there
+ may be no refs from the parent function to child_fn in offload
+ LTO mode. */
+ if (do_force_output)
+ cgraph_node::get (fn_decl)->mark_force_output ();
+ tmp_decl = fn_decl;
+ }
else if (tag == LTO_symtab_edge)
{
static bool error_emitted = false;
diff --git a/gcc/lto-section-names.h b/gcc/lto-section-names.h
index aa1b2f2eeff..f7ed622772f 100644
--- a/gcc/lto-section-names.h
+++ b/gcc/lto-section-names.h
@@ -37,5 +37,6 @@ extern const char *section_name_prefix;
#define OFFLOAD_VAR_TABLE_SECTION_NAME ".gnu.offload_vars"
#define OFFLOAD_FUNC_TABLE_SECTION_NAME ".gnu.offload_funcs"
+#define OFFLOAD_IND_FUNC_TABLE_SECTION_NAME ".gnu.offload_ind_funcs"
#endif /* GCC_LTO_SECTION_NAMES_H */
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index e0f03263db0..ed78d49d205 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -445,6 +445,9 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_UPDATE, "GOMP_target_update_ext",
DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_ENTER_EXIT_DATA,
"GOMP_target_enter_exit_data",
BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_UINT_PTR, ATTR_NOTHROW_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR,
+ "GOMP_target_map_indirect_ptr",
+ BT_FN_PTR_PTR, ATTR_NOTHROW_LIST)
DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TEAMS4, "GOMP_teams4",
BT_FN_BOOL_UINT_UINT_UINT_BOOL, ATTR_NOTHROW_LIST)
DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TEAMS_REG, "GOMP_teams_reg",
diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc
index 0d3c8794d54..3f14595b5b4 100644
--- a/gcc/omp-offload.cc
+++ b/gcc/omp-offload.cc
@@ -86,7 +86,7 @@ struct oacc_loop
};
/* Holds offload tables with decls. */
-vec<tree, va_gc> *offload_funcs, *offload_vars;
+vec<tree, va_gc> *offload_funcs, *offload_vars, *offload_ind_funcs;
/* Return level at which oacc routine may spawn a partitioned loop, or
-1 if it is not a routine (i.e. is an offload fn). */
@@ -351,6 +351,9 @@ omp_discover_implicit_declare_target (void)
if (DECL_SAVED_TREE (node->decl))
{
struct cgraph_node *cgn;
+ if (lookup_attribute ("omp declare target indirect",
+ DECL_ATTRIBUTES (node->decl)))
+ vec_safe_push (offload_ind_funcs, node->decl);
if (omp_declare_target_fn_p (node->decl))
worklist.safe_push (node->decl);
else if (DECL_STRUCT_FUNCTION (node->decl)
@@ -397,49 +400,66 @@ omp_finish_file (void)
{
unsigned num_funcs = vec_safe_length (offload_funcs);
unsigned num_vars = vec_safe_length (offload_vars);
+ unsigned num_ind_funcs = vec_safe_length (offload_ind_funcs);
- if (num_funcs == 0 && num_vars == 0)
+ if (num_funcs == 0 && num_vars == 0 && num_ind_funcs == 0)
return;
if (targetm_common.have_named_sections)
{
- vec<constructor_elt, va_gc> *v_f, *v_v;
+ vec<constructor_elt, va_gc> *v_f, *v_v, *v_if;
vec_alloc (v_f, num_funcs);
vec_alloc (v_v, num_vars * 2);
+ vec_alloc (v_if, num_ind_funcs);
add_decls_addresses_to_decl_constructor (offload_funcs, v_f);
add_decls_addresses_to_decl_constructor (offload_vars, v_v);
+ add_decls_addresses_to_decl_constructor (offload_ind_funcs, v_if);
tree vars_decl_type = build_array_type_nelts (pointer_sized_int_node,
vec_safe_length (v_v));
tree funcs_decl_type = build_array_type_nelts (pointer_sized_int_node,
num_funcs);
+ tree ind_funcs_decl_type = build_array_type_nelts (pointer_sized_int_node,
+ num_ind_funcs);
+
SET_TYPE_ALIGN (vars_decl_type, TYPE_ALIGN (pointer_sized_int_node));
SET_TYPE_ALIGN (funcs_decl_type, TYPE_ALIGN (pointer_sized_int_node));
+ SET_TYPE_ALIGN (ind_funcs_decl_type, TYPE_ALIGN (pointer_sized_int_node));
tree ctor_v = build_constructor (vars_decl_type, v_v);
tree ctor_f = build_constructor (funcs_decl_type, v_f);
- TREE_CONSTANT (ctor_v) = TREE_CONSTANT (ctor_f) = 1;
- TREE_STATIC (ctor_v) = TREE_STATIC (ctor_f) = 1;
+ tree ctor_if = build_constructor (ind_funcs_decl_type, v_if);
+ TREE_CONSTANT (ctor_v) = TREE_CONSTANT (ctor_f) = TREE_CONSTANT (ctor_if) = 1;
+ TREE_STATIC (ctor_v) = TREE_STATIC (ctor_f) = TREE_STATIC (ctor_if) = 1;
tree funcs_decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
get_identifier (".offload_func_table"),
funcs_decl_type);
tree vars_decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
get_identifier (".offload_var_table"),
vars_decl_type);
- TREE_STATIC (funcs_decl) = TREE_STATIC (vars_decl) = 1;
+ tree ind_funcs_decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
+ get_identifier (".offload_ind_func_table"),
+ ind_funcs_decl_type);
+ TREE_STATIC (funcs_decl) = TREE_STATIC (ind_funcs_decl) = 1;
+ TREE_STATIC (vars_decl) = 1;
/* Do not align tables more than TYPE_ALIGN (pointer_sized_int_node),
otherwise a joint table in a binary will contain padding between
tables from multiple object files. */
- DECL_USER_ALIGN (funcs_decl) = DECL_USER_ALIGN (vars_decl) = 1;
+ DECL_USER_ALIGN (funcs_decl) = DECL_USER_ALIGN (ind_funcs_decl) = 1;
+ DECL_USER_ALIGN (vars_decl) = 1;
SET_DECL_ALIGN (funcs_decl, TYPE_ALIGN (funcs_decl_type));
SET_DECL_ALIGN (vars_decl, TYPE_ALIGN (vars_decl_type));
+ SET_DECL_ALIGN (ind_funcs_decl, TYPE_ALIGN (ind_funcs_decl_type));
DECL_INITIAL (funcs_decl) = ctor_f;
DECL_INITIAL (vars_decl) = ctor_v;
+ DECL_INITIAL (ind_funcs_decl) = ctor_if;
set_decl_section_name (funcs_decl, OFFLOAD_FUNC_TABLE_SECTION_NAME);
set_decl_section_name (vars_decl, OFFLOAD_VAR_TABLE_SECTION_NAME);
-
+ set_decl_section_name (ind_funcs_decl,
+ OFFLOAD_IND_FUNC_TABLE_SECTION_NAME);
varpool_node::finalize_decl (vars_decl);
varpool_node::finalize_decl (funcs_decl);
+ varpool_node::finalize_decl (ind_funcs_decl);
}
else
{
@@ -471,6 +491,15 @@ omp_finish_file (void)
#endif
targetm.record_offload_symbol (it);
}
+ for (unsigned i = 0; i < num_ind_funcs; i++)
+ {
+ tree it = (*offload_ind_funcs)[i];
+ /* See also add_decls_addresses_to_decl_constructor
+ and output_offload_tables in lto-cgraph.cc. */
+ if (!in_lto_p && !symtab_node::get (it))
+ continue;
+ targetm.record_offload_symbol (it);
+ }
}
}
@@ -2603,6 +2632,8 @@ execute_omp_device_lower ()
gimple_stmt_iterator gsi;
bool calls_declare_variant_alt
= cgraph_node::get (cfun->decl)->calls_declare_variant_alt;
+ bool omp_redirect_indirect_calls
+ = vec_safe_length (offload_ind_funcs) > 0;
FOR_EACH_BB_FN (bb, cfun)
for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
{
@@ -2621,6 +2652,33 @@ execute_omp_device_lower ()
update_stmt (stmt);
}
}
+ if (omp_redirect_indirect_calls
+ && gimple_call_fndecl (stmt) == NULL_TREE)
+ {
+ gcall *orig_call = dyn_cast <gcall *> (stmt);
+ tree call_fn = gimple_call_fn (stmt);
+ tree map_ptr_fn
+ = builtin_decl_explicit (BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR);
+ tree fn_ty = TREE_TYPE (call_fn);
+
+ if (TREE_CODE (call_fn) == OBJ_TYPE_REF)
+ {
+ tree obj_ref = create_tmp_reg (TREE_TYPE (call_fn),
+ ".ind_fn_objref");
+ gimple *gassign = gimple_build_assign (obj_ref, call_fn);
+ gsi_insert_before (&gsi, gassign, GSI_SAME_STMT);
+ call_fn = obj_ref;
+ }
+ tree mapped_fn = create_tmp_reg (fn_ty, ".ind_fn");
+ gimple *gcall =
+ gimple_build_call (map_ptr_fn, 1, call_fn);
+ gimple_set_location (gcall, gimple_location (stmt));
+ gimple_call_set_lhs (gcall, mapped_fn);
+ gsi_insert_before (&gsi, gcall, GSI_SAME_STMT);
+
+ gimple_call_set_fn (orig_call, mapped_fn);
+ update_stmt (orig_call);
+ }
continue;
}
tree lhs = gimple_call_lhs (stmt), rhs = NULL_TREE;
@@ -2759,7 +2817,8 @@ public:
{
return (!(fun->curr_properties & PROP_gimple_lomp_dev)
|| (flag_openmp
- && cgraph_node::get (fun->decl)->calls_declare_variant_alt));
+ && (cgraph_node::get (fun->decl)->calls_declare_variant_alt
+ || vec_safe_length (offload_ind_funcs) > 0)));
}
unsigned int execute (function *) final override
{
diff --git a/gcc/omp-offload.h b/gcc/omp-offload.h
index 73711e74c7d..ae364422417 100644
--- a/gcc/omp-offload.h
+++ b/gcc/omp-offload.h
@@ -28,6 +28,7 @@ extern int oacc_fn_attrib_level (tree attr);
extern GTY(()) vec<tree, va_gc> *offload_funcs;
extern GTY(()) vec<tree, va_gc> *offload_vars;
+extern GTY(()) vec<tree, va_gc> *offload_ind_funcs;
extern void omp_finish_file (void);
extern void omp_discover_implicit_declare_target (void);
diff --git a/gcc/testsuite/c-c++-common/gomp/declare-target-indirect-1.c b/gcc/testsuite/c-c++-common/gomp/declare-target-indirect-1.c
new file mode 100644
index 00000000000..45e3ee598da
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/declare-target-indirect-1.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-fopenmp" } */
+
+extern int a, b;
+#define X 1
+#define Y 0
+
+#pragma omp begin declare target indirect
+void fn1 (void) { }
+#pragma omp end declare target
+
+#pragma omp begin declare target indirect (1)
+void fn2 (void) { }
+#pragma omp end declare target
+
+#pragma omp begin declare target indirect (0)
+void fn3 (void) { }
+#pragma omp end declare target
+
+void fn4 (void) { }
+#pragma omp declare target indirect to (fn4)
+
+void fn5 (void) { }
+#pragma omp declare target indirect (1) to (fn5)
+
+void fn6 (void) { }
+#pragma omp declare target indirect (0) to (fn6)
+
+void fn7 (void) { }
+#pragma omp declare target indirect (-1) to (fn7)
+
+/* Compile-time non-constant expressions are not allowed. */
+void fn8 (void) { }
+#pragma omp declare target indirect (a + b) to (fn8) /* { dg-error "expected constant integer expression" } */
+
+/* Compile-time constant expressions are permissible. */
+void fn9 (void) { }
+#pragma omp declare target indirect (X*Y) to (fn9)
+
+/* 'omp declare target'...'omp end declare target' form cannot take
+ clauses. */
+#pragma omp declare target indirect
+void fn10 (void) { }
+#pragma omp end declare target /* { dg-error "'#pragma omp end declare target' without corresponding '#pragma omp declare target' or '#pragma omp begin declare target'" } */
+
+void fn11 (void) { }
+#pragma omp declare target indirect (1) indirect (0) to (fn11) /* { dg-error "too many .indirect. clauses" } */
+
+/* Indirect on a variable should have no effect. */
+int x;
+#pragma omp declare target indirect to(x)
diff --git a/gcc/testsuite/c-c++-common/gomp/declare-target-indirect-2.c b/gcc/testsuite/c-c++-common/gomp/declare-target-indirect-2.c
new file mode 100644
index 00000000000..48ba4f86362
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/declare-target-indirect-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-fopenmp -fdump-tree-gimple" } */
+
+#pragma omp begin declare target indirect
+void fn1 (void) { }
+#pragma omp end declare target
+/* { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target, omp declare target block, omp declare target indirect\\\)\\\)\\\nvoid fn1" "gimple" } } */
+
+#pragma omp begin declare target indirect (0)
+void fn2 (void) { }
+#pragma omp end declare target
+/* { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target, omp declare target block\\\)\\\)\\\nvoid fn2" "gimple" } } */
+
+void fn3 (void) { }
+#pragma omp declare target indirect to (fn3)
+/* { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target indirect, omp declare target\\\)\\\)\\\nvoid fn3" "gimple" } } */
+
+void fn4 (void) { }
+#pragma omp declare target indirect (0) to (fn4)
+/* { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target\\\)\\\)\\\nvoid fn4" "gimple" } } */
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 91551fde900..f7549017df6 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -347,6 +347,9 @@ enum omp_clause_code {
/* OpenMP clause: doacross ({source,sink}:vec). */
OMP_CLAUSE_DOACROSS,
+ /* OpenMP clause: indirect [(constant-integer-expression)]. */
+ OMP_CLAUSE_INDIRECT,
+
/* Internal structure to hold OpenACC cache directive's variable-list.
#pragma acc cache (variable-list). */
OMP_CLAUSE__CACHE_,
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 54ca5e750df..34a03ff04c8 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -269,6 +269,7 @@ unsigned const char omp_clause_num_ops[] =
2, /* OMP_CLAUSE_MAP */
1, /* OMP_CLAUSE_HAS_DEVICE_ADDR */
1, /* OMP_CLAUSE_DOACROSS */
+ 1, /* OMP_CLAUSE_INDIRECT */
2, /* OMP_CLAUSE__CACHE_ */
2, /* OMP_CLAUSE_GANG */
1, /* OMP_CLAUSE_ASYNC */
@@ -360,6 +361,7 @@ const char * const omp_clause_code_name[] =
"map",
"has_device_addr",
"doacross",
+ "indirect",
"_cache_",
"gang",
"async",
diff --git a/gcc/tree.h b/gcc/tree.h
index 005c157e9b0..78b15c24641 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1842,6 +1842,10 @@ class auto_suppress_location_wrappers
#define OMP_CLAUSE_DEVICE_TYPE_KIND(NODE) \
(OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_DEVICE_TYPE)->omp_clause.subcode.device_type_kind)
+#define OMP_CLAUSE_INDIRECT_EXPR(NODE) \
+ OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_INDIRECT), 0)
+
+
/* True if there is a device clause with a device-modifier 'ancestor'. */
#define OMP_CLAUSE_DEVICE_ANCESTOR(NODE) \
(OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_DEVICE)->base.public_flag)
diff --git a/libgcc/offloadstuff.c b/libgcc/offloadstuff.c
index 4e1c4d41dd5..18c5bf89b69 100644
--- a/libgcc/offloadstuff.c
+++ b/libgcc/offloadstuff.c
@@ -43,6 +43,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
#if defined(HAVE_GAS_HIDDEN) && ENABLE_OFFLOADING == 1
#define OFFLOAD_FUNC_TABLE_SECTION_NAME ".gnu.offload_funcs"
+#define OFFLOAD_IND_FUNC_TABLE_SECTION_NAME ".gnu.offload_ind_funcs"
#define OFFLOAD_VAR_TABLE_SECTION_NAME ".gnu.offload_vars"
#ifdef CRT_BEGIN
@@ -53,6 +54,9 @@ const void *const __offload_func_table[0]
const void *const __offload_var_table[0]
__attribute__ ((__used__, visibility ("hidden"),
section (OFFLOAD_VAR_TABLE_SECTION_NAME))) = { };
+const void *const __offload_ind_func_table[0]
+ __attribute__ ((__used__, visibility ("hidden"),
+ section (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME))) = { };
#elif defined CRT_END
@@ -62,19 +66,25 @@ const void *const __offload_funcs_end[0]
const void *const __offload_vars_end[0]
__attribute__ ((__used__, visibility ("hidden"),
section (OFFLOAD_VAR_TABLE_SECTION_NAME))) = { };
+const void *const __offload_ind_funcs_end[0]
+ __attribute__ ((__used__, visibility ("hidden"),
+ section (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME))) = { };
#elif defined CRT_TABLE
extern const void *const __offload_func_table[];
extern const void *const __offload_var_table[];
+extern const void *const __offload_ind_func_table[];
extern const void *const __offload_funcs_end[];
extern const void *const __offload_vars_end[];
+extern const void *const __offload_ind_funcs_end[];
const void *const __OFFLOAD_TABLE__[]
__attribute__ ((__visibility__ ("hidden"))) =
{
&__offload_func_table, &__offload_funcs_end,
- &__offload_var_table, &__offload_vars_end
+ &__offload_var_table, &__offload_vars_end,
+ &__offload_ind_func_table, &__offload_ind_funcs_end,
};
#else /* ! CRT_BEGIN && ! CRT_END && ! CRT_TABLE */
diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index 428f7a9dab5..2549ea8906b 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -67,7 +67,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c error.c \
target.c splay-tree.c libgomp-plugin.c oacc-parallel.c oacc-host.c \
oacc-init.c oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c \
priority_queue.c affinity-fmt.c teams.c allocator.c oacc-profiling.c \
- oacc-target.c
+ oacc-target.c target-indirect.c
include $(top_srcdir)/plugin/Makefrag.am
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 431bc87b629..32e64292a6d 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -219,7 +219,7 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo critical.lo \
oacc-parallel.lo oacc-host.lo oacc-init.lo oacc-mem.lo \
oacc-async.lo oacc-plugin.lo oacc-cuda.lo priority_queue.lo \
affinity-fmt.lo teams.lo allocator.lo oacc-profiling.lo \
- oacc-target.lo $(am__objects_1)
+ oacc-target.lo target-indirect.lo $(am__objects_1)
libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
AM_V_P = $(am__v_P_@AM_V@)
am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
@@ -549,7 +549,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c \
oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c \
oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
affinity-fmt.c teams.c allocator.c oacc-profiling.c \
- oacc-target.c $(am__append_3)
+ oacc-target.c target-indirect.c $(am__append_3)
# Nvidia PTX OpenACC plugin.
@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
@@ -777,6 +777,7 @@ distclean-compile:
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/sem.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/single.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/splay-tree.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/target-indirect.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/target.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/task.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/team.Plo@am__quote@
diff --git a/libgomp/config/accel/target-indirect.c b/libgomp/config/accel/target-indirect.c
new file mode 100644
index 00000000000..6ee82a0ebd0
--- /dev/null
+++ b/libgomp/config/accel/target-indirect.c
@@ -0,0 +1,126 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+
+ Contributed by Siemens.
+
+ This file is part of the GNU Offloading and Multi Processing Library
+ (libgomp).
+
+ Libgomp is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3, or (at your option)
+ any later version.
+
+ Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+ WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+ FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <assert.h>
+#include "libgomp.h"
+
+#define splay_tree_prefix indirect
+#define splay_tree_c
+#include "splay-tree.h"
+
+volatile void **GOMP_INDIRECT_ADDR_MAP = NULL;
+
+/* Use a splay tree to lookup the target address instead of using a
+ linear search. */
+#define USE_SPLAY_TREE_LOOKUP
+
+#ifdef USE_SPLAY_TREE_LOOKUP
+
+static struct indirect_splay_tree_s indirect_map;
+static indirect_splay_tree_node indirect_array = NULL;
+
+/* Build the splay tree used for host->target address lookups. */
+
+void
+build_indirect_map (void)
+{
+ size_t num_ind_funcs = 0;
+ volatile void **map_entry;
+ static int lock = 0; /* == gomp_mutex_t lock; gomp_mutex_init (&lock); */
+
+ if (!GOMP_INDIRECT_ADDR_MAP)
+ return;
+
+ gomp_mutex_lock (&lock);
+
+ if (!indirect_array)
+ {
+ /* Count the number of entries in the NULL-terminated address map. */
+ for (map_entry = GOMP_INDIRECT_ADDR_MAP; *map_entry;
+ map_entry += 2, num_ind_funcs++);
+
+ /* Build splay tree for address lookup. */
+ indirect_array = gomp_malloc (num_ind_funcs * sizeof (*indirect_array));
+ indirect_splay_tree_node array = indirect_array;
+ map_entry = GOMP_INDIRECT_ADDR_MAP;
+
+ for (int i = 0; i < num_ind_funcs; i++, array++)
+ {
+ indirect_splay_tree_key k = &array->key;
+ k->host_addr = (uint64_t) *map_entry++;
+ k->target_addr = (uint64_t) *map_entry++;
+ array->left = NULL;
+ array->right = NULL;
+ indirect_splay_tree_insert (&indirect_map, array);
+ }
+ }
+
+ gomp_mutex_unlock (&lock);
+}
+
+void *
+GOMP_target_map_indirect_ptr (void *ptr)
+{
+ /* NULL pointers always resolve to NULL. */
+ if (!ptr)
+ return ptr;
+
+ assert (indirect_array);
+
+ struct indirect_splay_tree_key_s k;
+ indirect_splay_tree_key node = NULL;
+
+ k.host_addr = (uint64_t) ptr;
+ node = indirect_splay_tree_lookup (&indirect_map, &k);
+
+ return node ? (void *) node->target_addr : ptr;
+}
+
+#else
+
+void
+build_indirect_map (void)
+{
+}
+
+void *
+GOMP_target_map_indirect_ptr (void *ptr)
+{
+ /* NULL pointers always resolve to NULL. */
+ if (!ptr)
+ return ptr;
+
+ assert (GOMP_INDIRECT_ADDR_MAP);
+
+ for (volatile void **map_entry = GOMP_INDIRECT_ADDR_MAP; *map_entry;
+ map_entry += 2)
+ if (*map_entry == ptr)
+ return (void *) *(map_entry + 1);
+
+ return ptr;
+}
+
+#endif
diff --git a/libgomp/config/gcn/team.c b/libgomp/config/gcn/team.c
index f03207c84e3..fb20cbbcf9f 100644
--- a/libgomp/config/gcn/team.c
+++ b/libgomp/config/gcn/team.c
@@ -30,6 +30,7 @@
#include <string.h>
static void gomp_thread_start (struct gomp_thread_pool *);
+extern void build_indirect_map (void);
/* This externally visible function handles target region entry. It
sets up a per-team thread pool and transfers control by returning to
@@ -45,6 +46,9 @@ gomp_gcn_enter_kernel (void)
{
int threadid = __builtin_gcn_dim_pos (1);
+ /* Initialize indirect function support. */
+ build_indirect_map ();
+
if (threadid == 0)
{
int numthreads = __builtin_gcn_dim_size (1);
diff --git a/libgomp/config/linux/target-indirect.c b/libgomp/config/linux/target-indirect.c
new file mode 100644
index 00000000000..de2c708bb81
--- /dev/null
+++ b/libgomp/config/linux/target-indirect.c
@@ -0,0 +1,31 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+
+ Contributed by Siemens.
+
+ This file is part of the GNU Offloading and Multi Processing Library
+ (libgomp).
+
+ Libgomp is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3, or (at your option)
+ any later version.
+
+ Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+ WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+ FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+void *
+GOMP_target_map_indirect_ptr (void *ptr)
+{
+ return ptr;
+}
diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c
index af5f3171a47..59521fabd99 100644
--- a/libgomp/config/nvptx/team.c
+++ b/libgomp/config/nvptx/team.c
@@ -35,6 +35,7 @@ struct gomp_thread *nvptx_thrs __attribute__((shared,nocommon));
int __gomp_team_num __attribute__((shared,nocommon));
static void gomp_thread_start (struct gomp_thread_pool *);
+extern void build_indirect_map (void);
/* This externally visible function handles target region entry. It
@@ -52,6 +53,10 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data)
int tid, ntids;
asm ("mov.u32 %0, %%tid.y;" : "=r" (tid));
asm ("mov.u32 %0, %%ntid.y;" : "=r" (ntids));
+
+ /* Initialize indirect function support. */
+ build_indirect_map ();
+
if (tid == 0)
{
gomp_global_icv.nthreads_var = ntids;
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index dc993882c3b..3ce032c5cc0 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -107,6 +107,8 @@ struct addr_pair
must be stringified). */
#define GOMP_ADDITIONAL_ICVS __gomp_additional_icvs
+#define GOMP_INDIRECT_ADDR_MAP __gomp_indirect_addr_map
+
/* Miscellaneous functions. */
extern void *GOMP_PLUGIN_malloc (size_t) __attribute__ ((malloc));
extern void *GOMP_PLUGIN_malloc_cleared (size_t) __attribute__ ((malloc));
@@ -132,7 +134,8 @@ extern bool GOMP_OFFLOAD_init_device (int);
extern bool GOMP_OFFLOAD_fini_device (int);
extern unsigned GOMP_OFFLOAD_version (void);
extern int GOMP_OFFLOAD_load_image (int, unsigned, const void *,
- struct addr_pair **, uint64_t **);
+ struct addr_pair **, uint64_t **,
+ uint64_t *);
extern bool GOMP_OFFLOAD_unload_image (int, unsigned, const void *);
extern void *GOMP_OFFLOAD_alloc (int, size_t);
extern bool GOMP_OFFLOAD_free (int, void *);
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 68f20651fbf..15a767cf317 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1274,6 +1274,29 @@ reverse_splay_compare (reverse_splay_tree_key x, reverse_splay_tree_key y)
#define splay_tree_prefix reverse
#include "splay-tree.h"
+/* Indirect target function splay-tree handling. */
+
+struct indirect_splay_tree_key_s {
+ uint64_t host_addr, target_addr;
+};
+
+typedef struct indirect_splay_tree_node_s *indirect_splay_tree_node;
+typedef struct indirect_splay_tree_s *indirect_splay_tree;
+typedef struct indirect_splay_tree_key_s *indirect_splay_tree_key;
+
+static inline int
+indirect_splay_compare (indirect_splay_tree_key x, indirect_splay_tree_key y)
+{
+ if (x->host_addr < y->host_addr)
+ return -1;
+ if (x->host_addr > y->host_addr)
+ return 1;
+ return 0;
+}
+
+#define splay_tree_prefix indirect
+#include "splay-tree.h"
+
struct target_mem_desc {
/* Reference count. */
uintptr_t refcount;
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index ce6b719a57f..90c401453b2 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -419,6 +419,7 @@ GOMP_5.1 {
GOMP_5.1.1 {
global:
GOMP_taskwait_depend_nowait;
+ GOMP_target_map_indirect_ptr;
} GOMP_5.1;
OACC_2.0 {
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index d24f590fd84..cd2ef065456 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -304,7 +304,7 @@ The OpenMP 4.5 specification is fully supported.
@item Iterators in @code{target update} motion clauses and @code{map}
clauses @tab N @tab
@item Indirect calls to the device version of a procedure or function in
- @code{target} regions @tab N @tab
+ @code{target} regions @tab P @tab Only C and C++
@item @code{interop} directive @tab N @tab
@item @code{omp_interop_t} object support in runtime routines @tab N @tab
@item @code{nowait} clause in @code{taskwait} directive @tab Y @tab
@@ -353,7 +353,7 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
@item For Fortran, diagnose placing declarative before/between @code{USE},
@code{IMPORT}, and @code{IMPLICIT} as invalid @tab N @tab
@item Optional comma between directive and clause in the @code{#pragma} form @tab Y @tab
-@item @code{indirect} clause in @code{declare target} @tab N @tab
+@item @code{indirect} clause in @code{declare target} @tab P @tab Only C and C++
@item @code{device_type(nohost)}/@code{device_type(host)} for variables @tab N @tab
@item @code{present} modifier to the @code{map}, @code{to} and @code{from}
clauses @tab Y @tab
diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index 5c1675c7869..95046312ae9 100644
--- a/libgomp/libgomp_g.h
+++ b/libgomp/libgomp_g.h
@@ -357,6 +357,7 @@ extern void GOMP_target_enter_exit_data (int, size_t, void **, size_t *,
void **);
extern void GOMP_teams (unsigned int, unsigned int);
extern bool GOMP_teams4 (unsigned int, unsigned int, unsigned int, bool);
+extern void *GOMP_target_map_indirect_ptr (void *);
/* teams.c */
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 5980d510838..fbab75d7d43 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -82,7 +82,8 @@ host_load_image (int n __attribute__ ((unused)),
unsigned v __attribute__ ((unused)),
const void *t __attribute__ ((unused)),
struct addr_pair **r __attribute__ ((unused)),
- uint64_t **f __attribute__ ((unused)))
+ uint64_t **f __attribute__ ((unused)),
+ uint64_t *i __attribute__ ((unused)))
{
return 0;
}
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index ef22d48da79..597bd75798d 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -365,6 +365,7 @@ struct gcn_image_desc
} *gcn_image;
const unsigned kernel_count;
struct hsa_kernel_description *kernel_infos;
+ const unsigned ind_func_count;
const unsigned global_variable_count;
};
@@ -3359,7 +3360,8 @@ GOMP_OFFLOAD_init_device (int n)
int
GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
struct addr_pair **target_table,
- uint64_t **rev_fn_table)
+ uint64_t **rev_fn_table,
+ uint64_t *host_ind_fn_table)
{
if (GOMP_VERSION_DEV (version) != GOMP_VERSION_GCN)
{
@@ -3375,6 +3377,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
struct module_info *module;
struct kernel_info *kernel;
int kernel_count = image_desc->kernel_count;
+ unsigned ind_func_count = image_desc->ind_func_count;
unsigned var_count = image_desc->global_variable_count;
/* Currently, "others" is a struct of ICVS. */
int other_count = 1;
@@ -3393,6 +3396,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
return -1;
GCN_DEBUG ("Encountered %d kernels in an image\n", kernel_count);
+ GCN_DEBUG ("Encountered %d indirect functions in an image\n", ind_func_count);
GCN_DEBUG ("Encountered %u global variables in an image\n", var_count);
GCN_DEBUG ("Expect %d other variables in an image\n", other_count);
pair = GOMP_PLUGIN_malloc ((kernel_count + var_count + other_count - 2)
@@ -3474,6 +3478,87 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
}
}
+ if (ind_func_count > 0)
+ {
+ hsa_status_t status;
+
+ /* Read indirect function table from image. */
+ hsa_executable_symbol_t ind_funcs_symbol;
+ status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL,
+ ".offload_ind_func_table",
+ agent->id,
+ 0, &ind_funcs_symbol);
+
+ if (status != HSA_STATUS_SUCCESS)
+ hsa_fatal ("Could not find .offload_ind_func_table symbol in the "
+ "code object", status);
+
+ uint64_t ind_funcs_table_addr;
+ status = hsa_fns.hsa_executable_symbol_get_info_fn
+ (ind_funcs_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS,
+ &ind_funcs_table_addr);
+ if (status != HSA_STATUS_SUCCESS)
+ hsa_fatal ("Could not extract a variable from its symbol", status);
+
+ uint64_t ind_funcs_table[ind_func_count];
+ GOMP_OFFLOAD_dev2host (agent->device_id, ind_funcs_table,
+ (void*) ind_funcs_table_addr,
+ sizeof (ind_funcs_table));
+
+ /* Build host->target address map for indirect functions. */
+ uint64_t ind_fn_map[ind_func_count * 2 + 1];
+ for (unsigned i = 0; i < ind_func_count; i++)
+ {
+ ind_fn_map[i * 2] = host_ind_fn_table[i];
+ ind_fn_map[i * 2 + 1] = ind_funcs_table[i];
+ GCN_DEBUG ("Indirect function %d: %lx->%lx\n",
+ i, host_ind_fn_table[i], ind_funcs_table[i]);
+ }
+ ind_fn_map[ind_func_count * 2] = 0;
+
+ /* Write the map onto the target. */
+ void *map_target_addr
+ = GOMP_OFFLOAD_alloc (agent->device_id, sizeof (ind_fn_map));
+ GCN_DEBUG ("Allocated indirect map at %p\n", map_target_addr);
+
+ GOMP_OFFLOAD_host2dev (agent->device_id, map_target_addr,
+ (void*) ind_fn_map,
+ sizeof (ind_fn_map));
+
+ /* Write address of the map onto the target. */
+ hsa_executable_symbol_t symbol;
+
+ status
+ = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL,
+ XSTRING (GOMP_INDIRECT_ADDR_MAP),
+ agent->id, 0, &symbol);
+ if (status != HSA_STATUS_SUCCESS)
+ hsa_fatal ("Could not find GOMP_INDIRECT_ADDR_MAP in code object",
+ status);
+
+ uint64_t varptr;
+ uint32_t varsize;
+
+ status = hsa_fns.hsa_executable_symbol_get_info_fn
+ (symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS,
+ &varptr);
+ if (status != HSA_STATUS_SUCCESS)
+ hsa_fatal ("Could not extract a variable from its symbol", status);
+ status = hsa_fns.hsa_executable_symbol_get_info_fn
+ (symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_SIZE,
+ &varsize);
+ if (status != HSA_STATUS_SUCCESS)
+ hsa_fatal ("Could not extract a variable size from its symbol",
+ status);
+
+ GCN_DEBUG ("Found GOMP_INDIRECT_ADDR_MAP at %lx with size %d\n",
+ varptr, varsize);
+
+ GOMP_OFFLOAD_host2dev (agent->device_id, (void *) varptr,
+ &map_target_addr,
+ sizeof (map_target_addr));
+ }
+
GCN_DEBUG ("Looking for variable %s\n", XSTRING (GOMP_ADDITIONAL_ICVS));
hsa_status_t status;
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 00d4241ae02..78aa95c843a 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -266,6 +266,8 @@ typedef struct nvptx_tdata
const struct targ_fn_launch *fn_descs;
unsigned fn_num;
+
+ unsigned ind_fn_num;
} nvptx_tdata_t;
/* Descriptor of a loaded function. */
@@ -1285,12 +1287,13 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
int
GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
struct addr_pair **target_table,
- uint64_t **rev_fn_table)
+ uint64_t **rev_fn_table,
+ uint64_t *host_ind_fn_table)
{
CUmodule module;
const char *const *var_names;
const struct targ_fn_launch *fn_descs;
- unsigned int fn_entries, var_entries, other_entries, i, j;
+ unsigned int fn_entries, var_entries, ind_fn_entries, other_entries, i, j;
struct targ_fn_descriptor *targ_fns;
struct addr_pair *targ_tbl;
const nvptx_tdata_t *img_header = (const nvptx_tdata_t *) target_data;
@@ -1319,6 +1322,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
var_names = img_header->var_names;
fn_entries = img_header->fn_num;
fn_descs = img_header->fn_descs;
+ ind_fn_entries = img_header->ind_fn_num;
/* Currently, other_entries contains only the struct of ICVs. */
other_entries = 1;
@@ -1373,6 +1377,60 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
targ_tbl->end = targ_tbl->start + bytes;
}
+ if (ind_fn_entries > 0)
+ {
+ CUdeviceptr var;
+ size_t bytes;
+
+ /* Read indirect function table from image. */
+ CUresult r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &var, &bytes, module,
+ "$offload_ind_func_table");
+ if (r != CUDA_SUCCESS)
+ GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r));
+ assert (bytes == sizeof (uint64_t) * ind_fn_entries);
+
+ uint64_t ind_fn_table[ind_fn_entries];
+ r = CUDA_CALL_NOCHECK (cuMemcpyDtoH, ind_fn_table, var, bytes);
+ if (r != CUDA_SUCCESS)
+ GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuda_error (r));
+
+ /* Build host->target address map for indirect functions. */
+ uint64_t ind_fn_map[ind_fn_entries * 2 + 1];
+ for (unsigned k = 0; k < ind_fn_entries; k++)
+ {
+ ind_fn_map[k * 2] = host_ind_fn_table[k];
+ ind_fn_map[k * 2 + 1] = ind_fn_table[k];
+ GOMP_PLUGIN_debug (0, "Indirect function %d: %lx->%lx\n",
+ k, host_ind_fn_table[k], ind_fn_table[k]);
+ }
+ ind_fn_map[ind_fn_entries * 2] = 0;
+
+ /* Write the map onto the target. */
+ void *map_target_addr
+ = GOMP_OFFLOAD_alloc (ord, sizeof (ind_fn_map));
+ GOMP_PLUGIN_debug (0, "Allocated indirect map at %p\n", map_target_addr);
+
+ GOMP_OFFLOAD_host2dev (ord, map_target_addr,
+ (void*) ind_fn_map,
+ sizeof (ind_fn_map));
+
+ /* Write address of the map onto the target. */
+ CUdeviceptr varptr;
+ size_t varsize;
+ r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &varptr, &varsize,
+ module, XSTRING (GOMP_INDIRECT_ADDR_MAP));
+ if (r != CUDA_SUCCESS)
+ GOMP_PLUGIN_fatal ("Indirect map variable not found in image: %s",
+ cuda_error (r));
+
+ GOMP_PLUGIN_debug (0,
+ "Indirect map variable found at %llx with size %ld\n",
+ varptr, varsize);
+
+ GOMP_OFFLOAD_host2dev (ord, (void *) varptr, &map_target_addr,
+ sizeof (map_target_addr));
+ }
+
CUdeviceptr varptr;
size_t varsize;
CUresult r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &varptr, &varsize,
diff --git a/libgomp/target.c b/libgomp/target.c
index 812674d19a9..ec154222ad5 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2256,11 +2256,14 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
void **host_funcs_end = ((void ***) host_table)[1];
void **host_var_table = ((void ***) host_table)[2];
void **host_vars_end = ((void ***) host_table)[3];
+ void **host_ind_func_table = ((void ***) host_table)[4];
+ void **host_ind_funcs_end = ((void ***) host_table)[5];
- /* The func table contains only addresses, the var table contains addresses
- and corresponding sizes. */
+ /* The func and ind_func tables contain only addresses, the var table
+ contains addresses and corresponding sizes. */
int num_funcs = host_funcs_end - host_func_table;
int num_vars = (host_vars_end - host_var_table) / 2;
+ int num_ind_funcs = (host_ind_funcs_end - host_ind_func_table);
/* Load image to device and get target addresses for the image. */
struct addr_pair *target_table = NULL;
@@ -2273,7 +2276,9 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
num_target_entries
= devicep->load_image_func (devicep->target_id, version,
target_data, &target_table,
- rev_lookup ? &rev_target_fn_table : NULL);
+ rev_lookup ? &rev_target_fn_table : NULL,
+ num_ind_funcs
+ ? (uint64_t *) host_ind_func_table : NULL);
if (num_target_entries != num_funcs + num_vars
/* "+1" due to the additional ICV struct. */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-1.c b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-1.c
new file mode 100644
index 00000000000..b20bfa64dca
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-1.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+
+#pragma omp begin declare target indirect
+int foo(void) { return 5; }
+int bar(void) { return 8; }
+int baz(void) { return 11; }
+#pragma omp end declare target
+
+int main (void)
+{
+ int x;
+ int (*foo_ptr) (void) = &foo;
+ int (*bar_ptr) (void) = &bar;
+ int (*baz_ptr) (void) = &baz;
+ int expected = foo () + bar () + baz ();
+
+#pragma omp target map (to: foo_ptr, bar_ptr, baz_ptr) map (from: x)
+ x = (*foo_ptr) () + (*bar_ptr) () + (*baz_ptr) ();
+
+ return x - expected;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
new file mode 100644
index 00000000000..9fe190efce8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+
+#define N 256
+
+#pragma omp begin declare target indirect
+int foo(void) { return 5; }
+int bar(void) { return 8; }
+int baz(void) { return 11; }
+#pragma omp end declare target
+
+int main (void)
+{
+ int i, x = 0, expected = 0;
+ int (*fn_ptr[N])(void);
+
+ for (i = 0; i < N; i++)
+ {
+ switch (i % 3)
+ {
+ case 0: fn_ptr[i] = &foo;
+ case 1: fn_ptr[i] = &bar;
+ case 2: fn_ptr[i] = &baz;
+ }
+ expected += (*fn_ptr[i]) ();
+ }
+
+#pragma omp target teams distribute parallel for reduction(+: x) \
+ map (to: fn_ptr) map (tofrom: x)
+ for (int i = 0; i < N; i++)
+ x += (*fn_ptr[i]) ();
+
+ return x - expected;
+}
--
2.34.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
2023-10-08 13:13 [PATCH] openmp: Add support for the 'indirect' clause in C/C++ Kwok Cheung Yeung
@ 2023-10-17 13:12 ` Tobias Burnus
2023-10-17 13:34 ` Jakub Jelinek
2023-11-03 19:53 ` Kwok Cheung Yeung
0 siblings, 2 replies; 28+ messages in thread
From: Tobias Burnus @ 2023-10-17 13:12 UTC (permalink / raw)
To: Kwok Cheung Yeung, gcc-patches; +Cc: Jakub Jelinek
Hi Kwok, hi Jakub, hi all,
some first comments based on both playing around and reading the
patch - and some generic comments to any patch reader.
In general, the patch looks good. I just observe:
* There is an issue with [[omp::decl(...)]]'
* <clause>(<boolean expression>) - there is a C/C++ inconsistency in
what is expected; it possibly affects more such conditions
* Missed optimization for the host?
* Bunch of minor comments
On 08.10.23 15:13, Kwok Cheung Yeung wrote:
> This patch adds support for the 'indirect' clause in the 'declare
> target' directive in C/C++ (Fortran to follow) and adds the necessary
> infrastructure to support indirect calls in target regions. This allows
> one to pass in pointers to functions that have been declared as indirect
> from the host to the target, then invoked via the passed-in pointer on
> the target device.
> [...]
> The C++ support is currently limited to normal indirect calls - virtual
> calls on objects do not currently work. I believe the main issue is that
> the vtables are not currently copied across to the target. I have added
> some handling for OBJ_TYPE_REF to prevent the compiler from ICEing when
> it encounters a virtual call, but without the vtable this cannot work
> properly.
Side remark: Fortran polymorphic variables are similar. For them also
a vtable needs to be copied.
(For vtables, see also comment to 'libgomp.texi' far below.)
* * *
C++11 (and C23) attribute do not seem to be properly handled:
[[omp::decl (declare target,indirect(1))]]
int foo(void) { return 5; }
[[omp::decl (declare target indirect)]]
int bar(void) { return 8; }
[[omp::directive (begin declare target,indirect)]];
int baz(void) { return 11; }
[[omp::directive (end declare target)]];
While I get for the last one ("baz"):
__attribute__((omp declare target, omp declare target block, omp declare target indirect))
the first two (foo and bar) do not have any attribute; if I remove the "indirect",
I do get "__attribute__((omp declare target))". Hence, the omp::decl support seems to
partially work.
NOTE: C23 omp:: attribute support is still WIP and not yet in mainline.
Recent draft: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633007.html
The following works - but there is not a testcase for either syntax:
int bar(void) { return 8; }
[[omp::directive(declare target to(bar) , indirect(1))]];
int baz(void) { return 11; }
[[omp::directive ( declare target indirect enter(baz))]];
int bar(void) { return 8; }
#pragma omp declare target to(bar) , indirect(1)
int baz(void) { return 11; }
#pragma omp declare target indirect enter(baz)
(There is one for #pragma + 'to' in gomp/declare-target-indirect-2.c, however.)
Side remark: OpenMP 5.2 renamed 'to' to 'enter' (with deprecated alias 'to);
hence, I also use 'enter' above. The current testcases for indiredt use 'enter'.
(Not that it should make a difference as the to/enter do work.)
The following seems to work fine, but I think we do not have
a testcase for it ('bar' has no indirect, foo and baz have it):
#pragma omp begin declare target indirect(1)
int foo(void) { return 5; }
#pragma omp begin declare target indirect(0)
int bar(void) { return 8; }
int baz(void) { return 11; }
#pragma omp declare target indirect enter(baz)
#pragma omp end declare target
#pragma omp end declare target
* * *
Possibly affecting other logical flags as well, but I do notice that
gcc but not g++ accepts the following:
#pragma omp begin declare target indirect("abs")
#pragma omp begin declare target indirect(5.5)
g++ shows: error: expected constant integer expression
OpenMP requires 'constant boolean' expr (OpenMP 5.1) or
'expression of logical type','constant' (OpenMP 5.2), where for the latter it has:
"The OpenMP *logical type* supports logical variables and expressions in any base language.
"[C / C++] Any OpenMP logical expression is a scalar expression. This document uses true as
a generic term for a non-zero integer value and false as a generic term for an integer value
of zero."
I am not quite sure what to expect here; in terms of C++, conv.bool surely permits
those for those pvalues "Boolean conversions". For C, I don't find the wording in the
standard but 'if("abc")' and 'if (5.5)' is accepted.
* * *
I notice that the {__builtin_,}GOMP_target_map_indirect_ptr call is inserted
quite late, i.e. in omp-offload.cc. A dump and also looking at the *.s files
shows that the
__builtin_GOMP_target_map_indirect_ptr / call GOMP_target_map_indirect_ptr
do not only show up for the device but also for the host-fallback code.
I think the latter is not required as a host pointer can be directly executed
on the host - and device -> host pointer like in
omp target device(ancestor:1)
do not need to be supported.
Namely the current glossary (here git version but OpenMP 5.2 is very similar);
note the "other than the host device":
"indirect device invocation - An indirect call to the device version of a
procedure on a device other than the host device, through a function pointer
(C/C++), a pointer to a member function (C++) or a procedure pointer (Fortran)
that refers to the host version of the procedure.
Can't we use #ifdef ACCEL_COMPILER to optimize the host fallback?
That way, we can also avoid generating the splay-tree on the host
cf. LIBGOMP_OFFLOADED_ONLY.
* * *
#pragma omp begin declare target indirect(1) device_type(host)
is accepted but it violates:
OpenMP 5.1: "Restrictions to the declare target directive are as follows:"
"If an indirect clause is present and invoked-by-fptr evaluates to true then
the only permitted device_type clause is device_type(any)" [215:1-2]
In OpenMP 5.2 that's in "7.8.3 indirect Clause" itself.
* * *
OpenMP permits pointers to member functions. Can you
also a test for those? I bet it simply works but we
should still test those.
(For vtables, see also comment below.)
class Foo {
public:
int f(int x);
};
typedef int (Foo::*FooFptr)(int x);
...
int my_call(Foo &foo)
{
FooFptr fn_ptr = &Foo::f;
...
return std::invoke(fn_ptr, foo, 42);
}
* * *
Side remarks for patch readers:
Besides the existing offload functions/variables tables, a new
indirect-functions table is created; functions there do aren't
added to the function table (in the compiler-generated code).
While the code obviously affects the performance, it does not hamper
optimizations such as (modified libgomp.c-c++-common/declare-target-indirect-1.c):
#pragma omp target map (from: x)
{
int (*foo_ptr) (void) = &foo;
int (*bar_ptr) (void) = &bar;
int (*baz_ptr) (void) = &baz;
x = (*foo_ptr) () + (*bar_ptr) () + (*baz_ptr) ();
}
which still gives with -O1:
_8 = *.omp_data_i_7(D).x;
*_8 = 24;
There was some discussion at OpenMP spec level to avoid the overhead
via new assumptions which tell that 'indirect' can or cannot be
encountered. That was a more resent side discussion of the generic
questions in my Issue #3540.
I think in general there aren't that many function pointers around
and if the extra function call happens late enough (which seems to be
the case, see above), it shouldn't be a real problem in most code,
also because the indirect-reverse-lookup table is short and hopefully
there won't be too many repeated lookups with either function pointers
or C++ classes/Fortran polymorphic calls.
* * *
> --- a/gcc/omp-offload.cc +++ b/gcc/omp-offload.cc } + if
> (omp_redirect_indirect_calls + && gimple_call_fndecl (stmt) ==
> NULL_TREE) + { + gcall *orig_call = dyn_cast <gcall *> (stmt); + tree
> call_fn = gimple_call_fn (stmt); + tree map_ptr_fn + =
> builtin_decl_explicit (BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR);
This line is too long. Maybe use a 'enum built_in_function' temporary?
> --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -304,7 +304,7
> @@ The OpenMP 4.5 specification is fully supported. @item Iterators in
> @code{target update} motion clauses and @code{map} clauses @tab N @tab
> @item Indirect calls to the device version of a procedure or function
> in - @code{target} regions @tab N @tab + @code{target} regions @tab P
> @tab Only C and C++
I think we need a new entry to handle the virtual part. However, it looks
as if that's a new OpenMP 5.2 feature. Can you add an entry under
"Other new OpenMP 5.2 features2?
At least I cannot find any existing entry and I only see in OpenMP 5.2:
"Invoking a virtual member function of an object on a device other than
the device on which the object was constructed results in unspecified
behavior, unless the object is accessible and was constructed on the
host device." [OpenMP 5.2, 287:10-12] in "Restrictions to the target construct".
> --- a/libgomp/target.c +++ b/libgomp/target.c @@ -2256,11 +2256,14 @@
> gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned
> version, void **host_funcs_end = ((void ***) host_table)[1]; void
> **host_var_table = ((void ***) host_table)[2]; void **host_vars_end =
> ((void ***) host_table)[3]; + void **host_ind_func_table = ((void ***)
> host_table)[4]; + void **host_ind_funcs_end = ((void ***) host_table)[5];
This code assumes that all calls have now 6 arguments. But that's not true for old
code. It seems as if you have to bump the version number and only access those values
when the version number is sufficiently large.
Thanks,
Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
2023-10-17 13:12 ` Tobias Burnus
@ 2023-10-17 13:34 ` Jakub Jelinek
2023-10-17 14:41 ` Tobias Burnus
2023-11-03 19:53 ` Kwok Cheung Yeung
1 sibling, 1 reply; 28+ messages in thread
From: Jakub Jelinek @ 2023-10-17 13:34 UTC (permalink / raw)
To: Tobias Burnus; +Cc: Kwok Cheung Yeung, gcc-patches
On Tue, Oct 17, 2023 at 03:12:46PM +0200, Tobias Burnus wrote:
> C++11 (and C23) attribute do not seem to be properly handled:
>
> [[omp::decl (declare target,indirect(1))]]
> int foo(void) { return 5; }
> [[omp::decl (declare target indirect)]]
> int bar(void) { return 8; }
Isn't that correct?
Declare target directive has the forms
declare target (list)
declare target
declare target clauses
The first form is essentially equivalent to declare target enter (list),
the second to begin declare target with no clauses.
Now,
[[omp::decl (declare target)]] int v;
matches the first form and so enter (v) clause is implied.
But say
[[omp::decl (declare target, device_type (any))]] int v;
is the third type and so nothing is implied, so it is equivalent to
int v;
#pragma omp declare target device_type (any)
Don't remember if that is supposed to be an error or just not do anything
because there is no enter or to or link clause.
So, I think if you want to make foo indirect, the above would have to be:
[[omp::decl (declare target,enter,indirect(1))]]
int foo(void) { return 5; }
[[omp::decl (declare target indirect enter)]]
int bar(void) { return 8; }
or so (or to instead of enter, but guess that is either deprecated or
removed but we should support that anyway).
Jakub
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
2023-10-17 13:34 ` Jakub Jelinek
@ 2023-10-17 14:41 ` Tobias Burnus
0 siblings, 0 replies; 28+ messages in thread
From: Tobias Burnus @ 2023-10-17 14:41 UTC (permalink / raw)
To: Jakub Jelinek, Tobias Burnus; +Cc: Kwok Cheung Yeung, gcc-patches
On 17.10.23 15:34, Jakub Jelinek wrote:
> On Tue, Oct 17, 2023 at 03:12:46PM +0200, Tobias Burnus wrote:
>> C++11 (and C23) attribute do not seem to be properly handled:
>>
>> [[omp::decl (declare target,indirect(1))]]
>> int foo(void) { return 5; }
>> [[omp::decl (declare target indirect)]]
>> int bar(void) { return 8; }
> Isn't that correct?
No it isn't. Following your argument below, the following is violated:
"If the directive has a clause, it must contain at least one enter
clause, link clause, or local clause."
(OpenMP before 5.2 had 'to' instead of 'enter' and 5.2 had 'enter' with
'to' as alias.)
And the clause above does not have those. Alternatively permitted is:
"If the extended-list argument is specified, no clauses may be specified."
But that cannot be the case as the "indirect" clause has been specified.
* * *
> Declare target directive has the forms
> declare target (list)
> declare target
> declare target clauses
> The first form is essentially equivalent to declare target enter (list),
> the second to begin declare target with no clauses.
> Now,
> [[omp::decl (declare target)]] int v;
> matches the first form and so enter (v) clause is implied.
> But say
> [[omp::decl (declare target, device_type (any))]] int v;
> is the third type and so nothing is implied, so it is equivalent to
> int v;
I have to admit that I failed to read 'omp::decl()' in the spec before
trying it. The TR11 spec states:
"A declarative directive that is declaration-associated may
alternatively be expressed as an attribute specifier where
directive-attr is dec( directive-specification
"A declarative directive with an association of none that accepts a
variable list or extended list as a directive argument or clause
argument may alternatively be expressed with an attribute specifier that
also uses thedeclattribute, applies to variable and/or function
declarations, and omits the variable list or extended list argument. The
effect is as if the omitted list argument is the list of declared
variables and/or functions to which the attribute specifier applies."
('declare target' is a bit odd as it either accept a variable list or
supports a form where a clause accepts a variable list, which is confusing.)
> #pragma omp declare target device_type (any)
> Don't remember if that is supposed to be an error or just not do anything
> because there is no enter or to or link clause.
That's invalid per the restriction above - well, except for the
delimited form, which permits 'indirect' and 'device_type' as only
clauses. But that does not apply here as the association with
'omp::decl' must be none (i.e. 'delimited' is not permitted).
> So, I think if you want to make foo indirect, the above would have to be:
> [[omp::decl (declare target,enter,indirect(1))]]
> int foo(void) { return 5; }
I concur - and admit that I missed the 'enter'.
Still, there is an issue as the restriction is not checked for.
Same with:
#pragma omp declare target indirect
(invalid but accepted) while
#pragma omp declare target device_type(any) indirect
fails with: "error: directive with only ‘device_type’ clause". The error
due to 'device_type' is no longer completely correct as I would count
'indirect' as another directive.
Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
2023-10-17 13:12 ` Tobias Burnus
2023-10-17 13:34 ` Jakub Jelinek
@ 2023-11-03 19:53 ` Kwok Cheung Yeung
2023-11-06 8:48 ` Tobias Burnus
` (3 more replies)
1 sibling, 4 replies; 28+ messages in thread
From: Kwok Cheung Yeung @ 2023-11-03 19:53 UTC (permalink / raw)
To: Tobias Burnus, gcc-patches; +Cc: Jakub Jelinek
[-- Attachment #1: Type: text/plain, Size: 8732 bytes --]
On 17/10/2023 2:12 pm, Tobias Burnus wrote:
> C++11 (and C23) attribute do not seem to be properly handled:
>
> [[omp::decl (declare target,indirect(1))]]
> int foo(void) { return 5; }
> [[omp::decl (declare target indirect)]]
> int bar(void) { return 8; }
> [[omp::directive (begin declare target,indirect)]];
> int baz(void) { return 11; }
> [[omp::directive (end declare target)]];
>
> While I get for the last one ("baz"):
>
> __attribute__((omp declare target, omp declare target block, omp declare
> target indirect))
>
> the first two (foo and bar) do not have any attribute; if I remove the
> "indirect",
> I do get "__attribute__((omp declare target))". Hence, the omp::decl
> support seems to
> partially work.
>
If we replace the 'indirect' with 'device_type', we get a 'directive
with only ‘device_type’ clause' error as the affected function has not
been specified. I have updated the parser so that a similar message is
emitted if only 'device_type' or 'indirect' clauses are supplied.
>
>
> The following works - but there is not a testcase for either syntax:
>
> int bar(void) { return 8; }
> [[omp::directive(declare target to(bar) , indirect(1))]];
> int baz(void) { return 11; }
> [[omp::directive ( declare target indirect enter(baz))]];
>
> int bar(void) { return 8; }
> #pragma omp declare target to(bar) , indirect(1)
> int baz(void) { return 11; }
> #pragma omp declare target indirect enter(baz)
>
> (There is one for #pragma + 'to' in gomp/declare-target-indirect-2.c,
> however.)
>
> Side remark: OpenMP 5.2 renamed 'to' to 'enter' (with deprecated alias
> 'to);
> hence, I also use 'enter' above. The current testcases for indiredt use
> 'enter'.
> (Not that it should make a difference as the to/enter do work.)
>
Added to g++.dg/gomp/declare-target-indirect-1.C.
> The following seems to work fine, but I think we do not have
> a testcase for it ('bar' has no indirect, foo and baz have it):
>
> #pragma omp begin declare target indirect(1)
> int foo(void) { return 5; }
> #pragma omp begin declare target indirect(0)
> int bar(void) { return 8; }
> int baz(void) { return 11; }
> #pragma omp declare target indirect enter(baz)
> #pragma omp end declare target
> #pragma omp end declare target
>
Added to c-c++-common/gomp/declare-target-indirect-2.c.
> Possibly affecting other logical flags as well, but I do notice that
> gcc but not g++ accepts the following:
>
> #pragma omp begin declare target indirect("abs")
> #pragma omp begin declare target indirect(5.5)
>
> g++ shows: error: expected constant integer expression
>
> OpenMP requires 'constant boolean' expr (OpenMP 5.1) or
> 'expression of logical type','constant' (OpenMP 5.2), where for the
> latter it has:
>
> "The OpenMP *logical type* supports logical variables and expressions in
> any base language.
> "[C / C++] Any OpenMP logical expression is a scalar expression. This
> document uses true as
> a generic term for a non-zero integer value and false as a generic term
> for an integer value
> of zero."
>
> I am not quite sure what to expect here; in terms of C++, conv.bool
> surely permits
> those for those pvalues "Boolean conversions". For C, I don't find the
> wording in the
> standard but 'if("abc")' and 'if (5.5)' is accepted.
I've changed the C++ parser to accept these 'unusual' logical values,
and modified the wording of the error to require a 'logical' expression.
> I notice that the {__builtin_,}GOMP_target_map_indirect_ptr call is
> inserted
> quite late, i.e. in omp-offload.cc. A dump and also looking at the *.s
> files
> shows that the
> __builtin_GOMP_target_map_indirect_ptr / call
> GOMP_target_map_indirect_ptr
> do not only show up for the device but also for the host-fallback code.
>
> I think the latter is not required as a host pointer can be directly
> executed
> on the host - and device -> host pointer like in
> omp target device(ancestor:1)
> do not need to be supported.
>
> Namely the current glossary (here git version but OpenMP 5.2 is very
> similar);
> note the "other than the host device":
>
> "indirect device invocation - An indirect call to the device version of a
> procedure on a device other than the host device, through a function
> pointer
> (C/C++), a pointer to a member function (C++) or a procedure pointer
> (Fortran)
> that refers to the host version of the procedure.
>
> Can't we use #ifdef ACCEL_COMPILER to optimize the host fallback?
>
> That way, we can also avoid generating the splay-tree on the host
> cf. LIBGOMP_OFFLOADED_ONLY.
The GOMP_target_map_indirect_ptr call is now only generated by the accel
compilers.
FWIW, the splay-tree was not actually being built on the host-side, and
the host implementation of GOMP_target_map_indirect_ptr just returned
the pointer unchanged. It is now changed to __builtin_unreachable as the
calls should no longer be generated in host code.
> #pragma omp begin declare target indirect(1) device_type(host)
>
> is accepted but it violates:
>
> OpenMP 5.1: "Restrictions to the declare target directive are as follows:"
> "If an indirect clause is present and invoked-by-fptr evaluates to true
> then
> the only permitted device_type clause is device_type(any)" [215:1-2]
>
> In OpenMP 5.2 that's in "7.8.3 indirect Clause" itself.
>
I have added a check for this in the parser which emits an error if this
happens, and some tests in declare-target-indirect-1.c.
> OpenMP permits pointers to member functions. Can you
> also a test for those? I bet it simply works but we
> should still test those.
>
> (For vtables, see also comment below.)
>
> class Foo {
> public:
> int f(int x);
> };
>
> typedef int (Foo::*FooFptr)(int x);
> ...
> int my_call(Foo &foo)
> {
> FooFptr fn_ptr = &Foo::f;
> ...
> return std::invoke(fn_ptr, foo, 42);
> }
This works, and I have added an execution test in
libgomp.c++/declare-target-indirect-1.C.
>> --- a/gcc/omp-offload.cc +++ b/gcc/omp-offload.cc } + if
>> (omp_redirect_indirect_calls + && gimple_call_fndecl (stmt) ==
>> NULL_TREE) + { + gcall *orig_call = dyn_cast <gcall *> (stmt); + tree
>> call_fn = gimple_call_fn (stmt); + tree map_ptr_fn + =
>> builtin_decl_explicit (BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR);
>
> This line is too long. Maybe use a 'enum built_in_function' temporary?
Fixed.
>> --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -304,7 +304,7
>> @@ The OpenMP 4.5 specification is fully supported. @item Iterators in
>> @code{target update} motion clauses and @code{map} clauses @tab N @tab
>> @item Indirect calls to the device version of a procedure or function
>> in - @code{target} regions @tab N @tab + @code{target} regions @tab P
>> @tab Only C and C++
>
> I think we need a new entry to handle the virtual part. However, it looks
> as if that's a new OpenMP 5.2 feature. Can you add an entry under
> "Other new OpenMP 5.2 features2?
>
> At least I cannot find any existing entry and I only see in OpenMP 5.2:
>
> "Invoking a virtual member function of an object on a device other than
> the device on which the object was constructed results in unspecified
> behavior, unless the object is accessible and was constructed on the
> host device." [OpenMP 5.2, 287:10-12] in "Restrictions to the target
> construct".
I have added a line in the OpenMP 5.2 section to state this.
>> --- a/libgomp/target.c +++ b/libgomp/target.c @@ -2256,11 +2256,14 @@
>> gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned
>> version, void **host_funcs_end = ((void ***) host_table)[1]; void
>> **host_var_table = ((void ***) host_table)[2]; void **host_vars_end =
>> ((void ***) host_table)[3]; + void **host_ind_func_table = ((void ***)
>> host_table)[4]; + void **host_ind_funcs_end = ((void ***) host_table)[5];
>
> This code assumes that all calls have now 6 arguments. But that's not
> true for old
> code. It seems as if you have to bump the version number and only access
> those values
> when the version number is sufficiently large.
I have bumped up the GOMP_VERSION to 3, and reading the indirect
functions section of the host table will be skipped if the GOMP_VERSION
is not at least 3.
Also, in the device plugins, the indirect function count will not be
read from the image header if the GOMP_VERSION is too low.
Okay for mainline, pending successful testing (still in progress)?
Thanks
Kwok
> Thanks,
>
> Tobias
>
>
>
> -----------------
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201,
> 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer:
> Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München;
> Registergericht München, HRB 106955
[-- Attachment #2: 0001-openmp-Add-support-for-the-indirect-clause-in-C-C.patch --]
[-- Type: text/plain, Size: 83102 bytes --]
From adcd938b1dee1cc5a9df6efee40d47a2aab254f8 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <kcy@codesourcery.com>
Date: Fri, 3 Nov 2023 18:03:50 +0000
Subject: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
This adds support for the 'indirect' clause in the 'declare target'
directive. Functions declared as indirect may be called via function
pointers passed from the host in offloaded code.
Virtual calls to member functions via the object pointer in C++ are
currently not supported in target regions.
2023-11-03 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/c-family/
* c-attribs.cc (c_common_attribute_table): Add attribute for
indirect functions.
* c-pragma.h (enum parma_omp_clause): Add entry for indirect clause.
gcc/c/
* c-decl.cc (c_decl_attributes): Add attribute for indirect
functions.
* c-lang.h (c_omp_declare_target_attr): Add indirect field.
* c-parser.cc (c_parser_omp_clause_name): Handle indirect clause.
(c_parser_omp_clause_indirect): New.
(c_parser_omp_all_clauses): Handle indirect clause.
(OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
(c_parser_omp_declare_target): Handle indirect clause. Emit error
message if device_type or indirect clauses used alone. Emit error
if indirect clause used with device_type that is not 'any'.
(OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
(c_parser_omp_begin): Handle indirect clause.
* c-typeck.cc (c_finish_omp_clauses): Handle indirect clause.
gcc/cp/
* cp-tree.h (cp_omp_declare_target_attr): Add indirect field.
* decl2.cc (cplus_decl_attributes): Add attribute for indirect
functions.
* parser.cc (cp_parser_omp_clause_name): Handle indirect clause.
(cp_parser_omp_clause_indirect): New.
(cp_parser_omp_all_clauses): Handle indirect clause.
(handle_omp_declare_target_clause): Add extra parameter. Add
indirect attribute for indirect functions.
(OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
(cp_parser_omp_declare_target): Handle indirect clause. Emit error
message if device_type or indirect clauses used alone. Emit error
if indirect clause used with device_type that is not 'any'.
(OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask.
(cp_parser_omp_begin): Handle indirect clause.
* semantics.cc (finish_omp_clauses): Handle indirect clause.
gcc/
* lto-cgraph.cc (enum LTO_symtab_tags): Add tag for indirect
functions.
(output_offload_tables): Write indirect functions.
(input_offload_tables): read indirect functions.
* lto-section-names.h (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New.
* omp-builtins.def (BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR): New.
* omp-offload.cc (offload_ind_funcs): New.
(omp_discover_implicit_declare_target): Add functions marked with
'omp declare target indirect' to indirect functions list.
(omp_finish_file): Add indirect functions to section for offload
indirect functions.
(execute_omp_device_lower): Redirect indirect calls on target by
passing function pointer to BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR.
(pass_omp_device_lower::gate): Run pass_omp_device_lower if
indirect functions are present on an accelerator device.
* omp-offload.h (offload_ind_funcs): New.
* tree-core.h (omp_clause_code): Add OMP_CLAUSE_INDIRECT.
* tree.cc (omp_clause_num_ops): Add entry for OMP_CLAUSE_INDIRECT.
(omp_clause_code_name): Likewise.
* tree.h (OMP_CLAUSE_INDIRECT_EXPR): New.
* config/gcn/mkoffload.cc (process_asm): Process offload_ind_funcs
section. Count number of indirect functions.
(process_obj): Emit number of indirect functions.
* config/nvptx/mkoffload.cc (ind_func_ids, ind_funcs_tail): New.
(process): Emit offload_ind_func_table in PTX code. Emit indirect
function names and count in image.
* config/nvptx/nvptx.cc (nvptx_record_offload_symbol): Mark
indirect functions in PTX code with IND_FUNC_MAP.
gcc/testsuite/
* c-c++-common/gomp/declare-target-7.c: Update expected error message.
* c-c++-common/gomp/declare-target-indirect-1.c: New.
* c-c++-common/gomp/declare-target-indirect-2.c: New.
* g++.dg/gomp/declare-target-indirect-1.C: New.
include/
* gomp-constants.h (GOMP_VERSION): Increment to 3.
(GOMP_VERSION_SUPPORTS_INDIRECT_FUNCS): New.
libgcc/
* offloadstuff.c (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New.
(__offload_ind_func_table): New.
(__offload_ind_funcs_end): New.
(__OFFLOAD_TABLE__): Add entries for indirect functions.
libgomp/
* Makefile.am (libgomp_la_SOURCES): Add target-indirect.c.
* Makefile.in: Regenerate.
* libgomp-plugin.h (GOMP_INDIRECT_ADDR_MAP): New define.
(GOMP_OFFLOAD_load_image): Add extra argument.
* libgomp.h (struct indirect_splay_tree_key_s): New.
(indirect_splay_tree_node, indirect_splay_tree,
indirect_splay_tree_key): New.
(indirect_splay_compare): New.
* libgomp.map (GOMP_5.1.1): Add GOMP_target_map_indirect_ptr.
* libgomp.texi (OpenMP 5.1): Update documentation on indirect
calls in target region and on indirect clause.
(Other new OpenMP 5.2 features): Add entry for virtual function calls.
* libgomp_g.h (GOMP_target_map_indirect_ptr): Add prototype.
* oacc-host.c (host_load_image): Add extra argument.
* target.c (gomp_load_image_to_device): If the GOMP_VERSION is high
enough, read host indirect functions table and pass to
load_image_func.
* config/accel/target-indirect.c: New.
* config/linux/target-indirect.c: New.
* config/gcn/team.c (build_indirect_map): Add prototype.
(gomp_gcn_enter_kernel): Initialize support for indirect
function calls on GCN target.
* config/nvptx/team.c (build_indirect_map): Add prototype.
(gomp_nvptx_main): Initialize support for indirect function
calls on NVPTX target.
* plugin/plugin-gcn.c (struct gcn_image_desc): Add field for
indirect functions count.
(GOMP_OFFLOAD_load_image): Add extra argument. If the GOMP_VERSION
is high enough, build address translation table and copy it to target
memory.
* plugin/plugin-nvptx.c (nvptx_tdata): Add field for indirect
functions count.
(GOMP_OFFLOAD_load_image): Add extra argument. If the GOMP_VERSION
is high enough, Build address translation table and copy it to target
memory.
* testsuite/libgomp.c-c++-common/declare-target-indirect-1.c: New.
* testsuite/libgomp.c-c++-common/declare-target-indirect-2.c: New.
* testsuite/libgomp.c++/declare-target-indirect-1.C: New.
---
gcc/c-family/c-attribs.cc | 2 +
gcc/c-family/c-pragma.h | 1 +
gcc/c/c-decl.cc | 8 ++
gcc/c/c-lang.h | 1 +
gcc/c/c-parser.cc | 101 ++++++++++++--
gcc/c/c-typeck.cc | 1 +
gcc/config/gcn/mkoffload.cc | 29 +++-
gcc/config/nvptx/mkoffload.cc | 87 +++++++++++-
gcc/config/nvptx/nvptx.cc | 6 +-
gcc/cp/cp-tree.h | 1 +
gcc/cp/decl2.cc | 6 +
gcc/cp/parser.cc | 108 ++++++++++++---
gcc/cp/semantics.cc | 1 +
gcc/lto-cgraph.cc | 27 ++++
gcc/lto-section-names.h | 1 +
gcc/omp-builtins.def | 3 +
gcc/omp-offload.cc | 85 ++++++++++--
gcc/omp-offload.h | 1 +
.../c-c++-common/gomp/declare-target-7.c | 2 +-
.../gomp/declare-target-indirect-1.c | 62 +++++++++
.../gomp/declare-target-indirect-2.c | 32 +++++
.../g++.dg/gomp/declare-target-indirect-1.C | 17 +++
gcc/tree-core.h | 3 +
gcc/tree.cc | 2 +
gcc/tree.h | 4 +
include/gomp-constants.h | 4 +-
libgcc/offloadstuff.c | 12 +-
libgomp/Makefile.am | 2 +-
libgomp/Makefile.in | 5 +-
libgomp/config/accel/target-indirect.c | 126 ++++++++++++++++++
libgomp/config/gcn/team.c | 4 +
libgomp/config/linux/target-indirect.c | 32 +++++
libgomp/config/nvptx/team.c | 5 +
libgomp/libgomp-plugin.h | 5 +-
libgomp/libgomp.h | 23 ++++
libgomp/libgomp.map | 1 +
libgomp/libgomp.texi | 6 +-
libgomp/libgomp_g.h | 1 +
libgomp/oacc-host.c | 3 +-
libgomp/plugin/plugin-gcn.c | 88 +++++++++++-
libgomp/plugin/plugin-nvptx.c | 63 ++++++++-
libgomp/target.c | 17 ++-
.../libgomp.c++/declare-target-indirect-1.C | 23 ++++
.../declare-target-indirect-1.c | 21 +++
.../declare-target-indirect-2.c | 33 +++++
45 files changed, 1007 insertions(+), 58 deletions(-)
create mode 100644 gcc/testsuite/c-c++-common/gomp/declare-target-indirect-1.c
create mode 100644 gcc/testsuite/c-c++-common/gomp/declare-target-indirect-2.c
create mode 100644 gcc/testsuite/g++.dg/gomp/declare-target-indirect-1.C
create mode 100644 libgomp/config/accel/target-indirect.c
create mode 100644 libgomp/config/linux/target-indirect.c
create mode 100644 libgomp/testsuite/libgomp.c++/declare-target-indirect-1.C
create mode 100644 libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-1.c
create mode 100644 libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index a041c3b91eb..754cdab2ae8 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -522,6 +522,8 @@ const struct attribute_spec c_common_attribute_table[] =
handle_omp_declare_target_attribute, NULL },
{ "omp declare target implicit", 0, 0, true, false, false, false,
handle_omp_declare_target_attribute, NULL },
+ { "omp declare target indirect", 0, 0, true, false, false, false,
+ handle_omp_declare_target_attribute, NULL },
{ "omp declare target host", 0, 0, true, false, false, false,
handle_omp_declare_target_attribute, NULL },
{ "omp declare target nohost", 0, 0, true, false, false, false,
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 682157a4517..98177913053 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -125,6 +125,7 @@ enum pragma_omp_clause {
PRAGMA_OMP_CLAUSE_IF,
PRAGMA_OMP_CLAUSE_IN_REDUCTION,
PRAGMA_OMP_CLAUSE_INBRANCH,
+ PRAGMA_OMP_CLAUSE_INDIRECT,
PRAGMA_OMP_CLAUSE_IS_DEVICE_PTR,
PRAGMA_OMP_CLAUSE_LASTPRIVATE,
PRAGMA_OMP_CLAUSE_LINEAR,
diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 7a145bed281..3112d46f120 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -5332,6 +5332,14 @@ c_decl_attributes (tree *node, tree attributes, int flags)
attributes
= tree_cons (get_identifier ("omp declare target nohost"),
NULL_TREE, attributes);
+
+ int indirect
+ = current_omp_declare_target_attribute->last ().indirect;
+ if (indirect && !lookup_attribute ("omp declare target indirect",
+ attributes))
+ attributes
+ = tree_cons (get_identifier ("omp declare target indirect"),
+ NULL_TREE, attributes);
}
}
diff --git a/gcc/c/c-lang.h b/gcc/c/c-lang.h
index 4fea11855f1..cb13e34e80e 100644
--- a/gcc/c/c-lang.h
+++ b/gcc/c/c-lang.h
@@ -62,6 +62,7 @@ struct GTY(()) language_function {
struct GTY(()) c_omp_declare_target_attr {
int device_type;
+ int indirect;
};
/* If non-zero, implicit "omp declare target" attribute is added into the
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 5213a57a1ec..ebfe17ca383 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -13975,6 +13975,8 @@ c_parser_omp_clause_name (c_parser *parser)
result = PRAGMA_OMP_CLAUSE_IN_REDUCTION;
else if (!strcmp ("inbranch", p))
result = PRAGMA_OMP_CLAUSE_INBRANCH;
+ else if (!strcmp ("indirect", p))
+ result = PRAGMA_OMP_CLAUSE_INDIRECT;
else if (!strcmp ("independent", p))
result = PRAGMA_OACC_CLAUSE_INDEPENDENT;
else if (!strcmp ("is_device_ptr", p))
@@ -14837,6 +14839,47 @@ c_parser_omp_clause_final (c_parser *parser, tree list)
return list;
}
+/* OpenMP 5.1:
+ indirect [( expression )]
+*/
+
+static tree
+c_parser_omp_clause_indirect (c_parser *parser, tree list)
+{
+ location_t location = c_parser_peek_token (parser)->location;
+ tree t;
+
+ if (c_parser_peek_token (parser)->type == CPP_OPEN_PAREN)
+ {
+ matching_parens parens;
+ if (!parens.require_open (parser))
+ return list;
+
+ location_t loc = c_parser_peek_token (parser)->location;
+ c_expr expr = c_parser_expr_no_commas (parser, NULL);
+ expr = convert_lvalue_to_rvalue (loc, expr, true, true);
+ t = c_objc_common_truthvalue_conversion (loc, expr.value);
+ t = c_fully_fold (t, false, NULL);
+ if (!INTEGRAL_TYPE_P (TREE_TYPE (t))
+ || TREE_CODE (t) != INTEGER_CST)
+ {
+ c_parser_error (parser, "expected constant logical expression");
+ return list;
+ }
+ parens.skip_until_found_close (parser);
+ }
+ else
+ t = integer_one_node;
+
+ check_no_duplicate_clause (list, OMP_CLAUSE_INDIRECT, "indirect");
+
+ tree c = build_omp_clause (location, OMP_CLAUSE_INDIRECT);
+ OMP_CLAUSE_INDIRECT_EXPR (c) = t;
+ OMP_CLAUSE_CHAIN (c) = list;
+
+ return c;
+}
+
/* OpenACC, OpenMP 2.5:
if ( expression )
@@ -18398,6 +18441,10 @@ c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask,
true, clauses);
c_name = "in_reduction";
break;
+ case PRAGMA_OMP_CLAUSE_INDIRECT:
+ clauses = c_parser_omp_clause_indirect (parser, clauses);
+ c_name = "indirect";
+ break;
case PRAGMA_OMP_CLAUSE_LASTPRIVATE:
clauses = c_parser_omp_clause_lastprivate (parser, clauses);
c_name = "lastprivate";
@@ -23798,14 +23845,16 @@ c_finish_omp_declare_simd (c_parser *parser, tree fndecl, tree parms,
( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_TO) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_ENTER) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_LINK) \
- | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEVICE_TYPE))
+ | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEVICE_TYPE) \
+ | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_INDIRECT))
static void
c_parser_omp_declare_target (c_parser *parser)
{
tree clauses = NULL_TREE;
int device_type = 0;
- bool only_device_type = true;
+ bool indirect = false;
+ bool only_device_type_or_indirect = true;
if (c_parser_next_token_is (parser, CPP_NAME)
|| (c_parser_next_token_is (parser, CPP_COMMA)
&& c_parser_peek_2nd_token (parser)->type == CPP_NAME))
@@ -23821,22 +23870,27 @@ c_parser_omp_declare_target (c_parser *parser)
else
{
c_parser_skip_to_pragma_eol (parser);
- c_omp_declare_target_attr attr = { -1 };
+ c_omp_declare_target_attr attr = { -1, 0 };
vec_safe_push (current_omp_declare_target_attribute, attr);
return;
}
- for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
- if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE)
- device_type |= OMP_CLAUSE_DEVICE_TYPE_KIND (c);
for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
{
if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE)
+ device_type |= OMP_CLAUSE_DEVICE_TYPE_KIND (c);
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_INDIRECT)
+ indirect |= !integer_zerop (OMP_CLAUSE_INDIRECT_EXPR (c));
+ }
+ for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+ {
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE
+ || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_INDIRECT)
continue;
tree t = OMP_CLAUSE_DECL (c), id;
tree at1 = lookup_attribute ("omp declare target", DECL_ATTRIBUTES (t));
tree at2 = lookup_attribute ("omp declare target link",
DECL_ATTRIBUTES (t));
- only_device_type = false;
+ only_device_type_or_indirect = false;
if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LINK)
{
id = get_identifier ("omp declare target link");
@@ -23898,10 +23952,25 @@ c_parser_omp_declare_target (c_parser *parser)
= tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
}
}
+ if (indirect)
+ {
+ tree at4 = lookup_attribute ("omp declare target indirect",
+ DECL_ATTRIBUTES (t));
+ if (at4 == NULL_TREE)
+ {
+ id = get_identifier ("omp declare target indirect");
+ DECL_ATTRIBUTES (t)
+ = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+ }
+ }
}
- if (device_type && only_device_type)
+ if ((device_type || indirect) && only_device_type_or_indirect)
error_at (OMP_CLAUSE_LOCATION (clauses),
- "directive with only %<device_type%> clause");
+ "directive with only %<device_type%> or %<indirect%> clauses");
+ if (indirect && device_type && device_type != OMP_CLAUSE_DEVICE_TYPE_ANY)
+ error_at (OMP_CLAUSE_LOCATION (clauses),
+ "%<device_type%> clause must specify 'any' when used with "
+ "an %<indirect%> clause");
}
/* OpenMP 5.1
@@ -23910,7 +23979,8 @@ c_parser_omp_declare_target (c_parser *parser)
#pragma omp begin declare target clauses[optseq] new-line */
#define OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK \
- (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEVICE_TYPE)
+ ( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEVICE_TYPE) \
+ | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_INDIRECT))
static void
c_parser_omp_begin (c_parser *parser)
@@ -23933,10 +24003,15 @@ c_parser_omp_begin (c_parser *parser)
OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK,
"#pragma omp begin declare target");
int device_type = 0;
+ int indirect = 0;
for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
- if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE)
- device_type |= OMP_CLAUSE_DEVICE_TYPE_KIND (c);
- c_omp_declare_target_attr attr = { device_type };
+ {
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE)
+ device_type |= OMP_CLAUSE_DEVICE_TYPE_KIND (c);
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_INDIRECT)
+ indirect |= !integer_zerop (OMP_CLAUSE_INDIRECT_EXPR (c));
+ }
+ c_omp_declare_target_attr attr = { device_type, indirect };
vec_safe_push (current_omp_declare_target_attribute, attr);
}
else
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index bdd57aae3ff..4c33a30d223 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -15914,6 +15914,7 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
case OMP_CLAUSE_IF_PRESENT:
case OMP_CLAUSE_FINALIZE:
case OMP_CLAUSE_NOHOST:
+ case OMP_CLAUSE_INDIRECT:
pc = &OMP_CLAUSE_CHAIN (c);
continue;
diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index f6d56b798e1..0e224ca8f65 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -479,7 +479,8 @@ copy_early_debug_info (const char *infile, const char *outfile)
static void
process_asm (FILE *in, FILE *out, FILE *cfile)
{
- int fn_count = 0, var_count = 0, dims_count = 0, regcount_count = 0;
+ int fn_count = 0, var_count = 0, ind_fn_count = 0;
+ int dims_count = 0, regcount_count = 0;
struct obstack fns_os, dims_os, regcounts_os;
obstack_init (&fns_os);
obstack_init (&dims_os);
@@ -508,7 +509,8 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
{ IN_CODE,
IN_METADATA,
IN_VARS,
- IN_FUNCS
+ IN_FUNCS,
+ IN_IND_FUNCS,
} state = IN_CODE;
while (fgets (buf, sizeof (buf), in))
{
@@ -570,6 +572,17 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
}
break;
}
+ case IN_IND_FUNCS:
+ {
+ char *funcname;
+ if (sscanf (buf, "\t.8byte\t%ms\n", &funcname))
+ {
+ fputs (buf, out);
+ ind_fn_count++;
+ continue;
+ }
+ break;
+ }
}
char dummy;
@@ -597,6 +610,15 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
".offload_func_table:\n",
out);
}
+ else if (sscanf (buf, " .section .gnu.offload_ind_funcs%c", &dummy) > 0)
+ {
+ state = IN_IND_FUNCS;
+ fputs (buf, out);
+ fputs ("\t.global .offload_ind_func_table\n"
+ "\t.type .offload_ind_func_table, @object\n"
+ ".offload_ind_func_table:\n",
+ out);
+ }
else if (sscanf (buf, " .amdgpu_metadata%c", &dummy) > 0)
{
state = IN_METADATA;
@@ -634,6 +656,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
fprintf (cfile, "#include <stdbool.h>\n\n");
fprintf (cfile, "static const int gcn_num_vars = %d;\n\n", var_count);
+ fprintf (cfile, "static const int gcn_num_ind_funcs = %d;\n\n", ind_fn_count);
/* Dump out function idents. */
fprintf (cfile, "static const struct hsa_kernel_description {\n"
@@ -728,12 +751,14 @@ process_obj (FILE *in, FILE *cfile, uint32_t omp_requires)
" const struct gcn_image *gcn_image;\n"
" unsigned kernel_count;\n"
" const struct hsa_kernel_description *kernel_infos;\n"
+ " unsigned ind_func_count;\n"
" unsigned global_variable_count;\n"
"} gcn_data = {\n"
" %d,\n"
" &gcn_image,\n"
" sizeof (gcn_kernels) / sizeof (gcn_kernels[0]),\n"
" gcn_kernels,\n"
+ " gcn_num_ind_funcs,\n"
" gcn_num_vars\n"
"};\n\n", omp_requires);
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index aaea9fb320d..fb75ca090df 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -51,6 +51,7 @@ struct id_map
};
static id_map *func_ids, **funcs_tail = &func_ids;
+static id_map *ind_func_ids, **ind_funcs_tail = &ind_func_ids;
static id_map *var_ids, **vars_tail = &var_ids;
/* Files to unlink. */
@@ -302,6 +303,11 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
output_fn_ptr = true;
record_id (input + i + 9, &funcs_tail);
}
+ else if (startswith (input + i, "IND_FUNC_MAP "))
+ {
+ output_fn_ptr = true;
+ record_id (input + i + 13, &ind_funcs_tail);
+ }
else
abort ();
/* Skip to next line. */
@@ -422,6 +428,77 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
fprintf (out, "};\\n\";\n\n");
}
+ if (ind_func_ids)
+ {
+ const char needle[] = "// BEGIN GLOBAL FUNCTION DECL: ";
+
+ fprintf (out, "static const char ptx_code_%u[] =\n", obj_count++);
+ fprintf (out, "\t\".version ");
+ for (size_t i = 0; version[i] != '\0' && version[i] != '\n'; i++)
+ fputc (version[i], out);
+ fprintf (out, "\"\n\t\".target sm_");
+ for (size_t i = 0; sm_ver[i] != '\0' && sm_ver[i] != '\n'; i++)
+ fputc (sm_ver[i], out);
+ fprintf (out, "\"\n\t\".file 2 \\\"<dummy>\\\"\"\n");
+
+ /* WORKAROUND - see PR 108098
+ It seems as if older CUDA JIT compiler optimizes the function pointers
+ in offload_func_table to NULL, which can be prevented by adding a
+ dummy procedure. With CUDA 11.1, it seems to work fine without
+ workaround while CUDA 10.2 as some ancient version have need the
+ workaround. Assuming CUDA 11.0 fixes it, emitting it could be
+ restricted to 'if (sm_ver2[0] < 8 && version2[0] < 7)' as sm_80 and
+ PTX ISA 7.0 are new in CUDA 11.0; for 11.1 it would be sm_86 and
+ PTX ISA 7.1. */
+ fprintf (out, "\n\t\".func __dummy$func2 ( );\"\n");
+ fprintf (out, "\t\".func __dummy$func2 ( )\"\n");
+ fprintf (out, "\t\"{\"\n");
+ fprintf (out, "\t\"}\"\n");
+
+ size_t fidx = 0;
+ for (id = ind_func_ids; id; id = id->next)
+ {
+ fprintf (out, "\t\".extern ");
+ const char *p = input + file_idx[fidx];
+ while (true)
+ {
+ p = strstr (p, needle);
+ if (!p)
+ {
+ fidx++;
+ if (fidx >= file_cnt)
+ break;
+ p = input + file_idx[fidx];
+ continue;
+ }
+ p += strlen (needle);
+ if (!startswith (p, id->ptx_name))
+ continue;
+ p += strlen (id->ptx_name);
+ if (*p != '\n')
+ continue;
+ p++;
+ /* Skip over any directives. */
+ while (!startswith (p, ".func"))
+ while (*p++ != ' ');
+ for (; *p != '\0' && *p != '\n'; p++)
+ fputc (*p, out);
+ break;
+ }
+ fprintf (out, "\"\n");
+ if (fidx == file_cnt)
+ fatal_error (input_location,
+ "Cannot find function declaration for %qs",
+ id->ptx_name);
+ }
+
+ fprintf (out, "\t\".visible .global .align 8 .u64 "
+ "$offload_ind_func_table[] = {");
+ for (comma = "", id = ind_func_ids; id; comma = ",", id = id->next)
+ fprintf (out, "%s\"\n\t\t\"%s", comma, id->ptx_name);
+ fprintf (out, "};\\n\";\n\n");
+ }
+
/* Dump out array of pointers to ptx object strings. */
fprintf (out, "static const struct ptx_obj {\n"
" const char *code;\n"
@@ -447,6 +524,12 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
id->dim ? id->dim : "");
fprintf (out, "\n};\n\n");
+ /* Dump out indirect function idents. */
+ fprintf (out, "static const char *const ind_func_mappings[] = {");
+ for (comma = "", id = ind_func_ids; id; comma = ",", id = id->next)
+ fprintf (out, "%s\n\t\"%s\"", comma, id->ptx_name);
+ fprintf (out, "\n};\n\n");
+
fprintf (out,
"static const struct nvptx_data {\n"
" uintptr_t omp_requires_mask;\n"
@@ -456,12 +539,14 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
" unsigned var_num;\n"
" const struct nvptx_fn *fn_names;\n"
" unsigned fn_num;\n"
+ " unsigned ind_fn_num;\n"
"} nvptx_data = {\n"
" %d, ptx_objs, sizeof (ptx_objs) / sizeof (ptx_objs[0]),\n"
" var_mappings,"
" sizeof (var_mappings) / sizeof (var_mappings[0]),\n"
" func_mappings,"
- " sizeof (func_mappings) / sizeof (func_mappings[0])\n"
+ " sizeof (func_mappings) / sizeof (func_mappings[0]),\n"
+ " sizeof (ind_func_mappings) / sizeof (ind_func_mappings[0])\n"
"};\n\n", omp_requires);
fprintf (out, "#ifdef __cplusplus\n"
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 634c31673be..0eeff95b3f5 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -5919,7 +5919,11 @@ nvptx_record_offload_symbol (tree decl)
/* OpenMP offloading does not set this attribute. */
tree dims = attr ? TREE_VALUE (attr) : NULL_TREE;
- fprintf (asm_out_file, "//:FUNC_MAP \"%s\"",
+ fprintf (asm_out_file, "//:");
+ if (lookup_attribute ("omp declare target indirect",
+ DECL_ATTRIBUTES (decl)))
+ fprintf (asm_out_file, "IND_");
+ fprintf (asm_out_file, "FUNC_MAP \"%s\"",
IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
for (; dims; dims = TREE_CHAIN (dims))
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 98b29e9cf81..b2603d4830e 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -1831,6 +1831,7 @@ union GTY((desc ("cp_tree_node_structure (&%h)"),
struct GTY(()) cp_omp_declare_target_attr {
bool attr_syntax;
int device_type;
+ bool indirect;
};
struct GTY(()) cp_omp_begin_assumes_data {
diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 0aa1e355972..9e666e5eece 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -1762,6 +1762,12 @@ cplus_decl_attributes (tree *decl, tree attributes, int flags)
attributes
= tree_cons (get_identifier ("omp declare target nohost"),
NULL_TREE, attributes);
+ if (last.indirect
+ && !lookup_attribute ("omp declare target indirect",
+ attributes))
+ attributes
+ = tree_cons (get_identifier ("omp declare target indirect"),
+ NULL_TREE, attributes);
}
}
}
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 20e18365906..cd1d7666ae3 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -37524,6 +37524,8 @@ cp_parser_omp_clause_name (cp_parser *parser)
result = PRAGMA_OMP_CLAUSE_IN_REDUCTION;
else if (!strcmp ("inbranch", p))
result = PRAGMA_OMP_CLAUSE_INBRANCH;
+ else if (!strcmp ("indirect", p))
+ result = PRAGMA_OMP_CLAUSE_INDIRECT;
else if (!strcmp ("independent", p))
result = PRAGMA_OACC_CLAUSE_INDEPENDENT;
else if (!strcmp ("is_device_ptr", p))
@@ -38558,6 +38560,46 @@ cp_parser_omp_clause_final (cp_parser *parser, tree list, location_t location)
return c;
}
+/* OpenMP 5.1:
+ indirect [( expression )]
+*/
+
+static tree
+cp_parser_omp_clause_indirect (cp_parser *parser, tree list,
+ location_t location)
+{
+ tree t;
+
+ if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN))
+ {
+ matching_parens parens;
+ if (!parens.require_open (parser))
+ return list;
+
+ bool non_constant_p;
+ t = cp_parser_constant_expression (parser, true, &non_constant_p);
+
+ if (t != error_mark_node && non_constant_p)
+ error_at (location, "expected constant logical expression");
+
+ if (t == error_mark_node
+ || !parens.require_close (parser))
+ cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
+ /*or_comma=*/false,
+ /*consume_paren=*/true);
+ }
+ else
+ t = integer_one_node;
+
+ check_no_duplicate_clause (list, OMP_CLAUSE_INDIRECT, "indirect", location);
+
+ tree c = build_omp_clause (location, OMP_CLAUSE_INDIRECT);
+ OMP_CLAUSE_INDIRECT_EXPR (c) = t;
+ OMP_CLAUSE_CHAIN (c) = list;
+
+ return c;
+}
+
/* OpenMP 2.5:
if ( expression )
@@ -41629,6 +41671,11 @@ cp_parser_omp_all_clauses (cp_parser *parser, omp_clause_mask mask,
true, clauses);
c_name = "in_reduction";
break;
+ case PRAGMA_OMP_CLAUSE_INDIRECT:
+ clauses = cp_parser_omp_clause_indirect (parser, clauses,
+ token->location);
+ c_name = "indirect";
+ break;
case PRAGMA_OMP_CLAUSE_LASTPRIVATE:
clauses = cp_parser_omp_clause_lastprivate (parser, clauses);
c_name = "lastprivate";
@@ -48170,7 +48217,8 @@ cp_maybe_parse_omp_decl (tree decl, tree d)
on #pragma omp declare target. Return false if errors were reported. */
static bool
-handle_omp_declare_target_clause (tree c, tree t, int device_type)
+handle_omp_declare_target_clause (tree c, tree t, int device_type,
+ bool indirect)
{
tree at1 = lookup_attribute ("omp declare target", DECL_ATTRIBUTES (t));
tree at2 = lookup_attribute ("omp declare target link", DECL_ATTRIBUTES (t));
@@ -48234,6 +48282,17 @@ handle_omp_declare_target_clause (tree c, tree t, int device_type)
DECL_ATTRIBUTES (t) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
}
}
+ if (indirect)
+ {
+ tree at4 = lookup_attribute ("omp declare target indirect",
+ DECL_ATTRIBUTES (t));
+ if (at4 == NULL_TREE)
+ {
+ id = get_identifier ("omp declare target indirect");
+ DECL_ATTRIBUTES (t)
+ = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+ }
+ }
return true;
}
@@ -48251,14 +48310,16 @@ handle_omp_declare_target_clause (tree c, tree t, int device_type)
( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_TO) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_ENTER) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_LINK) \
- | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEVICE_TYPE))
+ | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEVICE_TYPE) \
+ | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_INDIRECT))
static void
cp_parser_omp_declare_target (cp_parser *parser, cp_token *pragma_tok)
{
tree clauses = NULL_TREE;
int device_type = 0;
- bool only_device_type = true;
+ bool indirect = false;
+ bool only_device_type_or_indirect = true;
if (cp_lexer_next_token_is (parser->lexer, CPP_NAME)
|| (cp_lexer_next_token_is (parser->lexer, CPP_COMMA)
&& cp_lexer_nth_token_is (parser->lexer, 2, CPP_NAME)))
@@ -48276,21 +48337,26 @@ cp_parser_omp_declare_target (cp_parser *parser, cp_token *pragma_tok)
else
{
cp_omp_declare_target_attr a
- = { parser->lexer->in_omp_attribute_pragma, -1 };
+ = { parser->lexer->in_omp_attribute_pragma, -1, false };
vec_safe_push (scope_chain->omp_declare_target_attribute, a);
cp_parser_require_pragma_eol (parser, pragma_tok);
return;
}
- for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
- if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE)
- device_type |= OMP_CLAUSE_DEVICE_TYPE_KIND (c);
for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
{
if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE)
+ device_type |= OMP_CLAUSE_DEVICE_TYPE_KIND (c);
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_INDIRECT)
+ indirect |= !integer_zerop (OMP_CLAUSE_INDIRECT_EXPR (c));
+ }
+ for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+ {
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE
+ || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_INDIRECT)
continue;
tree t = OMP_CLAUSE_DECL (c);
- only_device_type = false;
- if (!handle_omp_declare_target_clause (c, t, device_type))
+ only_device_type_or_indirect = false;
+ if (!handle_omp_declare_target_clause (c, t, device_type, indirect))
continue;
if (VAR_OR_FUNCTION_DECL_P (t)
&& DECL_LOCAL_DECL_P (t)
@@ -48298,11 +48364,15 @@ cp_parser_omp_declare_target (cp_parser *parser, cp_token *pragma_tok)
&& DECL_LOCAL_DECL_ALIAS (t)
&& DECL_LOCAL_DECL_ALIAS (t) != error_mark_node)
handle_omp_declare_target_clause (c, DECL_LOCAL_DECL_ALIAS (t),
- device_type);
+ device_type, indirect);
}
- if (device_type && only_device_type)
+ if ((device_type || indirect) && only_device_type_or_indirect)
+ error_at (OMP_CLAUSE_LOCATION (clauses),
+ "directive with only %<device_type%> or %<indirect%> clauses");
+ if (indirect && device_type && device_type != OMP_CLAUSE_DEVICE_TYPE_ANY)
error_at (OMP_CLAUSE_LOCATION (clauses),
- "directive with only %<device_type%> clause");
+ "%<device_type%> clause must specify 'any' when used with "
+ "an %<indirect%> clause");
}
/* OpenMP 5.1
@@ -48311,7 +48381,8 @@ cp_parser_omp_declare_target (cp_parser *parser, cp_token *pragma_tok)
# pragma omp begin declare target clauses[optseq] new-line */
#define OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK \
- (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEVICE_TYPE)
+ ( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEVICE_TYPE) \
+ | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_INDIRECT))
static void
cp_parser_omp_begin (cp_parser *parser, cp_token *pragma_tok)
@@ -48341,11 +48412,16 @@ cp_parser_omp_begin (cp_parser *parser, cp_token *pragma_tok)
"#pragma omp begin declare target",
pragma_tok);
int device_type = 0;
+ bool indirect = 0;
for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
- if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE)
- device_type |= OMP_CLAUSE_DEVICE_TYPE_KIND (c);
+ {
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE)
+ device_type |= OMP_CLAUSE_DEVICE_TYPE_KIND (c);
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_INDIRECT)
+ indirect |= !integer_zerop (OMP_CLAUSE_INDIRECT_EXPR (c));
+ }
cp_omp_declare_target_attr a
- = { in_omp_attribute_pragma, device_type };
+ = { in_omp_attribute_pragma, device_type, indirect };
vec_safe_push (scope_chain->omp_declare_target_attribute, a);
}
else
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 37bffca8e55..4059e74bdb7 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -8888,6 +8888,7 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
case OMP_CLAUSE_IF_PRESENT:
case OMP_CLAUSE_FINALIZE:
case OMP_CLAUSE_NOHOST:
+ case OMP_CLAUSE_INDIRECT:
break;
case OMP_CLAUSE_MERGEABLE:
diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
index 32c0f5ac6db..db6a22a444e 100644
--- a/gcc/lto-cgraph.cc
+++ b/gcc/lto-cgraph.cc
@@ -68,6 +68,7 @@ enum LTO_symtab_tags
LTO_symtab_edge,
LTO_symtab_indirect_edge,
LTO_symtab_variable,
+ LTO_symtab_indirect_function,
LTO_symtab_last_tag
};
@@ -1111,6 +1112,18 @@ output_offload_tables (void)
(*offload_vars)[i]);
}
+ for (unsigned i = 0; i < vec_safe_length (offload_ind_funcs); i++)
+ {
+ symtab_node *node = symtab_node::get ((*offload_ind_funcs)[i]);
+ if (!node)
+ continue;
+ node->force_output = true;
+ streamer_write_enum (ob->main_stream, LTO_symtab_tags,
+ LTO_symtab_last_tag, LTO_symtab_indirect_function);
+ lto_output_fn_decl_ref (ob->decl_state, ob->main_stream,
+ (*offload_ind_funcs)[i]);
+ }
+
if (output_requires)
{
HOST_WIDE_INT val = ((HOST_WIDE_INT) omp_requires_mask
@@ -1134,6 +1147,7 @@ output_offload_tables (void)
{
vec_free (offload_funcs);
vec_free (offload_vars);
+ vec_free (offload_ind_funcs);
}
}
@@ -1863,6 +1877,19 @@ input_offload_tables (bool do_force_output)
varpool_node::get (var_decl)->force_output = 1;
tmp_decl = var_decl;
}
+ else if (tag == LTO_symtab_indirect_function)
+ {
+ tree fn_decl
+ = lto_input_fn_decl_ref (ib, file_data);
+ vec_safe_push (offload_ind_funcs, fn_decl);
+
+ /* Prevent IPA from removing fn_decl as unreachable, since there
+ may be no refs from the parent function to child_fn in offload
+ LTO mode. */
+ if (do_force_output)
+ cgraph_node::get (fn_decl)->mark_force_output ();
+ tmp_decl = fn_decl;
+ }
else if (tag == LTO_symtab_edge)
{
static bool error_emitted = false;
diff --git a/gcc/lto-section-names.h b/gcc/lto-section-names.h
index aa1b2f2eeff..f7ed622772f 100644
--- a/gcc/lto-section-names.h
+++ b/gcc/lto-section-names.h
@@ -37,5 +37,6 @@ extern const char *section_name_prefix;
#define OFFLOAD_VAR_TABLE_SECTION_NAME ".gnu.offload_vars"
#define OFFLOAD_FUNC_TABLE_SECTION_NAME ".gnu.offload_funcs"
+#define OFFLOAD_IND_FUNC_TABLE_SECTION_NAME ".gnu.offload_ind_funcs"
#endif /* GCC_LTO_SECTION_NAMES_H */
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index e0f03263db0..ed78d49d205 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -445,6 +445,9 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_UPDATE, "GOMP_target_update_ext",
DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_ENTER_EXIT_DATA,
"GOMP_target_enter_exit_data",
BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_UINT_PTR, ATTR_NOTHROW_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR,
+ "GOMP_target_map_indirect_ptr",
+ BT_FN_PTR_PTR, ATTR_NOTHROW_LIST)
DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TEAMS4, "GOMP_teams4",
BT_FN_BOOL_UINT_UINT_UINT_BOOL, ATTR_NOTHROW_LIST)
DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TEAMS_REG, "GOMP_teams_reg",
diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc
index 0d3c8794d54..1d6dfef74fc 100644
--- a/gcc/omp-offload.cc
+++ b/gcc/omp-offload.cc
@@ -86,7 +86,7 @@ struct oacc_loop
};
/* Holds offload tables with decls. */
-vec<tree, va_gc> *offload_funcs, *offload_vars;
+vec<tree, va_gc> *offload_funcs, *offload_vars, *offload_ind_funcs;
/* Return level at which oacc routine may spawn a partitioned loop, or
-1 if it is not a routine (i.e. is an offload fn). */
@@ -351,6 +351,9 @@ omp_discover_implicit_declare_target (void)
if (DECL_SAVED_TREE (node->decl))
{
struct cgraph_node *cgn;
+ if (lookup_attribute ("omp declare target indirect",
+ DECL_ATTRIBUTES (node->decl)))
+ vec_safe_push (offload_ind_funcs, node->decl);
if (omp_declare_target_fn_p (node->decl))
worklist.safe_push (node->decl);
else if (DECL_STRUCT_FUNCTION (node->decl)
@@ -397,49 +400,66 @@ omp_finish_file (void)
{
unsigned num_funcs = vec_safe_length (offload_funcs);
unsigned num_vars = vec_safe_length (offload_vars);
+ unsigned num_ind_funcs = vec_safe_length (offload_ind_funcs);
- if (num_funcs == 0 && num_vars == 0)
+ if (num_funcs == 0 && num_vars == 0 && num_ind_funcs == 0)
return;
if (targetm_common.have_named_sections)
{
- vec<constructor_elt, va_gc> *v_f, *v_v;
+ vec<constructor_elt, va_gc> *v_f, *v_v, *v_if;
vec_alloc (v_f, num_funcs);
vec_alloc (v_v, num_vars * 2);
+ vec_alloc (v_if, num_ind_funcs);
add_decls_addresses_to_decl_constructor (offload_funcs, v_f);
add_decls_addresses_to_decl_constructor (offload_vars, v_v);
+ add_decls_addresses_to_decl_constructor (offload_ind_funcs, v_if);
tree vars_decl_type = build_array_type_nelts (pointer_sized_int_node,
vec_safe_length (v_v));
tree funcs_decl_type = build_array_type_nelts (pointer_sized_int_node,
num_funcs);
+ tree ind_funcs_decl_type = build_array_type_nelts (pointer_sized_int_node,
+ num_ind_funcs);
+
SET_TYPE_ALIGN (vars_decl_type, TYPE_ALIGN (pointer_sized_int_node));
SET_TYPE_ALIGN (funcs_decl_type, TYPE_ALIGN (pointer_sized_int_node));
+ SET_TYPE_ALIGN (ind_funcs_decl_type, TYPE_ALIGN (pointer_sized_int_node));
tree ctor_v = build_constructor (vars_decl_type, v_v);
tree ctor_f = build_constructor (funcs_decl_type, v_f);
- TREE_CONSTANT (ctor_v) = TREE_CONSTANT (ctor_f) = 1;
- TREE_STATIC (ctor_v) = TREE_STATIC (ctor_f) = 1;
+ tree ctor_if = build_constructor (ind_funcs_decl_type, v_if);
+ TREE_CONSTANT (ctor_v) = TREE_CONSTANT (ctor_f) = TREE_CONSTANT (ctor_if) = 1;
+ TREE_STATIC (ctor_v) = TREE_STATIC (ctor_f) = TREE_STATIC (ctor_if) = 1;
tree funcs_decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
get_identifier (".offload_func_table"),
funcs_decl_type);
tree vars_decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
get_identifier (".offload_var_table"),
vars_decl_type);
- TREE_STATIC (funcs_decl) = TREE_STATIC (vars_decl) = 1;
+ tree ind_funcs_decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
+ get_identifier (".offload_ind_func_table"),
+ ind_funcs_decl_type);
+ TREE_STATIC (funcs_decl) = TREE_STATIC (ind_funcs_decl) = 1;
+ TREE_STATIC (vars_decl) = 1;
/* Do not align tables more than TYPE_ALIGN (pointer_sized_int_node),
otherwise a joint table in a binary will contain padding between
tables from multiple object files. */
- DECL_USER_ALIGN (funcs_decl) = DECL_USER_ALIGN (vars_decl) = 1;
+ DECL_USER_ALIGN (funcs_decl) = DECL_USER_ALIGN (ind_funcs_decl) = 1;
+ DECL_USER_ALIGN (vars_decl) = 1;
SET_DECL_ALIGN (funcs_decl, TYPE_ALIGN (funcs_decl_type));
SET_DECL_ALIGN (vars_decl, TYPE_ALIGN (vars_decl_type));
+ SET_DECL_ALIGN (ind_funcs_decl, TYPE_ALIGN (ind_funcs_decl_type));
DECL_INITIAL (funcs_decl) = ctor_f;
DECL_INITIAL (vars_decl) = ctor_v;
+ DECL_INITIAL (ind_funcs_decl) = ctor_if;
set_decl_section_name (funcs_decl, OFFLOAD_FUNC_TABLE_SECTION_NAME);
set_decl_section_name (vars_decl, OFFLOAD_VAR_TABLE_SECTION_NAME);
-
+ set_decl_section_name (ind_funcs_decl,
+ OFFLOAD_IND_FUNC_TABLE_SECTION_NAME);
varpool_node::finalize_decl (vars_decl);
varpool_node::finalize_decl (funcs_decl);
+ varpool_node::finalize_decl (ind_funcs_decl);
}
else
{
@@ -471,6 +491,15 @@ omp_finish_file (void)
#endif
targetm.record_offload_symbol (it);
}
+ for (unsigned i = 0; i < num_ind_funcs; i++)
+ {
+ tree it = (*offload_ind_funcs)[i];
+ /* See also add_decls_addresses_to_decl_constructor
+ and output_offload_tables in lto-cgraph.cc. */
+ if (!in_lto_p && !symtab_node::get (it))
+ continue;
+ targetm.record_offload_symbol (it);
+ }
}
}
@@ -2603,6 +2632,11 @@ execute_omp_device_lower ()
gimple_stmt_iterator gsi;
bool calls_declare_variant_alt
= cgraph_node::get (cfun->decl)->calls_declare_variant_alt;
+#ifdef ACCEL_COMPILER
+ bool omp_redirect_indirect_calls = vec_safe_length (offload_ind_funcs) > 0;
+ tree map_ptr_fn
+ = builtin_decl_explicit (BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR);
+#endif
FOR_EACH_BB_FN (bb, cfun)
for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
{
@@ -2621,6 +2655,33 @@ execute_omp_device_lower ()
update_stmt (stmt);
}
}
+#ifdef ACCEL_COMPILER
+ if (omp_redirect_indirect_calls
+ && gimple_call_fndecl (stmt) == NULL_TREE)
+ {
+ gcall *orig_call = dyn_cast <gcall *> (stmt);
+ tree call_fn = gimple_call_fn (stmt);
+ tree fn_ty = TREE_TYPE (call_fn);
+
+ if (TREE_CODE (call_fn) == OBJ_TYPE_REF)
+ {
+ tree obj_ref = create_tmp_reg (TREE_TYPE (call_fn),
+ ".ind_fn_objref");
+ gimple *gassign = gimple_build_assign (obj_ref, call_fn);
+ gsi_insert_before (&gsi, gassign, GSI_SAME_STMT);
+ call_fn = obj_ref;
+ }
+ tree mapped_fn = create_tmp_reg (fn_ty, ".ind_fn");
+ gimple *gcall =
+ gimple_build_call (map_ptr_fn, 1, call_fn);
+ gimple_set_location (gcall, gimple_location (stmt));
+ gimple_call_set_lhs (gcall, mapped_fn);
+ gsi_insert_before (&gsi, gcall, GSI_SAME_STMT);
+
+ gimple_call_set_fn (orig_call, mapped_fn);
+ update_stmt (orig_call);
+ }
+#endif
continue;
}
tree lhs = gimple_call_lhs (stmt), rhs = NULL_TREE;
@@ -2757,9 +2818,15 @@ public:
/* opt_pass methods: */
bool gate (function *fun) final override
{
+#ifdef ACCEL_COMPILER
+ bool offload_ind_funcs_p = vec_safe_length (offload_ind_funcs) > 0;
+#else
+ bool offload_ind_funcs_p = false;
+#endif
return (!(fun->curr_properties & PROP_gimple_lomp_dev)
|| (flag_openmp
- && cgraph_node::get (fun->decl)->calls_declare_variant_alt));
+ && (cgraph_node::get (fun->decl)->calls_declare_variant_alt
+ || offload_ind_funcs_p)));
}
unsigned int execute (function *) final override
{
diff --git a/gcc/omp-offload.h b/gcc/omp-offload.h
index 73711e74c7d..ae364422417 100644
--- a/gcc/omp-offload.h
+++ b/gcc/omp-offload.h
@@ -28,6 +28,7 @@ extern int oacc_fn_attrib_level (tree attr);
extern GTY(()) vec<tree, va_gc> *offload_funcs;
extern GTY(()) vec<tree, va_gc> *offload_vars;
+extern GTY(()) vec<tree, va_gc> *offload_ind_funcs;
extern void omp_finish_file (void);
extern void omp_discover_implicit_declare_target (void);
diff --git a/gcc/testsuite/c-c++-common/gomp/declare-target-7.c b/gcc/testsuite/c-c++-common/gomp/declare-target-7.c
index 747000a74b9..e37b4652050 100644
--- a/gcc/testsuite/c-c++-common/gomp/declare-target-7.c
+++ b/gcc/testsuite/c-c++-common/gomp/declare-target-7.c
@@ -1,7 +1,7 @@
/* { dg-do compile } */
/* { dg-options "-fopenmp" } */
-#pragma omp declare target device_type (any) /* { dg-error "directive with only 'device_type' clause" } */
+#pragma omp declare target device_type (any) /* { dg-error "directive with only 'device_type' or 'indirect' clauses" } */
void f1 (void) {}
#pragma omp declare target device_type (host) to (f1) device_type (nohost) /* { dg-error "too many 'device_type' clauses" } */
diff --git a/gcc/testsuite/c-c++-common/gomp/declare-target-indirect-1.c b/gcc/testsuite/c-c++-common/gomp/declare-target-indirect-1.c
new file mode 100644
index 00000000000..0fcbb2d04e4
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/declare-target-indirect-1.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-options "-fopenmp" } */
+
+extern int a, b;
+#define X 1
+#define Y 0
+
+#pragma omp begin declare target indirect
+void fn1 (void) { }
+#pragma omp end declare target
+
+#pragma omp begin declare target indirect (1)
+void fn2 (void) { }
+#pragma omp end declare target
+
+#pragma omp begin declare target indirect (0)
+void fn3 (void) { }
+#pragma omp end declare target
+
+void fn4 (void) { }
+#pragma omp declare target indirect to (fn4)
+
+void fn5 (void) { }
+#pragma omp declare target indirect (1) to (fn5)
+
+void fn6 (void) { }
+#pragma omp declare target indirect (0) to (fn6)
+
+void fn7 (void) { }
+#pragma omp declare target indirect (-1) to (fn7)
+
+/* Compile-time non-constant expressions are not allowed. */
+void fn8 (void) { }
+#pragma omp declare target indirect (a + b) to (fn8) /* { dg-error "expected constant logical expression" } */
+
+/* Compile-time constant expressions are permissible. */
+void fn9 (void) { }
+#pragma omp declare target indirect (X*Y) to (fn9)
+
+/* 'omp declare target'...'omp end declare target' form cannot take clauses. */
+#pragma omp declare target indirect /* { dg-error "directive with only 'device_type' or 'indirect' clauses" }*/
+void fn10 (void) { }
+#pragma omp end declare target /* { dg-error "'#pragma omp end declare target' without corresponding '#pragma omp declare target' or '#pragma omp begin declare target'" } */
+
+void fn11 (void) { }
+#pragma omp declare target indirect (1) indirect (0) to (fn11) /* { dg-error "too many .indirect. clauses" } */
+
+void fn12 (void) { }
+#pragma omp declare target indirect ("abs") to (fn12)
+
+void fn13 (void) { }
+#pragma omp declare target indirect (5.5) enter (fn13)
+
+void fn14 (void) { }
+#pragma omp declare target indirect (1) device_type (host) enter (fn14) /* { dg-error "'device_type' clause must specify 'any' when used with an 'indirect' clause" } */
+
+void fn15 (void) { }
+#pragma omp declare target indirect (0) device_type (nohost) enter (fn15)
+
+/* Indirect on a variable should have no effect. */
+int x;
+#pragma omp declare target indirect to(x)
diff --git a/gcc/testsuite/c-c++-common/gomp/declare-target-indirect-2.c b/gcc/testsuite/c-c++-common/gomp/declare-target-indirect-2.c
new file mode 100644
index 00000000000..6ba278b3ef0
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/declare-target-indirect-2.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-fopenmp -fdump-tree-gimple" } */
+
+#pragma omp begin declare target indirect
+void fn1 (void) { }
+#pragma omp end declare target
+/* { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target, omp declare target block, omp declare target indirect\\\)\\\)\\\nvoid fn1" "gimple" } } */
+
+#pragma omp begin declare target indirect (0)
+void fn2 (void) { }
+#pragma omp end declare target
+/* { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target, omp declare target block\\\)\\\)\\\nvoid fn2" "gimple" } } */
+
+void fn3 (void) { }
+#pragma omp declare target indirect to (fn3)
+/* { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target indirect, omp declare target\\\)\\\)\\\nvoid fn3" "gimple" } } */
+
+void fn4 (void) { }
+#pragma omp declare target indirect (0) to (fn4)
+/* { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target\\\)\\\)\\\nvoid fn4" "gimple" } } */
+
+#pragma omp begin declare target indirect(1)
+ int foo(void) { return 5; }
+ #pragma omp begin declare target indirect(0)
+ int bar(void) { return 8; }
+ int baz(void) { return 11; }
+ #pragma omp declare target indirect enter(baz)
+ #pragma omp end declare target
+#pragma omp end declare target
+/* { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target, omp declare target block, omp declare target indirect\\\)\\\)\\\nint foo" "gimple" } } */
+/* { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target, omp declare target block\\\)\\\)\\\nint bar" "gimple" } } */
+/* { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target indirect, omp declare target, omp declare target block\\\)\\\)\\\nint baz" "gimple" } } */
diff --git a/gcc/testsuite/g++.dg/gomp/declare-target-indirect-1.C b/gcc/testsuite/g++.dg/gomp/declare-target-indirect-1.C
new file mode 100644
index 00000000000..1d66ec9f741
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/declare-target-indirect-1.C
@@ -0,0 +1,17 @@
+// { dg-skip-if "c++98 does not support attributes" { c++98_only } }
+
+[[omp::decl (declare target, indirect(1))]] // { dg-error "directive with only 'device_type' or 'indirect' clause" }
+int f (void) { return 5; }
+
+[[omp::decl (declare target indirect)]] // { dg-error "directive with only 'device_type' or 'indirect' clause" }
+int g (void) { return 8; }
+
+[[omp::directive (begin declare target, indirect)]];
+int h (void) { return 11; }
+[[omp::directive (end declare target)]];
+
+int i (void) { return 8; }
+[[omp::directive (declare target to(i), indirect (1))]];
+
+int j (void) { return 11; }
+[[omp::directive (declare target indirect enter (j))]];
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 13435344401..65e51b939a2 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -350,6 +350,9 @@ enum omp_clause_code {
/* OpenMP clause: doacross ({source,sink}:vec). */
OMP_CLAUSE_DOACROSS,
+ /* OpenMP clause: indirect [(constant-integer-expression)]. */
+ OMP_CLAUSE_INDIRECT,
+
/* Internal structure to hold OpenACC cache directive's variable-list.
#pragma acc cache (variable-list). */
OMP_CLAUSE__CACHE_,
diff --git a/gcc/tree.cc b/gcc/tree.cc
index cfead156ddf..285ba046a63 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -269,6 +269,7 @@ unsigned const char omp_clause_num_ops[] =
2, /* OMP_CLAUSE_MAP */
1, /* OMP_CLAUSE_HAS_DEVICE_ADDR */
1, /* OMP_CLAUSE_DOACROSS */
+ 1, /* OMP_CLAUSE_INDIRECT */
2, /* OMP_CLAUSE__CACHE_ */
2, /* OMP_CLAUSE_GANG */
1, /* OMP_CLAUSE_ASYNC */
@@ -361,6 +362,7 @@ const char * const omp_clause_code_name[] =
"map",
"has_device_addr",
"doacross",
+ "indirect",
"_cache_",
"gang",
"async",
diff --git a/gcc/tree.h b/gcc/tree.h
index ac94bd7b460..a49de5c58cb 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1842,6 +1842,10 @@ class auto_suppress_location_wrappers
#define OMP_CLAUSE_DEVICE_TYPE_KIND(NODE) \
(OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_DEVICE_TYPE)->omp_clause.subcode.device_type_kind)
+#define OMP_CLAUSE_INDIRECT_EXPR(NODE) \
+ OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_INDIRECT), 0)
+
+
/* True if there is a device clause with a device-modifier 'ancestor'. */
#define OMP_CLAUSE_DEVICE_ANCESTOR(NODE) \
(OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_DEVICE)->base.public_flag)
diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index 89b966e63c6..f1579bb2519 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -316,7 +316,7 @@ enum gomp_map_kind
/* Versions of libgomp and device-specific plugins. GOMP_VERSION
should be incremented whenever an ABI-incompatible change is introduced
to the plugin interface defined in libgomp/libgomp.h. */
-#define GOMP_VERSION 2
+#define GOMP_VERSION 3
#define GOMP_VERSION_NVIDIA_PTX 1
#define GOMP_VERSION_GCN 3
@@ -324,6 +324,8 @@ enum gomp_map_kind
#define GOMP_VERSION_LIB(PACK) (((PACK) >> 16) & 0xffff)
#define GOMP_VERSION_DEV(PACK) ((PACK) & 0xffff)
+#define GOMP_VERSION_SUPPORTS_INDIRECT_FUNCS(VER) (GOMP_VERSION_LIB(VER) >= 3)
+
#define GOMP_DIM_GANG 0
#define GOMP_DIM_WORKER 1
#define GOMP_DIM_VECTOR 2
diff --git a/libgcc/offloadstuff.c b/libgcc/offloadstuff.c
index 4e1c4d41dd5..18c5bf89b69 100644
--- a/libgcc/offloadstuff.c
+++ b/libgcc/offloadstuff.c
@@ -43,6 +43,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
#if defined(HAVE_GAS_HIDDEN) && ENABLE_OFFLOADING == 1
#define OFFLOAD_FUNC_TABLE_SECTION_NAME ".gnu.offload_funcs"
+#define OFFLOAD_IND_FUNC_TABLE_SECTION_NAME ".gnu.offload_ind_funcs"
#define OFFLOAD_VAR_TABLE_SECTION_NAME ".gnu.offload_vars"
#ifdef CRT_BEGIN
@@ -53,6 +54,9 @@ const void *const __offload_func_table[0]
const void *const __offload_var_table[0]
__attribute__ ((__used__, visibility ("hidden"),
section (OFFLOAD_VAR_TABLE_SECTION_NAME))) = { };
+const void *const __offload_ind_func_table[0]
+ __attribute__ ((__used__, visibility ("hidden"),
+ section (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME))) = { };
#elif defined CRT_END
@@ -62,19 +66,25 @@ const void *const __offload_funcs_end[0]
const void *const __offload_vars_end[0]
__attribute__ ((__used__, visibility ("hidden"),
section (OFFLOAD_VAR_TABLE_SECTION_NAME))) = { };
+const void *const __offload_ind_funcs_end[0]
+ __attribute__ ((__used__, visibility ("hidden"),
+ section (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME))) = { };
#elif defined CRT_TABLE
extern const void *const __offload_func_table[];
extern const void *const __offload_var_table[];
+extern const void *const __offload_ind_func_table[];
extern const void *const __offload_funcs_end[];
extern const void *const __offload_vars_end[];
+extern const void *const __offload_ind_funcs_end[];
const void *const __OFFLOAD_TABLE__[]
__attribute__ ((__visibility__ ("hidden"))) =
{
&__offload_func_table, &__offload_funcs_end,
- &__offload_var_table, &__offload_vars_end
+ &__offload_var_table, &__offload_vars_end,
+ &__offload_ind_func_table, &__offload_ind_funcs_end,
};
#else /* ! CRT_BEGIN && ! CRT_END && ! CRT_TABLE */
diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index ceb8c910abd..1871590596d 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -72,7 +72,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c error.c \
target.c splay-tree.c libgomp-plugin.c oacc-parallel.c oacc-host.c \
oacc-init.c oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c \
priority_queue.c affinity-fmt.c teams.c allocator.c oacc-profiling.c \
- oacc-target.c
+ oacc-target.c target-indirect.c
include $(top_srcdir)/plugin/Makefrag.am
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 186937da4e9..56a6beab867 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -219,7 +219,7 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo critical.lo \
oacc-parallel.lo oacc-host.lo oacc-init.lo oacc-mem.lo \
oacc-async.lo oacc-plugin.lo oacc-cuda.lo priority_queue.lo \
affinity-fmt.lo teams.lo allocator.lo oacc-profiling.lo \
- oacc-target.lo $(am__objects_1)
+ oacc-target.lo target-indirect.lo $(am__objects_1)
libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
AM_V_P = $(am__v_P_@AM_V@)
am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
@@ -552,7 +552,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c \
oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c \
oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
affinity-fmt.c teams.c allocator.c oacc-profiling.c \
- oacc-target.c $(am__append_3)
+ oacc-target.c target-indirect.c $(am__append_3)
# Nvidia PTX OpenACC plugin.
@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
@@ -780,6 +780,7 @@ distclean-compile:
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/sem.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/single.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/splay-tree.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/target-indirect.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/target.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/task.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/team.Plo@am__quote@
diff --git a/libgomp/config/accel/target-indirect.c b/libgomp/config/accel/target-indirect.c
new file mode 100644
index 00000000000..6ee82a0ebd0
--- /dev/null
+++ b/libgomp/config/accel/target-indirect.c
@@ -0,0 +1,126 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+
+ Contributed by Siemens.
+
+ This file is part of the GNU Offloading and Multi Processing Library
+ (libgomp).
+
+ Libgomp is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3, or (at your option)
+ any later version.
+
+ Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+ WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+ FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <assert.h>
+#include "libgomp.h"
+
+#define splay_tree_prefix indirect
+#define splay_tree_c
+#include "splay-tree.h"
+
+volatile void **GOMP_INDIRECT_ADDR_MAP = NULL;
+
+/* Use a splay tree to lookup the target address instead of using a
+ linear search. */
+#define USE_SPLAY_TREE_LOOKUP
+
+#ifdef USE_SPLAY_TREE_LOOKUP
+
+static struct indirect_splay_tree_s indirect_map;
+static indirect_splay_tree_node indirect_array = NULL;
+
+/* Build the splay tree used for host->target address lookups. */
+
+void
+build_indirect_map (void)
+{
+ size_t num_ind_funcs = 0;
+ volatile void **map_entry;
+ static int lock = 0; /* == gomp_mutex_t lock; gomp_mutex_init (&lock); */
+
+ if (!GOMP_INDIRECT_ADDR_MAP)
+ return;
+
+ gomp_mutex_lock (&lock);
+
+ if (!indirect_array)
+ {
+ /* Count the number of entries in the NULL-terminated address map. */
+ for (map_entry = GOMP_INDIRECT_ADDR_MAP; *map_entry;
+ map_entry += 2, num_ind_funcs++);
+
+ /* Build splay tree for address lookup. */
+ indirect_array = gomp_malloc (num_ind_funcs * sizeof (*indirect_array));
+ indirect_splay_tree_node array = indirect_array;
+ map_entry = GOMP_INDIRECT_ADDR_MAP;
+
+ for (int i = 0; i < num_ind_funcs; i++, array++)
+ {
+ indirect_splay_tree_key k = &array->key;
+ k->host_addr = (uint64_t) *map_entry++;
+ k->target_addr = (uint64_t) *map_entry++;
+ array->left = NULL;
+ array->right = NULL;
+ indirect_splay_tree_insert (&indirect_map, array);
+ }
+ }
+
+ gomp_mutex_unlock (&lock);
+}
+
+void *
+GOMP_target_map_indirect_ptr (void *ptr)
+{
+ /* NULL pointers always resolve to NULL. */
+ if (!ptr)
+ return ptr;
+
+ assert (indirect_array);
+
+ struct indirect_splay_tree_key_s k;
+ indirect_splay_tree_key node = NULL;
+
+ k.host_addr = (uint64_t) ptr;
+ node = indirect_splay_tree_lookup (&indirect_map, &k);
+
+ return node ? (void *) node->target_addr : ptr;
+}
+
+#else
+
+void
+build_indirect_map (void)
+{
+}
+
+void *
+GOMP_target_map_indirect_ptr (void *ptr)
+{
+ /* NULL pointers always resolve to NULL. */
+ if (!ptr)
+ return ptr;
+
+ assert (GOMP_INDIRECT_ADDR_MAP);
+
+ for (volatile void **map_entry = GOMP_INDIRECT_ADDR_MAP; *map_entry;
+ map_entry += 2)
+ if (*map_entry == ptr)
+ return (void *) *(map_entry + 1);
+
+ return ptr;
+}
+
+#endif
diff --git a/libgomp/config/gcn/team.c b/libgomp/config/gcn/team.c
index f03207c84e3..fb20cbbcf9f 100644
--- a/libgomp/config/gcn/team.c
+++ b/libgomp/config/gcn/team.c
@@ -30,6 +30,7 @@
#include <string.h>
static void gomp_thread_start (struct gomp_thread_pool *);
+extern void build_indirect_map (void);
/* This externally visible function handles target region entry. It
sets up a per-team thread pool and transfers control by returning to
@@ -45,6 +46,9 @@ gomp_gcn_enter_kernel (void)
{
int threadid = __builtin_gcn_dim_pos (1);
+ /* Initialize indirect function support. */
+ build_indirect_map ();
+
if (threadid == 0)
{
int numthreads = __builtin_gcn_dim_size (1);
diff --git a/libgomp/config/linux/target-indirect.c b/libgomp/config/linux/target-indirect.c
new file mode 100644
index 00000000000..0ab9bc52d79
--- /dev/null
+++ b/libgomp/config/linux/target-indirect.c
@@ -0,0 +1,32 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+
+ Contributed by Siemens.
+
+ This file is part of the GNU Offloading and Multi Processing Library
+ (libgomp).
+
+ Libgomp is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3, or (at your option)
+ any later version.
+
+ Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+ WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+ FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+void *
+GOMP_target_map_indirect_ptr (void *ptr)
+{
+ /* Calls to this function should not be generated for host code. */
+ __builtin_unreachable ();
+}
diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c
index af5f3171a47..59521fabd99 100644
--- a/libgomp/config/nvptx/team.c
+++ b/libgomp/config/nvptx/team.c
@@ -35,6 +35,7 @@ struct gomp_thread *nvptx_thrs __attribute__((shared,nocommon));
int __gomp_team_num __attribute__((shared,nocommon));
static void gomp_thread_start (struct gomp_thread_pool *);
+extern void build_indirect_map (void);
/* This externally visible function handles target region entry. It
@@ -52,6 +53,10 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data)
int tid, ntids;
asm ("mov.u32 %0, %%tid.y;" : "=r" (tid));
asm ("mov.u32 %0, %%ntid.y;" : "=r" (ntids));
+
+ /* Initialize indirect function support. */
+ build_indirect_map ();
+
if (tid == 0)
{
gomp_global_icv.nthreads_var = ntids;
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index dc993882c3b..3ce032c5cc0 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -107,6 +107,8 @@ struct addr_pair
must be stringified). */
#define GOMP_ADDITIONAL_ICVS __gomp_additional_icvs
+#define GOMP_INDIRECT_ADDR_MAP __gomp_indirect_addr_map
+
/* Miscellaneous functions. */
extern void *GOMP_PLUGIN_malloc (size_t) __attribute__ ((malloc));
extern void *GOMP_PLUGIN_malloc_cleared (size_t) __attribute__ ((malloc));
@@ -132,7 +134,8 @@ extern bool GOMP_OFFLOAD_init_device (int);
extern bool GOMP_OFFLOAD_fini_device (int);
extern unsigned GOMP_OFFLOAD_version (void);
extern int GOMP_OFFLOAD_load_image (int, unsigned, const void *,
- struct addr_pair **, uint64_t **);
+ struct addr_pair **, uint64_t **,
+ uint64_t *);
extern bool GOMP_OFFLOAD_unload_image (int, unsigned, const void *);
extern void *GOMP_OFFLOAD_alloc (int, size_t);
extern bool GOMP_OFFLOAD_free (int, void *);
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 68f20651fbf..15a767cf317 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1274,6 +1274,29 @@ reverse_splay_compare (reverse_splay_tree_key x, reverse_splay_tree_key y)
#define splay_tree_prefix reverse
#include "splay-tree.h"
+/* Indirect target function splay-tree handling. */
+
+struct indirect_splay_tree_key_s {
+ uint64_t host_addr, target_addr;
+};
+
+typedef struct indirect_splay_tree_node_s *indirect_splay_tree_node;
+typedef struct indirect_splay_tree_s *indirect_splay_tree;
+typedef struct indirect_splay_tree_key_s *indirect_splay_tree_key;
+
+static inline int
+indirect_splay_compare (indirect_splay_tree_key x, indirect_splay_tree_key y)
+{
+ if (x->host_addr < y->host_addr)
+ return -1;
+ if (x->host_addr > y->host_addr)
+ return 1;
+ return 0;
+}
+
+#define splay_tree_prefix indirect
+#include "splay-tree.h"
+
struct target_mem_desc {
/* Reference count. */
uintptr_t refcount;
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index ce6b719a57f..90c401453b2 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -419,6 +419,7 @@ GOMP_5.1 {
GOMP_5.1.1 {
global:
GOMP_taskwait_depend_nowait;
+ GOMP_target_map_indirect_ptr;
} GOMP_5.1;
OACC_2.0 {
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 02f2d0e5767..f6f5827e424 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -311,7 +311,7 @@ The OpenMP 4.5 specification is fully supported.
@item Iterators in @code{target update} motion clauses and @code{map}
clauses @tab N @tab
@item Indirect calls to the device version of a procedure or function in
- @code{target} regions @tab N @tab
+ @code{target} regions @tab P @tab Only C and C++
@item @code{interop} directive @tab N @tab
@item @code{omp_interop_t} object support in runtime routines @tab N @tab
@item @code{nowait} clause in @code{taskwait} directive @tab Y @tab
@@ -360,7 +360,7 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
@item For Fortran, diagnose placing declarative before/between @code{USE},
@code{IMPORT}, and @code{IMPLICIT} as invalid @tab N @tab
@item Optional comma between directive and clause in the @code{#pragma} form @tab Y @tab
-@item @code{indirect} clause in @code{declare target} @tab N @tab
+@item @code{indirect} clause in @code{declare target} @tab P @tab Only C and C++
@item @code{device_type(nohost)}/@code{device_type(host)} for variables @tab N @tab
@item @code{present} modifier to the @code{map}, @code{to} and @code{from}
clauses @tab Y @tab
@@ -439,6 +439,8 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
@item @code{all} as @emph{implicit-behavior} for @code{defaultmap} @tab Y @tab
@item @emph{interop_types} in any position of the modifier list for the @code{init} clause
of the @code{interop} construct @tab N @tab
+@item Invoke virtual member functions of C++ objects created on the host device
+ on other devices @tab N @tab
@end multitable
diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index 5c1675c7869..95046312ae9 100644
--- a/libgomp/libgomp_g.h
+++ b/libgomp/libgomp_g.h
@@ -357,6 +357,7 @@ extern void GOMP_target_enter_exit_data (int, size_t, void **, size_t *,
void **);
extern void GOMP_teams (unsigned int, unsigned int);
extern bool GOMP_teams4 (unsigned int, unsigned int, unsigned int, bool);
+extern void *GOMP_target_map_indirect_ptr (void *);
/* teams.c */
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 5980d510838..fbab75d7d43 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -82,7 +82,8 @@ host_load_image (int n __attribute__ ((unused)),
unsigned v __attribute__ ((unused)),
const void *t __attribute__ ((unused)),
struct addr_pair **r __attribute__ ((unused)),
- uint64_t **f __attribute__ ((unused)))
+ uint64_t **f __attribute__ ((unused)),
+ uint64_t *i __attribute__ ((unused)))
{
return 0;
}
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 4328d3de14e..7e7e2d6edfe 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -365,6 +365,7 @@ struct gcn_image_desc
} *gcn_image;
const unsigned kernel_count;
struct hsa_kernel_description *kernel_infos;
+ const unsigned ind_func_count;
const unsigned global_variable_count;
};
@@ -3366,7 +3367,8 @@ GOMP_OFFLOAD_init_device (int n)
int
GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
struct addr_pair **target_table,
- uint64_t **rev_fn_table)
+ uint64_t **rev_fn_table,
+ uint64_t *host_ind_fn_table)
{
if (GOMP_VERSION_DEV (version) != GOMP_VERSION_GCN)
{
@@ -3382,6 +3384,8 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
struct module_info *module;
struct kernel_info *kernel;
int kernel_count = image_desc->kernel_count;
+ unsigned ind_func_count = GOMP_VERSION_SUPPORTS_INDIRECT_FUNCS (version)
+ ? image_desc->ind_func_count : 0;
unsigned var_count = image_desc->global_variable_count;
/* Currently, "others" is a struct of ICVS. */
int other_count = 1;
@@ -3400,6 +3404,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
return -1;
GCN_DEBUG ("Encountered %d kernels in an image\n", kernel_count);
+ GCN_DEBUG ("Encountered %d indirect functions in an image\n", ind_func_count);
GCN_DEBUG ("Encountered %u global variables in an image\n", var_count);
GCN_DEBUG ("Expect %d other variables in an image\n", other_count);
pair = GOMP_PLUGIN_malloc ((kernel_count + var_count + other_count - 2)
@@ -3481,6 +3486,87 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
}
}
+ if (ind_func_count > 0)
+ {
+ hsa_status_t status;
+
+ /* Read indirect function table from image. */
+ hsa_executable_symbol_t ind_funcs_symbol;
+ status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL,
+ ".offload_ind_func_table",
+ agent->id,
+ 0, &ind_funcs_symbol);
+
+ if (status != HSA_STATUS_SUCCESS)
+ hsa_fatal ("Could not find .offload_ind_func_table symbol in the "
+ "code object", status);
+
+ uint64_t ind_funcs_table_addr;
+ status = hsa_fns.hsa_executable_symbol_get_info_fn
+ (ind_funcs_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS,
+ &ind_funcs_table_addr);
+ if (status != HSA_STATUS_SUCCESS)
+ hsa_fatal ("Could not extract a variable from its symbol", status);
+
+ uint64_t ind_funcs_table[ind_func_count];
+ GOMP_OFFLOAD_dev2host (agent->device_id, ind_funcs_table,
+ (void*) ind_funcs_table_addr,
+ sizeof (ind_funcs_table));
+
+ /* Build host->target address map for indirect functions. */
+ uint64_t ind_fn_map[ind_func_count * 2 + 1];
+ for (unsigned i = 0; i < ind_func_count; i++)
+ {
+ ind_fn_map[i * 2] = host_ind_fn_table[i];
+ ind_fn_map[i * 2 + 1] = ind_funcs_table[i];
+ GCN_DEBUG ("Indirect function %d: %lx->%lx\n",
+ i, host_ind_fn_table[i], ind_funcs_table[i]);
+ }
+ ind_fn_map[ind_func_count * 2] = 0;
+
+ /* Write the map onto the target. */
+ void *map_target_addr
+ = GOMP_OFFLOAD_alloc (agent->device_id, sizeof (ind_fn_map));
+ GCN_DEBUG ("Allocated indirect map at %p\n", map_target_addr);
+
+ GOMP_OFFLOAD_host2dev (agent->device_id, map_target_addr,
+ (void*) ind_fn_map,
+ sizeof (ind_fn_map));
+
+ /* Write address of the map onto the target. */
+ hsa_executable_symbol_t symbol;
+
+ status
+ = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL,
+ XSTRING (GOMP_INDIRECT_ADDR_MAP),
+ agent->id, 0, &symbol);
+ if (status != HSA_STATUS_SUCCESS)
+ hsa_fatal ("Could not find GOMP_INDIRECT_ADDR_MAP in code object",
+ status);
+
+ uint64_t varptr;
+ uint32_t varsize;
+
+ status = hsa_fns.hsa_executable_symbol_get_info_fn
+ (symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS,
+ &varptr);
+ if (status != HSA_STATUS_SUCCESS)
+ hsa_fatal ("Could not extract a variable from its symbol", status);
+ status = hsa_fns.hsa_executable_symbol_get_info_fn
+ (symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_SIZE,
+ &varsize);
+ if (status != HSA_STATUS_SUCCESS)
+ hsa_fatal ("Could not extract a variable size from its symbol",
+ status);
+
+ GCN_DEBUG ("Found GOMP_INDIRECT_ADDR_MAP at %lx with size %d\n",
+ varptr, varsize);
+
+ GOMP_OFFLOAD_host2dev (agent->device_id, (void *) varptr,
+ &map_target_addr,
+ sizeof (map_target_addr));
+ }
+
GCN_DEBUG ("Looking for variable %s\n", XSTRING (GOMP_ADDITIONAL_ICVS));
hsa_status_t status;
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 00d4241ae02..0548e7e09e5 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -266,6 +266,8 @@ typedef struct nvptx_tdata
const struct targ_fn_launch *fn_descs;
unsigned fn_num;
+
+ unsigned ind_fn_num;
} nvptx_tdata_t;
/* Descriptor of a loaded function. */
@@ -1285,12 +1287,13 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
int
GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
struct addr_pair **target_table,
- uint64_t **rev_fn_table)
+ uint64_t **rev_fn_table,
+ uint64_t *host_ind_fn_table)
{
CUmodule module;
const char *const *var_names;
const struct targ_fn_launch *fn_descs;
- unsigned int fn_entries, var_entries, other_entries, i, j;
+ unsigned int fn_entries, var_entries, ind_fn_entries, other_entries, i, j;
struct targ_fn_descriptor *targ_fns;
struct addr_pair *targ_tbl;
const nvptx_tdata_t *img_header = (const nvptx_tdata_t *) target_data;
@@ -1319,6 +1322,8 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
var_names = img_header->var_names;
fn_entries = img_header->fn_num;
fn_descs = img_header->fn_descs;
+ ind_fn_entries = GOMP_VERSION_SUPPORTS_INDIRECT_FUNCS (version)
+ ? img_header->ind_fn_num : 0;
/* Currently, other_entries contains only the struct of ICVs. */
other_entries = 1;
@@ -1373,6 +1378,60 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
targ_tbl->end = targ_tbl->start + bytes;
}
+ if (ind_fn_entries > 0)
+ {
+ CUdeviceptr var;
+ size_t bytes;
+
+ /* Read indirect function table from image. */
+ CUresult r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &var, &bytes, module,
+ "$offload_ind_func_table");
+ if (r != CUDA_SUCCESS)
+ GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r));
+ assert (bytes == sizeof (uint64_t) * ind_fn_entries);
+
+ uint64_t ind_fn_table[ind_fn_entries];
+ r = CUDA_CALL_NOCHECK (cuMemcpyDtoH, ind_fn_table, var, bytes);
+ if (r != CUDA_SUCCESS)
+ GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuda_error (r));
+
+ /* Build host->target address map for indirect functions. */
+ uint64_t ind_fn_map[ind_fn_entries * 2 + 1];
+ for (unsigned k = 0; k < ind_fn_entries; k++)
+ {
+ ind_fn_map[k * 2] = host_ind_fn_table[k];
+ ind_fn_map[k * 2 + 1] = ind_fn_table[k];
+ GOMP_PLUGIN_debug (0, "Indirect function %d: %lx->%lx\n",
+ k, host_ind_fn_table[k], ind_fn_table[k]);
+ }
+ ind_fn_map[ind_fn_entries * 2] = 0;
+
+ /* Write the map onto the target. */
+ void *map_target_addr
+ = GOMP_OFFLOAD_alloc (ord, sizeof (ind_fn_map));
+ GOMP_PLUGIN_debug (0, "Allocated indirect map at %p\n", map_target_addr);
+
+ GOMP_OFFLOAD_host2dev (ord, map_target_addr,
+ (void*) ind_fn_map,
+ sizeof (ind_fn_map));
+
+ /* Write address of the map onto the target. */
+ CUdeviceptr varptr;
+ size_t varsize;
+ r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &varptr, &varsize,
+ module, XSTRING (GOMP_INDIRECT_ADDR_MAP));
+ if (r != CUDA_SUCCESS)
+ GOMP_PLUGIN_fatal ("Indirect map variable not found in image: %s",
+ cuda_error (r));
+
+ GOMP_PLUGIN_debug (0,
+ "Indirect map variable found at %llx with size %ld\n",
+ varptr, varsize);
+
+ GOMP_OFFLOAD_host2dev (ord, (void *) varptr, &map_target_addr,
+ sizeof (map_target_addr));
+ }
+
CUdeviceptr varptr;
size_t varsize;
CUresult r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &varptr, &varsize,
diff --git a/libgomp/target.c b/libgomp/target.c
index 812674d19a9..f30c20255d3 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2256,11 +2256,20 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
void **host_funcs_end = ((void ***) host_table)[1];
void **host_var_table = ((void ***) host_table)[2];
void **host_vars_end = ((void ***) host_table)[3];
+ void **host_ind_func_table = NULL;
+ void **host_ind_funcs_end = NULL;
- /* The func table contains only addresses, the var table contains addresses
- and corresponding sizes. */
+ if (GOMP_VERSION_SUPPORTS_INDIRECT_FUNCS (version))
+ {
+ host_ind_func_table = ((void ***) host_table)[4];
+ host_ind_funcs_end = ((void ***) host_table)[5];
+ }
+
+ /* The func and ind_func tables contain only addresses, the var table
+ contains addresses and corresponding sizes. */
int num_funcs = host_funcs_end - host_func_table;
int num_vars = (host_vars_end - host_var_table) / 2;
+ int num_ind_funcs = (host_ind_funcs_end - host_ind_func_table);
/* Load image to device and get target addresses for the image. */
struct addr_pair *target_table = NULL;
@@ -2273,7 +2282,9 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
num_target_entries
= devicep->load_image_func (devicep->target_id, version,
target_data, &target_table,
- rev_lookup ? &rev_target_fn_table : NULL);
+ rev_lookup ? &rev_target_fn_table : NULL,
+ num_ind_funcs
+ ? (uint64_t *) host_ind_func_table : NULL);
if (num_target_entries != num_funcs + num_vars
/* "+1" due to the additional ICV struct. */
diff --git a/libgomp/testsuite/libgomp.c++/declare-target-indirect-1.C b/libgomp/testsuite/libgomp.c++/declare-target-indirect-1.C
new file mode 100644
index 00000000000..1eac6b3fa96
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/declare-target-indirect-1.C
@@ -0,0 +1,23 @@
+// { dg-run }
+
+#pragma omp begin declare target indirect
+class C
+{
+public:
+ int y;
+ int f (int x) { return x + y; }
+};
+#pragma omp end declare target
+
+int main (void)
+{
+ C c;
+ c.y = 27;
+ int x;
+ int (C::*fn_ptr) (int) = &C::f;
+
+#pragma omp target map (to: c, fn_ptr) map (from: x)
+ x = (c.*fn_ptr) (42);
+
+ return x != 27 + 42;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-1.c b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-1.c
new file mode 100644
index 00000000000..b20bfa64dca
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-1.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+
+#pragma omp begin declare target indirect
+int foo(void) { return 5; }
+int bar(void) { return 8; }
+int baz(void) { return 11; }
+#pragma omp end declare target
+
+int main (void)
+{
+ int x;
+ int (*foo_ptr) (void) = &foo;
+ int (*bar_ptr) (void) = &bar;
+ int (*baz_ptr) (void) = &baz;
+ int expected = foo () + bar () + baz ();
+
+#pragma omp target map (to: foo_ptr, bar_ptr, baz_ptr) map (from: x)
+ x = (*foo_ptr) () + (*bar_ptr) () + (*baz_ptr) ();
+
+ return x - expected;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
new file mode 100644
index 00000000000..9fe190efce8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+
+#define N 256
+
+#pragma omp begin declare target indirect
+int foo(void) { return 5; }
+int bar(void) { return 8; }
+int baz(void) { return 11; }
+#pragma omp end declare target
+
+int main (void)
+{
+ int i, x = 0, expected = 0;
+ int (*fn_ptr[N])(void);
+
+ for (i = 0; i < N; i++)
+ {
+ switch (i % 3)
+ {
+ case 0: fn_ptr[i] = &foo;
+ case 1: fn_ptr[i] = &bar;
+ case 2: fn_ptr[i] = &baz;
+ }
+ expected += (*fn_ptr[i]) ();
+ }
+
+#pragma omp target teams distribute parallel for reduction(+: x) \
+ map (to: fn_ptr) map (tofrom: x)
+ for (int i = 0; i < N; i++)
+ x += (*fn_ptr[i]) ();
+
+ return x - expected;
+}
--
2.34.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
2023-11-03 19:53 ` Kwok Cheung Yeung
@ 2023-11-06 8:48 ` Tobias Burnus
2023-11-07 21:37 ` Joseph Myers
2023-11-09 12:24 ` Thomas Schwinge
` (2 subsequent siblings)
3 siblings, 1 reply; 28+ messages in thread
From: Tobias Burnus @ 2023-11-06 8:48 UTC (permalink / raw)
To: Kwok Cheung Yeung, gcc-patches; +Cc: Jakub Jelinek
On 03.11.23 20:53, Kwok Cheung Yeung wrote:
> On 17/10/2023 2:12 pm, Tobias Burnus wrote:
>> C++11 (and C23) attribute do not seem to be properly handled:
(Side remark: Since Saturday, the [[omp::]] attributes syntax is now
also supported in C23.)
[Quoted email text by Kwok: Lots of lines removed that describe how
previously found issues were fixed.]
> Okay for mainline, pending successful testing (still in progress)?
LGTM - thanks for the patch and the follow-up fixes.
Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
2023-11-06 8:48 ` Tobias Burnus
@ 2023-11-07 21:37 ` Joseph Myers
2023-11-07 21:51 ` Jakub Jelinek
0 siblings, 1 reply; 28+ messages in thread
From: Joseph Myers @ 2023-11-07 21:37 UTC (permalink / raw)
To: Tobias Burnus; +Cc: Kwok Cheung Yeung, gcc-patches, Jakub Jelinek
I'm seeing build failures "make[5]: *** No rule to make target
'target-indirect.c', needed by 'target-indirect.lo'. Stop." for many
targets in my glibc bot.
https://sourceware.org/pipermail/libc-testresults/2023q4/012061.html
FAIL: compilers-arc-linux-gnu gcc build
FAIL: compilers-arc-linux-gnuhf gcc build
FAIL: compilers-arceb-linux-gnu gcc build
FAIL: compilers-csky-linux-gnuabiv2 gcc build
FAIL: compilers-csky-linux-gnuabiv2-soft gcc build
FAIL: compilers-m68k-linux-gnu gcc build
FAIL: compilers-m68k-linux-gnu-coldfire gcc build
FAIL: compilers-m68k-linux-gnu-coldfire-soft gcc build
FAIL: compilers-microblaze-linux-gnu gcc build
FAIL: compilers-microblazeel-linux-gnu gcc build
FAIL: compilers-nios2-linux-gnu gcc build
FAIL: compilers-or1k-linux-gnu-soft gcc build
FAIL: compilers-riscv32-linux-gnu-rv32imac-ilp32 gcc build
FAIL: compilers-riscv32-linux-gnu-rv32imafdc-ilp32 gcc build
FAIL: compilers-riscv32-linux-gnu-rv32imafdc-ilp32d gcc build
FAIL: compilers-sh3-linux-gnu gcc build
FAIL: compilers-sh3eb-linux-gnu gcc build
FAIL: compilers-sh4-linux-gnu gcc build
FAIL: compilers-sh4-linux-gnu-soft gcc build
FAIL: compilers-sh4eb-linux-gnu gcc build
FAIL: compilers-sh4eb-linux-gnu-soft gcc build
FAIL: compilers-x86_64-gnu gcc build
This looks like targets that libgomp/configure.tgt does *not* have any
special handling for, and so never adds "linux" to config_path for.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
2023-11-07 21:37 ` Joseph Myers
@ 2023-11-07 21:51 ` Jakub Jelinek
2023-11-07 21:59 ` Kwok Cheung Yeung
0 siblings, 1 reply; 28+ messages in thread
From: Jakub Jelinek @ 2023-11-07 21:51 UTC (permalink / raw)
To: Joseph Myers; +Cc: Tobias Burnus, Kwok Cheung Yeung, gcc-patches
On Tue, Nov 07, 2023 at 09:37:22PM +0000, Joseph Myers wrote:
> This looks like targets that libgomp/configure.tgt does *not* have any
> special handling for, and so never adds "linux" to config_path for.
Indeed, I don't really see anything linux specific about the
libgomp/config/linux/target-indirect.c
so wonder if the right fix isn't
git mv libgomp/{config/linux/,}target-indirect.c
Jakub
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
2023-11-07 21:51 ` Jakub Jelinek
@ 2023-11-07 21:59 ` Kwok Cheung Yeung
0 siblings, 0 replies; 28+ messages in thread
From: Kwok Cheung Yeung @ 2023-11-07 21:59 UTC (permalink / raw)
To: Jakub Jelinek, Joseph Myers; +Cc: Tobias Burnus, gcc-patches
Yes, I believe that is the right fix. The version in
libgomp/config/accel/ should then override the version in libgomp/ for
accelerator targets.
I'll do a quick check that this works as expected and push it ASAP.
Sorry for breaking the build for so many targets!
Kwok
On 07/11/2023 9:51 pm, Jakub Jelinek wrote:
> On Tue, Nov 07, 2023 at 09:37:22PM +0000, Joseph Myers wrote:
>> This looks like targets that libgomp/configure.tgt does *not* have any
>> special handling for, and so never adds "linux" to config_path for.
>
> Indeed, I don't really see anything linux specific about the
> libgomp/config/linux/target-indirect.c
> so wonder if the right fix isn't
> git mv libgomp/{config/linux/,}target-indirect.c
>
> Jakub
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
2023-11-03 19:53 ` Kwok Cheung Yeung
2023-11-06 8:48 ` Tobias Burnus
@ 2023-11-09 12:24 ` Thomas Schwinge
2023-11-09 16:00 ` Tobias Burnus
` (2 more replies)
2024-01-22 20:33 ` [PATCH] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls Kwok Cheung Yeung
2024-01-22 20:41 ` [PATCH] openmp, fortran: Add Fortran support for indirect clause on the declare target directive Kwok Cheung Yeung
3 siblings, 3 replies; 28+ messages in thread
From: Thomas Schwinge @ 2023-11-09 12:24 UTC (permalink / raw)
To: Kwok Cheung Yeung; +Cc: Tobias Burnus, gcc-patches, Jakub Jelinek
Hi Kwok!
Nice work! A few comments:
On 2023-11-03T19:53:28+0000, Kwok Cheung Yeung <kcy@codesourcery.com> wrote:
> Subject: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
>
> This adds support for the 'indirect' clause in the 'declare target'
> directive. Functions declared as indirect may be called via function
> pointers passed from the host in offloaded code.
>
> Virtual calls to member functions via the object pointer in C++ are
> currently not supported in target regions.
Similar to how you have it here:
> --- a/gcc/config/nvptx/mkoffload.cc
> +++ b/gcc/config/nvptx/mkoffload.cc
> @@ -51,6 +51,7 @@ struct id_map
> };
>
> static id_map *func_ids, **funcs_tail = &func_ids;
> +static id_map *ind_func_ids, **ind_funcs_tail = &ind_func_ids;
> static id_map *var_ids, **vars_tail = &var_ids;
>
> /* Files to unlink. */
> @@ -302,6 +303,11 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
| else if (startswith (input + i, "FUNC_MAP "))
| {
> output_fn_ptr = true;
> record_id (input + i + 9, &funcs_tail);
> }
> + else if (startswith (input + i, "IND_FUNC_MAP "))
> + {
> + output_fn_ptr = true;
> + record_id (input + i + 13, &ind_funcs_tail);
> + }
> else
> abort ();
> /* Skip to next line. */
..., please also here:
> --- a/gcc/config/nvptx/nvptx.cc
> +++ b/gcc/config/nvptx/nvptx.cc
> @@ -5919,7 +5919,11 @@ nvptx_record_offload_symbol (tree decl)
> /* OpenMP offloading does not set this attribute. */
> tree dims = attr ? TREE_VALUE (attr) : NULL_TREE;
>
> - fprintf (asm_out_file, "//:FUNC_MAP \"%s\"",
> + fprintf (asm_out_file, "//:");
> + if (lookup_attribute ("omp declare target indirect",
> + DECL_ATTRIBUTES (decl)))
> + fprintf (asm_out_file, "IND_");
> + fprintf (asm_out_file, "FUNC_MAP \"%s\"",
> IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
... maintain separate 'if' branches for 'FUNC_MAP' vs. 'IND_FUNC_MAP', so
that we're able to easily locate those with 'grep', for example.
Also, assuming that the order of appearance of 'IND_FUNC_MAP' does matter
as it does for 'FUNC_MAP', don't we have to handle the former in
nvptx-tools 'as', too? For example, see
<https://github.com/MentorEmbedded/nvptx-tools/pull/29>
"Ensure :VAR_MAP and :FUNC_MAP are output in order", in particular
nvptx-tools commit aa3404ad5a496cda5d79a50bedb1344fd63e8763
"Ensure :VAR_MAP and :FUNC_MAP are output in order, part II [#29]".
Please check that, and submit on
<https://github.com/MentorEmbedded/nvptx-tools/pulls>, if necessary.
If yes, maybe that's another nudge towards:
"Instead of 'K_comment', a point could be made to have these be
represented as their own 'Kind'." That is, '//:' be its own token kind,
and handled generically, instead of '//:VAR_MAP', '//:FUNC_MAP',
'//:IND_FUNC_MAP' specially/only. I shall then look into that, later.
> --- a/gcc/lto-cgraph.cc
> +++ b/gcc/lto-cgraph.cc
> @@ -68,6 +68,7 @@ enum LTO_symtab_tags
> LTO_symtab_edge,
> LTO_symtab_indirect_edge,
> LTO_symtab_variable,
> + LTO_symtab_indirect_function,
> LTO_symtab_last_tag
> };
I did wonder if that new tag should have "offload" in its name, as that's
the only case where it's used? But then I noticed that here
('output_offload_tables'):
> @@ -1111,6 +1112,18 @@ output_offload_tables (void)
> (*offload_vars)[i]);
> }
>
> + for (unsigned i = 0; i < vec_safe_length (offload_ind_funcs); i++)
> + {
> + symtab_node *node = symtab_node::get ((*offload_ind_funcs)[i]);
> + if (!node)
> + continue;
> + node->force_output = true;
> + streamer_write_enum (ob->main_stream, LTO_symtab_tags,
> + LTO_symtab_last_tag, LTO_symtab_indirect_function);
> + lto_output_fn_decl_ref (ob->decl_state, ob->main_stream,
> + (*offload_ind_funcs)[i]);
> + }
> +
..., and correspondingly here ('input_offload_tables'):
> @@ -1863,6 +1877,19 @@ input_offload_tables (bool do_force_output)
> varpool_node::get (var_decl)->force_output = 1;
> tmp_decl = var_decl;
> }
> + else if (tag == LTO_symtab_indirect_function)
> + {
> + tree fn_decl
> + = lto_input_fn_decl_ref (ib, file_data);
> + vec_safe_push (offload_ind_funcs, fn_decl);
> +
> + /* Prevent IPA from removing fn_decl as unreachable, since there
> + may be no refs from the parent function to child_fn in offload
> + LTO mode. */
> + if (do_force_output)
> + cgraph_node::get (fn_decl)->mark_force_output ();
> + tmp_decl = fn_decl;
> + }
> else if (tag == LTO_symtab_edge)
> {
> static bool error_emitted = false;
..., we're currently using 'LTO_symtab_unavail_node' for 'offload_funcs'
and 'LTO_symtab_variable' for 'offload_vars', also with "LTO"
(non-"offload") tags, so 'LTO_symtab_indirect_function' isn't any worse.
Maybe, though, we should generally have separate tags for offloading use?
Possibly aliasing (in value) the LTO ones -- but maybe actually not, to
improve "type safety". I shall look into that, later.
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -350,6 +350,9 @@ enum omp_clause_code {
> /* OpenMP clause: doacross ({source,sink}:vec). */
> OMP_CLAUSE_DOACROSS,
>
> + /* OpenMP clause: indirect [(constant-integer-expression)]. */
> + OMP_CLAUSE_INDIRECT,
> +
> /* Internal structure to hold OpenACC cache directive's variable-list.
> #pragma acc cache (variable-list). */
> OMP_CLAUSE__CACHE_,
In this position here, isn't 'OMP_CLAUSE_INDIRECT' applicable to the
'OMP_CLAUSE_RANGE_CHECK' in 'gcc/tree.h:OMP_CLAUSE_SIZE' and
'gcc/tree.h:OMP_CLAUSE_DECL':
#define OMP_CLAUSE_SIZE(NODE) \
OMP_CLAUSE_OPERAND (OMP_CLAUSE_RANGE_CHECK (OMP_CLAUSE_CHECK (NODE), \
OMP_CLAUSE_FROM, \
OMP_CLAUSE__CACHE_), 1)
#define OMP_CLAUSE_DECL(NODE) \
OMP_CLAUSE_OPERAND (OMP_CLAUSE_RANGE_CHECK (OMP_CLAUSE_CHECK (NODE), \
OMP_CLAUSE_PRIVATE, \
OMP_CLAUSE__SCANTEMP_), 0)
That's probably not intentional? In that case, maybe simply move it at
the end of the clause list? (..., and generally then match that ordering
in any 'switch'es, as applicable, and likewise position
'gcc/tree.h:OMP_CLAUSE_INDIRECT_EXPR' correspondingly.)
I would've assumed handling for 'OMP_CLAUSE_INDIRECT' to also be
necessary in the following places:
- 'gcc/c-family/c-omp.cc:c_omp_split_clauses'
- 'gcc/cp/pt.cc:tsubst_omp_clauses',
- 'gcc/gimplify.cc:gimplify_scan_omp_clauses',
'gcc/gimplify.cc:gimplify_adjust_omp_clauses'
- 'gcc/omp-low.cc:scan_sharing_clauses' (twice)
- 'gcc/tree-nested.cc:convert_nonlocal_omp_clauses',
'gcc/tree-nested.cc:convert_local_omp_clauses'
- 'gcc/tree-pretty-print.cc:dump_omp_clause'
Please verify, and add handling as well as test cases as necessary, or,
as applicable, put 'case OMP_CLAUSE_INDIRECT:' next to
'default: gcc_unreachable ();' etc., if indeed that clause is not
expected there.
In this file here:
> --- /dev/null
> +++ b/libgomp/config/accel/target-indirect.c
> @@ -0,0 +1,126 @@
> +/* Copyright (C) 2023 Free Software Foundation, Inc.
> +
> + Contributed by Siemens.
> +
> + This file is part of the GNU Offloading and Multi Processing Library
> + (libgomp).
> +
> + Libgomp is free software; you can redistribute it and/or modify it
> + under the terms of the GNU General Public License as published by
> + the Free Software Foundation; either version 3, or (at your option)
> + any later version.
> +
> + Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
> + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
> + FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + more details.
> +
> + Under Section 7 of GPL version 3, you are granted additional
> + permissions described in the GCC Runtime Library Exception, version
> + 3.1, as published by the Free Software Foundation.
> +
> + You should have received a copy of the GNU General Public License and
> + a copy of the GCC Runtime Library Exception along with this program;
> + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +#include <assert.h>
> +#include "libgomp.h"
> +
> +#define splay_tree_prefix indirect
> +#define splay_tree_c
> +#include "splay-tree.h"
> +
> +volatile void **GOMP_INDIRECT_ADDR_MAP = NULL;
> +
> +/* Use a splay tree to lookup the target address instead of using a
> + linear search. */
> +#define USE_SPLAY_TREE_LOOKUP
> +
> +#ifdef USE_SPLAY_TREE_LOOKUP
> +
> +static struct indirect_splay_tree_s indirect_map;
> +static indirect_splay_tree_node indirect_array = NULL;
> +
> +/* Build the splay tree used for host->target address lookups. */
> +
> +void
> +build_indirect_map (void)
> +{
> + size_t num_ind_funcs = 0;
> + volatile void **map_entry;
> + static int lock = 0; /* == gomp_mutex_t lock; gomp_mutex_init (&lock); */
> +
> + if (!GOMP_INDIRECT_ADDR_MAP)
> + return;
> +
> + gomp_mutex_lock (&lock);
> +
> + if (!indirect_array)
> + {
> + /* Count the number of entries in the NULL-terminated address map. */
> + for (map_entry = GOMP_INDIRECT_ADDR_MAP; *map_entry;
> + map_entry += 2, num_ind_funcs++);
> +
> + /* Build splay tree for address lookup. */
> + indirect_array = gomp_malloc (num_ind_funcs * sizeof (*indirect_array));
> + indirect_splay_tree_node array = indirect_array;
> + map_entry = GOMP_INDIRECT_ADDR_MAP;
> +
> + for (int i = 0; i < num_ind_funcs; i++, array++)
> + {
> + indirect_splay_tree_key k = &array->key;
> + k->host_addr = (uint64_t) *map_entry++;
> + k->target_addr = (uint64_t) *map_entry++;
> + array->left = NULL;
> + array->right = NULL;
> + indirect_splay_tree_insert (&indirect_map, array);
> + }
> + }
> +
> + gomp_mutex_unlock (&lock);
> +}
> +
> +void *
> +GOMP_target_map_indirect_ptr (void *ptr)
> +{
> + /* NULL pointers always resolve to NULL. */
> + if (!ptr)
> + return ptr;
> +
> + assert (indirect_array);
> +
> + struct indirect_splay_tree_key_s k;
> + indirect_splay_tree_key node = NULL;
> +
> + k.host_addr = (uint64_t) ptr;
> + node = indirect_splay_tree_lookup (&indirect_map, &k);
> +
> + return node ? (void *) node->target_addr : ptr;
> +}
> +
> +#else
> +
> +void
> +build_indirect_map (void)
> +{
> +}
> +
> +void *
> +GOMP_target_map_indirect_ptr (void *ptr)
> +{
> + /* NULL pointers always resolve to NULL. */
> + if (!ptr)
> + return ptr;
> +
> + assert (GOMP_INDIRECT_ADDR_MAP);
> +
> + for (volatile void **map_entry = GOMP_INDIRECT_ADDR_MAP; *map_entry;
> + map_entry += 2)
> + if (*map_entry == ptr)
> + return (void *) *(map_entry + 1);
> +
> + return ptr;
> +}
> +
> +#endif
..., I'm curious why certain variables are declared 'volatile'? Is that
really the right approach for whatever exactly the (concurrency?)
requirements here are?
> --- a/libgomp/config/gcn/team.c
> +++ b/libgomp/config/gcn/team.c
> @@ -30,6 +30,7 @@
> +extern void build_indirect_map (void);
Why not generally have a prototype for this (new
'libgomp/config/accel/target-indirect.h', or maybe just
'libgomp/libgomp.h'?)?
> @@ -45,6 +46,9 @@ gomp_gcn_enter_kernel (void)
> {
> int threadid = __builtin_gcn_dim_pos (1);
>
Shouldn't this:
> + /* Initialize indirect function support. */
> + build_indirect_map ();
> +
... be called inside here:
> if (threadid == 0)
> {
..., so that it's only executed by one thread?
Also, for my understanding: why is 'build_indirect_map' done at kernel
invocation time (here) instead of at image load time?
> --- a/libgomp/config/nvptx/team.c
> +++ b/libgomp/config/nvptx/team.c
> @@ -35,6 +35,7 @@ struct gomp_thread *nvptx_thrs __attribute__((shared,nocommon));
> +extern void build_indirect_map (void);
Likewise to 'libgomp/config/gcn/team.c'.
> @@ -52,6 +53,10 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data)
> int tid, ntids;
> asm ("mov.u32 %0, %%tid.y;" : "=r" (tid));
> asm ("mov.u32 %0, %%ntid.y;" : "=r" (ntids));
> +
> + /* Initialize indirect function support. */
> + build_indirect_map ();
> +
> if (tid == 0)
> {
Likewise to 'libgomp/config/gcn/team.c'.
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
> @@ -0,0 +1,33 @@
> +/* { dg-do run } */
> +
> +#define N 256
> +
> +#pragma omp begin declare target indirect
> +int foo(void) { return 5; }
> +int bar(void) { return 8; }
> +int baz(void) { return 11; }
> +#pragma omp end declare target
> +
> +int main (void)
> +{
> + int i, x = 0, expected = 0;
> + int (*fn_ptr[N])(void);
> +
> + for (i = 0; i < N; i++)
> + {
> + switch (i % 3)
> + {
> + case 0: fn_ptr[i] = &foo;
> + case 1: fn_ptr[i] = &bar;
> + case 2: fn_ptr[i] = &baz;
> + }
> + expected += (*fn_ptr[i]) ();
> + }
> +
> +#pragma omp target teams distribute parallel for reduction(+: x) \
> + map (to: fn_ptr) map (tofrom: x)
> + for (int i = 0; i < N; i++)
> + x += (*fn_ptr[i]) ();
> +
> + return x - expected;
> +}
[...]/libgomp.c-c++-common/declare-target-indirect-2.c: In function ‘main’:
[...]/libgomp.c-c++-common/declare-target-indirect-2.c:20:27: warning: this statement may fall through [-Wimplicit-fallthrough=]
20 | case 0: fn_ptr[i] = &foo;
| ~~~~~~~~~~^~~~~~
[...]/libgomp.c-c++-common/declare-target-indirect-2.c:21:9: note: here
21 | case 1: fn_ptr[i] = &bar;
| ^~~~
[...]/libgomp.c-c++-common/declare-target-indirect-2.c:21:27: warning: this statement may fall through [-Wimplicit-fallthrough=]
21 | case 1: fn_ptr[i] = &bar;
| ~~~~~~~~~~^~~~~~
[...]/libgomp.c-c++-common/declare-target-indirect-2.c:22:9: note: here
22 | case 2: fn_ptr[i] = &baz;
| ^~~~
..., so I suppose that's effectively testing 'fn_ptr[i] = &baz;' only for
all 'i's?
Grüße
Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
2023-11-09 12:24 ` Thomas Schwinge
@ 2023-11-09 16:00 ` Tobias Burnus
2023-11-13 10:59 ` Thomas Schwinge
2024-01-03 14:47 ` [committed] " Kwok Cheung Yeung
2024-01-03 15:54 ` Kwok Cheung Yeung
2 siblings, 1 reply; 28+ messages in thread
From: Tobias Burnus @ 2023-11-09 16:00 UTC (permalink / raw)
To: Thomas Schwinge, Kwok Cheung Yeung; +Cc: gcc-patches, Jakub Jelinek
Hi Thomas, hi Kwok,
(Skipping some valid (review) comments and bare remarks.)
On 09.11.23 13:24, Thomas Schwinge wrote:
> Also, assuming that the order of appearance of 'IND_FUNC_MAP' does matter
> as it does for 'FUNC_MAP', ... https://github.com/MentorEmbedded/nvptx-tools/pull/29 ...
It should matter. Thus, we should indeed update nvptx-tools for this!
For hello-world it probably does not show up that easily as there are
only very few such tagged functions. But especially once it gets used
for C++ virtual functions, the number of function will be that large
that the ordering issue is likely to occur in the real world.
(I shouldn't have missed this – given that I debugging and reported the
original issue.)
[...]
> Maybe, though, we should generally have separate tags for offloading use?
> Possibly aliasing (in value) the LTO ones -- but maybe actually not, to
> improve "type safety". I shall look into that, later.
Regarding LTO, my long-term plans is to have the variables visible to the compiler,
i.e. writing indeed something like:
__offload_vars[10] = [&A, &my_var, ... ];
and then set __offload_vars's node to force_output. The IPA can then see that 'A's address
is used (such that '&A' does not disappear) but it can still do optimizations which are currently
ruled out because we do set 'force_output'.
Currently, we set force_output to all such nodes, but that prevents several optimizations which
could be done - we just don't want that the variable disappears. (There is a PR about the
missed optimization.)
>> --- a/gcc/tree-core.h
>> +++ b/gcc/tree-core.h
>> @@ -350,6 +350,9 @@ enum omp_clause_code {
>> /* OpenMP clause: doacross ({source,sink}:vec). */
>> OMP_CLAUSE_DOACROSS,
>>
>> + /* OpenMP clause: indirect [(constant-integer-expression)]. */
>> + OMP_CLAUSE_INDIRECT,
>> +
>> /* Internal structure to hold OpenACC cache directive's variable-list.
>> #pragma acc cache (variable-list). */
>> OMP_CLAUSE__CACHE_,
> In this position here, isn't 'OMP_CLAUSE_INDIRECT' applicable to the
> 'OMP_CLAUSE_RANGE_CHECK' in 'gcc/tree.h:OMP_CLAUSE_SIZE' and
> 'gcc/tree.h:OMP_CLAUSE_DECL':
>
> #define OMP_CLAUSE_SIZE(NODE) \
> OMP_CLAUSE_OPERAND (OMP_CLAUSE_RANGE_CHECK (OMP_CLAUSE_CHECK (NODE), \
> OMP_CLAUSE_FROM, \
> OMP_CLAUSE__CACHE_), 1)
>
> #define OMP_CLAUSE_DECL(NODE) \
> OMP_CLAUSE_OPERAND (OMP_CLAUSE_RANGE_CHECK (OMP_CLAUSE_CHECK (NODE), \
> OMP_CLAUSE_PRIVATE, \
> OMP_CLAUSE__SCANTEMP_), 0)
We may need to check whether the range check is correct for the other
item or whether some others sneaked in as well.
But I concur, the OMP_CLAUSE_INDIRECT indeed looks misplaced.
(BTW: OMP_CLAUSE_INDIRECT is only used intermittendly in the C/C++ FEs
and not in the ME as it is soon turned into an attribute string.)
> I would've assumed handling for 'OMP_CLAUSE_INDIRECT' to also be
> necessary in the following places:
>
> - 'gcc/c-family/c-omp.cc:c_omp_split_clauses'
"split_clauses" applies only to combined composite constructs like
'target'+'parallel' +'for' + 'simd' where clauses have to be added to
the right constituent clause(s). Declarative directives cannot be combined.
> - 'gcc/cp/pt.cc:tsubst_omp_clauses',
> - 'gcc/gimplify.cc:gimplify_scan_omp_clauses',
> 'gcc/gimplify.cc:gimplify_adjust_omp_clauses'
> - 'gcc/omp-low.cc:scan_sharing_clauses' (twice)
> - 'gcc/tree-nested.cc:convert_nonlocal_omp_clauses',
> 'gcc/tree-nested.cc:convert_local_omp_clauses'
> - 'gcc/tree-pretty-print.cc:dump_omp_clause'
Most of those seem to relate to executable directives – and not to
declarative ones, where we attach DECL_ATTRIBUTES to a decl and process
them. For functions, the pretty printer prints the attributes.
Here, we use "omp declare target indirect" as attribute.
We use noclone,noinline attributes for 'declare target', thus, there
should be no issue on this side and regarding tsubst_omp_clauses, as the
clause is either present or not (either bare or with a parse-time
constant logical), there is not much post processing needed.
Thus, I bet that there is nothing to do for those.
> Please verify, and add handling as well as test cases as necessary, or,
> as applicable, put 'case OMP_CLAUSE_INDIRECT:' next to
> 'default: gcc_unreachable ();' etc., if indeed that clause is not
> expected there.
What's the point of having it next to default if it is gcc_unreachable?
I mean there are several others which shouldn't be needed like
OMP_CLAUSE_DEVICE_TYPE which also does not show up at gcc/cp/pt.cc.
> In this file here:
>
>> +++ b/libgomp/config/accel/target-indirect.c
>>
>> [...]
>> +volatile void **GOMP_INDIRECT_ADDR_MAP = NULL;
>>
>> [...]
>> +build_indirect_map (void)
>> +{
>> + size_t num_ind_funcs = 0;
>> + volatile void **map_entry;
>> [...]
>> + for (map_entry = GOMP_INDIRECT_ADDR_MAP; *map_entry;
>> + map_entry += 2, num_ind_funcs++);
>> [...]
>> + map_entry = GOMP_INDIRECT_ADDR_MAP;
>> +
>> + for (int i = 0; i < num_ind_funcs; i++, array++)
>> + {
>> + indirect_splay_tree_key k = &array->key;
>> + k->host_addr = (uint64_t) *map_entry++;
>> + k->target_addr = (uint64_t) *map_entry++;
>> [...]
>> +}
>> [...]
>> +#else
>> [...]
>> +void *
>> +GOMP_target_map_indirect_ptr (void *ptr)
>> [...]
>> + for (volatile void **map_entry = GOMP_INDIRECT_ADDR_MAP; *map_entry;
>> + map_entry += 2)
>> + if (*map_entry == ptr)
>> + return (void *) *(map_entry + 1);
>> +
>> + return ptr;
>> +}
>> +
>> +#endif
> ..., I'm curious why certain variables are declared 'volatile'? Is that
> really the right approach for whatever exactly the (concurrency?)
> requirements here are?
The variable GOMP_INDIRECT_ADDR_MAP itself is set non-concurrently via GOMP_OFFLOAD_load_image.
When the kernel is run, it should not be touched.
Thus, I concur that 'volatile' should not be needed at all.
>> --- a/libgomp/config/gcn/team.c
>> +++ b/libgomp/config/gcn/team.c
>> @@ -30,6 +30,7 @@
>> +extern void build_indirect_map (void);
> Why not generally have a prototype for this (new
> 'libgomp/config/accel/target-indirect.h', or maybe just
> 'libgomp/libgomp.h'?)?
>
>> @@ -45,6 +46,9 @@ gomp_gcn_enter_kernel (void)
>> {
>> int threadid = __builtin_gcn_dim_pos (1);
>>
> Shouldn't this:
>
>> + /* Initialize indirect function support. */
>> + build_indirect_map ();
>> +
> ... be called inside here:
>
>> if (threadid == 0)
>> {
> ..., so that it's only executed by one thread?
(concur)
> Also, for my understanding: why is 'build_indirect_map' done at kernel
> invocation time (here) instead of at image load time?
The splay_tree is generated on the device itself - and we currently do
not start a kernel during GOMP_OFFLOAD_load_image. We could, the
question is whether it makes sense. (Generating the splay_tree on the
host for the device is a hassle and error prone as it needs to use
device pointers at the end.)
>> +++ b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
>> + switch (i % 3)
>> + {
>> + case 0: fn_ptr[i] = &foo;
>> + case 1: fn_ptr[i] = &bar;
>> + case 2: fn_ptr[i] = &baz;
>> + }
> [...]/libgomp.c-c++-common/declare-target-indirect-2.c:20:27: warning: this statement may fall through [-Wimplicit-fallthrough=]
Indeed a 'break;' would be good.
Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
2023-11-09 16:00 ` Tobias Burnus
@ 2023-11-13 10:59 ` Thomas Schwinge
2023-11-13 11:47 ` Tobias Burnus
0 siblings, 1 reply; 28+ messages in thread
From: Thomas Schwinge @ 2023-11-13 10:59 UTC (permalink / raw)
To: Tobias Burnus, Kwok Cheung Yeung; +Cc: gcc-patches, Jakub Jelinek
Hi!
On 2023-11-09T17:00:11+0100, Tobias Burnus <tobias@codesourcery.com> wrote:
> On 09.11.23 13:24, Thomas Schwinge wrote:
>> Also, assuming that the order of appearance of 'IND_FUNC_MAP' does matter
>> as it does for 'FUNC_MAP', ... https://github.com/MentorEmbedded/nvptx-tools/pull/29 ...
>
> It should matter. Thus, we should indeed update nvptx-tools for this!
>
> For hello-world it probably does not show up that easily as there are
> only very few such tagged functions. But especially once it gets used
> for C++ virtual functions, the number of function will be that large
> that the ordering issue is likely to occur in the real world.
>
> (I shouldn't have missed this – given that I debugging and reported the
> original issue.)
Let's blame this on the inadequate general (non-)handling of these
directives in nvptx-tools 'as' -- which, as I said, I'll address once
Kwok has fixed this specific issue (with test case, please).
(Generalize/refactor after fixing specific issue.)
> (BTW: OMP_CLAUSE_INDIRECT is only used intermittendly in the C/C++ FEs
> and not in the ME as it is soon turned into an attribute string.)
OK, that does explain:
>> I would've assumed handling for 'OMP_CLAUSE_INDIRECT' to also be
>> necessary in the following places:
>>
>> - 'gcc/c-family/c-omp.cc:c_omp_split_clauses'
> "split_clauses" applies only to combined composite constructs like
> 'target'+'parallel' +'for' + 'simd' where clauses have to be added to
> the right constituent clause(s). Declarative directives cannot be combined.
>> - 'gcc/cp/pt.cc:tsubst_omp_clauses',
>> - 'gcc/gimplify.cc:gimplify_scan_omp_clauses',
>> 'gcc/gimplify.cc:gimplify_adjust_omp_clauses'
>> - 'gcc/omp-low.cc:scan_sharing_clauses' (twice)
>> - 'gcc/tree-nested.cc:convert_nonlocal_omp_clauses',
>> 'gcc/tree-nested.cc:convert_local_omp_clauses'
>> - 'gcc/tree-pretty-print.cc:dump_omp_clause'
... this.
> Most of those seem to relate to executable directives
(That remark I don't understand.)
> – and not to
> declarative ones, where we attach DECL_ATTRIBUTES to a decl and process
> them. For functions, the pretty printer prints the attributes.
> Here, we use "omp declare target indirect" as attribute.
ACK.
> We use noclone,noinline attributes for 'declare target', thus, there
> should be no issue on this side and regarding tsubst_omp_clauses, as the
> clause is either present or not (either bare or with a parse-time
> constant logical), there is not much post processing needed.
That's not obvious to the casual reader of GCC source code, though.
> Thus, I bet that there is nothing to do for those.
>
>> Please verify, and add handling as well as test cases as necessary, or,
>> as applicable, put 'case OMP_CLAUSE_INDIRECT:' next to
>> 'default: gcc_unreachable ();' etc., if indeed that clause is not
>> expected there.
>
> What's the point of having it next to default if it is gcc_unreachable?
Instead of "bet", I suggest to document intentions: so that it's clear
that 'OMP_CLAUSE_INDIRECT' is not meant to be seen here vs. an accidental
omission.
> I mean there are several others which shouldn't be needed like
> OMP_CLAUSE_DEVICE_TYPE which also does not show up at gcc/cp/pt.cc.
Quite possible. :-) I certainly wouldn't object to "handling" those,
too.
Generally, in my opinion, we should usually see 'case's listed for all
clause codes where we 'switch' on them, for example.
>>> --- a/libgomp/config/gcn/team.c
>>> +++ b/libgomp/config/gcn/team.c
>>> @@ -45,6 +46,9 @@ gomp_gcn_enter_kernel (void)
>>> {
>>> int threadid = __builtin_gcn_dim_pos (1);
>>>
>> Shouldn't this:
>>
>>> + /* Initialize indirect function support. */
>>> + build_indirect_map ();
>>> +
>> ... be called inside here:
>>
>>> if (threadid == 0)
>>> {
>> ..., so that it's only executed by one thread?
> (concur)
>> Also, for my understanding: why is 'build_indirect_map' done at kernel
>> invocation time (here) instead of at image load time?
>
> The splay_tree is generated on the device itself - and we currently do
> not start a kernel during GOMP_OFFLOAD_load_image. We could, the
> question is whether it makes sense. (Generating the splay_tree on the
> host for the device is a hassle and error prone as it needs to use
> device pointers at the end.)
Hmm. It seems conceptually cleaner to me to set this up upfront, and
avoids potentially slowing down every device kernel invocation (at least
another function call, and 'gomp_mutex_lock' check). Though, I agree
this may be "in the noise" with regards to all the other stuff going on
in 'gomp_gcn_enter_kernel' and elsewhere...
What I just realize, what's also unclear to me is how the current
implementation works with regards to several images getting loaded --
don't we then overwrite 'GOMP_INDIRECT_ADDR_MAP' instead of
(conceptually) appending to it?
In the general case, additional images may also get loaded during
execution. We thus need proper locking of the shared data structure, uh?
Or, can we have separate on-device data structures per image? (I've not
yet thought about that in detail.)
Relatedly then, when images are unloaded, we also need to remove stale
items from the table, and release resources (for example, the
'GOMP_OFFLOAD_alloc' for 'map_target_addr').
>>> +++ b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
Another thing regarding this test case: testing
'-foffload-options=amdgcn-amdhsa=-march=gfx900' offloading on our
amdnano4 system, I see:
+PASS: libgomp.c/../libgomp.c-c++-common/declare-target-indirect-2.c execution test
..., but:
+FAIL: libgomp.c++/../libgomp.c-c++-common/declare-target-indirect-2.c execution test
Memory access fault by GPU node-2 (Agent handle: 0x21b0530) on address 0x401000. Reason: Page not present or supervisor privilege.
Re-running this manually a few times, I got:
pass: 5
fail (as above): 3
hang: 1
Otherwise, that system appears to behave normally, and a reboot did not
cure this.
Grüße
Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
2023-11-13 10:59 ` Thomas Schwinge
@ 2023-11-13 11:47 ` Tobias Burnus
2024-04-11 10:10 ` Thomas Schwinge
0 siblings, 1 reply; 28+ messages in thread
From: Tobias Burnus @ 2023-11-13 11:47 UTC (permalink / raw)
To: Thomas Schwinge, Kwok Cheung Yeung; +Cc: gcc-patches, Jakub Jelinek
Hi Thomas,
On 13.11.23 11:59, Thomas Schwinge wrote:
>>> - 'gcc/cp/pt.cc:tsubst_omp_clauses',
>>> - 'gcc/gimplify.cc:gimplify_scan_omp_clauses',
>>> 'gcc/gimplify.cc:gimplify_adjust_omp_clauses'
>>> - 'gcc/omp-low.cc:scan_sharing_clauses' (twice)
>>> - 'gcc/tree-nested.cc:convert_nonlocal_omp_clauses',
>>> 'gcc/tree-nested.cc:convert_local_omp_clauses'
>>> - 'gcc/tree-pretty-print.cc:dump_omp_clause'
> [...]
>> Most of those seem to relate to executable directives
> (That remark I don't understand.)
OpenMP classifies directives into categories. The following exists
(grep + count of TR12's json/directive/ files.)
2 meta
2 subsidiary
2 utility
4 informational
11 declarative
40 executable
where "declare target' + 'indirect' (like declare variant, declare simd) is declarative,
i.e.
"declare target directive - A declarative directive that ensures that procedures
and/or variables can be executed or accessed on a device."
For those, we usually add an attribute to the function declaration. And 'executive' is
defined as
"executable directive - A directive that appears in an executable context and results in
implementation code and/or prescribes the manner in which associate user code must execute."
Those either are turned into a libgomp call or transform some code - possibly including
calling the library. That would be, e.g. 'omp target', 'omp parallel', 'omp atomic'.
The listed functions all deal with executable code, i.e. some tree ..._EXPR, possibly associated
with a structured block (at least the scan_* only apply for block-associated directives as they
scan of usage inside that block - affecting the directive/the directive's clauses).
* * *
>> We use noclone,noinline attributes for 'declare target', thus, there
>> should be no issue on this side and regarding tsubst_omp_clauses, as the
>> clause is either present or not (either bare or with a parse-time
>> constant logical), there is not much post processing needed.
> That's not obvious to the casual reader of GCC source code, though.
True, but that's the general problem with code - without background, you
don't always understand the flow in the code and when something is called.
I think there is no good way how this can be solved; or rather, it can
be solved for a specific question but the next person looks for
something else and then has the same problem and the previous "solution"
(like comment) doesn't help.
In some cases, I think it helps to add comments, but I have the feeling
that your question is to specific (you look at a single patch) to be
really helpful here. But I am happy to be proven wrong and see useful
code changes/comments.
>> Thus, I bet that there is nothing to do for those.
>>
>>> Please verify, and add handling as well as test cases as necessary, or,
>>> as applicable, put 'case OMP_CLAUSE_INDIRECT:' next to
>>> 'default: gcc_unreachable ();' etc., if indeed that clause is not
>>> expected there.
>> What's the point of having it next to default if it is gcc_unreachable?
> Instead of "bet", I suggest to document intentions: so that it's clear
> that 'OMP_CLAUSE_INDIRECT' is not meant to be seen here vs. an accidental
> omission.
Done - in this email thread, which can be found by patch archeology.
>> I mean there are several others which shouldn't be needed like
>> OMP_CLAUSE_DEVICE_TYPE which also does not show up at gcc/cp/pt.cc.
> Quite possible. :-) I certainly wouldn't object to "handling" those,
> too.
>
> Generally, in my opinion, we should usually see 'case's listed for all
> clause codes where we 'switch' on them, for example.
If there are plenty of 'default:', I am no sure it makes sense.
But in general, the question is (for a switch where most 'enum' values are used)
whether it would make more sense to remove the 'default: gcc_unreachable();'.
In that case, we do not handle the others - but the -Wswitch-enum will warn about it.
Thus, a bootstrap build will ensure that all values are covered due to -Werror=switch-enum.
Downside is that without -Werror/bootstrap, it will silently fall through but for
normal FE/ME code, it is guaranteed to be bootstrapped and will show up.
* * *
>>> Also, for my understanding: why is 'build_indirect_map' done at kernel
>>> invocation time (here) instead of at image load time?
>> The splay_tree is generated on the device itself - and we currently do
>> not start a kernel during GOMP_OFFLOAD_load_image. We could, the
>> question is whether it makes sense. (Generating the splay_tree on the
>> host for the device is a hassle and error prone as it needs to use
>> device pointers at the end.)
> Hmm. It seems conceptually cleaner to me to set this up upfront, and
> avoids potentially slowing down every device kernel invocation (at least
> another function call, and 'gomp_mutex_lock' check). Though, I agree
> this may be "in the noise" with regards to all the other stuff going on
> in 'gomp_gcn_enter_kernel' and elsewhere...
I think the most common case is GOMP_INDIRECT_ADDR_MAP == NULL.
The question is whether the lock should/could be moved inside if (!indirect_array)
or not. Probably yes:
* doing an atomic load for the outer '!indirect array', work on a local array for
the build up and only assign it at the end - and just after the lock check again
whether '!indirect array'.
That way, it is lock free once build but when build there is no race.
> What I just realize, what's also unclear to me is how the current
> implementation works with regards to several images getting loaded --
> don't we then overwrite 'GOMP_INDIRECT_ADDR_MAP' instead of
> (conceptually) appending to it?
Yes, I think that will happen - but it looks as if the same issue exists
also the other code? I think that's not the first variable that has that
issue?
I think we should try to cleanup that handling, also to support calling
a device function in a shared library from a target region in the main
program, which currently also fails.
All device routines that are in normal static libraries and in the
object files of the main program should simply work thanks to offload
LTO such that there is only a single GOMP_offload_register_ver call (per
device type) and GOMP_OFFLOAD_load_image call (per device).
Likewise if the offloading is only done via a single shared library. —
Any mixing will currently fail, unfortunately. This patch just adds
another item which does not handle it properly.
(Not good but IMHO also not a showstopper for this patch.)
> In the general case, additional images may also get loaded during
> execution. We thus need proper locking of the shared data structure, uh?
> Or, can we have separate on-device data structures per image? (I've not
> yet thought about that in detail.)
I think we could - but in the main-program 'omp target' case that calls
a shared-library 'declare target' function means that we need to handle
multiple GOMP_offload_register_ver / load_image calls such that they can
work together.
Obviously, it gets harder if the user keeps doing dlopen() / dlclose()
of libraries containing offload code where a target/compute region is
run before, between, and after those calls (but hopefully not running
when calling dlopen/dlclose).
> Relatedly then, when images are unloaded, we also need to remove stale
> items from the table, and release resources (for example, the
> 'GOMP_OFFLOAD_alloc' for 'map_target_addr').
True. I think the general assumption is that images only get unloaded at
the very end, which matches most but not all code. Yet another work item.
I think we should open a new PR about this topic and collect work items
there.
>>>> +++ b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
> Another thing regarding this test case: testing
> '-foffload-options=amdgcn-amdhsa=-march=gfx900' offloading on our
> amdnano4 system, I see:
> ...
> Re-running this manually a few times, I got:
> pass: 5
> fail (as above): 3
> hang: 1
Looks like something that needs to be investigated.
Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
^ permalink raw reply [flat|nested] 28+ messages in thread
* [committed] Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
2023-11-09 12:24 ` Thomas Schwinge
2023-11-09 16:00 ` Tobias Burnus
@ 2024-01-03 14:47 ` Kwok Cheung Yeung
2024-01-03 15:54 ` Kwok Cheung Yeung
2 siblings, 0 replies; 28+ messages in thread
From: Kwok Cheung Yeung @ 2024-01-03 14:47 UTC (permalink / raw)
To: Thomas Schwinge; +Cc: Tobias Burnus, gcc-patches, Jakub Jelinek
[-- Attachment #1: Type: text/plain, Size: 1999 bytes --]
Hello
I have committed the following trivial patch to emit FUNC_MAP or
IND_FUNC_MAP in separate branches of an if statement.
Kwok
On 09/11/2023 12:24 pm, Thomas Schwinge wrote:
> Similar to how you have it here:
>
>> --- a/gcc/config/nvptx/mkoffload.cc
>> +++ b/gcc/config/nvptx/mkoffload.cc
>> @@ -51,6 +51,7 @@ struct id_map
>> };
>>
>> static id_map *func_ids, **funcs_tail = &func_ids;
>> +static id_map *ind_func_ids, **ind_funcs_tail = &ind_func_ids;
>> static id_map *var_ids, **vars_tail = &var_ids;
>>
>> /* Files to unlink. */
>> @@ -302,6 +303,11 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
> | else if (startswith (input + i, "FUNC_MAP "))
> | {
>> output_fn_ptr = true;
>> record_id (input + i + 9, &funcs_tail);
>> }
>> + else if (startswith (input + i, "IND_FUNC_MAP "))
>> + {
>> + output_fn_ptr = true;
>> + record_id (input + i + 13, &ind_funcs_tail);
>> + }
>> else
>> abort ();
>> /* Skip to next line. */
>
> ..., please also here:
>
>> --- a/gcc/config/nvptx/nvptx.cc
>> +++ b/gcc/config/nvptx/nvptx.cc
>> @@ -5919,7 +5919,11 @@ nvptx_record_offload_symbol (tree decl)
>> /* OpenMP offloading does not set this attribute. */
>> tree dims = attr ? TREE_VALUE (attr) : NULL_TREE;
>>
>> - fprintf (asm_out_file, "//:FUNC_MAP \"%s\"",
>> + fprintf (asm_out_file, "//:");
>> + if (lookup_attribute ("omp declare target indirect",
>> + DECL_ATTRIBUTES (decl)))
>> + fprintf (asm_out_file, "IND_");
>> + fprintf (asm_out_file, "FUNC_MAP \"%s\"",
>> IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
>
> ... maintain separate 'if' branches for 'FUNC_MAP' vs. 'IND_FUNC_MAP', so
> that we're able to easily locate those with 'grep', for example.
>
[-- Attachment #2: 0001-nvptx-Restructure-code-generating-function-map-label.patch --]
[-- Type: text/plain, Size: 1341 bytes --]
From 6ae84729940acff598e1a7f49d7b381025082ceb Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <kcy@codesourcery.com>
Date: Wed, 3 Jan 2024 14:27:39 +0000
Subject: [PATCH] nvptx: Restructure code generating function map labels
This restructures the code generating FUNC_MAP and IND_FUNC_MAP labels
in the assembly code for mkoffload to consume, hopefully making it a
bit clearer and easier to search for.
2024-01-03 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/
* config/nvptx/nvptx.cc (nvptx_record_offload_symbol): Restucture
printing of FUNC_MAP/IND_FUNC_MAP labels.
---
gcc/config/nvptx/nvptx.cc | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 724e403a0e9..9363d3ecc6a 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -5921,8 +5921,10 @@ nvptx_record_offload_symbol (tree decl)
fprintf (asm_out_file, "//:");
if (lookup_attribute ("omp declare target indirect",
DECL_ATTRIBUTES (decl)))
- fprintf (asm_out_file, "IND_");
- fprintf (asm_out_file, "FUNC_MAP \"%s\"",
+ fprintf (asm_out_file, "IND_FUNC_MAP");
+ else
+ fprintf (asm_out_file, "FUNC_MAP");
+ fprintf (asm_out_file, " \"%s\"",
IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
for (; dims; dims = TREE_CHAIN (dims))
--
2.34.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [committed] Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
2023-11-09 12:24 ` Thomas Schwinge
2023-11-09 16:00 ` Tobias Burnus
2024-01-03 14:47 ` [committed] " Kwok Cheung Yeung
@ 2024-01-03 15:54 ` Kwok Cheung Yeung
2 siblings, 0 replies; 28+ messages in thread
From: Kwok Cheung Yeung @ 2024-01-03 15:54 UTC (permalink / raw)
To: Thomas Schwinge; +Cc: Tobias Burnus, gcc-patches, Jakub Jelinek
[-- Attachment #1: Type: text/plain, Size: 3047 bytes --]
On 09/11/2023 12:24 pm, Thomas Schwinge wrote:
>> --- a/gcc/tree-core.h
>> +++ b/gcc/tree-core.h
>> @@ -350,6 +350,9 @@ enum omp_clause_code {
>> /* OpenMP clause: doacross ({source,sink}:vec). */
>> OMP_CLAUSE_DOACROSS,
>>
>> + /* OpenMP clause: indirect [(constant-integer-expression)]. */
>> + OMP_CLAUSE_INDIRECT,
>> +
>> /* Internal structure to hold OpenACC cache directive's variable-list.
>> #pragma acc cache (variable-list). */
>> OMP_CLAUSE__CACHE_,
>
> In this position here, isn't 'OMP_CLAUSE_INDIRECT' applicable to the
> 'OMP_CLAUSE_RANGE_CHECK' in 'gcc/tree.h:OMP_CLAUSE_SIZE' and
> 'gcc/tree.h:OMP_CLAUSE_DECL':
>
> #define OMP_CLAUSE_SIZE(NODE) \
> OMP_CLAUSE_OPERAND (OMP_CLAUSE_RANGE_CHECK (OMP_CLAUSE_CHECK (NODE), \
> OMP_CLAUSE_FROM, \
> OMP_CLAUSE__CACHE_), 1)
>
> #define OMP_CLAUSE_DECL(NODE) \
> OMP_CLAUSE_OPERAND (OMP_CLAUSE_RANGE_CHECK (OMP_CLAUSE_CHECK (NODE), \
> OMP_CLAUSE_PRIVATE, \
> OMP_CLAUSE__SCANTEMP_), 0)
>
> That's probably not intentional? In that case, maybe simply move it at
> the end of the clause list? (..., and generally then match that ordering
> in any 'switch'es, as applicable, and likewise position
> 'gcc/tree.h:OMP_CLAUSE_INDIRECT_EXPR' correspondingly.)
I have moved OMP_CLAUSE_INDIRECT to just before OMP_CLAUSE__SIMDUID_ so
that it is outside the range checked by OMP_CLAUSE_SIZE and
OMP_CLAUSE_DECL. I have also moved its handling in
c(p)_parser_omp_clause_name so that the alphabetical ordering is
preserved. Committed as trivial.
> I would've assumed handling for 'OMP_CLAUSE_INDIRECT' to also be
> necessary in the following places:
>
> - 'gcc/c-family/c-omp.cc:c_omp_split_clauses'
> - 'gcc/cp/pt.cc:tsubst_omp_clauses',
> - 'gcc/gimplify.cc:gimplify_scan_omp_clauses',
> 'gcc/gimplify.cc:gimplify_adjust_omp_clauses'
> - 'gcc/omp-low.cc:scan_sharing_clauses' (twice)
> - 'gcc/tree-nested.cc:convert_nonlocal_omp_clauses',
> 'gcc/tree-nested.cc:convert_local_omp_clauses'
> - 'gcc/tree-pretty-print.cc:dump_omp_clause'
>
> Please verify, and add handling as well as test cases as necessary, or,
> as applicable, put 'case OMP_CLAUSE_INDIRECT:' next to
> 'default: gcc_unreachable ();' etc., if indeed that clause is not
> expected there.
As Tobias noted, OMP_CLAUSE_INDIRECT never makes it into the middle-end.
It may be generated by c(p)_parser_omp_all_clauses, and if present an
attribute is applied to the function declaration, but at no point is it
directly incorporated into the tree structure. I'm not sure whether it
is best to explicitly list such cases as gcc_unreachable (it might imply
that it can reach the ME, but just not at that particular point?) or not
though.
Kwok
[-- Attachment #2: 0001-openmp-Adjust-position-of-OMP_CLAUSE_INDIRECT-in-Ope.patch --]
[-- Type: text/plain, Size: 4258 bytes --]
From a56a693a74dd3bee71b1266b09dbd753694ace94 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <kcy@codesourcery.com>
Date: Wed, 3 Jan 2024 14:34:39 +0000
Subject: [PATCH] openmp: Adjust position of OMP_CLAUSE_INDIRECT in OpenMP
clauses
Move OMP_CLAUSE_INDIRECT so that it is outside of the range checked by
OMP_CLAUSE_SIZE and OMP_CLAUSE_DECL.
2024-01-03 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/c/
* c-parser.cc (c_parser_omp_clause_name): Move handling of indirect
clause to correspond to alphabetical order.
gcc/cp/
* parser.cc (cp_parser_omp_clause_name): Move handling of indirect
clause to correspond to alphabetical order.
gcc/
* tree-core.h (enum omp_clause_code): Move OMP_CLAUSE_INDIRECT to before
OMP_CLAUSE__SIMDUID_.
* tree.cc (omp_clause_num_ops): Update position of entry for
OMP_CLAUSE_INDIRECT to correspond with omp_clause_code.
(omp_clause_code_name): Likewise.
---
gcc/c/c-parser.cc | 4 ++--
gcc/cp/parser.cc | 4 ++--
gcc/tree-core.h | 6 +++---
gcc/tree.cc | 4 ++--
4 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 64e436010d5..e7b74fb07f0 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -14899,10 +14899,10 @@ c_parser_omp_clause_name (c_parser *parser)
result = PRAGMA_OMP_CLAUSE_IN_REDUCTION;
else if (!strcmp ("inbranch", p))
result = PRAGMA_OMP_CLAUSE_INBRANCH;
- else if (!strcmp ("indirect", p))
- result = PRAGMA_OMP_CLAUSE_INDIRECT;
else if (!strcmp ("independent", p))
result = PRAGMA_OACC_CLAUSE_INDEPENDENT;
+ else if (!strcmp ("indirect", p))
+ result = PRAGMA_OMP_CLAUSE_INDIRECT;
else if (!strcmp ("is_device_ptr", p))
result = PRAGMA_OMP_CLAUSE_IS_DEVICE_PTR;
break;
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 1a6b53933a7..37536faf2cf 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -37645,10 +37645,10 @@ cp_parser_omp_clause_name (cp_parser *parser)
result = PRAGMA_OMP_CLAUSE_IN_REDUCTION;
else if (!strcmp ("inbranch", p))
result = PRAGMA_OMP_CLAUSE_INBRANCH;
- else if (!strcmp ("indirect", p))
- result = PRAGMA_OMP_CLAUSE_INDIRECT;
else if (!strcmp ("independent", p))
result = PRAGMA_OACC_CLAUSE_INDEPENDENT;
+ else if (!strcmp ("indirect", p))
+ result = PRAGMA_OMP_CLAUSE_INDIRECT;
else if (!strcmp ("is_device_ptr", p))
result = PRAGMA_OMP_CLAUSE_IS_DEVICE_PTR;
break;
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index d1c7136c204..8a89462bd7e 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -350,9 +350,6 @@ enum omp_clause_code {
/* OpenMP clause: doacross ({source,sink}:vec). */
OMP_CLAUSE_DOACROSS,
- /* OpenMP clause: indirect [(constant-integer-expression)]. */
- OMP_CLAUSE_INDIRECT,
-
/* Internal structure to hold OpenACC cache directive's variable-list.
#pragma acc cache (variable-list). */
OMP_CLAUSE__CACHE_,
@@ -497,6 +494,9 @@ enum omp_clause_code {
/* OpenMP clause: filter (integer-expression). */
OMP_CLAUSE_FILTER,
+ /* OpenMP clause: indirect [(constant-integer-expression)]. */
+ OMP_CLAUSE_INDIRECT,
+
/* Internally used only clause, holding SIMD uid. */
OMP_CLAUSE__SIMDUID_,
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 82eff2bf34d..8aee3ef18d8 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -269,7 +269,6 @@ unsigned const char omp_clause_num_ops[] =
2, /* OMP_CLAUSE_MAP */
1, /* OMP_CLAUSE_HAS_DEVICE_ADDR */
1, /* OMP_CLAUSE_DOACROSS */
- 1, /* OMP_CLAUSE_INDIRECT */
2, /* OMP_CLAUSE__CACHE_ */
2, /* OMP_CLAUSE_GANG */
1, /* OMP_CLAUSE_ASYNC */
@@ -316,6 +315,7 @@ unsigned const char omp_clause_num_ops[] =
0, /* OMP_CLAUSE_ORDER */
0, /* OMP_CLAUSE_BIND */
1, /* OMP_CLAUSE_FILTER */
+ 1, /* OMP_CLAUSE_INDIRECT */
1, /* OMP_CLAUSE__SIMDUID_ */
0, /* OMP_CLAUSE__SIMT_ */
0, /* OMP_CLAUSE_INDEPENDENT */
@@ -362,7 +362,6 @@ const char * const omp_clause_code_name[] =
"map",
"has_device_addr",
"doacross",
- "indirect",
"_cache_",
"gang",
"async",
@@ -409,6 +408,7 @@ const char * const omp_clause_code_name[] =
"order",
"bind",
"filter",
+ "indirect",
"_simduid_",
"_simt_",
"independent",
--
2.34.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls
2023-11-03 19:53 ` Kwok Cheung Yeung
2023-11-06 8:48 ` Tobias Burnus
2023-11-09 12:24 ` Thomas Schwinge
@ 2024-01-22 20:33 ` Kwok Cheung Yeung
2024-01-24 7:06 ` rep.dot.nop
2024-01-22 20:41 ` [PATCH] openmp, fortran: Add Fortran support for indirect clause on the declare target directive Kwok Cheung Yeung
3 siblings, 1 reply; 28+ messages in thread
From: Kwok Cheung Yeung @ 2024-01-22 20:33 UTC (permalink / raw)
To: gcc-patches, Jakub Jelinek, Tobias Burnus; +Cc: tschwinge
[-- Attachment #1: Type: text/plain, Size: 1366 bytes --]
Hi
There was a bug in the declare-target-indirect-2.c libgomp testcase
(testing indirect calls in offloaded target regions, spread over
multiple teams/threads) that due to an errant fallthrough in a switch
statement resulted in only one indirect function ever getting called:
switch (i % 3)
{
case 0: fn_ptr[i] = &foo; // Missing break
case 1: fn_ptr[i] = &bar; // Missing break
case 2: fn_ptr[i] = &baz;
}
However, when the missing break statements are added, the testcase fails
with an invalid memory access. Upon investigation, this is due to the
use of a splay-tree as the lookup structure for indirect addresses, as
the splay-tree moves frequently accessed elements closer to the root
node and so needs locking when used from multiple threads. However, this
would end up partially serialising all the threads and kill performance.
I have switched the lookup structure from a splay tree to a hashtab
instead to avoid locking during lookup.
I have also tidied up the initialisation of the lookup table by calling
it only from the first thread of the first team, instead of redundantly
calling it from every thread and only having the first one reached do
the initialisation. This removes the need for locking during initialisation.
Tested with offloading to NVPTX and GCN with a x86_64 host. Okay for master?
Thanks
Kwok
[-- Attachment #2: 0001-openmp-Change-to-using-a-hashtab-to-lookup-offload-t.patch --]
[-- Type: text/plain, Size: 8229 bytes --]
From 721ec33bec2fddc7ee37e227358e36fec923f8da Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <kcy@codesourcery.com>
Date: Wed, 17 Jan 2024 16:53:40 +0000
Subject: [PATCH 1/2] openmp: Change to using a hashtab to lookup offload
target addresses for indirect function calls
A splay-tree was previously used to lookup equivalent target addresses
for a given host address on offload targets. However, as splay-trees can
modify their structure on lookup, they are not suitable for concurrent
access from separate teams/threads without some form of locking. This
patch changes the lookup data structure to a hashtab instead, which does
not have these issues.
The call to build_indirect_map to initialize the data structure is now
called from just the first thread of the first team to avoid redundant
calls to this function.
2024-01-19 Kwok Cheung Yeung <kcy@codesourcery.com>
libgomp/
* config/accel/target-indirect.c: Include string.h and hashtab.h.
Remove include of splay-tree.h.
(splay_tree_prefix, splay_tree_c): Delete.
(struct indirect_map_t): New.
(hash_entry_type, htab_alloc, htab_free, htab_hash, htab_eq): New.
(GOMP_INDIRECT_ADD_MAP): Remove volatile qualifier.
(USE_SPLAY_TREE_LOOKUP): Rename to...
(USE_HASHTAB_LOOKUP): ..this.
(indirect_map, indirect_array): Delete.
(indirect_htab): New.
(build_indirect_map): Remove locking. Build indirect map using
hashtab.
(GOMP_target_map_indirect_ptr): Use indirect_htab to lookup target
address.
* config/gcn/team.c (gomp_gcn_enter_kernel): Call build_indirect_map
from first thread of first team only.
* config/nvptx/team.c (gomp_nvptx_main): Likewise.
* testsuite/libgomp.c-c++-common/declare-target-indirect-2.c (main):
Add missing break statements.
---
libgomp/config/accel/target-indirect.c | 75 +++++++++++--------
libgomp/config/gcn/team.c | 7 +-
libgomp/config/nvptx/team.c | 9 ++-
.../declare-target-indirect-2.c | 14 ++--
4 files changed, 59 insertions(+), 46 deletions(-)
diff --git a/libgomp/config/accel/target-indirect.c b/libgomp/config/accel/target-indirect.c
index c60fd547cb6..6dad85076d6 100644
--- a/libgomp/config/accel/target-indirect.c
+++ b/libgomp/config/accel/target-indirect.c
@@ -25,22 +25,43 @@
<http://www.gnu.org/licenses/>. */
#include <assert.h>
+#include <string.h>
#include "libgomp.h"
-#define splay_tree_prefix indirect
-#define splay_tree_c
-#include "splay-tree.h"
+struct indirect_map_t
+{
+ void *host_addr;
+ void *target_addr;
+};
+
+typedef struct indirect_map_t *hash_entry_type;
+
+static inline void * htab_alloc (size_t size) { return gomp_malloc (size); }
+static inline void htab_free (void *ptr) { free (ptr); }
+
+#include "hashtab.h"
+
+static inline hashval_t
+htab_hash (hash_entry_type element)
+{
+ return hash_pointer (element->host_addr);
+}
-volatile void **GOMP_INDIRECT_ADDR_MAP = NULL;
+static inline bool
+htab_eq (hash_entry_type x, hash_entry_type y)
+{
+ return x->host_addr == y->host_addr;
+}
+
+void **GOMP_INDIRECT_ADDR_MAP = NULL;
/* Use a splay tree to lookup the target address instead of using a
linear search. */
-#define USE_SPLAY_TREE_LOOKUP
+#define USE_HASHTAB_LOOKUP
-#ifdef USE_SPLAY_TREE_LOOKUP
+#ifdef USE_HASHTAB_LOOKUP
-static struct indirect_splay_tree_s indirect_map;
-static indirect_splay_tree_node indirect_array = NULL;
+static htab_t indirect_htab = NULL;
/* Build the splay tree used for host->target address lookups. */
@@ -48,37 +69,29 @@ void
build_indirect_map (void)
{
size_t num_ind_funcs = 0;
- volatile void **map_entry;
- static int lock = 0; /* == gomp_mutex_t lock; gomp_mutex_init (&lock); */
+ void **map_entry;
if (!GOMP_INDIRECT_ADDR_MAP)
return;
- gomp_mutex_lock (&lock);
-
- if (!indirect_array)
+ if (!indirect_htab)
{
/* Count the number of entries in the NULL-terminated address map. */
for (map_entry = GOMP_INDIRECT_ADDR_MAP; *map_entry;
map_entry += 2, num_ind_funcs++);
- /* Build splay tree for address lookup. */
- indirect_array = gomp_malloc (num_ind_funcs * sizeof (*indirect_array));
- indirect_splay_tree_node array = indirect_array;
+ /* Build hashtab for address lookup. */
+ indirect_htab = htab_create (num_ind_funcs);
map_entry = GOMP_INDIRECT_ADDR_MAP;
- for (int i = 0; i < num_ind_funcs; i++, array++)
+ for (int i = 0; i < num_ind_funcs; i++, map_entry += 2)
{
- indirect_splay_tree_key k = &array->key;
- k->host_addr = (uint64_t) *map_entry++;
- k->target_addr = (uint64_t) *map_entry++;
- array->left = NULL;
- array->right = NULL;
- indirect_splay_tree_insert (&indirect_map, array);
+ struct indirect_map_t element = { *map_entry, NULL };
+ hash_entry_type *slot = htab_find_slot (&indirect_htab, &element,
+ INSERT);
+ *slot = (hash_entry_type) map_entry;
}
}
-
- gomp_mutex_unlock (&lock);
}
void *
@@ -88,15 +101,11 @@ GOMP_target_map_indirect_ptr (void *ptr)
if (!ptr)
return ptr;
- assert (indirect_array);
-
- struct indirect_splay_tree_key_s k;
- indirect_splay_tree_key node = NULL;
-
- k.host_addr = (uint64_t) ptr;
- node = indirect_splay_tree_lookup (&indirect_map, &k);
+ assert (indirect_htab);
- return node ? (void *) node->target_addr : ptr;
+ struct indirect_map_t element = { ptr, NULL };
+ hash_entry_type entry = htab_find (indirect_htab, &element);
+ return entry ? entry->target_addr : ptr;
}
#else
diff --git a/libgomp/config/gcn/team.c b/libgomp/config/gcn/team.c
index 61e9c616b67..bd3df448b52 100644
--- a/libgomp/config/gcn/team.c
+++ b/libgomp/config/gcn/team.c
@@ -52,14 +52,15 @@ gomp_gcn_enter_kernel (void)
{
int threadid = __builtin_gcn_dim_pos (1);
- /* Initialize indirect function support. */
- build_indirect_map ();
-
if (threadid == 0)
{
int numthreads = __builtin_gcn_dim_size (1);
int teamid = __builtin_gcn_dim_pos(0);
+ /* Initialize indirect function support. */
+ if (teamid == 0)
+ build_indirect_map ();
+
/* Set up the global state.
Every team will do this, but that should be harmless. */
gomp_global_icv.nthreads_var = 16;
diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c
index 0cf5dad39ca..d5361917a24 100644
--- a/libgomp/config/nvptx/team.c
+++ b/libgomp/config/nvptx/team.c
@@ -60,9 +60,6 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data)
asm ("mov.u32 %0, %%tid.y;" : "=r" (tid));
asm ("mov.u32 %0, %%ntid.y;" : "=r" (ntids));
- /* Initialize indirect function support. */
- build_indirect_map ();
-
if (tid == 0)
{
gomp_global_icv.nthreads_var = ntids;
@@ -74,6 +71,12 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data)
nvptx_thrs = alloca (ntids * sizeof (*nvptx_thrs));
memset (nvptx_thrs, 0, ntids * sizeof (*nvptx_thrs));
+ /* Initialize indirect function support. */
+ unsigned int block_id;
+ asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id));
+ if (block_id == 0)
+ build_indirect_map ();
+
/* Find the low-latency heap details .... */
uint32_t *shared_pool;
uint32_t shared_pool_size = 0;
diff --git a/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
index 9fe190efce8..545f1a9fcbf 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
@@ -17,17 +17,17 @@ int main (void)
{
switch (i % 3)
{
- case 0: fn_ptr[i] = &foo;
- case 1: fn_ptr[i] = &bar;
- case 2: fn_ptr[i] = &baz;
+ case 0: fn_ptr[i] = &foo; break;
+ case 1: fn_ptr[i] = &bar; break;
+ case 2: fn_ptr[i] = &baz; break;
}
expected += (*fn_ptr[i]) ();
}
-#pragma omp target teams distribute parallel for reduction(+: x) \
- map (to: fn_ptr) map (tofrom: x)
- for (int i = 0; i < N; i++)
- x += (*fn_ptr[i]) ();
+ #pragma omp target teams distribute parallel for \
+ reduction (+: x) map (to: fn_ptr) map (tofrom: x)
+ for (int i = 0; i < N; i++)
+ x += (*fn_ptr[i]) ();
return x - expected;
}
--
2.34.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] openmp, fortran: Add Fortran support for indirect clause on the declare target directive
2023-11-03 19:53 ` Kwok Cheung Yeung
` (2 preceding siblings ...)
2024-01-22 20:33 ` [PATCH] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls Kwok Cheung Yeung
@ 2024-01-22 20:41 ` Kwok Cheung Yeung
2024-01-23 19:14 ` Tobias Burnus
3 siblings, 1 reply; 28+ messages in thread
From: Kwok Cheung Yeung @ 2024-01-22 20:41 UTC (permalink / raw)
To: gcc-patches, fortran, Jakub Jelinek, Tobias Burnus
[-- Attachment #1: Type: text/plain, Size: 341 bytes --]
Hi
This patch adds support for the indirect clause on the OpenMP 'declare
target' directive in Fortran. As with the C and C++ front-ends, this
applies the 'omp declare target indirect' attribute on affected function
declarations. The C test cases have also been translated to Fortran
where appropriate.
Okay for mainline?
Thanks
Kwok
[-- Attachment #2: 0002-openmp-fortran-Add-Fortran-support-for-indirect-clau.patch --]
[-- Type: text/plain, Size: 13797 bytes --]
From 545bdb2c8ab9a43e79c7a3a2992bd9edc7d08a6f Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <kcy@codesourcery.com>
Date: Thu, 11 Jan 2024 19:52:53 +0000
Subject: [PATCH 2/2] openmp, fortran: Add Fortran support for indirect clause
on the declare target directive
2024-01-19 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/fortran/
* dump-parse-tree.cc (show_attr): Handle omp_declare_target_indirect
attribute.
* f95-lang.cc (gfc_gnu_attributes): Add entry for 'omp declare
target indirect'.
* gfortran.h (symbol_attribute): Add omp_declare_target_indirect
field.
(struct gfc_omp_clauses): Add indirect field.
* openmp.cc (omp_mask2): Add OMP_CLAUSE_INDIRECT.
(gfc_match_omp_clauses): Match indirect clause.
(OMP_DECLARE_TARGET_CLAUSES): Add OMP_CLAUSE_INDIRECT.
(gfc_match_omp_declare_target): Check omp_device_type and apply
omp_declare_target_indirect attribute to symbol if indirect clause
active.
* trans-decl.cc (add_attributes_to_decl): Add 'omp declare target
indirect' attribute if symbol has indirect attribute set.
gcc/testsuite/
* gfortran.dg/gomp/declare-target-indirect-1.f90: New.
* gfortran.dg/gomp/declare-target-indirect-2.f90: New.
libgomp/
* testsuite/libgomp.fortran/declare-target-indirect-1.f90: New.
* testsuite/libgomp.fortran/declare-target-indirect-2.f90: New.
---
gcc/fortran/dump-parse-tree.cc | 2 +
gcc/fortran/f95-lang.cc | 2 +
gcc/fortran/gfortran.h | 3 +-
gcc/fortran/openmp.cc | 45 +++++++++++++-
gcc/fortran/trans-decl.cc | 4 ++
.../gomp/declare-target-indirect-1.f90 | 58 +++++++++++++++++++
.../gomp/declare-target-indirect-2.f90 | 25 ++++++++
.../declare-target-indirect-1.f90 | 39 +++++++++++++
.../declare-target-indirect-2.f90 | 53 +++++++++++++++++
9 files changed, 229 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-1.f90
create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-2.f90
create mode 100644 libgomp/testsuite/libgomp.fortran/declare-target-indirect-1.f90
create mode 100644 libgomp/testsuite/libgomp.fortran/declare-target-indirect-2.f90
diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 1563b810b98..7b154eb3ca7 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -914,6 +914,8 @@ show_attr (symbol_attribute *attr, const char * module)
fputs (" OMP-DECLARE-TARGET", dumpfile);
if (attr->omp_declare_target_link)
fputs (" OMP-DECLARE-TARGET-LINK", dumpfile);
+ if (attr->omp_declare_target_indirect)
+ fputs (" OMP-DECLARE-TARGET-INDIRECT", dumpfile);
if (attr->elemental)
fputs (" ELEMENTAL", dumpfile);
if (attr->pure)
diff --git a/gcc/fortran/f95-lang.cc b/gcc/fortran/f95-lang.cc
index 358cb17fce2..67fda27aa3e 100644
--- a/gcc/fortran/f95-lang.cc
+++ b/gcc/fortran/f95-lang.cc
@@ -96,6 +96,8 @@ static const attribute_spec gfc_gnu_attributes[] =
gfc_handle_omp_declare_target_attribute, NULL },
{ "omp declare target link", 0, 0, true, false, false, false,
gfc_handle_omp_declare_target_attribute, NULL },
+ { "omp declare target indirect", 0, 0, true, false, false, false,
+ gfc_handle_omp_declare_target_attribute, NULL },
{ "oacc function", 0, -1, true, false, false, false,
gfc_handle_omp_declare_target_attribute, NULL },
};
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index fd73e4ce431..fd843a3241d 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -999,6 +999,7 @@ typedef struct
/* Mentioned in OMP DECLARE TARGET. */
unsigned omp_declare_target:1;
unsigned omp_declare_target_link:1;
+ unsigned omp_declare_target_indirect:1;
ENUM_BITFIELD (gfc_omp_device_type) omp_device_type:2;
unsigned omp_allocate:1;
@@ -1584,7 +1585,7 @@ typedef struct gfc_omp_clauses
unsigned grainsize_strict:1, num_tasks_strict:1, compare:1, weak:1;
unsigned non_rectangular:1, order_concurrent:1;
unsigned contains_teams_construct:1, target_first_st_is_teams:1;
- unsigned contained_in_target_construct:1;
+ unsigned contained_in_target_construct:1, indirect:1;
ENUM_BITFIELD (gfc_omp_sched_kind) sched_kind:3;
ENUM_BITFIELD (gfc_omp_device_type) device_type:2;
ENUM_BITFIELD (gfc_omp_memorder) memorder:3;
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 0af80d54fad..d1c5c323c54 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -1096,6 +1096,7 @@ enum omp_mask2
OMP_CLAUSE_DOACROSS, /* OpenMP 5.2 */
OMP_CLAUSE_ASSUMPTIONS, /* OpenMP 5.1. */
OMP_CLAUSE_USES_ALLOCATORS, /* OpenMP 5.0 */
+ OMP_CLAUSE_INDIRECT, /* OpenMP 5.1 */
/* This must come last. */
OMP_MASK2_LAST
};
@@ -2798,6 +2799,32 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
needs_space = true;
continue;
}
+ if ((mask & OMP_CLAUSE_INDIRECT)
+ && (m = gfc_match_dupl_check (!c->indirect, "indirect"))
+ != MATCH_NO)
+ {
+ if (m == MATCH_ERROR)
+ goto error;
+ gfc_expr *indirect_expr = NULL;
+ m = gfc_match (" ( %e )", &indirect_expr);
+ if (m == MATCH_YES)
+ {
+ if (!gfc_resolve_expr (indirect_expr)
+ || indirect_expr->ts.type != BT_LOGICAL
+ || indirect_expr->expr_type != EXPR_CONSTANT)
+ {
+ gfc_error ("INDIRECT clause at %C requires a constant "
+ "logical expression");
+ gfc_free_expr (indirect_expr);
+ goto error;
+ }
+ c->indirect = indirect_expr->value.logical;
+ gfc_free_expr (indirect_expr);
+ }
+ else
+ c->indirect = 1;
+ continue;
+ }
if ((mask & OMP_CLAUSE_IS_DEVICE_PTR)
&& gfc_match_omp_variable_list
("is_device_ptr (",
@@ -4460,7 +4487,7 @@ cleanup:
(omp_mask (OMP_CLAUSE_THREADS) | OMP_CLAUSE_SIMD)
#define OMP_DECLARE_TARGET_CLAUSES \
(omp_mask (OMP_CLAUSE_ENTER) | OMP_CLAUSE_LINK | OMP_CLAUSE_DEVICE_TYPE \
- | OMP_CLAUSE_TO)
+ | OMP_CLAUSE_TO | OMP_CLAUSE_INDIRECT)
#define OMP_ATOMIC_CLAUSES \
(omp_mask (OMP_CLAUSE_ATOMIC) | OMP_CLAUSE_CAPTURE | OMP_CLAUSE_HINT \
| OMP_CLAUSE_MEMORDER | OMP_CLAUSE_COMPARE | OMP_CLAUSE_FAIL \
@@ -5513,6 +5540,15 @@ gfc_match_omp_declare_target (void)
n->sym->name, &n->where);
n->sym->attr.omp_device_type = c->device_type;
}
+ if (c->indirect)
+ {
+ if (n->sym->attr.omp_device_type != OMP_DEVICE_TYPE_UNSET
+ && n->sym->attr.omp_device_type != OMP_DEVICE_TYPE_ANY)
+ gfc_error_now ("DEVICE_TYPE must be ANY when used with "
+ "INDIRECT at %L", &n->where);
+ n->sym->attr.omp_declare_target_indirect = c->indirect;
+ }
+
n->sym->mark = 1;
}
else if (n->u.common->omp_declare_target
@@ -5558,6 +5594,13 @@ gfc_match_omp_declare_target (void)
" TARGET directive to a different DEVICE_TYPE",
s->name, &n->where);
s->attr.omp_device_type = c->device_type;
+
+ if (c->indirect
+ && s->attr.omp_device_type != OMP_DEVICE_TYPE_UNSET
+ && s->attr.omp_device_type != OMP_DEVICE_TYPE_ANY)
+ gfc_error_now ("DEVICE_TYPE must be ANY when used with "
+ "INDIRECT at %L", &n->where);
+ s->attr.omp_declare_target_indirect = c->indirect;
}
}
if (c->device_type
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index de162f6cc75..6d463036966 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -1526,6 +1526,10 @@ add_attributes_to_decl (symbol_attribute sym_attr, tree list)
list = tree_cons (get_identifier ("omp declare target"),
clauses, list);
+ if (sym_attr.omp_declare_target_indirect)
+ list = tree_cons (get_identifier ("omp declare target indirect"),
+ clauses, list);
+
return list;
}
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-1.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-1.f90
new file mode 100644
index 00000000000..560a0541e9a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-1.f90
@@ -0,0 +1,58 @@
+! { dg-do compile }
+! { dg-options "-fopenmp" }
+
+module m
+ integer :: a
+ integer, parameter :: X = 1
+ integer, parameter :: Y = 2
+
+ ! Indirect on a variable should have no effect.
+ integer :: z
+ !$omp declare target to (z) indirect
+contains
+ subroutine sub1
+ !$omp declare target indirect to (sub1)
+ end subroutine
+
+ subroutine sub2
+ !$omp declare target enter (sub2) indirect (.true.)
+ end subroutine
+
+ subroutine sub3
+ !$omp declare target to (sub3) indirect (.false.)
+ end subroutine
+
+ subroutine sub4
+ !$omp declare target to (sub4) indirect (1) ! { dg-error "INDIRECT clause at .1. requires a constant logical expression" }
+ end subroutine
+
+ ! Compile-time non-constant expressions are not allowed.
+ subroutine sub5
+ !$omp declare target indirect (a > 0) to (sub5) ! { dg-error "INDIRECT clause at .1. requires a constant logical expression" }
+ end subroutine
+
+ ! Compile-time constant expressions are permissible.
+ subroutine sub6
+ !$omp declare target indirect (X .eq. Y) to (sub6)
+ end subroutine
+
+ subroutine sub7
+ !$omp declare target indirect (.true.) indirect (.false.) to (sub7) ! { dg-error "Duplicated .indirect. clause at .1." }
+ end subroutine
+
+ subroutine sub8
+ !$omp declare target to (sub8) indirect ("abs") ! { dg-error "INDIRECT clause at .1. requires a constant logical expression" }
+ end subroutine
+
+ subroutine sub9
+ !$omp declare target to (sub9) indirect (5.5) ! { dg-error "INDIRECT clause at .1. requires a constant logical expression" }
+ end subroutine
+
+ subroutine sub10
+ !$omp declare target indirect (.true.) device_type (host) enter (sub10) ! { dg-error "DEVICE_TYPE must be ANY when used with INDIRECT at .1." }
+ end subroutine
+
+ subroutine sub11
+ !$omp declare target indirect (.false.) device_type (nohost) enter (sub11)
+ end subroutine
+end module
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-2.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-2.f90
new file mode 100644
index 00000000000..f6b3ae17856
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-2.f90
@@ -0,0 +1,25 @@
+! { dg-do compile }
+! { dg-options "-fopenmp -fdump-tree-gimple" }
+
+module m
+contains
+ subroutine sub1
+ !$omp declare target indirect enter (sub1)
+ end subroutine
+ ! { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target, omp declare target indirect\\\)\\\)\\\n.*\\\nvoid sub1" "gimple" } }
+
+ subroutine sub2
+ !$omp declare target indirect (.false.) to (sub2)
+ end subroutine
+ ! { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target\\\)\\\)\\\n.*\\\nvoid sub2" "gimple" } }
+
+ subroutine sub3
+ !$omp declare target indirect (.true.) to (sub3)
+ end subroutine
+ ! { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target, omp declare target indirect\\\)\\\)\\\n.*\\\nvoid sub3" "gimple" } }
+
+ subroutine sub4
+ !$omp declare target indirect (.false.) enter (sub4)
+ end subroutine
+ ! { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target\\\)\\\)\\\n.*\\\nvoid sub4" "gimple" } }
+end module
diff --git a/libgomp/testsuite/libgomp.fortran/declare-target-indirect-1.f90 b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-1.f90
new file mode 100644
index 00000000000..39a91dfcdca
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-1.f90
@@ -0,0 +1,39 @@
+! { dg-do run }
+
+module m
+contains
+ integer function foo ()
+ !$omp declare target to (foo) indirect
+ foo = 5
+ end function
+
+ integer function bar ()
+ !$omp declare target to (bar) indirect
+ bar = 8
+ end function
+
+ integer function baz ()
+ !$omp declare target to (baz) indirect
+ baz = 11
+ end function
+end module
+
+program main
+ use m
+ implicit none
+
+ integer :: x, expected
+ procedure (foo), pointer :: foo_ptr, bar_ptr, baz_ptr
+
+ foo_ptr => foo
+ bar_ptr => bar
+ baz_ptr => baz
+
+ expected = foo () + bar () + baz ()
+
+ !$omp target map (to: foo_ptr, bar_ptr, baz_ptr) map (from: x)
+ x = foo_ptr () + bar_ptr () + baz_ptr ()
+ !$omp end target
+
+ stop x - expected
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/declare-target-indirect-2.f90 b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-2.f90
new file mode 100644
index 00000000000..d3baa81dd07
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-2.f90
@@ -0,0 +1,53 @@
+! { dg-do run }
+
+module m
+contains
+ integer function foo ()
+ !$omp declare target to (foo) indirect
+ foo = 5
+ end function
+
+ integer function bar ()
+ !$omp declare target to (bar) indirect
+ bar = 8
+ end function
+
+ integer function baz ()
+ !$omp declare target to (baz) indirect
+ baz = 11
+ end function
+end module
+
+program main
+ use m
+ implicit none
+
+ type fp
+ procedure (foo), pointer, nopass :: f => null ()
+ end type
+
+ integer, parameter :: N = 256
+ integer :: i, x = 0, expected = 0;
+ type (fp) :: fn_ptr (N)
+
+ do i = 1, N
+ select case (mod (i, 3))
+ case (0)
+ fn_ptr (i)%f => foo
+ case (1)
+ fn_ptr (i)%f => bar
+ case (2)
+ fn_ptr (i)%f => baz
+ end select
+ expected = expected + fn_ptr (i)%f ()
+ end do
+
+ !$omp target teams distribute parallel do &
+ !$omp & reduction(+: x) map (to: fn_ptr) map (tofrom: x)
+ do i = 1, N
+ x = x + fn_ptr (i)%f ()
+ end do
+ !$omp end target teams distribute parallel do
+
+ stop x - expected
+end program
--
2.34.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] openmp, fortran: Add Fortran support for indirect clause on the declare target directive
2024-01-22 20:41 ` [PATCH] openmp, fortran: Add Fortran support for indirect clause on the declare target directive Kwok Cheung Yeung
@ 2024-01-23 19:14 ` Tobias Burnus
2024-02-05 21:37 ` [PATCH v2] " Kwok Cheung Yeung
0 siblings, 1 reply; 28+ messages in thread
From: Tobias Burnus @ 2024-01-23 19:14 UTC (permalink / raw)
To: Kwok Cheung Yeung, gcc-patches, fortran, Jakub Jelinek
Kwok Cheung Yeung wrote:
> This patch adds support for the indirect clause on the OpenMP 'declare
> target' directive in Fortran. As with the C and C++ front-ends, this
> applies the 'omp declare target indirect' attribute on affected
> function declarations. The C test cases have also been translated to
> Fortran where appropriate.
>
> Okay for mainline?
LGTM – can you also update the following libgomp.texi entry?
@item @code{indirect} clause in @code{declare target} @tab P @tab Only C
and C++
Thanks,
Tobias
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls
2024-01-22 20:33 ` [PATCH] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls Kwok Cheung Yeung
@ 2024-01-24 7:06 ` rep.dot.nop
2024-01-29 17:48 ` [PATCH v2] " Kwok Cheung Yeung
0 siblings, 1 reply; 28+ messages in thread
From: rep.dot.nop @ 2024-01-24 7:06 UTC (permalink / raw)
To: gcc-patches, Kwok Cheung Yeung, gcc-patches, Jakub Jelinek,
Tobias Burnus
Cc: tschwinge
On 22 January 2024 21:33:17 CET, Kwok Cheung Yeung <kcy@codesourcery.com> wrote:
>Hi
>
>There was a bug in the declare-target-indirect-2.c libgomp testcase (testing indirect calls in offloaded target regions, spread over multiple teams/threads) that due to an errant fallthrough in a switch statement resulted in only one indirect function ever getting called:
>
>switch (i % 3)
> {
> case 0: fn_ptr[i] = &foo; // Missing break
> case 1: fn_ptr[i] = &bar; // Missing break
> case 2: fn_ptr[i] = &baz;
> }
>
>However, when the missing break statements are added, the testcase fails with an invalid memory access. Upon investigation, this is due to the use of a splay-tree as the lookup structure for indirect addresses, as the splay-tree moves frequently accessed elements closer to the root node and so needs locking when used from multiple threads. However, this would end up partially serialising all the threads and kill performance. I have switched the lookup structure from a splay tree to a hashtab instead to avoid locking during lookup.
>
>I have also tidied up the initialisation of the lookup table by calling it only from the first thread of the first team, instead of redundantly calling it from every thread and only having the first one reached do the initialisation. This removes the need for locking during initialisation.
>
>Tested with offloading to NVPTX and GCN with a x86_64 host. Okay for master?
Can you please akso update the comments to talk about hashtab instead of splay?
TIA
>
>Thanks
>
>Kwok
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v2] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls
2024-01-24 7:06 ` rep.dot.nop
@ 2024-01-29 17:48 ` Kwok Cheung Yeung
2024-03-08 13:40 ` Thomas Schwinge
2024-03-14 11:38 ` Tobias Burnus
0 siblings, 2 replies; 28+ messages in thread
From: Kwok Cheung Yeung @ 2024-01-29 17:48 UTC (permalink / raw)
To: rep.dot.nop, gcc-patches; +Cc: Tobias Burnus, Jakub Jelinek
[-- Attachment #1: Type: text/plain, Size: 202 bytes --]
> Can you please akso update the comments to talk about hashtab instead of splay?
>
Hello
This version has the comments updated and removes a stray 'volatile' in
the #ifdefed out code.
Thanks
Kwok
[-- Attachment #2: 0001-openmp-Change-to-using-a-hashtab-to-lookup-offload-t.patch --]
[-- Type: text/plain, Size: 8766 bytes --]
From 5737298f4f5e5471667b05e207b22c9c91b94ca0 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <kcyeung@baylibre.com>
Date: Mon, 29 Jan 2024 17:40:04 +0000
Subject: [PATCH 1/2] openmp: Change to using a hashtab to lookup offload
target addresses for indirect function calls
A splay-tree was previously used to lookup equivalent target addresses
for a given host address on offload targets. However, as splay-trees can
modify their structure on lookup, they are not suitable for concurrent
access from separate teams/threads without some form of locking. This
patch changes the lookup data structure to a hashtab instead, which does
not have these issues.
The call to build_indirect_map to initialize the data structure is now
called from just the first thread of the first team to avoid redundant
calls to this function.
2024-01-29 Kwok Cheung Yeung <kcy@baylibre.com>
libgomp/
* config/accel/target-indirect.c: Include string.h and hashtab.h.
Remove include of splay-tree.h. Update comments.
(splay_tree_prefix, splay_tree_c): Delete.
(struct indirect_map_t): New.
(hash_entry_type, htab_alloc, htab_free, htab_hash, htab_eq): New.
(GOMP_INDIRECT_ADD_MAP): Remove volatile qualifier.
(USE_SPLAY_TREE_LOOKUP): Rename to...
(USE_HASHTAB_LOOKUP): ..this.
(indirect_map, indirect_array): Delete.
(indirect_htab): New.
(build_indirect_map): Remove locking. Build indirect map using
hashtab.
(GOMP_target_map_indirect_ptr): Use indirect_htab to lookup target
address.
(GOMP_target_map_indirect_ptr): Remove volatile qualifier.
* config/gcn/team.c (gomp_gcn_enter_kernel): Call build_indirect_map
from first thread of first team only.
* config/nvptx/team.c (gomp_nvptx_main): Likewise.
* testsuite/libgomp.c-c++-common/declare-target-indirect-2.c (main):
Add missing break statements.
---
libgomp/config/accel/target-indirect.c | 83 ++++++++++---------
libgomp/config/gcn/team.c | 7 +-
libgomp/config/nvptx/team.c | 9 +-
.../declare-target-indirect-2.c | 14 ++--
4 files changed, 63 insertions(+), 50 deletions(-)
diff --git a/libgomp/config/accel/target-indirect.c b/libgomp/config/accel/target-indirect.c
index c60fd547cb6..cfef1ddbc49 100644
--- a/libgomp/config/accel/target-indirect.c
+++ b/libgomp/config/accel/target-indirect.c
@@ -25,60 +25,73 @@
<http://www.gnu.org/licenses/>. */
#include <assert.h>
+#include <string.h>
#include "libgomp.h"
-#define splay_tree_prefix indirect
-#define splay_tree_c
-#include "splay-tree.h"
+struct indirect_map_t
+{
+ void *host_addr;
+ void *target_addr;
+};
+
+typedef struct indirect_map_t *hash_entry_type;
+
+static inline void * htab_alloc (size_t size) { return gomp_malloc (size); }
+static inline void htab_free (void *ptr) { free (ptr); }
+
+#include "hashtab.h"
+
+static inline hashval_t
+htab_hash (hash_entry_type element)
+{
+ return hash_pointer (element->host_addr);
+}
-volatile void **GOMP_INDIRECT_ADDR_MAP = NULL;
+static inline bool
+htab_eq (hash_entry_type x, hash_entry_type y)
+{
+ return x->host_addr == y->host_addr;
+}
-/* Use a splay tree to lookup the target address instead of using a
- linear search. */
-#define USE_SPLAY_TREE_LOOKUP
+void **GOMP_INDIRECT_ADDR_MAP = NULL;
-#ifdef USE_SPLAY_TREE_LOOKUP
+/* Use a hashtab to lookup the target address instead of using a linear
+ search. */
+#define USE_HASHTAB_LOOKUP
-static struct indirect_splay_tree_s indirect_map;
-static indirect_splay_tree_node indirect_array = NULL;
+#ifdef USE_HASHTAB_LOOKUP
-/* Build the splay tree used for host->target address lookups. */
+static htab_t indirect_htab = NULL;
+
+/* Build the hashtab used for host->target address lookups. */
void
build_indirect_map (void)
{
size_t num_ind_funcs = 0;
- volatile void **map_entry;
- static int lock = 0; /* == gomp_mutex_t lock; gomp_mutex_init (&lock); */
+ void **map_entry;
if (!GOMP_INDIRECT_ADDR_MAP)
return;
- gomp_mutex_lock (&lock);
-
- if (!indirect_array)
+ if (!indirect_htab)
{
/* Count the number of entries in the NULL-terminated address map. */
for (map_entry = GOMP_INDIRECT_ADDR_MAP; *map_entry;
map_entry += 2, num_ind_funcs++);
- /* Build splay tree for address lookup. */
- indirect_array = gomp_malloc (num_ind_funcs * sizeof (*indirect_array));
- indirect_splay_tree_node array = indirect_array;
+ /* Build hashtab for address lookup. */
+ indirect_htab = htab_create (num_ind_funcs);
map_entry = GOMP_INDIRECT_ADDR_MAP;
- for (int i = 0; i < num_ind_funcs; i++, array++)
+ for (int i = 0; i < num_ind_funcs; i++, map_entry += 2)
{
- indirect_splay_tree_key k = &array->key;
- k->host_addr = (uint64_t) *map_entry++;
- k->target_addr = (uint64_t) *map_entry++;
- array->left = NULL;
- array->right = NULL;
- indirect_splay_tree_insert (&indirect_map, array);
+ struct indirect_map_t element = { *map_entry, NULL };
+ hash_entry_type *slot = htab_find_slot (&indirect_htab, &element,
+ INSERT);
+ *slot = (hash_entry_type) map_entry;
}
}
-
- gomp_mutex_unlock (&lock);
}
void *
@@ -88,15 +101,11 @@ GOMP_target_map_indirect_ptr (void *ptr)
if (!ptr)
return ptr;
- assert (indirect_array);
-
- struct indirect_splay_tree_key_s k;
- indirect_splay_tree_key node = NULL;
-
- k.host_addr = (uint64_t) ptr;
- node = indirect_splay_tree_lookup (&indirect_map, &k);
+ assert (indirect_htab);
- return node ? (void *) node->target_addr : ptr;
+ struct indirect_map_t element = { ptr, NULL };
+ hash_entry_type entry = htab_find (indirect_htab, &element);
+ return entry ? entry->target_addr : ptr;
}
#else
@@ -115,7 +124,7 @@ GOMP_target_map_indirect_ptr (void *ptr)
assert (GOMP_INDIRECT_ADDR_MAP);
- for (volatile void **map_entry = GOMP_INDIRECT_ADDR_MAP; *map_entry;
+ for (void **map_entry = GOMP_INDIRECT_ADDR_MAP; *map_entry;
map_entry += 2)
if (*map_entry == ptr)
return (void *) *(map_entry + 1);
diff --git a/libgomp/config/gcn/team.c b/libgomp/config/gcn/team.c
index 61e9c616b67..bd3df448b52 100644
--- a/libgomp/config/gcn/team.c
+++ b/libgomp/config/gcn/team.c
@@ -52,14 +52,15 @@ gomp_gcn_enter_kernel (void)
{
int threadid = __builtin_gcn_dim_pos (1);
- /* Initialize indirect function support. */
- build_indirect_map ();
-
if (threadid == 0)
{
int numthreads = __builtin_gcn_dim_size (1);
int teamid = __builtin_gcn_dim_pos(0);
+ /* Initialize indirect function support. */
+ if (teamid == 0)
+ build_indirect_map ();
+
/* Set up the global state.
Every team will do this, but that should be harmless. */
gomp_global_icv.nthreads_var = 16;
diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c
index 0cf5dad39ca..d5361917a24 100644
--- a/libgomp/config/nvptx/team.c
+++ b/libgomp/config/nvptx/team.c
@@ -60,9 +60,6 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data)
asm ("mov.u32 %0, %%tid.y;" : "=r" (tid));
asm ("mov.u32 %0, %%ntid.y;" : "=r" (ntids));
- /* Initialize indirect function support. */
- build_indirect_map ();
-
if (tid == 0)
{
gomp_global_icv.nthreads_var = ntids;
@@ -74,6 +71,12 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data)
nvptx_thrs = alloca (ntids * sizeof (*nvptx_thrs));
memset (nvptx_thrs, 0, ntids * sizeof (*nvptx_thrs));
+ /* Initialize indirect function support. */
+ unsigned int block_id;
+ asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id));
+ if (block_id == 0)
+ build_indirect_map ();
+
/* Find the low-latency heap details .... */
uint32_t *shared_pool;
uint32_t shared_pool_size = 0;
diff --git a/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
index 9fe190efce8..545f1a9fcbf 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
@@ -17,17 +17,17 @@ int main (void)
{
switch (i % 3)
{
- case 0: fn_ptr[i] = &foo;
- case 1: fn_ptr[i] = &bar;
- case 2: fn_ptr[i] = &baz;
+ case 0: fn_ptr[i] = &foo; break;
+ case 1: fn_ptr[i] = &bar; break;
+ case 2: fn_ptr[i] = &baz; break;
}
expected += (*fn_ptr[i]) ();
}
-#pragma omp target teams distribute parallel for reduction(+: x) \
- map (to: fn_ptr) map (tofrom: x)
- for (int i = 0; i < N; i++)
- x += (*fn_ptr[i]) ();
+ #pragma omp target teams distribute parallel for \
+ reduction (+: x) map (to: fn_ptr) map (tofrom: x)
+ for (int i = 0; i < N; i++)
+ x += (*fn_ptr[i]) ();
return x - expected;
}
--
2.34.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v2] openmp, fortran: Add Fortran support for indirect clause on the declare target directive
2024-01-23 19:14 ` Tobias Burnus
@ 2024-02-05 21:37 ` Kwok Cheung Yeung
2024-02-06 9:03 ` Tobias Burnus
0 siblings, 1 reply; 28+ messages in thread
From: Kwok Cheung Yeung @ 2024-02-05 21:37 UTC (permalink / raw)
To: tburnus; +Cc: Jakub Jelinek, gcc-patches, fortran
[-- Attachment #1: Type: text/plain, Size: 587 bytes --]
Hi
As previously discussed, this version of the patch adds code to emit a
warning when a directive like this:
!$omp declare target indirect(.true.)
is encountered (i.e. a target directive containing at least one clause,
but no to/enter clause, which appears to violate the OpenMP standard). A
test is also added to gfortran.dg/gomp/declare-target-indirect-1.f90 to
test for this.
I have also added a declare-target-indirect-3.f90 test to libgomp to
check that procedures passed via a dummy argument work properly when
used in an indirect call.
Okay for mainline?
Thanks
Kwok
[-- Attachment #2: 0001-openmp-fortran-Add-Fortran-support-for-indirect-clau.patch --]
[-- Type: text/plain, Size: 16179 bytes --]
From f6662a7bc76d400fecb5013ad6d6ab3b00b8a6e7 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <kcyeung@baylibre.com>
Date: Mon, 5 Feb 2024 20:31:49 +0000
Subject: [PATCH] openmp, fortran: Add Fortran support for indirect clause on
the declare target directive
2024-02-05 Kwok Cheung Yeung <kcyeung@baylibre.com>
gcc/fortran/
* dump-parse-tree.cc (show_attr): Handle omp_declare_target_indirect
attribute.
* f95-lang.cc (gfc_gnu_attributes): Add entry for 'omp declare
target indirect'.
* gfortran.h (symbol_attribute): Add omp_declare_target_indirect
field.
(struct gfc_omp_clauses): Add indirect field.
* openmp.cc (omp_mask2): Add OMP_CLAUSE_INDIRECT.
(gfc_match_omp_clauses): Match indirect clause.
(OMP_DECLARE_TARGET_CLAUSES): Add OMP_CLAUSE_INDIRECT.
(gfc_match_omp_declare_target): Check omp_device_type and apply
omp_declare_target_indirect attribute to symbol if indirect clause
active. Show warning if there are only device_type and/or indirect
clauses on the directive.
* trans-decl.cc (add_attributes_to_decl): Add 'omp declare target
indirect' attribute if symbol has indirect attribute set.
gcc/testsuite/
* gfortran.dg/gomp/declare-target-4.f90 (f1): Update expected warning.
* gfortran.dg/gomp/declare-target-indirect-1.f90: New.
* gfortran.dg/gomp/declare-target-indirect-2.f90: New.
libgomp/
* testsuite/libgomp.fortran/declare-target-indirect-1.f90: New.
* testsuite/libgomp.fortran/declare-target-indirect-2.f90: New.
* testsuite/libgomp.fortran/declare-target-indirect-3.f90: New.
---
gcc/fortran/dump-parse-tree.cc | 2 +
gcc/fortran/f95-lang.cc | 2 +
gcc/fortran/gfortran.h | 3 +-
gcc/fortran/openmp.cc | 50 ++++++++++++++-
gcc/fortran/trans-decl.cc | 4 ++
.../gfortran.dg/gomp/declare-target-4.f90 | 2 +-
.../gomp/declare-target-indirect-1.f90 | 62 +++++++++++++++++++
.../gomp/declare-target-indirect-2.f90 | 25 ++++++++
.../declare-target-indirect-1.f90 | 39 ++++++++++++
.../declare-target-indirect-2.f90 | 53 ++++++++++++++++
.../declare-target-indirect-3.f90 | 25 ++++++++
11 files changed, 262 insertions(+), 5 deletions(-)
create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-1.f90
create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-2.f90
create mode 100644 libgomp/testsuite/libgomp.fortran/declare-target-indirect-1.f90
create mode 100644 libgomp/testsuite/libgomp.fortran/declare-target-indirect-2.f90
create mode 100644 libgomp/testsuite/libgomp.fortran/declare-target-indirect-3.f90
diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 1563b810b98..7b154eb3ca7 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -914,6 +914,8 @@ show_attr (symbol_attribute *attr, const char * module)
fputs (" OMP-DECLARE-TARGET", dumpfile);
if (attr->omp_declare_target_link)
fputs (" OMP-DECLARE-TARGET-LINK", dumpfile);
+ if (attr->omp_declare_target_indirect)
+ fputs (" OMP-DECLARE-TARGET-INDIRECT", dumpfile);
if (attr->elemental)
fputs (" ELEMENTAL", dumpfile);
if (attr->pure)
diff --git a/gcc/fortran/f95-lang.cc b/gcc/fortran/f95-lang.cc
index 358cb17fce2..67fda27aa3e 100644
--- a/gcc/fortran/f95-lang.cc
+++ b/gcc/fortran/f95-lang.cc
@@ -96,6 +96,8 @@ static const attribute_spec gfc_gnu_attributes[] =
gfc_handle_omp_declare_target_attribute, NULL },
{ "omp declare target link", 0, 0, true, false, false, false,
gfc_handle_omp_declare_target_attribute, NULL },
+ { "omp declare target indirect", 0, 0, true, false, false, false,
+ gfc_handle_omp_declare_target_attribute, NULL },
{ "oacc function", 0, -1, true, false, false, false,
gfc_handle_omp_declare_target_attribute, NULL },
};
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index fd73e4ce431..fd843a3241d 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -999,6 +999,7 @@ typedef struct
/* Mentioned in OMP DECLARE TARGET. */
unsigned omp_declare_target:1;
unsigned omp_declare_target_link:1;
+ unsigned omp_declare_target_indirect:1;
ENUM_BITFIELD (gfc_omp_device_type) omp_device_type:2;
unsigned omp_allocate:1;
@@ -1584,7 +1585,7 @@ typedef struct gfc_omp_clauses
unsigned grainsize_strict:1, num_tasks_strict:1, compare:1, weak:1;
unsigned non_rectangular:1, order_concurrent:1;
unsigned contains_teams_construct:1, target_first_st_is_teams:1;
- unsigned contained_in_target_construct:1;
+ unsigned contained_in_target_construct:1, indirect:1;
ENUM_BITFIELD (gfc_omp_sched_kind) sched_kind:3;
ENUM_BITFIELD (gfc_omp_device_type) device_type:2;
ENUM_BITFIELD (gfc_omp_memorder) memorder:3;
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 0af80d54fad..30aba4421ff 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -1096,6 +1096,7 @@ enum omp_mask2
OMP_CLAUSE_DOACROSS, /* OpenMP 5.2 */
OMP_CLAUSE_ASSUMPTIONS, /* OpenMP 5.1. */
OMP_CLAUSE_USES_ALLOCATORS, /* OpenMP 5.0 */
+ OMP_CLAUSE_INDIRECT, /* OpenMP 5.1 */
/* This must come last. */
OMP_MASK2_LAST
};
@@ -2798,6 +2799,32 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
needs_space = true;
continue;
}
+ if ((mask & OMP_CLAUSE_INDIRECT)
+ && (m = gfc_match_dupl_check (!c->indirect, "indirect"))
+ != MATCH_NO)
+ {
+ if (m == MATCH_ERROR)
+ goto error;
+ gfc_expr *indirect_expr = NULL;
+ m = gfc_match (" ( %e )", &indirect_expr);
+ if (m == MATCH_YES)
+ {
+ if (!gfc_resolve_expr (indirect_expr)
+ || indirect_expr->ts.type != BT_LOGICAL
+ || indirect_expr->expr_type != EXPR_CONSTANT)
+ {
+ gfc_error ("INDIRECT clause at %C requires a constant "
+ "logical expression");
+ gfc_free_expr (indirect_expr);
+ goto error;
+ }
+ c->indirect = indirect_expr->value.logical;
+ gfc_free_expr (indirect_expr);
+ }
+ else
+ c->indirect = 1;
+ continue;
+ }
if ((mask & OMP_CLAUSE_IS_DEVICE_PTR)
&& gfc_match_omp_variable_list
("is_device_ptr (",
@@ -4460,7 +4487,7 @@ cleanup:
(omp_mask (OMP_CLAUSE_THREADS) | OMP_CLAUSE_SIMD)
#define OMP_DECLARE_TARGET_CLAUSES \
(omp_mask (OMP_CLAUSE_ENTER) | OMP_CLAUSE_LINK | OMP_CLAUSE_DEVICE_TYPE \
- | OMP_CLAUSE_TO)
+ | OMP_CLAUSE_TO | OMP_CLAUSE_INDIRECT)
#define OMP_ATOMIC_CLAUSES \
(omp_mask (OMP_CLAUSE_ATOMIC) | OMP_CLAUSE_CAPTURE | OMP_CLAUSE_HINT \
| OMP_CLAUSE_MEMORDER | OMP_CLAUSE_COMPARE | OMP_CLAUSE_FAIL \
@@ -5513,6 +5540,15 @@ gfc_match_omp_declare_target (void)
n->sym->name, &n->where);
n->sym->attr.omp_device_type = c->device_type;
}
+ if (c->indirect)
+ {
+ if (n->sym->attr.omp_device_type != OMP_DEVICE_TYPE_UNSET
+ && n->sym->attr.omp_device_type != OMP_DEVICE_TYPE_ANY)
+ gfc_error_now ("DEVICE_TYPE must be ANY when used with "
+ "INDIRECT at %L", &n->where);
+ n->sym->attr.omp_declare_target_indirect = c->indirect;
+ }
+
n->sym->mark = 1;
}
else if (n->u.common->omp_declare_target
@@ -5558,15 +5594,23 @@ gfc_match_omp_declare_target (void)
" TARGET directive to a different DEVICE_TYPE",
s->name, &n->where);
s->attr.omp_device_type = c->device_type;
+
+ if (c->indirect
+ && s->attr.omp_device_type != OMP_DEVICE_TYPE_UNSET
+ && s->attr.omp_device_type != OMP_DEVICE_TYPE_ANY)
+ gfc_error_now ("DEVICE_TYPE must be ANY when used with "
+ "INDIRECT at %L", &n->where);
+ s->attr.omp_declare_target_indirect = c->indirect;
}
}
- if (c->device_type
+ if ((c->device_type || c->indirect)
&& !c->lists[OMP_LIST_ENTER]
&& !c->lists[OMP_LIST_TO]
&& !c->lists[OMP_LIST_LINK])
gfc_warning_now (OPT_Wopenmp,
"OMP DECLARE TARGET directive at %L with only "
- "DEVICE_TYPE clause is ignored", &old_loc);
+ "DEVICE_TYPE or INDIRECT clauses is ignored",
+ &old_loc);
gfc_buffer_error (true);
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index de162f6cc75..6d463036966 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -1526,6 +1526,10 @@ add_attributes_to_decl (symbol_attribute sym_attr, tree list)
list = tree_cons (get_identifier ("omp declare target"),
clauses, list);
+ if (sym_attr.omp_declare_target_indirect)
+ list = tree_cons (get_identifier ("omp declare target indirect"),
+ clauses, list);
+
return list;
}
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-target-4.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-target-4.f90
index 4f5de4bd8c7..55534d8fe99 100644
--- a/gcc/testsuite/gfortran.dg/gomp/declare-target-4.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-target-4.f90
@@ -2,7 +2,7 @@
! { dg-additional-options "-fdump-tree-original" }
subroutine f1
- !$omp declare target device_type (any) ! { dg-warning "OMP DECLARE TARGET directive at .1. with only DEVICE_TYPE clause is ignored" }
+ !$omp declare target device_type (any) ! { dg-warning "OMP DECLARE TARGET directive at .1. with only DEVICE_TYPE or INDIRECT clauses is ignored" }
end subroutine
subroutine f2
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-1.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-1.f90
new file mode 100644
index 00000000000..504c1a29813
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-1.f90
@@ -0,0 +1,62 @@
+! { dg-do compile }
+! { dg-options "-fopenmp" }
+
+module m
+ integer :: a
+ integer, parameter :: X = 1
+ integer, parameter :: Y = 2
+
+ ! Indirect on a variable should have no effect.
+ integer :: z
+ !$omp declare target to (z) indirect
+contains
+ subroutine sub1
+ !$omp declare target indirect to (sub1)
+ end subroutine
+
+ subroutine sub2
+ !$omp declare target enter (sub2) indirect (.true.)
+ end subroutine
+
+ subroutine sub3
+ !$omp declare target to (sub3) indirect (.false.)
+ end subroutine
+
+ subroutine sub4
+ !$omp declare target to (sub4) indirect (1) ! { dg-error "INDIRECT clause at .1. requires a constant logical expression" }
+ end subroutine
+
+ ! Compile-time non-constant expressions are not allowed.
+ subroutine sub5
+ !$omp declare target indirect (a > 0) to (sub5) ! { dg-error "INDIRECT clause at .1. requires a constant logical expression" }
+ end subroutine
+
+ ! Compile-time constant expressions are permissible.
+ subroutine sub6
+ !$omp declare target indirect (X .eq. Y) to (sub6)
+ end subroutine
+
+ subroutine sub7
+ !$omp declare target indirect ! { dg-warning "OMP DECLARE TARGET directive at .1. with only DEVICE_TYPE or INDIRECT clauses is ignored" }
+ end subroutine
+
+ subroutine sub8
+ !$omp declare target indirect (.true.) indirect (.false.) to (sub8) ! { dg-error "Duplicated .indirect. clause at .1." }
+ end subroutine
+
+ subroutine sub9
+ !$omp declare target to (sub9) indirect ("abs") ! { dg-error "INDIRECT clause at .1. requires a constant logical expression" }
+ end subroutine
+
+ subroutine sub10
+ !$omp declare target to (sub10) indirect (5.5) ! { dg-error "INDIRECT clause at .1. requires a constant logical expression" }
+ end subroutine
+
+ subroutine sub11
+ !$omp declare target indirect (.true.) device_type (host) enter (sub11) ! { dg-error "DEVICE_TYPE must be ANY when used with INDIRECT at .1." }
+ end subroutine
+
+ subroutine sub12
+ !$omp declare target indirect (.false.) device_type (nohost) enter (sub12)
+ end subroutine
+end module
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-2.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-2.f90
new file mode 100644
index 00000000000..f6b3ae17856
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-2.f90
@@ -0,0 +1,25 @@
+! { dg-do compile }
+! { dg-options "-fopenmp -fdump-tree-gimple" }
+
+module m
+contains
+ subroutine sub1
+ !$omp declare target indirect enter (sub1)
+ end subroutine
+ ! { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target, omp declare target indirect\\\)\\\)\\\n.*\\\nvoid sub1" "gimple" } }
+
+ subroutine sub2
+ !$omp declare target indirect (.false.) to (sub2)
+ end subroutine
+ ! { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target\\\)\\\)\\\n.*\\\nvoid sub2" "gimple" } }
+
+ subroutine sub3
+ !$omp declare target indirect (.true.) to (sub3)
+ end subroutine
+ ! { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target, omp declare target indirect\\\)\\\)\\\n.*\\\nvoid sub3" "gimple" } }
+
+ subroutine sub4
+ !$omp declare target indirect (.false.) enter (sub4)
+ end subroutine
+ ! { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target\\\)\\\)\\\n.*\\\nvoid sub4" "gimple" } }
+end module
diff --git a/libgomp/testsuite/libgomp.fortran/declare-target-indirect-1.f90 b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-1.f90
new file mode 100644
index 00000000000..39a91dfcdca
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-1.f90
@@ -0,0 +1,39 @@
+! { dg-do run }
+
+module m
+contains
+ integer function foo ()
+ !$omp declare target to (foo) indirect
+ foo = 5
+ end function
+
+ integer function bar ()
+ !$omp declare target to (bar) indirect
+ bar = 8
+ end function
+
+ integer function baz ()
+ !$omp declare target to (baz) indirect
+ baz = 11
+ end function
+end module
+
+program main
+ use m
+ implicit none
+
+ integer :: x, expected
+ procedure (foo), pointer :: foo_ptr, bar_ptr, baz_ptr
+
+ foo_ptr => foo
+ bar_ptr => bar
+ baz_ptr => baz
+
+ expected = foo () + bar () + baz ()
+
+ !$omp target map (to: foo_ptr, bar_ptr, baz_ptr) map (from: x)
+ x = foo_ptr () + bar_ptr () + baz_ptr ()
+ !$omp end target
+
+ stop x - expected
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/declare-target-indirect-2.f90 b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-2.f90
new file mode 100644
index 00000000000..d3baa81dd07
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-2.f90
@@ -0,0 +1,53 @@
+! { dg-do run }
+
+module m
+contains
+ integer function foo ()
+ !$omp declare target to (foo) indirect
+ foo = 5
+ end function
+
+ integer function bar ()
+ !$omp declare target to (bar) indirect
+ bar = 8
+ end function
+
+ integer function baz ()
+ !$omp declare target to (baz) indirect
+ baz = 11
+ end function
+end module
+
+program main
+ use m
+ implicit none
+
+ type fp
+ procedure (foo), pointer, nopass :: f => null ()
+ end type
+
+ integer, parameter :: N = 256
+ integer :: i, x = 0, expected = 0;
+ type (fp) :: fn_ptr (N)
+
+ do i = 1, N
+ select case (mod (i, 3))
+ case (0)
+ fn_ptr (i)%f => foo
+ case (1)
+ fn_ptr (i)%f => bar
+ case (2)
+ fn_ptr (i)%f => baz
+ end select
+ expected = expected + fn_ptr (i)%f ()
+ end do
+
+ !$omp target teams distribute parallel do &
+ !$omp & reduction(+: x) map (to: fn_ptr) map (tofrom: x)
+ do i = 1, N
+ x = x + fn_ptr (i)%f ()
+ end do
+ !$omp end target teams distribute parallel do
+
+ stop x - expected
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/declare-target-indirect-3.f90 b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-3.f90
new file mode 100644
index 00000000000..ff99892f25c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-3.f90
@@ -0,0 +1,25 @@
+! { dg-do run }
+
+! Check that indirect calls work on procedures passed in via a dummy argument
+
+module m
+contains
+ subroutine bar
+ !$omp declare target enter(bar) indirect
+ end subroutine
+
+ subroutine foo(f)
+ procedure(bar) :: f
+
+ !$omp target
+ call f
+ !$omp end target
+ end subroutine
+end module
+
+program main
+ use m
+ implicit none
+
+ call foo(bar)
+end program
--
2.34.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] openmp, fortran: Add Fortran support for indirect clause on the declare target directive
2024-02-05 21:37 ` [PATCH v2] " Kwok Cheung Yeung
@ 2024-02-06 9:03 ` Tobias Burnus
2024-02-06 9:50 ` Kwok Cheung Yeung
0 siblings, 1 reply; 28+ messages in thread
From: Tobias Burnus @ 2024-02-06 9:03 UTC (permalink / raw)
To: Kwok Cheung Yeung; +Cc: Jakub Jelinek, gcc-patches, fortran
[-- Attachment #1: Type: text/plain, Size: 1627 bytes --]
Kwok Cheung Yeung wrote:
> As previously discussed, this version of the patch adds code to emit a
> warning when a directive like this:
>
> !$omp declare target indirect(.true.)
>
> is encountered (i.e. a target directive containing at least one
> clause, but no to/enter clause, which appears to violate the OpenMP
> standard). A test is also added to
> gfortran.dg/gomp/declare-target-indirect-1.f90 to test for this.
Thanks. And indeed, the 5.1 spec requires under "Restrictions to the
declare target directive are as follows:" "If the directive has a
clause, it must contain at least one 'to' clause or at least one 'link'
clause.". [5.2 replaced 'to' by its alias 'enter' and the 6.0 preview
added 'local' to the list.]
> I have also added a declare-target-indirect-3.f90 test to libgomp to
> check that procedures passed via a dummy argument work properly when
> used in an indirect call.
>
> Okay for mainline?
LGTM. I just wonder whether there should be a value test and not just a
does-not-crash-when-called test for the latter testcase, i.e.
> +++ b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-3.f90
> @@ -0,0 +1,25 @@
> +! { dg-do run }
> +
> +! Check that indirect calls work on procedures passed in via a dummy argument
> +
> +module m
> +contains
> + subroutine bar
> + !$omp declare target enter(bar) indirect
e.g. "integer function bar()" ... " bar = 42"
> + end subroutine
> +
> + subroutine foo(f)
> + procedure(bar) :: f
> +
> + !$omp target
> + call f
And then: if (f() /= 42) stop 1
> + !$omp end target
> + end subroutine
> +end module
Thanks,
Tobias
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] openmp, fortran: Add Fortran support for indirect clause on the declare target directive
2024-02-06 9:03 ` Tobias Burnus
@ 2024-02-06 9:50 ` Kwok Cheung Yeung
2024-02-12 8:51 ` Tobias Burnus
0 siblings, 1 reply; 28+ messages in thread
From: Kwok Cheung Yeung @ 2024-02-06 9:50 UTC (permalink / raw)
To: Tobias Burnus; +Cc: Jakub Jelinek, gcc-patches, fortran
[-- Attachment #1: Type: text/plain, Size: 912 bytes --]
Oops. I thought exactly the same thing yesterday, but forgot to add the
changes to my commit! Here is the updated version.
Kwok
On 06/02/2024 9:03 am, Tobias Burnus wrote:
> LGTM. I just wonder whether there should be a value test and not just a
> does-not-crash-when-called test for the latter testcase, i.e.
>
>
>> +++ b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-3.f90
>> @@ -0,0 +1,25 @@
>> +! { dg-do run }
>> +
>> +! Check that indirect calls work on procedures passed in via a dummy argument
>> +
>> +module m
>> +contains
>> + subroutine bar
>> + !$omp declare target enter(bar) indirect
> e.g. "integer function bar()" ... " bar = 42"
>> + end subroutine
>> +
>> + subroutine foo(f)
>> + procedure(bar) :: f
>> +
>> + !$omp target
>> + call f
> And then: if (f() /= 42) stop 1
>> + !$omp end target
>> + end subroutine
>> +end module
>
> Thanks,
>
> Tobias
>
[-- Attachment #2: 0001-openmp-fortran-Add-Fortran-support-for-indirect-clau.patch --]
[-- Type: text/plain, Size: 16440 bytes --]
From 83b734aa63aa63ea5bb438bb59ee09b00869e0fd Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <kcyeung@baylibre.com>
Date: Mon, 5 Feb 2024 20:31:49 +0000
Subject: [PATCH] openmp, fortran: Add Fortran support for indirect clause on
the declare target directive
2024-02-05 Kwok Cheung Yeung <kcyeung@baylibre.com>
gcc/fortran/
* dump-parse-tree.cc (show_attr): Handle omp_declare_target_indirect
attribute.
* f95-lang.cc (gfc_gnu_attributes): Add entry for 'omp declare
target indirect'.
* gfortran.h (symbol_attribute): Add omp_declare_target_indirect
field.
(struct gfc_omp_clauses): Add indirect field.
* openmp.cc (omp_mask2): Add OMP_CLAUSE_INDIRECT.
(gfc_match_omp_clauses): Match indirect clause.
(OMP_DECLARE_TARGET_CLAUSES): Add OMP_CLAUSE_INDIRECT.
(gfc_match_omp_declare_target): Check omp_device_type and apply
omp_declare_target_indirect attribute to symbol if indirect clause
active. Show warning if there are only device_type and/or indirect
clauses on the directive.
* trans-decl.cc (add_attributes_to_decl): Add 'omp declare target
indirect' attribute if symbol has indirect attribute set.
gcc/testsuite/
* gfortran.dg/gomp/declare-target-4.f90 (f1): Update expected warning.
* gfortran.dg/gomp/declare-target-indirect-1.f90: New.
* gfortran.dg/gomp/declare-target-indirect-2.f90: New.
libgomp/
* testsuite/libgomp.fortran/declare-target-indirect-1.f90: New.
* testsuite/libgomp.fortran/declare-target-indirect-2.f90: New.
* testsuite/libgomp.fortran/declare-target-indirect-3.f90: New.
---
gcc/fortran/dump-parse-tree.cc | 2 +
gcc/fortran/f95-lang.cc | 2 +
gcc/fortran/gfortran.h | 3 +-
gcc/fortran/openmp.cc | 50 ++++++++++++++-
gcc/fortran/trans-decl.cc | 4 ++
.../gfortran.dg/gomp/declare-target-4.f90 | 2 +-
.../gomp/declare-target-indirect-1.f90 | 62 +++++++++++++++++++
.../gomp/declare-target-indirect-2.f90 | 25 ++++++++
.../declare-target-indirect-1.f90 | 39 ++++++++++++
.../declare-target-indirect-2.f90 | 53 ++++++++++++++++
.../declare-target-indirect-3.f90 | 35 +++++++++++
11 files changed, 272 insertions(+), 5 deletions(-)
create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-1.f90
create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-2.f90
create mode 100644 libgomp/testsuite/libgomp.fortran/declare-target-indirect-1.f90
create mode 100644 libgomp/testsuite/libgomp.fortran/declare-target-indirect-2.f90
create mode 100644 libgomp/testsuite/libgomp.fortran/declare-target-indirect-3.f90
diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 1563b810b98..7b154eb3ca7 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -914,6 +914,8 @@ show_attr (symbol_attribute *attr, const char * module)
fputs (" OMP-DECLARE-TARGET", dumpfile);
if (attr->omp_declare_target_link)
fputs (" OMP-DECLARE-TARGET-LINK", dumpfile);
+ if (attr->omp_declare_target_indirect)
+ fputs (" OMP-DECLARE-TARGET-INDIRECT", dumpfile);
if (attr->elemental)
fputs (" ELEMENTAL", dumpfile);
if (attr->pure)
diff --git a/gcc/fortran/f95-lang.cc b/gcc/fortran/f95-lang.cc
index 358cb17fce2..67fda27aa3e 100644
--- a/gcc/fortran/f95-lang.cc
+++ b/gcc/fortran/f95-lang.cc
@@ -96,6 +96,8 @@ static const attribute_spec gfc_gnu_attributes[] =
gfc_handle_omp_declare_target_attribute, NULL },
{ "omp declare target link", 0, 0, true, false, false, false,
gfc_handle_omp_declare_target_attribute, NULL },
+ { "omp declare target indirect", 0, 0, true, false, false, false,
+ gfc_handle_omp_declare_target_attribute, NULL },
{ "oacc function", 0, -1, true, false, false, false,
gfc_handle_omp_declare_target_attribute, NULL },
};
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index fd73e4ce431..fd843a3241d 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -999,6 +999,7 @@ typedef struct
/* Mentioned in OMP DECLARE TARGET. */
unsigned omp_declare_target:1;
unsigned omp_declare_target_link:1;
+ unsigned omp_declare_target_indirect:1;
ENUM_BITFIELD (gfc_omp_device_type) omp_device_type:2;
unsigned omp_allocate:1;
@@ -1584,7 +1585,7 @@ typedef struct gfc_omp_clauses
unsigned grainsize_strict:1, num_tasks_strict:1, compare:1, weak:1;
unsigned non_rectangular:1, order_concurrent:1;
unsigned contains_teams_construct:1, target_first_st_is_teams:1;
- unsigned contained_in_target_construct:1;
+ unsigned contained_in_target_construct:1, indirect:1;
ENUM_BITFIELD (gfc_omp_sched_kind) sched_kind:3;
ENUM_BITFIELD (gfc_omp_device_type) device_type:2;
ENUM_BITFIELD (gfc_omp_memorder) memorder:3;
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 0af80d54fad..30aba4421ff 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -1096,6 +1096,7 @@ enum omp_mask2
OMP_CLAUSE_DOACROSS, /* OpenMP 5.2 */
OMP_CLAUSE_ASSUMPTIONS, /* OpenMP 5.1. */
OMP_CLAUSE_USES_ALLOCATORS, /* OpenMP 5.0 */
+ OMP_CLAUSE_INDIRECT, /* OpenMP 5.1 */
/* This must come last. */
OMP_MASK2_LAST
};
@@ -2798,6 +2799,32 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
needs_space = true;
continue;
}
+ if ((mask & OMP_CLAUSE_INDIRECT)
+ && (m = gfc_match_dupl_check (!c->indirect, "indirect"))
+ != MATCH_NO)
+ {
+ if (m == MATCH_ERROR)
+ goto error;
+ gfc_expr *indirect_expr = NULL;
+ m = gfc_match (" ( %e )", &indirect_expr);
+ if (m == MATCH_YES)
+ {
+ if (!gfc_resolve_expr (indirect_expr)
+ || indirect_expr->ts.type != BT_LOGICAL
+ || indirect_expr->expr_type != EXPR_CONSTANT)
+ {
+ gfc_error ("INDIRECT clause at %C requires a constant "
+ "logical expression");
+ gfc_free_expr (indirect_expr);
+ goto error;
+ }
+ c->indirect = indirect_expr->value.logical;
+ gfc_free_expr (indirect_expr);
+ }
+ else
+ c->indirect = 1;
+ continue;
+ }
if ((mask & OMP_CLAUSE_IS_DEVICE_PTR)
&& gfc_match_omp_variable_list
("is_device_ptr (",
@@ -4460,7 +4487,7 @@ cleanup:
(omp_mask (OMP_CLAUSE_THREADS) | OMP_CLAUSE_SIMD)
#define OMP_DECLARE_TARGET_CLAUSES \
(omp_mask (OMP_CLAUSE_ENTER) | OMP_CLAUSE_LINK | OMP_CLAUSE_DEVICE_TYPE \
- | OMP_CLAUSE_TO)
+ | OMP_CLAUSE_TO | OMP_CLAUSE_INDIRECT)
#define OMP_ATOMIC_CLAUSES \
(omp_mask (OMP_CLAUSE_ATOMIC) | OMP_CLAUSE_CAPTURE | OMP_CLAUSE_HINT \
| OMP_CLAUSE_MEMORDER | OMP_CLAUSE_COMPARE | OMP_CLAUSE_FAIL \
@@ -5513,6 +5540,15 @@ gfc_match_omp_declare_target (void)
n->sym->name, &n->where);
n->sym->attr.omp_device_type = c->device_type;
}
+ if (c->indirect)
+ {
+ if (n->sym->attr.omp_device_type != OMP_DEVICE_TYPE_UNSET
+ && n->sym->attr.omp_device_type != OMP_DEVICE_TYPE_ANY)
+ gfc_error_now ("DEVICE_TYPE must be ANY when used with "
+ "INDIRECT at %L", &n->where);
+ n->sym->attr.omp_declare_target_indirect = c->indirect;
+ }
+
n->sym->mark = 1;
}
else if (n->u.common->omp_declare_target
@@ -5558,15 +5594,23 @@ gfc_match_omp_declare_target (void)
" TARGET directive to a different DEVICE_TYPE",
s->name, &n->where);
s->attr.omp_device_type = c->device_type;
+
+ if (c->indirect
+ && s->attr.omp_device_type != OMP_DEVICE_TYPE_UNSET
+ && s->attr.omp_device_type != OMP_DEVICE_TYPE_ANY)
+ gfc_error_now ("DEVICE_TYPE must be ANY when used with "
+ "INDIRECT at %L", &n->where);
+ s->attr.omp_declare_target_indirect = c->indirect;
}
}
- if (c->device_type
+ if ((c->device_type || c->indirect)
&& !c->lists[OMP_LIST_ENTER]
&& !c->lists[OMP_LIST_TO]
&& !c->lists[OMP_LIST_LINK])
gfc_warning_now (OPT_Wopenmp,
"OMP DECLARE TARGET directive at %L with only "
- "DEVICE_TYPE clause is ignored", &old_loc);
+ "DEVICE_TYPE or INDIRECT clauses is ignored",
+ &old_loc);
gfc_buffer_error (true);
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index de162f6cc75..6d463036966 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -1526,6 +1526,10 @@ add_attributes_to_decl (symbol_attribute sym_attr, tree list)
list = tree_cons (get_identifier ("omp declare target"),
clauses, list);
+ if (sym_attr.omp_declare_target_indirect)
+ list = tree_cons (get_identifier ("omp declare target indirect"),
+ clauses, list);
+
return list;
}
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-target-4.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-target-4.f90
index 4f5de4bd8c7..55534d8fe99 100644
--- a/gcc/testsuite/gfortran.dg/gomp/declare-target-4.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-target-4.f90
@@ -2,7 +2,7 @@
! { dg-additional-options "-fdump-tree-original" }
subroutine f1
- !$omp declare target device_type (any) ! { dg-warning "OMP DECLARE TARGET directive at .1. with only DEVICE_TYPE clause is ignored" }
+ !$omp declare target device_type (any) ! { dg-warning "OMP DECLARE TARGET directive at .1. with only DEVICE_TYPE or INDIRECT clauses is ignored" }
end subroutine
subroutine f2
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-1.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-1.f90
new file mode 100644
index 00000000000..504c1a29813
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-1.f90
@@ -0,0 +1,62 @@
+! { dg-do compile }
+! { dg-options "-fopenmp" }
+
+module m
+ integer :: a
+ integer, parameter :: X = 1
+ integer, parameter :: Y = 2
+
+ ! Indirect on a variable should have no effect.
+ integer :: z
+ !$omp declare target to (z) indirect
+contains
+ subroutine sub1
+ !$omp declare target indirect to (sub1)
+ end subroutine
+
+ subroutine sub2
+ !$omp declare target enter (sub2) indirect (.true.)
+ end subroutine
+
+ subroutine sub3
+ !$omp declare target to (sub3) indirect (.false.)
+ end subroutine
+
+ subroutine sub4
+ !$omp declare target to (sub4) indirect (1) ! { dg-error "INDIRECT clause at .1. requires a constant logical expression" }
+ end subroutine
+
+ ! Compile-time non-constant expressions are not allowed.
+ subroutine sub5
+ !$omp declare target indirect (a > 0) to (sub5) ! { dg-error "INDIRECT clause at .1. requires a constant logical expression" }
+ end subroutine
+
+ ! Compile-time constant expressions are permissible.
+ subroutine sub6
+ !$omp declare target indirect (X .eq. Y) to (sub6)
+ end subroutine
+
+ subroutine sub7
+ !$omp declare target indirect ! { dg-warning "OMP DECLARE TARGET directive at .1. with only DEVICE_TYPE or INDIRECT clauses is ignored" }
+ end subroutine
+
+ subroutine sub8
+ !$omp declare target indirect (.true.) indirect (.false.) to (sub8) ! { dg-error "Duplicated .indirect. clause at .1." }
+ end subroutine
+
+ subroutine sub9
+ !$omp declare target to (sub9) indirect ("abs") ! { dg-error "INDIRECT clause at .1. requires a constant logical expression" }
+ end subroutine
+
+ subroutine sub10
+ !$omp declare target to (sub10) indirect (5.5) ! { dg-error "INDIRECT clause at .1. requires a constant logical expression" }
+ end subroutine
+
+ subroutine sub11
+ !$omp declare target indirect (.true.) device_type (host) enter (sub11) ! { dg-error "DEVICE_TYPE must be ANY when used with INDIRECT at .1." }
+ end subroutine
+
+ subroutine sub12
+ !$omp declare target indirect (.false.) device_type (nohost) enter (sub12)
+ end subroutine
+end module
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-2.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-2.f90
new file mode 100644
index 00000000000..f6b3ae17856
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-target-indirect-2.f90
@@ -0,0 +1,25 @@
+! { dg-do compile }
+! { dg-options "-fopenmp -fdump-tree-gimple" }
+
+module m
+contains
+ subroutine sub1
+ !$omp declare target indirect enter (sub1)
+ end subroutine
+ ! { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target, omp declare target indirect\\\)\\\)\\\n.*\\\nvoid sub1" "gimple" } }
+
+ subroutine sub2
+ !$omp declare target indirect (.false.) to (sub2)
+ end subroutine
+ ! { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target\\\)\\\)\\\n.*\\\nvoid sub2" "gimple" } }
+
+ subroutine sub3
+ !$omp declare target indirect (.true.) to (sub3)
+ end subroutine
+ ! { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target, omp declare target indirect\\\)\\\)\\\n.*\\\nvoid sub3" "gimple" } }
+
+ subroutine sub4
+ !$omp declare target indirect (.false.) enter (sub4)
+ end subroutine
+ ! { dg-final { scan-tree-dump "__attribute__\\\(\\\(omp declare target\\\)\\\)\\\n.*\\\nvoid sub4" "gimple" } }
+end module
diff --git a/libgomp/testsuite/libgomp.fortran/declare-target-indirect-1.f90 b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-1.f90
new file mode 100644
index 00000000000..39a91dfcdca
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-1.f90
@@ -0,0 +1,39 @@
+! { dg-do run }
+
+module m
+contains
+ integer function foo ()
+ !$omp declare target to (foo) indirect
+ foo = 5
+ end function
+
+ integer function bar ()
+ !$omp declare target to (bar) indirect
+ bar = 8
+ end function
+
+ integer function baz ()
+ !$omp declare target to (baz) indirect
+ baz = 11
+ end function
+end module
+
+program main
+ use m
+ implicit none
+
+ integer :: x, expected
+ procedure (foo), pointer :: foo_ptr, bar_ptr, baz_ptr
+
+ foo_ptr => foo
+ bar_ptr => bar
+ baz_ptr => baz
+
+ expected = foo () + bar () + baz ()
+
+ !$omp target map (to: foo_ptr, bar_ptr, baz_ptr) map (from: x)
+ x = foo_ptr () + bar_ptr () + baz_ptr ()
+ !$omp end target
+
+ stop x - expected
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/declare-target-indirect-2.f90 b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-2.f90
new file mode 100644
index 00000000000..d3baa81dd07
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-2.f90
@@ -0,0 +1,53 @@
+! { dg-do run }
+
+module m
+contains
+ integer function foo ()
+ !$omp declare target to (foo) indirect
+ foo = 5
+ end function
+
+ integer function bar ()
+ !$omp declare target to (bar) indirect
+ bar = 8
+ end function
+
+ integer function baz ()
+ !$omp declare target to (baz) indirect
+ baz = 11
+ end function
+end module
+
+program main
+ use m
+ implicit none
+
+ type fp
+ procedure (foo), pointer, nopass :: f => null ()
+ end type
+
+ integer, parameter :: N = 256
+ integer :: i, x = 0, expected = 0;
+ type (fp) :: fn_ptr (N)
+
+ do i = 1, N
+ select case (mod (i, 3))
+ case (0)
+ fn_ptr (i)%f => foo
+ case (1)
+ fn_ptr (i)%f => bar
+ case (2)
+ fn_ptr (i)%f => baz
+ end select
+ expected = expected + fn_ptr (i)%f ()
+ end do
+
+ !$omp target teams distribute parallel do &
+ !$omp & reduction(+: x) map (to: fn_ptr) map (tofrom: x)
+ do i = 1, N
+ x = x + fn_ptr (i)%f ()
+ end do
+ !$omp end target teams distribute parallel do
+
+ stop x - expected
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/declare-target-indirect-3.f90 b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-3.f90
new file mode 100644
index 00000000000..00f33bd1170
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-3.f90
@@ -0,0 +1,35 @@
+! { dg-do run }
+
+! Check that indirect calls work on procedures passed in via a dummy argument
+
+module m
+ integer, parameter :: offset = 123
+contains
+ function bar(x)
+ !$omp declare target enter (bar) indirect
+ integer :: bar
+ integer, intent(in) :: x
+ bar = x + offset
+ end function
+
+ function foo(f, x)
+ integer :: foo
+ procedure(bar) :: f
+ integer, intent(in) :: x
+
+ !$omp target map (to: x) map (from: foo)
+ foo = f(x)
+ !$omp end target
+ end function
+end module
+
+program main
+ use m
+ implicit none
+
+ integer :: a = 321
+ integer :: b
+
+ b = foo(bar, a)
+ stop b - (a + offset)
+end program
--
2.34.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] openmp, fortran: Add Fortran support for indirect clause on the declare target directive
2024-02-06 9:50 ` Kwok Cheung Yeung
@ 2024-02-12 8:51 ` Tobias Burnus
2024-02-15 21:37 ` [COMMITTED] libgomp: Update documentation for indirect calls in target regions Kwok Cheung Yeung
0 siblings, 1 reply; 28+ messages in thread
From: Tobias Burnus @ 2024-02-12 8:51 UTC (permalink / raw)
To: Kwok Cheung Yeung; +Cc: Jakub Jelinek, gcc-patches, fortran
Hi Kwok,
Kwok Cheung Yeung wrote:
> Oops. I thought exactly the same thing yesterday, but forgot to add
> the changes to my commit! Here is the updated version.
I regard(ed) this change as obvious - hence, I missed to reply.
But for completeness: LGTM.
I think it would be useful to commit this now with an xfail
for the one failing testcase that depends on the review-pending libgomp
patch.
I mean something like:
--- a/libgomp/testsuite/libgomp.fortran/declare-target-indirect-2.f90
+++ b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-2.f90
@@ -1,2 +1,3 @@
! { dg-do run }
+! { dg-xfail-run-if "Requires libgomp bug fix pending review" { offload_device } }
Thanks,
Tobias
> On 06/02/2024 9:03 am, Tobias Burnus wrote:
>> LGTM. I just wonder whether there should be a value test and not just
>> a does-not-crash-when-called test for the latter testcase, i.e.
>>
>>
>>> +++ b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-3.f90
>>> @@ -0,0 +1,25 @@
>>> +! { dg-do run }
>>> +
>>> +! Check that indirect calls work on procedures passed in via a
>>> dummy argument
>>> +
>>> +module m
>>> +contains
>>> + subroutine bar
>>> + !$omp declare target enter(bar) indirect
>> e.g. "integer function bar()" ... " bar = 42"
>>> + end subroutine
>>> +
>>> + subroutine foo(f)
>>> + procedure(bar) :: f
>>> +
>>> + !$omp target
>>> + call f
>> And then: if (f() /= 42) stop 1
>>> + !$omp end target
>>> + end subroutine
>>> +end module
>>
>> Thanks,
>>
>> Tobias
>>
^ permalink raw reply [flat|nested] 28+ messages in thread
* [COMMITTED] libgomp: Update documentation for indirect calls in target regions
2024-02-12 8:51 ` Tobias Burnus
@ 2024-02-15 21:37 ` Kwok Cheung Yeung
0 siblings, 0 replies; 28+ messages in thread
From: Kwok Cheung Yeung @ 2024-02-15 21:37 UTC (permalink / raw)
To: Tobias Burnus; +Cc: Jakub Jelinek, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 169 bytes --]
Hi,
I have committed this patch to the libgomp documentation to reflect that
indirect calls in offloaded target regions are now supported in C, C++
and Fortran.
Kwok
[-- Attachment #2: 0001-libgomp-Update-documentation-for-indirect-calls-in-t.patch --]
[-- Type: text/plain, Size: 1944 bytes --]
From b3b3bd250f0a7c22b7d46d3522c8b94c6a35d22a Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <kcyeung@baylibre.com>
Date: Thu, 15 Feb 2024 21:22:26 +0000
Subject: [PATCH] libgomp: Update documentation for indirect calls in target
regions
Support for indirect calls to procedures/functions in offloaded target
regions is now available for C, C++ and Fortran.
2024-02-15 Kwok Cheung Yeung <kcyeung@baylibre.com>
libgomp/
* libgomp.texi (OpenMP 5.1): Mark indirect call support as fully
implemented.
---
libgomp/libgomp.texi | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 6ee923099b7..f57190f203c 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -313,7 +313,7 @@ The OpenMP 4.5 specification is fully supported.
@item Iterators in @code{target update} motion clauses and @code{map}
clauses @tab N @tab
@item Indirect calls to the device version of a procedure or function in
- @code{target} regions @tab P @tab Only C and C++
+ @code{target} regions @tab Y @tab
@item @code{interop} directive @tab N @tab
@item @code{omp_interop_t} object support in runtime routines @tab N @tab
@item @code{nowait} clause in @code{taskwait} directive @tab Y @tab
@@ -362,7 +362,7 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
@item For Fortran, diagnose placing declarative before/between @code{USE},
@code{IMPORT}, and @code{IMPLICIT} as invalid @tab N @tab
@item Optional comma between directive and clause in the @code{#pragma} form @tab Y @tab
-@item @code{indirect} clause in @code{declare target} @tab P @tab Only C and C++
+@item @code{indirect} clause in @code{declare target} @tab Y @tab
@item @code{device_type(nohost)}/@code{device_type(host)} for variables @tab N @tab
@item @code{present} modifier to the @code{map}, @code{to} and @code{from}
clauses @tab Y @tab
--
2.34.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls
2024-01-29 17:48 ` [PATCH v2] " Kwok Cheung Yeung
@ 2024-03-08 13:40 ` Thomas Schwinge
2024-03-14 11:38 ` Tobias Burnus
1 sibling, 0 replies; 28+ messages in thread
From: Thomas Schwinge @ 2024-03-08 13:40 UTC (permalink / raw)
To: Kwok Cheung Yeung, Jakub Jelinek, Tobias Burnus; +Cc: gcc-patches
Hi!
On 2024-01-29T17:48:47+0000, Kwok Cheung Yeung <kcyeung@baylibre.com> wrote:
> A splay-tree was previously used to lookup equivalent target addresses
> for a given host address on offload targets. However, as splay-trees can
> modify their structure on lookup, they are not suitable for concurrent
> access from separate teams/threads without some form of locking.
Heh. ,-)
> This
> patch changes the lookup data structure to a hashtab instead, which does
> not have these issues.
(I've not looked into which data structure is most suitable here; not my
area of expertise.)
> The call to build_indirect_map to initialize the data structure is now
> called from just the first thread of the first team to avoid redundant
> calls to this function.
ACK, and also you've removed a number of 'volatile's, as I had questioned
earlier. It remains open the question when to do the initialization, and
how to react to dynamic device image load and unload, and possibly other
(but not many?) raised during review.
I cannot formally approve this patch, but it seems a good incremental
step forward to me: per my testing so far,
(a) 'libgomp.c-c++-common/declare-target-indirect-2.c' is all-PASS,
with 'warning: this statement may fall through' resolved, and
(b) for 'libgomp.fortran/declare-target-indirect-2.f90': no more timeouts
(applies to nvptx only), and all-PASS execution test (both GCN, nvptx):
PASS: libgomp.fortran/declare-target-indirect-2.f90 -O0 (test for excess errors)
[-WARNING: libgomp.fortran/declare-target-indirect-2.f90 -O0 execution test program timed out.-]
[-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90 -O0 execution test
PASS: libgomp.fortran/declare-target-indirect-2.f90 -O1 (test for excess errors)
[-WARNING: libgomp.fortran/declare-target-indirect-2.f90 -O1 execution test program timed out.-]
[-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90 -O1 execution test
PASS: libgomp.fortran/declare-target-indirect-2.f90 -O2 (test for excess errors)
[-WARNING: libgomp.fortran/declare-target-indirect-2.f90 -O2 execution test program timed out.-]
[-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90 -O2 execution test
PASS: libgomp.fortran/declare-target-indirect-2.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors)
[-WARNING: libgomp.fortran/declare-target-indirect-2.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test program timed out.-]
[-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test
PASS: libgomp.fortran/declare-target-indirect-2.f90 -O3 -g (test for excess errors)
[-WARNING: libgomp.fortran/declare-target-indirect-2.f90 -O3 -g execution test program timed out.-]
[-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90 -O3 -g execution test
PASS: libgomp.fortran/declare-target-indirect-2.f90 -Os (test for excess errors)
[-WARNING: libgomp.fortran/declare-target-indirect-2.f90 -Os execution test program timed out.-]
[-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90 -Os execution test
(Of course, the patch now needs un-XFAILing of
'libgomp.fortran/declare-target-indirect-2.f90' merged in.)
Grüße
Thomas
> libgomp/
> * config/accel/target-indirect.c: Include string.h and hashtab.h.
> Remove include of splay-tree.h. Update comments.
> (splay_tree_prefix, splay_tree_c): Delete.
> (struct indirect_map_t): New.
> (hash_entry_type, htab_alloc, htab_free, htab_hash, htab_eq): New.
> (GOMP_INDIRECT_ADD_MAP): Remove volatile qualifier.
> (USE_SPLAY_TREE_LOOKUP): Rename to...
> (USE_HASHTAB_LOOKUP): ..this.
> (indirect_map, indirect_array): Delete.
> (indirect_htab): New.
> (build_indirect_map): Remove locking. Build indirect map using
> hashtab.
> (GOMP_target_map_indirect_ptr): Use indirect_htab to lookup target
> address.
> (GOMP_target_map_indirect_ptr): Remove volatile qualifier.
> * config/gcn/team.c (gomp_gcn_enter_kernel): Call build_indirect_map
> from first thread of first team only.
> * config/nvptx/team.c (gomp_nvptx_main): Likewise.
> * testsuite/libgomp.c-c++-common/declare-target-indirect-2.c (main):
> Add missing break statements.
> ---
> libgomp/config/accel/target-indirect.c | 83 ++++++++++---------
> libgomp/config/gcn/team.c | 7 +-
> libgomp/config/nvptx/team.c | 9 +-
> .../declare-target-indirect-2.c | 14 ++--
> 4 files changed, 63 insertions(+), 50 deletions(-)
>
> diff --git a/libgomp/config/accel/target-indirect.c b/libgomp/config/accel/target-indirect.c
> index c60fd547cb6..cfef1ddbc49 100644
> --- a/libgomp/config/accel/target-indirect.c
> +++ b/libgomp/config/accel/target-indirect.c
> @@ -25,60 +25,73 @@
> <http://www.gnu.org/licenses/>. */
>
> #include <assert.h>
> +#include <string.h>
> #include "libgomp.h"
>
> -#define splay_tree_prefix indirect
> -#define splay_tree_c
> -#include "splay-tree.h"
> +struct indirect_map_t
> +{
> + void *host_addr;
> + void *target_addr;
> +};
> +
> +typedef struct indirect_map_t *hash_entry_type;
> +
> +static inline void * htab_alloc (size_t size) { return gomp_malloc (size); }
> +static inline void htab_free (void *ptr) { free (ptr); }
> +
> +#include "hashtab.h"
> +
> +static inline hashval_t
> +htab_hash (hash_entry_type element)
> +{
> + return hash_pointer (element->host_addr);
> +}
>
> -volatile void **GOMP_INDIRECT_ADDR_MAP = NULL;
> +static inline bool
> +htab_eq (hash_entry_type x, hash_entry_type y)
> +{
> + return x->host_addr == y->host_addr;
> +}
>
> -/* Use a splay tree to lookup the target address instead of using a
> - linear search. */
> -#define USE_SPLAY_TREE_LOOKUP
> +void **GOMP_INDIRECT_ADDR_MAP = NULL;
>
> -#ifdef USE_SPLAY_TREE_LOOKUP
> +/* Use a hashtab to lookup the target address instead of using a linear
> + search. */
> +#define USE_HASHTAB_LOOKUP
>
> -static struct indirect_splay_tree_s indirect_map;
> -static indirect_splay_tree_node indirect_array = NULL;
> +#ifdef USE_HASHTAB_LOOKUP
>
> -/* Build the splay tree used for host->target address lookups. */
> +static htab_t indirect_htab = NULL;
> +
> +/* Build the hashtab used for host->target address lookups. */
>
> void
> build_indirect_map (void)
> {
> size_t num_ind_funcs = 0;
> - volatile void **map_entry;
> - static int lock = 0; /* == gomp_mutex_t lock; gomp_mutex_init (&lock); */
> + void **map_entry;
>
> if (!GOMP_INDIRECT_ADDR_MAP)
> return;
>
> - gomp_mutex_lock (&lock);
> -
> - if (!indirect_array)
> + if (!indirect_htab)
> {
> /* Count the number of entries in the NULL-terminated address map. */
> for (map_entry = GOMP_INDIRECT_ADDR_MAP; *map_entry;
> map_entry += 2, num_ind_funcs++);
>
> - /* Build splay tree for address lookup. */
> - indirect_array = gomp_malloc (num_ind_funcs * sizeof (*indirect_array));
> - indirect_splay_tree_node array = indirect_array;
> + /* Build hashtab for address lookup. */
> + indirect_htab = htab_create (num_ind_funcs);
> map_entry = GOMP_INDIRECT_ADDR_MAP;
>
> - for (int i = 0; i < num_ind_funcs; i++, array++)
> + for (int i = 0; i < num_ind_funcs; i++, map_entry += 2)
> {
> - indirect_splay_tree_key k = &array->key;
> - k->host_addr = (uint64_t) *map_entry++;
> - k->target_addr = (uint64_t) *map_entry++;
> - array->left = NULL;
> - array->right = NULL;
> - indirect_splay_tree_insert (&indirect_map, array);
> + struct indirect_map_t element = { *map_entry, NULL };
> + hash_entry_type *slot = htab_find_slot (&indirect_htab, &element,
> + INSERT);
> + *slot = (hash_entry_type) map_entry;
> }
> }
> -
> - gomp_mutex_unlock (&lock);
> }
>
> void *
> @@ -88,15 +101,11 @@ GOMP_target_map_indirect_ptr (void *ptr)
> if (!ptr)
> return ptr;
>
> - assert (indirect_array);
> -
> - struct indirect_splay_tree_key_s k;
> - indirect_splay_tree_key node = NULL;
> -
> - k.host_addr = (uint64_t) ptr;
> - node = indirect_splay_tree_lookup (&indirect_map, &k);
> + assert (indirect_htab);
>
> - return node ? (void *) node->target_addr : ptr;
> + struct indirect_map_t element = { ptr, NULL };
> + hash_entry_type entry = htab_find (indirect_htab, &element);
> + return entry ? entry->target_addr : ptr;
> }
>
> #else
> @@ -115,7 +124,7 @@ GOMP_target_map_indirect_ptr (void *ptr)
>
> assert (GOMP_INDIRECT_ADDR_MAP);
>
> - for (volatile void **map_entry = GOMP_INDIRECT_ADDR_MAP; *map_entry;
> + for (void **map_entry = GOMP_INDIRECT_ADDR_MAP; *map_entry;
> map_entry += 2)
> if (*map_entry == ptr)
> return (void *) *(map_entry + 1);
> diff --git a/libgomp/config/gcn/team.c b/libgomp/config/gcn/team.c
> index 61e9c616b67..bd3df448b52 100644
> --- a/libgomp/config/gcn/team.c
> +++ b/libgomp/config/gcn/team.c
> @@ -52,14 +52,15 @@ gomp_gcn_enter_kernel (void)
> {
> int threadid = __builtin_gcn_dim_pos (1);
>
> - /* Initialize indirect function support. */
> - build_indirect_map ();
> -
> if (threadid == 0)
> {
> int numthreads = __builtin_gcn_dim_size (1);
> int teamid = __builtin_gcn_dim_pos(0);
>
> + /* Initialize indirect function support. */
> + if (teamid == 0)
> + build_indirect_map ();
> +
> /* Set up the global state.
> Every team will do this, but that should be harmless. */
> gomp_global_icv.nthreads_var = 16;
> diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c
> index 0cf5dad39ca..d5361917a24 100644
> --- a/libgomp/config/nvptx/team.c
> +++ b/libgomp/config/nvptx/team.c
> @@ -60,9 +60,6 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data)
> asm ("mov.u32 %0, %%tid.y;" : "=r" (tid));
> asm ("mov.u32 %0, %%ntid.y;" : "=r" (ntids));
>
> - /* Initialize indirect function support. */
> - build_indirect_map ();
> -
> if (tid == 0)
> {
> gomp_global_icv.nthreads_var = ntids;
> @@ -74,6 +71,12 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data)
> nvptx_thrs = alloca (ntids * sizeof (*nvptx_thrs));
> memset (nvptx_thrs, 0, ntids * sizeof (*nvptx_thrs));
>
> + /* Initialize indirect function support. */
> + unsigned int block_id;
> + asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id));
> + if (block_id == 0)
> + build_indirect_map ();
> +
> /* Find the low-latency heap details .... */
> uint32_t *shared_pool;
> uint32_t shared_pool_size = 0;
> diff --git a/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
> index 9fe190efce8..545f1a9fcbf 100644
> --- a/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
> +++ b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c
> @@ -17,17 +17,17 @@ int main (void)
> {
> switch (i % 3)
> {
> - case 0: fn_ptr[i] = &foo;
> - case 1: fn_ptr[i] = &bar;
> - case 2: fn_ptr[i] = &baz;
> + case 0: fn_ptr[i] = &foo; break;
> + case 1: fn_ptr[i] = &bar; break;
> + case 2: fn_ptr[i] = &baz; break;
> }
> expected += (*fn_ptr[i]) ();
> }
>
> -#pragma omp target teams distribute parallel for reduction(+: x) \
> - map (to: fn_ptr) map (tofrom: x)
> - for (int i = 0; i < N; i++)
> - x += (*fn_ptr[i]) ();
> + #pragma omp target teams distribute parallel for \
> + reduction (+: x) map (to: fn_ptr) map (tofrom: x)
> + for (int i = 0; i < N; i++)
> + x += (*fn_ptr[i]) ();
>
> return x - expected;
> }
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls
2024-01-29 17:48 ` [PATCH v2] " Kwok Cheung Yeung
2024-03-08 13:40 ` Thomas Schwinge
@ 2024-03-14 11:38 ` Tobias Burnus
1 sibling, 0 replies; 28+ messages in thread
From: Tobias Burnus @ 2024-03-14 11:38 UTC (permalink / raw)
To: Kwok Cheung Yeung, gcc-patches; +Cc: Jakub Jelinek, Thomas Schwinge
Hi Kwok,
On January 22, 2024, Kwok Cheung Yeung wrote:
> There was a bug in the declare-target-indirect-2.c libgomp testcase
> (testing indirect calls in offloaded target regions, spread over
> multiple teams/threads) that due to an errant fallthrough in a switch
> statement resulted in only one indirect function ever getting called:
(When applying, also the 'dg-xfail-run-if' needs to be removed from
libgomp.fortran/declare-target-indirect-2.f90) ...
> However, when the missing break statements are added, the testcase
> fails with an invalid memory access. Upon investigation, this is due
> to the use of a splay-tree as the lookup structure for indirect
> addresses, as the splay-tree moves frequently accessed elements closer
> to the root node and so needs locking when used from multiple threads.
> However, this would end up partially serialising all the threads and
> kill performance. I have switched the lookup structure from a splay
> tree to a hashtab instead to avoid locking during lookup.
>
> I have also tidied up the initialisation of the lookup table by
> calling it only from the first thread of the first team, instead of
> redundantly calling it from every thread and only having the first one
> reached do the initialisation. This removes the need for locking
> during initialisation.
LGTM - except of the following, which we need to solve
(as suggested or differently (locking, or ...) or
by declaring it a nonissue (e.g. because of thinko of mine).
Thoughts about the following?
* * *
Namely, I wonder whether there will be an issue for
#pragma target nowait
...
#pragma target
...
Once the kernel is started, thegcn_expand_prologue creates some setup code and then a call to
gomp_gcn_enter_kernel. Likewise for gcc/config/nvptx/nvptx.cc, where
nvptx_declare_function_name adds via write_omp_entry a call to
gomp_nvptx_main. And one of the first tasks there is 'build_indirect_map'. Assume a very simple kernel for the second item (i.e. it is quickly started)
and a very large number of reverse kernels.
Now, I wonder whether it is possible to have a race between the two kernels;
it seems as if that might happen but is extremely unlikely accounting for all
the overhead of launching and the rather small list of reverse offload items.
As it is unlikely, I wonder whether doing the following lock free, opportunistic
approach will be the best solution. Namely, assuming that no other kernel updates
the hash, but if that happens by chance, use the one that was created first.
(If we are lucky, the atomic overhead is fully cancelled by using a local
variable in the function but neither should matter much.)
if (!indirect_htab) // or: __atomic_load_n (&indirect_htab, __ATOMIC_RELAXED) ?
{
htab_t local_indirect_htab = htab_create (num_ind_funcs);
...
htab_t expected = NULL;
__atomic_compare_exchange_n (&indirect_htab, &expected,
local_indirect_htab, false, ...);
if (expected) // Other kernel was faster, drop our version
htab_free (local_indirect_htab);
}
On January 29, 2024, Kwok Cheung Yeung wrote:
>> Can you please akso update the comments to talk about hashtab instead
>> of splay?
> This version has the comments updated and removes a stray 'volatile'
> in the #ifdefed out code.
Thanks,
Tobias
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++
2023-11-13 11:47 ` Tobias Burnus
@ 2024-04-11 10:10 ` Thomas Schwinge
0 siblings, 0 replies; 28+ messages in thread
From: Thomas Schwinge @ 2024-04-11 10:10 UTC (permalink / raw)
To: Tobias Burnus, Kwok Cheung Yeung; +Cc: gcc-patches, Jakub Jelinek
Hi!
I've filed <https://gcc.gnu.org/PR114690>
"OpenMP 'indirect' clause: dynamic image loading/unloading" for the
following issue:
On 2023-11-13T12:47:04+0100, Tobias Burnus <tobias@codesourcery.com> wrote:
> On 13.11.23 11:59, Thomas Schwinge wrote:
>>>> Also, for my understanding: why is 'build_indirect_map' done at kernel
>>>> invocation time (here) instead of at image load time?
>>> The splay_tree is generated on the device itself - and we currently do
>>> not start a kernel during GOMP_OFFLOAD_load_image. We could, the
>>> question is whether it makes sense. (Generating the splay_tree on the
>>> host for the device is a hassle and error prone as it needs to use
>>> device pointers at the end.)
>> Hmm. It seems conceptually cleaner to me to set this up upfront, and
>> avoids potentially slowing down every device kernel invocation (at least
>> another function call, and 'gomp_mutex_lock' check). Though, I agree
>> this may be "in the noise" with regards to all the other stuff going on
>> in 'gomp_gcn_enter_kernel' and elsewhere...
>
> I think the most common case is GOMP_INDIRECT_ADDR_MAP == NULL.
>
> The question is whether the lock should/could be moved inside if (!indirect_array)
> or not. Probably yes:
> * doing an atomic load for the outer '!indirect array', work on a local array for
> the build up and only assign it at the end - and just after the lock check again
> whether '!indirect array'.
>
> That way, it is lock free once build but when build there is no race.
>
>> What I just realize, what's also unclear to me is how the current
>> implementation works with regards to several images getting loaded --
>> don't we then overwrite 'GOMP_INDIRECT_ADDR_MAP' instead of
>> (conceptually) appending to it?
>
> Yes, I think that will happen - but it looks as if the same issue exists
> also the other code? I think that's not the first variable that has that
> issue?
>
> I think we should try to cleanup that handling, also to support calling
> a device function in a shared library from a target region in the main
> program, which currently also fails.
>
> All device routines that are in normal static libraries and in the
> object files of the main program should simply work thanks to offload
> LTO such that there is only a single GOMP_offload_register_ver call (per
> device type) and GOMP_OFFLOAD_load_image call (per device).
>
> Likewise if the offloading is only done via a single shared library. —
> Any mixing will currently fail, unfortunately. This patch just adds
> another item which does not handle it properly.
>
> (Not good but IMHO also not a showstopper for this patch.)
>
>> In the general case, additional images may also get loaded during
>> execution. We thus need proper locking of the shared data structure, uh?
>> Or, can we have separate on-device data structures per image? (I've not
>> yet thought about that in detail.)
>
> I think we could - but in the main-program 'omp target' case that calls
> a shared-library 'declare target' function means that we need to handle
> multiple GOMP_offload_register_ver / load_image calls such that they can
> work together.
>
> Obviously, it gets harder if the user keeps doing dlopen() / dlclose()
> of libraries containing offload code where a target/compute region is
> run before, between, and after those calls (but hopefully not running
> when calling dlopen/dlclose).
>
>> Relatedly then, when images are unloaded, we also need to remove stale
>> items from the table, and release resources (for example, the
>> 'GOMP_OFFLOAD_alloc' for 'map_target_addr').
>
> True. I think the general assumption is that images only get unloaded at
> the very end, which matches most but not all code. Yet another work item.
>
> I think we should open a new PR about this topic and collect work items
> there.
Grüße
Thomas
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2024-04-11 10:10 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-08 13:13 [PATCH] openmp: Add support for the 'indirect' clause in C/C++ Kwok Cheung Yeung
2023-10-17 13:12 ` Tobias Burnus
2023-10-17 13:34 ` Jakub Jelinek
2023-10-17 14:41 ` Tobias Burnus
2023-11-03 19:53 ` Kwok Cheung Yeung
2023-11-06 8:48 ` Tobias Burnus
2023-11-07 21:37 ` Joseph Myers
2023-11-07 21:51 ` Jakub Jelinek
2023-11-07 21:59 ` Kwok Cheung Yeung
2023-11-09 12:24 ` Thomas Schwinge
2023-11-09 16:00 ` Tobias Burnus
2023-11-13 10:59 ` Thomas Schwinge
2023-11-13 11:47 ` Tobias Burnus
2024-04-11 10:10 ` Thomas Schwinge
2024-01-03 14:47 ` [committed] " Kwok Cheung Yeung
2024-01-03 15:54 ` Kwok Cheung Yeung
2024-01-22 20:33 ` [PATCH] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls Kwok Cheung Yeung
2024-01-24 7:06 ` rep.dot.nop
2024-01-29 17:48 ` [PATCH v2] " Kwok Cheung Yeung
2024-03-08 13:40 ` Thomas Schwinge
2024-03-14 11:38 ` Tobias Burnus
2024-01-22 20:41 ` [PATCH] openmp, fortran: Add Fortran support for indirect clause on the declare target directive Kwok Cheung Yeung
2024-01-23 19:14 ` Tobias Burnus
2024-02-05 21:37 ` [PATCH v2] " Kwok Cheung Yeung
2024-02-06 9:03 ` Tobias Burnus
2024-02-06 9:50 ` Kwok Cheung Yeung
2024-02-12 8:51 ` Tobias Burnus
2024-02-15 21:37 ` [COMMITTED] libgomp: Update documentation for indirect calls in target regions Kwok Cheung Yeung
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).