* [PATCH] Add attribute((target_clone(...))) to PowerPC
@ 2017-05-25 18:54 Michael Meissner
2017-05-25 20:05 ` Florian Weimer
0 siblings, 1 reply; 13+ messages in thread
From: Michael Meissner @ 2017-05-25 18:54 UTC (permalink / raw)
To: GCC Patches, Segher Boessenkool, David Edelsohn
This patch adds the initial attribute((target_clone(...))) support to the
PowerPC. It looks at the HWCAP bits for ISA 2.05 (power6), ISA 2.06 (power7),
ISA 2.07 (power8) and ISA 3.0 (power9) to determine which clone function to
run. The implementation used the existing i386/x86_64 support for target_clone
as a template.
At the moment, it has the same basic flaw that the i386/x86_64 implementation
has, which is outside of the current module, the default version of the
function is exported. It is only in the module that the function is defined in
that supports calling the different target clones. I hope to add support in
the future to make the exported function be the ifunc handler and not the
default version. However, I wanted to get the basic framework into the
compiler before tackling that issue.
I have tested these patches on a little endian power8 system and there were no
regressions. Can I install it into the trunk?
[gcc]
2017-05-24 Michael Meissner <meissner@linux.vnet.ibm.com>
* config/rs6000/rs6000.c (toplevel): Include attribs.h.
(enum clone_list): New enumeration to give the target clones
processors we generate code for.
(rs6000_clone_map): New array to identify which clone processors
the current program is running on.
(TARGET_COMPARE_VERSION_PRIORITY): Define to enable the
target_clone attribute.
(TARGET_GENERATE_VERSION_DISPATCHER_BODY): Likewise.
(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): Likewise.
(TARGET_OPTION_FUNCTION_VERSIONS): Likewise.
(cpu_expand_builtin): Add support for target_clone attribute.
(rs6000_valid_attribute_p): Allow "default" attribute.
(get_decl_name): New debug function to simplify printing the
current function name in debugging statements.
(rs6000_clone_priority): New functions to support the target_clone
attribute, and be able to generate code to switch between ISA 2.05
through ISA 3.0 (power6 through power9).
(rs6000_compare_version_priority): Likewise.
(rs6000_get_function_versions_dispatcher): Likewise.
(make_resolver_func): Likewise.
(add_condition_to_bb): Likewise.
(dispatch_function_versions): Likewise.
(rs6000_generate_version_dispatcher_body): Likewise.
(rs6000_can_inline_p): Call get_decl_name for debugging usage.
* doc/extend.texi (Common Function Attributes): Document that the
PowerPC supports the target_clone attribute.
[gcc/testsuite]
2017-05-24 Michael Meissner <meissner@linux.vnet.ibm.com>
* gcc.target/powerpc/clone1.c: New test.
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Add attribute((target_clone(...))) to PowerPC
2017-05-25 18:54 [PATCH] Add attribute((target_clone(...))) to PowerPC Michael Meissner
@ 2017-05-25 20:05 ` Florian Weimer
2017-05-25 20:18 ` Michael Meissner
0 siblings, 1 reply; 13+ messages in thread
From: Florian Weimer @ 2017-05-25 20:05 UTC (permalink / raw)
To: Michael Meissner, GCC Patches, Segher Boessenkool, David Edelsohn
On Thu, May 25, 2017 at 8:25 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This patch adds the initial attribute((target_clone(...))) support to the
Patch seems to be missing.
Florian
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Add attribute((target_clone(...))) to PowerPC
2017-05-25 20:05 ` Florian Weimer
@ 2017-05-25 20:18 ` Michael Meissner
2017-05-30 22:04 ` Segher Boessenkool
0 siblings, 1 reply; 13+ messages in thread
From: Michael Meissner @ 2017-05-25 20:18 UTC (permalink / raw)
To: Florian Weimer, GCC Patches, Segher Boessenkool,
Michael Meissner, David Edelsohn, Bill Schmidt
[-- Attachment #1: Type: text/plain, Size: 2919 bytes --]
On Thu, May 25, 2017 at 09:56:20PM +0200, Florian Weimer wrote:
> On Thu, May 25, 2017 at 8:25 PM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
> > This patch adds the initial attribute((target_clone(...))) support to the
>
> Patch seems to be missing.
>
> Florian
>
Sorry about that.
This patch adds the initial attribute((target_clone(...))) support to the
PowerPC. It looks at the HWCAP bits for ISA 2.05 (power6), ISA 2.06 (power7),
ISA 2.07 (power8) and ISA 3.0 (power9) to determine which clone function to
run. The implementation used the existing i386/x86_64 support for target_clone
as a template.
At the moment, it has the same basic flaw that the i386/x86_64 implementation
has, which is outside of the current module, the default version of the
function is exported. It is only in the module that the function is defined in
that supports calling the different target clones. I hope to add support in
the future to make the exported function be the ifunc handler and not the
default version. However, I wanted to get the basic framework into the
compiler before tackling that issue.
I have tested these patches on a little endian power8 system and there were no
regressions. Can I install it into the trunk?
[gcc]
2017-05-24 Michael Meissner <meissner@linux.vnet.ibm.com>
* config/rs6000/rs6000.c (toplevel): Include attribs.h.
(enum clone_list): New enumeration to give the target clones
processors we generate code for.
(rs6000_clone_map): New array to identify which clone processors
the current program is running on.
(TARGET_COMPARE_VERSION_PRIORITY): Define to enable the
target_clone attribute.
(TARGET_GENERATE_VERSION_DISPATCHER_BODY): Likewise.
(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): Likewise.
(TARGET_OPTION_FUNCTION_VERSIONS): Likewise.
(cpu_expand_builtin): Add support for target_clone attribute.
(rs6000_valid_attribute_p): Allow "default" attribute.
(get_decl_name): New debug function to simplify printing the
current function name in debugging statements.
(rs6000_clone_priority): New functions to support the target_clone
attribute, and be able to generate code to switch between ISA 2.05
through ISA 3.0 (power6 through power9).
(rs6000_compare_version_priority): Likewise.
(rs6000_get_function_versions_dispatcher): Likewise.
(make_resolver_func): Likewise.
(add_condition_to_bb): Likewise.
(dispatch_function_versions): Likewise.
(rs6000_generate_version_dispatcher_body): Likewise.
(rs6000_can_inline_p): Call get_decl_name for debugging usage.
* doc/extend.texi (Common Function Attributes): Document that the
PowerPC supports the target_clone attribute.
[gcc/testsuite]
2017-05-24 Michael Meissner <meissner@linux.vnet.ibm.com>
* gcc.target/powerpc/clone1.c: New test.
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
[-- Attachment #2: clone.patch03b --]
[-- Type: text/plain, Size: 22303 bytes --]
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 248378)
+++ gcc/config/rs6000/rs6000.c (.../gcc/config/rs6000) (working copy)
@@ -42,6 +42,7 @@
#include "flags.h"
#include "alias.h"
#include "fold-const.h"
+#include "attribs.h"
#include "stor-layout.h"
#include "calls.h"
#include "print-tree.h"
@@ -384,6 +385,34 @@ static const struct
{ "ieee128", PPC_FEATURE2_HAS_IEEE128, 1 }
};
+/* On PowerPC, we have a limited number of target clones that we care about
+ which means we can use an array to hold the options, rather than having more
+ elaborate data structures to identify each possible variation. Order the
+ clones from the highest ISA to the least. */
+enum clone_list {
+ CLONE_ISA_3_00, /* ISA 3.00 (power9). */
+ CLONE_ISA_2_07, /* ISA 2.07 (power8). */
+ CLONE_ISA_2_06, /* ISA 2.06 (power7). */
+ CLONE_ISA_2_05, /* ISA 2.05 (power6). */
+ CLONE_DEFAULT, /* default clone. */
+ CLONE_MAX
+};
+
+/* Map compiler ISA bits into HWCAP names. */
+struct clone_map {
+ HOST_WIDE_INT isa_mask; /* rs6000_isa mask */
+ const char *name; /* name to use in __builtin_cpu_supports. */
+};
+
+static const struct clone_map rs6000_clone_map[ (int)CLONE_MAX ] = {
+ { OPTION_MASK_P9_VECTOR, "arch_3_00" }, /* ISA 3.00 (power9). */
+ { OPTION_MASK_P8_VECTOR, "arch_2_07" }, /* ISA 2.07 (power8). */
+ { OPTION_MASK_POPCNTD, "arch_2_06" }, /* ISA 2.06 (power7). */
+ { OPTION_MASK_CMPB, "arch_2_05" }, /* ISA 2.05 (power6). */
+ { 0, "" }, /* Default options. */
+};
+
+
/* Newer LIBCs explicitly export this symbol to declare that they provide
the AT_PLATFORM and AT_HWCAP/AT_HWCAP2 values in the TCB. We emit a
reference to this symbol whenever we expand a CPU builtin, so that
@@ -1969,6 +1998,21 @@ static const struct attribute_spec rs600
#undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
+
+#undef TARGET_COMPARE_VERSION_PRIORITY
+#define TARGET_COMPARE_VERSION_PRIORITY rs6000_compare_version_priority
+
+#undef TARGET_GENERATE_VERSION_DISPATCHER_BODY
+#define TARGET_GENERATE_VERSION_DISPATCHER_BODY \
+ rs6000_generate_version_dispatcher_body
+
+#undef TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+#define TARGET_GET_FUNCTION_VERSIONS_DISPATCHER \
+ rs6000_get_function_versions_dispatcher
+
+#undef TARGET_OPTION_FUNCTION_VERSIONS
+#define TARGET_OPTION_FUNCTION_VERSIONS common_function_versions
+
\f
/* Processor table. */
@@ -15616,6 +15660,14 @@ cpu_expand_builtin (enum rs6000_builtins
#ifdef TARGET_LIBC_PROVIDES_HWCAP_IN_TCB
tree arg = TREE_OPERAND (CALL_EXPR_ARG (exp, 0), 0);
+ /* Target clones creates an ARRAY_REF instead of STRING_CST, convert it back
+ to a STRING_CST. */
+ if (TREE_CODE (arg) == ARRAY_REF
+ && TREE_CODE (TREE_OPERAND (arg, 0)) == STRING_CST
+ && TREE_CODE (TREE_OPERAND (arg, 1)) == INTEGER_CST
+ && compare_tree_int (TREE_OPERAND (arg, 1), 0) == 0)
+ arg = TREE_OPERAND (arg, 0);
+
if (TREE_CODE (arg) != STRING_CST)
{
error ("builtin %s only accepts a string argument",
@@ -39743,6 +39795,14 @@ rs6000_valid_attribute_p (tree fndecl,
fprintf (stderr, "--------------------\n");
}
+ /* attribute((target("default"))) does nothing, beyond
+ affecting multi-versioning. */
+ if (TREE_VALUE (args)
+ && TREE_CODE (TREE_VALUE (args)) == STRING_CST
+ && TREE_CHAIN (args) == NULL_TREE
+ && strcmp (TREE_STRING_POINTER (TREE_VALUE (args)), "default") == 0)
+ return true;
+
old_optimize = build_optimization_node (&global_options);
func_optimize = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (fndecl);
@@ -40175,6 +40235,486 @@ rs6000_disable_incompatible_switches (vo
}
\f
+/* Helper function for printing the function name when debugging. */
+
+static inline const char *
+get_decl_name (tree fn)
+{
+ tree name;
+
+ if (!fn)
+ return "<null>";
+
+ name = DECL_NAME (fn);
+ if (!name)
+ return "<no-name>";
+
+ return IDENTIFIER_POINTER (name);
+}
+
+/* Return the clone id of the target we are compiling code for in a target
+ clone. The clone id is ordered from 0 to CLONE_MAX-1 and gives the priority
+ list for the target clones (ordered from highest to lowest). */
+
+static int
+rs6000_clone_priority (tree fndecl)
+{
+ tree fn_opts = DECL_FUNCTION_SPECIFIC_TARGET (fndecl);
+ HOST_WIDE_INT isa_masks;
+ int ret = (int) CLONE_DEFAULT;
+ tree attrs = lookup_attribute ("target", DECL_ATTRIBUTES (fndecl));
+ const char *attrs_str = NULL;
+
+ gcc_assert (attrs != NULL);
+ attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+ gcc_assert (TREE_CODE (attrs) == STRING_CST);
+ attrs_str = TREE_STRING_POINTER (attrs);
+
+ /* Return priority zero for default function. Return the ISA needed for the
+ function if it is not the default. */
+ if (strcmp (attrs_str, "default") != 0)
+ {
+ if (fn_opts == NULL_TREE)
+ fn_opts = target_option_default_node;
+
+ if (!fn_opts || !TREE_TARGET_OPTION (fn_opts))
+ isa_masks = rs6000_isa_flags;
+ else
+ isa_masks = TREE_TARGET_OPTION (fn_opts)->x_rs6000_isa_flags;
+
+ for (ret = 0; ret < (int) CLONE_DEFAULT; ret++)
+ if ((rs6000_clone_map[ret].isa_mask & isa_masks) != 0)
+ break;
+ }
+
+ if (TARGET_DEBUG_TARGET)
+ fprintf (stderr, "rs6000_get_function_version_priority (%s) => %d\n",
+ get_decl_name (fndecl), (int) ret);
+
+ return ret;
+}
+
+/* This compares the priority of target features in function DECL1 and DECL2.
+ It returns positive value if DECL1 is higher priority, negative value if
+ DECL2 is higher priority and 0 if they are the same. Note, priorities are
+ ordered from highest (0, CLONE_ISA_3_0) to lowest (CLONE_DEFAULT). */
+
+static int
+rs6000_compare_version_priority (tree decl1, tree decl2)
+{
+ int priority1 = rs6000_clone_priority (decl1);
+ int priority2 = rs6000_clone_priority (decl2);
+ int ret = priority2 - priority1;
+
+ if (TARGET_DEBUG_TARGET)
+ fprintf (stderr, "rs6000_compare_version_priority (%s, %s) => %d\n",
+ get_decl_name (decl1), get_decl_name (decl2), ret);
+
+ return ret;
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+ Calls to DECL function will be replaced with calls to the dispatcher
+ by the front-end. Returns the decl of the dispatcher function. */
+
+static tree
+rs6000_get_function_versions_dispatcher (void *decl)
+{
+ tree fn = (tree) decl;
+ struct cgraph_node *node = NULL;
+ struct cgraph_node *default_node = NULL;
+ struct cgraph_function_version_info *node_v = NULL;
+ struct cgraph_function_version_info *first_v = NULL;
+
+ tree dispatch_decl = NULL;
+
+ struct cgraph_function_version_info *default_version_info = NULL;
+
+ gcc_assert (fn != NULL && DECL_FUNCTION_VERSIONED (fn));
+
+ if (TARGET_DEBUG_TARGET)
+ fprintf (stderr, "rs6000_get_function_versions_dispatcher (%s)\n",
+ get_decl_name (fn));
+
+ node = cgraph_node::get (fn);
+ gcc_assert (node != NULL);
+
+ node_v = node->function_version ();
+ gcc_assert (node_v != NULL);
+
+ if (node_v->dispatcher_resolver != NULL)
+ return node_v->dispatcher_resolver;
+
+ /* Find the default version and make it the first node. */
+ first_v = node_v;
+ /* Go to the beginning of the chain. */
+ while (first_v->prev != NULL)
+ first_v = first_v->prev;
+
+ default_version_info = first_v;
+ while (default_version_info != NULL)
+ {
+ const tree decl2 = default_version_info->this_node->decl;
+ if (is_function_default_version (decl2))
+ break;
+ default_version_info = default_version_info->next;
+ }
+
+ /* If there is no default node, just return NULL. */
+ if (default_version_info == NULL)
+ return NULL;
+
+ /* Make default info the first node. */
+ if (first_v != default_version_info)
+ {
+ default_version_info->prev->next = default_version_info->next;
+ if (default_version_info->next)
+ default_version_info->next->prev = default_version_info->prev;
+ first_v->prev = default_version_info;
+ default_version_info->next = first_v;
+ default_version_info->prev = NULL;
+ }
+
+ default_node = default_version_info->this_node;
+
+#if defined (ASM_OUTPUT_TYPE_DIRECTIVE)
+ if (targetm.has_ifunc_p ())
+ {
+ struct cgraph_function_version_info *it_v = NULL;
+ struct cgraph_node *dispatcher_node = NULL;
+ struct cgraph_function_version_info *dispatcher_version_info = NULL;
+
+ /* Right now, the dispatching is done via ifunc. */
+ dispatch_decl = make_dispatcher_decl (default_node->decl);
+
+ dispatcher_node = cgraph_node::get_create (dispatch_decl);
+ gcc_assert (dispatcher_node != NULL);
+ dispatcher_node->dispatcher_function = 1;
+ dispatcher_version_info
+ = dispatcher_node->insert_new_function_version ();
+ dispatcher_version_info->next = default_version_info;
+ dispatcher_node->definition = 1;
+
+ /* Set the dispatcher for all the versions. */
+ it_v = default_version_info;
+ while (it_v != NULL)
+ {
+ it_v->dispatcher_resolver = dispatch_decl;
+ it_v = it_v->next;
+ }
+ }
+ else
+#endif
+ {
+ error_at (DECL_SOURCE_LOCATION (default_node->decl),
+ "multiversioning needs ifunc which is not supported "
+ "on this target");
+ }
+
+ return dispatch_decl;
+}
+
+/* Make the resolver function decl to dispatch the versions of
+ a multi-versioned function, DEFAULT_DECL. Create an
+ empty basic block in the resolver and store the pointer in
+ EMPTY_BB. Return the decl of the resolver function. */
+
+static tree
+make_resolver_func (const tree default_decl,
+ const tree dispatch_decl,
+ basic_block *empty_bb)
+{
+ char *resolver_name;
+ tree decl, type, decl_name, t;
+ bool is_uniq = false;
+
+ /* IFUNC's have to be globally visible. So, if the default_decl is
+ not, then the name of the IFUNC should be made unique. */
+ if (TREE_PUBLIC (default_decl) == 0)
+ is_uniq = true;
+
+ /* Append the filename to the resolver function if the versions are
+ not externally visible. This is because the resolver function has
+ to be externally visible for the loader to find it. So, appending
+ the filename will prevent conflicts with a resolver function from
+ another module which is based on the same version name. */
+ resolver_name = make_unique_name (default_decl, "resolver", is_uniq);
+
+ /* The resolver function should return a (void *). */
+ type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+ decl = build_fn_decl (resolver_name, type);
+ decl_name = get_identifier (resolver_name);
+ SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+ DECL_NAME (decl) = decl_name;
+ TREE_USED (decl) = 1;
+ DECL_ARTIFICIAL (decl) = 1;
+ DECL_IGNORED_P (decl) = 0;
+ /* IFUNC resolvers have to be externally visible. */
+ TREE_PUBLIC (decl) = 1;
+ DECL_UNINLINABLE (decl) = 1;
+
+ /* Resolver is not external, body is generated. */
+ DECL_EXTERNAL (decl) = 0;
+ DECL_EXTERNAL (dispatch_decl) = 0;
+
+ DECL_CONTEXT (decl) = NULL_TREE;
+ DECL_INITIAL (decl) = make_node (BLOCK);
+ DECL_STATIC_CONSTRUCTOR (decl) = 0;
+
+ if (DECL_COMDAT_GROUP (default_decl)
+ || TREE_PUBLIC (default_decl))
+ {
+ /* In this case, each translation unit with a call to this
+ versioned function will put out a resolver. Ensure it
+ is comdat to keep just one copy. */
+ DECL_COMDAT (decl) = 1;
+ make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+ }
+ /* Build result decl and add to function_decl. */
+ t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+ DECL_ARTIFICIAL (t) = 1;
+ DECL_IGNORED_P (t) = 1;
+ DECL_RESULT (decl) = t;
+
+ gimplify_function_tree (decl);
+ push_cfun (DECL_STRUCT_FUNCTION (decl));
+ *empty_bb = init_lowered_empty_function (decl, false, 0);
+
+ cgraph_node::add_new_function (decl, true);
+ symtab->call_cgraph_insertion_hooks (cgraph_node::get_create (decl));
+
+ pop_cfun ();
+
+ gcc_assert (dispatch_decl != NULL);
+ /* Mark dispatch_decl as "ifunc" with resolver as resolver_name. */
+ DECL_ATTRIBUTES (dispatch_decl)
+ = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));
+
+ /* Create the alias for dispatch to resolver here. */
+ /*cgraph_create_function_alias (dispatch_decl, decl);*/
+ cgraph_node::create_same_body_alias (dispatch_decl, decl);
+ XDELETEVEC (resolver_name);
+ return decl;
+}
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL to
+ return a pointer to VERSION_DECL if we are running on a machine that
+ supports the index CLONE_ISA hardware architecture bits. This function will
+ be called during version dispatch to decide which function version to
+ execute. It returns the basic block at the end, to which more conditions
+ can be added. */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+ int clone_isa, basic_block new_bb)
+{
+ gimple *return_stmt;
+ tree convert_expr, result_var;
+ gimple *convert_stmt;
+ gimple_seq gseq;
+ gimple *call_cond_stmt;
+ gimple *if_else_stmt;
+
+ basic_block bb1, bb2, bb3;
+ edge e12, e23;
+ tree cond_var, predicate_decl, predicate_arg, bool_zero;
+ const char *arg_str;
+
+ push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+
+ gcc_assert (new_bb != NULL);
+ gseq = bb_seq (new_bb);
+
+
+ convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+ build_fold_addr_expr (version_decl));
+ result_var = create_tmp_var (ptr_type_node);
+ convert_stmt = gimple_build_assign (result_var, convert_expr);
+ return_stmt = gimple_build_return (result_var);
+
+ if (clone_isa == (int)CLONE_DEFAULT)
+ {
+ gimple_seq_add_stmt (&gseq, convert_stmt);
+ gimple_seq_add_stmt (&gseq, return_stmt);
+ set_bb_seq (new_bb, gseq);
+ gimple_set_bb (convert_stmt, new_bb);
+ gimple_set_bb (return_stmt, new_bb);
+ pop_cfun ();
+ return new_bb;
+ }
+
+ bool_zero = build_int_cst (bool_int_type_node, 0);
+ cond_var = create_tmp_var (bool_int_type_node);
+ predicate_decl = rs6000_builtin_decls [(int) RS6000_BUILTIN_CPU_SUPPORTS];
+ arg_str = rs6000_clone_map[clone_isa].name;
+ predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+ call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+ gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+ gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+ gimple_set_bb (call_cond_stmt, new_bb);
+ gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+ if_else_stmt = gimple_build_cond (NE_EXPR, cond_var, bool_zero, NULL_TREE,
+ NULL_TREE);
+ gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+ gimple_set_bb (if_else_stmt, new_bb);
+ gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+ gimple_seq_add_stmt (&gseq, convert_stmt);
+ gimple_seq_add_stmt (&gseq, return_stmt);
+ set_bb_seq (new_bb, gseq);
+
+ bb1 = new_bb;
+ e12 = split_block (bb1, if_else_stmt);
+ bb2 = e12->dest;
+ e12->flags &= ~EDGE_FALLTHRU;
+ e12->flags |= EDGE_TRUE_VALUE;
+
+ e23 = split_block (bb2, return_stmt);
+
+ gimple_set_bb (convert_stmt, bb2);
+ gimple_set_bb (return_stmt, bb2);
+
+ bb3 = e23->dest;
+ make_edge (bb1, bb3, EDGE_FALSE_VALUE);
+
+ remove_edge (e23);
+ make_edge (bb2, EXIT_BLOCK_PTR_FOR_FN (cfun), 0);
+
+ pop_cfun ();
+
+ return bb3;
+}
+
+/* This function generates the dispatch function for multi-versioned functions.
+ DISPATCH_DECL is the function which will contain the dispatch logic.
+ FNDECLS are the function choices for dispatch, and is a tree chain.
+ EMPTY_BB is the basic block pointer in DISPATCH_DECL in which the dispatch
+ code is generated. */
+
+static int
+dispatch_function_versions (tree dispatch_decl,
+ void *fndecls_p,
+ basic_block *empty_bb)
+{
+ int ix;
+ tree ele;
+ vec<tree> *fndecls;
+ tree clones[ (int)CLONE_MAX ];
+
+ if (TARGET_DEBUG_TARGET)
+ fputs ("dispatch_function_versions, top\n", stderr);
+
+ gcc_assert (dispatch_decl != NULL
+ && fndecls_p != NULL
+ && empty_bb != NULL);
+
+ /* fndecls_p is actually a vector. */
+ fndecls = static_cast<vec<tree> *> (fndecls_p);
+
+ /* At least one more version other than the default. */
+ gcc_assert (fndecls->length () >= 2);
+
+ /* The first version in the vector is the default decl. */
+ memset ((void *) clones, '\0', sizeof (clones));
+ clones[ (int)CLONE_DEFAULT ] = (*fndecls)[0];
+
+ /* On the PowerPC, we do not need to call __builtin_cpu_init, if we are using
+ a new enough glibc. If we ever need to call it, we would need to insert
+ the code here to do the call. */
+
+ for (ix = 1; fndecls->iterate (ix, &ele); ++ix)
+ {
+ int priority = rs6000_clone_priority (ele);
+ if (!clones[priority])
+ clones[priority] = ele;
+ }
+
+ for (ix = 0; ix < (int)CLONE_MAX; ix++)
+ if (clones[ix])
+ {
+ if (TARGET_DEBUG_TARGET)
+ fprintf (stderr, "dispatch_function_versions, clone %d, %s\n",
+ ix, get_decl_name (clones[ix]));
+
+ *empty_bb = add_condition_to_bb (dispatch_decl, clones[ix], ix,
+ *empty_bb);
+ }
+
+ return 0;
+}
+
+/* Generate the dispatching code body to dispatch multi-versioned function
+ DECL. The target hook is called to process the "target" attributes and
+ provide the code to dispatch the right function at run-time. NODE points
+ to the dispatcher decl whose body will be created. */
+
+static tree
+rs6000_generate_version_dispatcher_body (void *node_p)
+{
+ tree resolver_decl;
+ basic_block empty_bb;
+ tree default_ver_decl;
+ struct cgraph_node *versn;
+ struct cgraph_node *node;
+
+ struct cgraph_function_version_info *node_version_info = NULL;
+ struct cgraph_function_version_info *versn_info = NULL;
+
+ node = (cgraph_node *)node_p;
+
+ node_version_info = node->function_version ();
+ gcc_assert (node->dispatcher_function
+ && node_version_info != NULL);
+
+ if (node_version_info->dispatcher_resolver)
+ return node_version_info->dispatcher_resolver;
+
+ /* The first version in the chain corresponds to the default version. */
+ default_ver_decl = node_version_info->next->this_node->decl;
+
+ /* node is going to be an alias, so remove the finalized bit. */
+ node->definition = false;
+
+ resolver_decl = make_resolver_func (default_ver_decl,
+ node->decl, &empty_bb);
+
+ node_version_info->dispatcher_resolver = resolver_decl;
+
+ if (TARGET_DEBUG_TARGET)
+ fprintf (stderr, "rs6000_get_function_versions_dispatcher, %s\n",
+ get_decl_name (resolver_decl));
+
+ push_cfun (DECL_STRUCT_FUNCTION (resolver_decl));
+
+ auto_vec<tree, 2> fn_ver_vec;
+
+ for (versn_info = node_version_info->next; versn_info;
+ versn_info = versn_info->next)
+ {
+ versn = versn_info->this_node;
+ /* Check for virtual functions here again, as by this time it should
+ have been determined if this function needs a vtable index or
+ not. This happens for methods in derived classes that override
+ virtual methods in base classes but are not explicitly marked as
+ virtual. */
+ if (DECL_VINDEX (versn->decl))
+ sorry ("Virtual function multiversioning not supported");
+
+ fn_ver_vec.safe_push (versn->decl);
+ }
+
+ dispatch_function_versions (resolver_decl, &fn_ver_vec, &empty_bb);
+ cgraph_edge::rebuild_edges ();
+ pop_cfun ();
+ return resolver_decl;
+}
+
+\f
/* Hook to determine if one function can safely inline another. */
static bool
@@ -40208,12 +40748,7 @@ rs6000_can_inline_p (tree caller, tree c
if (TARGET_DEBUG_TARGET)
fprintf (stderr, "rs6000_can_inline_p:, caller %s, callee %s, %s inline\n",
- (DECL_NAME (caller)
- ? IDENTIFIER_POINTER (DECL_NAME (caller))
- : "<unknown>"),
- (DECL_NAME (callee)
- ? IDENTIFIER_POINTER (DECL_NAME (callee))
- : "<unknown>"),
+ get_decl_name (caller), get_decl_name (callee),
(ret ? "can" : "cannot"));
return ret;
Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/doc) (revision 248378)
+++ gcc/doc/extend.texi (.../gcc/doc) (working copy)
@@ -3257,7 +3257,15 @@ For instance, on an x86, you could compi
@code{target_clones("sse4.1,avx")}. GCC creates two function clones,
one compiled with @option{-msse4.1} and another with @option{-mavx}.
It also creates a resolver function (see the @code{ifunc} attribute
-above) that dynamically selects a clone suitable for current architecture.
+above) that dynamically selects a clone suitable for current
+architecture.
+
+On a PowerPC, you could compile a function with
+@code{target_clones("cpu=power9,default")}. GCC creates two function
+clones, one compiled with @option{-mcpu=power9} and another with the
+default options. It also creates a resolver function (see the
+@code{ifunc} attribute above) that dynamically selects a clone
+suitable for current architecture.
@item unused
@cindex @code{unused} function attribute
Index: gcc/testsuite/gcc.target/powerpc/clone1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/clone1.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/clone1.c (.../gcc/testsuite/gcc.target/powerpc) (revision 248446)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O2" } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+
+__attribute__((target_clones("cpu=power9,default")))
+long mod_func (long a, long b)
+{
+ return a % b;
+}
+
+long mod_func_or (long a, long b, long c)
+{
+ return mod_func (a, b) | c;
+}
+
+/* { dg-final { scan-assembler-times {\mdivd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mmulld\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mmodsd\M} 1 } } */
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Add attribute((target_clone(...))) to PowerPC
2017-05-25 20:18 ` Michael Meissner
@ 2017-05-30 22:04 ` Segher Boessenkool
2017-05-31 2:22 ` Michael Meissner
2017-05-31 23:15 ` Michael Meissner
0 siblings, 2 replies; 13+ messages in thread
From: Segher Boessenkool @ 2017-05-30 22:04 UTC (permalink / raw)
To: Michael Meissner, Florian Weimer, GCC Patches, David Edelsohn,
Bill Schmidt
Hi Mike,
On Thu, May 25, 2017 at 04:05:39PM -0400, Michael Meissner wrote:
> +/* On PowerPC, we have a limited number of target clones that we care about
> + which means we can use an array to hold the options, rather than having more
> + elaborate data structures to identify each possible variation. Order the
> + clones from the highest ISA to the least. */
> +enum clone_list {
> + CLONE_ISA_3_00, /* ISA 3.00 (power9). */
> + CLONE_ISA_2_07, /* ISA 2.07 (power8). */
> + CLONE_ISA_2_06, /* ISA 2.06 (power7). */
> + CLONE_ISA_2_05, /* ISA 2.05 (power6). */
> + CLONE_DEFAULT, /* default clone. */
> + CLONE_MAX
> +};
Is this easier than the more natural ordering (from default to higher)?
Also, since you use the enum values as numbers, please make the first
on explicitly "= 0". These go together: default 0 is nice to have.
> +static const struct clone_map rs6000_clone_map[ (int)CLONE_MAX ] = {
Space after cast; no spaces inside [].
> +static inline const char *
> +get_decl_name (tree fn)
Please don't use inline unless there is a good reason to.
> + if (TARGET_DEBUG_TARGET)
> + fprintf (stderr, "rs6000_get_function_version_priority (%s) => %d\n",
> + get_decl_name (fndecl), (int) ret);
"ret" already is an int. Similarly, are the casts of the enum values
necessary?
> + struct cgraph_function_version_info *default_version_info = NULL;
You always initialise this variable later on; don't set it to NULL
earlier. You can move the declaration down to where the var is first
initialised.
> + tree dispatch_decl = NULL;
For this one, you can put it inside the if (), and just explicitly
return NULL on the error path (you do that in one case already).
> +#if defined (ASM_OUTPUT_TYPE_DIRECTIVE)
Is this the correct conditional to use? It is not obvious to me why
it would be. Does it have to be an #ifdef anyway, can't it be an if?
> + if (targetm.has_ifunc_p ())
> + {
> + struct cgraph_function_version_info *it_v = NULL;
> + struct cgraph_node *dispatcher_node = NULL;
> + struct cgraph_function_version_info *dispatcher_version_info = NULL;
No NULL for these either please. If you later add a path where you
forget to initialise one of these vars you will not get a warning
(and if nothing goes wrong these initialisations are distracting noise).
> +/* Make the resolver function decl to dispatch the versions of
> + a multi-versioned function, DEFAULT_DECL. Create an
One space after comma.
> + /* The resolver function should return a (void *). */
And two after a dot.
> + gcc_assert (dispatch_decl != NULL);
> + /* Mark dispatch_decl as "ifunc" with resolver as resolver_name. */
> + DECL_ATTRIBUTES (dispatch_decl)
> + = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));
That assert is not very useful: the very next statement would segfault
if the assertion fails, giving just as much information.
> + /* Create the alias for dispatch to resolver here. */
> + /*cgraph_create_function_alias (dispatch_decl, decl);*/
Do you need to keep this line? Please add a comment saying why it is
disabled for now, or such.
> + gcc_assert (new_bb != NULL);
> + gseq = bb_seq (new_bb);
Same as before.
> + convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
> + build_fold_addr_expr (version_decl));
Indent is broken here.
> + result_var = create_tmp_var (ptr_type_node);
> + convert_stmt = gimple_build_assign (result_var, convert_expr);
Space at end of line.
> + if (clone_isa == (int)CLONE_DEFAULT)
Space after cast. Do you need a cast here?
> + predicate_decl = rs6000_builtin_decls [(int) RS6000_BUILTIN_CPU_SUPPORTS];
You don't need a cast here either afaics.
> + make_edge (bb1, bb3, EDGE_FALSE_VALUE);
Space at end of line.
> + /* The first version in the vector is the default decl. */
> + memset ((void *) clones, '\0', sizeof (clones));
memset (clones, 0, sizeof clones);
or just initialise it in the first place:
tree clones[CLONE_MAX] = { 0 };
> + /* On the PowerPC, we do not need to call __builtin_cpu_init, if we are using
> + a new enough glibc. If we ever need to call it, we would need to insert
> + the code here to do the call. */
Are we always using a new enough glibc? If so, please clarify the
comment.
> +static tree
> +rs6000_generate_version_dispatcher_body (void *node_p)
Trailing space.
> + node = (cgraph_node *)node_p;
Space after cast.
> +On a PowerPC, you could compile a function with
> +@code{target_clones("cpu=power9,default")}. GCC creates two function
"For PowerPC you can ..."?
> --- gcc/testsuite/gcc.target/powerpc/clone1.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/clone1.c (.../gcc/testsuite/gcc.target/powerpc) (revision 248446)
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
s/powerpc64/powerpc/
Looks good so far, just needs some polish ;-) Please consider changing
the clone_list enum to a more natural order (and does the enum need a
name, anyway?), tidy up layout stuff etc., and repost.
Thanks,
Segher
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Add attribute((target_clone(...))) to PowerPC
2017-05-30 22:04 ` Segher Boessenkool
@ 2017-05-31 2:22 ` Michael Meissner
2017-05-31 23:15 ` Michael Meissner
1 sibling, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2017-05-31 2:22 UTC (permalink / raw)
To: Segher Boessenkool
Cc: Michael Meissner, Florian Weimer, GCC Patches, David Edelsohn,
Bill Schmidt
On Tue, May 30, 2017 at 04:51:34PM -0500, Segher Boessenkool wrote:
> Hi Mike,
>
> On Thu, May 25, 2017 at 04:05:39PM -0400, Michael Meissner wrote:
> > +/* On PowerPC, we have a limited number of target clones that we care about
> > + which means we can use an array to hold the options, rather than having more
> > + elaborate data structures to identify each possible variation. Order the
> > + clones from the highest ISA to the least. */
> > +enum clone_list {
> > + CLONE_ISA_3_00, /* ISA 3.00 (power9). */
> > + CLONE_ISA_2_07, /* ISA 2.07 (power8). */
> > + CLONE_ISA_2_06, /* ISA 2.06 (power7). */
> > + CLONE_ISA_2_05, /* ISA 2.05 (power6). */
> > + CLONE_DEFAULT, /* default clone. */
> > + CLONE_MAX
> > +};
>
> Is this easier than the more natural ordering (from default to higher)?
> Also, since you use the enum values as numbers, please make the first
> on explicitly "= 0". These go together: default 0 is nice to have.
It is easier to write the loops going up, but I have changed it to const ints
and deleted the enum.
> > +static const struct clone_map rs6000_clone_map[ (int)CLONE_MAX ] = {
>
> Space after cast; no spaces inside [].
Yep.
> > +static inline const char *
> > +get_decl_name (tree fn)
>
> Please don't use inline unless there is a good reason to.
Ok.
> > + if (TARGET_DEBUG_TARGET)
> > + fprintf (stderr, "rs6000_get_function_version_priority (%s) => %d\n",
> > + get_decl_name (fndecl), (int) ret);
>
> "ret" already is an int. Similarly, are the casts of the enum values
> necessary?
Yep.
> > + struct cgraph_function_version_info *default_version_info = NULL;
>
> You always initialise this variable later on; don't set it to NULL
> earlier. You can move the declaration down to where the var is first
> initialised.
Ok.
> > + tree dispatch_decl = NULL;
>
> For this one, you can put it inside the if (), and just explicitly
> return NULL on the error path (you do that in one case already).
Ok.
> > +#if defined (ASM_OUTPUT_TYPE_DIRECTIVE)
>
> Is this the correct conditional to use? It is not obvious to me why
> it would be. Does it have to be an #ifdef anyway, can't it be an if?
Yes I believe it is. ASM_OUTPUT_TYPE_DIRECTIVE is only defined in sysv4.h.
You need the .type directive to be able to declare .ifunc functions (plus
enabling ifunc which we now do as a default). AIX and non-Linux systems will
not be able to use target_clones.
> > + if (targetm.has_ifunc_p ())
> > + {
> > + struct cgraph_function_version_info *it_v = NULL;
> > + struct cgraph_node *dispatcher_node = NULL;
> > + struct cgraph_function_version_info *dispatcher_version_info = NULL;
>
> No NULL for these either please. If you later add a path where you
> forget to initialise one of these vars you will not get a warning
> (and if nothing goes wrong these initialisations are distracting noise).
I've recoded these.
> > +/* Make the resolver function decl to dispatch the versions of
> > + a multi-versioned function, DEFAULT_DECL. Create an
>
> One space after comma.
Ok.
> > + /* The resolver function should return a (void *). */
>
> And two after a dot.
Ok.
> > + gcc_assert (dispatch_decl != NULL);
> > + /* Mark dispatch_decl as "ifunc" with resolver as resolver_name. */
> > + DECL_ATTRIBUTES (dispatch_decl)
> > + = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));
>
> That assert is not very useful: the very next statement would segfault
> if the assertion fails, giving just as much information.
Ok.
> > + /* Create the alias for dispatch to resolver here. */
> > + /*cgraph_create_function_alias (dispatch_decl, decl);*/
>
> Do you need to keep this line? Please add a comment saying why it is
> disabled for now, or such.
I will probably need to call cgraph_create_function_alias in the next round
when I fix what I consider to be the big problem with target_clones (namely,
outside of the function you don't use the target clones, you only use the ifunc
support for the current module. But I will comment it for now.
>
> > + gcc_assert (new_bb != NULL);
> > + gseq = bb_seq (new_bb);
>
> Same as before.
Ok.
> > + convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
> > + build_fold_addr_expr (version_decl));
>
> Indent is broken here.
Ok.
> > + result_var = create_tmp_var (ptr_type_node);
> > + convert_stmt = gimple_build_assign (result_var, convert_expr);
>
> Space at end of line.
>
> > + if (clone_isa == (int)CLONE_DEFAULT)
>
> Space after cast. Do you need a cast here?
>
> > + predicate_decl = rs6000_builtin_decls [(int) RS6000_BUILTIN_CPU_SUPPORTS];
>
> You don't need a cast here either afaics.
See above.
> > + make_edge (bb1, bb3, EDGE_FALSE_VALUE);
>
> Space at end of line.
>
> > + /* The first version in the vector is the default decl. */
> > + memset ((void *) clones, '\0', sizeof (clones));
>
> memset (clones, 0, sizeof clones);
Ummm, it was my understanding in C++, you no longer get a free cast to void *,
and when you do need to use it in the mem* functions, you need an explicit
case.
> or just initialise it in the first place:
>
> tree clones[CLONE_MAX] = { 0 };
>
> > + /* On the PowerPC, we do not need to call __builtin_cpu_init, if we are using
> > + a new enough glibc. If we ever need to call it, we would need to insert
> > + the code here to do the call. */
>
> Are we always using a new enough glibc? If so, please clarify the
> comment.
The expansion of the __builtin_cpu_supports ensures we have a new enough glibc,
but I can expand on the comment (basically x86 needs to call
__builtin_cpu_init, we don't).
> > +static tree
> > +rs6000_generate_version_dispatcher_body (void *node_p)
>
> Trailing space.
Ok.
> > + node = (cgraph_node *)node_p;
>
> Space after cast.
Ok.
> > +On a PowerPC, you could compile a function with
> > +@code{target_clones("cpu=power9,default")}. GCC creates two function
>
> "For PowerPC you can ..."?
>
> > --- gcc/testsuite/gcc.target/powerpc/clone1.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0)
> > +++ gcc/testsuite/gcc.target/powerpc/clone1.c (.../gcc/testsuite/gcc.target/powerpc) (revision 248446)
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
>
> s/powerpc64/powerpc/
Ok.
>
> Looks good so far, just needs some polish ;-) Please consider changing
> the clone_list enum to a more natural order (and does the enum need a
> name, anyway?), tidy up layout stuff etc., and repost.
>
> Thanks,
>
>
> Segher
>
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Add attribute((target_clone(...))) to PowerPC
2017-05-30 22:04 ` Segher Boessenkool
2017-05-31 2:22 ` Michael Meissner
@ 2017-05-31 23:15 ` Michael Meissner
2017-06-01 0:20 ` Michael Meissner
2017-06-01 20:43 ` Segher Boessenkool
1 sibling, 2 replies; 13+ messages in thread
From: Michael Meissner @ 2017-05-31 23:15 UTC (permalink / raw)
To: Segher Boessenkool
Cc: Michael Meissner, Florian Weimer, GCC Patches, David Edelsohn,
Bill Schmidt
[-- Attachment #1: Type: text/plain, Size: 2712 bytes --]
Here is the updated version of the target_clone attribute support.
The changes include:
1) Change the order of the CLONE priority list from default being 0 to ISA 3.0
being the highest (and eliminating the enum and casts).
2) I tried to fix the various spacing issues. I found one place further down
that had ". */" and I fixed that also.
3) Given we are programming in C++ now, I moved some of the declarations in the
larger functions closer to the first usage and eliminated the extra
initialization to 0/NULL.
4) I took off the inline on the debug helper function.
5) I deleted the comment with 'cgraph_create_function_alias' in it (that
function no longer exists).
6) I clarified that __builtin_cpu_supports requires a recent glibc.
7) I changed the clone1.c test to eliminate powerpc64 but require Linux, since
other OSes might no support ifunc.
I bootstrapped this on a little endian power8 system and there were no
regressions. Can I install this in the trunk?
[gcc]
2017-05-31 Michael Meissner <meissner@linux.vnet.ibm.com>
* config/rs6000/rs6000.c (toplevel): Include attribs.h.
(CLONE_*): New constants to define the processors we can generate
code for with the target_clone attribute.
(rs6000_clone_map): New array to identify which clone processors
the current program is running on.
(TARGET_COMPARE_VERSION_PRIORITY): Define to enable the
target_clone attribute.
(TARGET_GENERATE_VERSION_DISPATCHER_BODY): Likewise.
(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): Likewise.
(TARGET_OPTION_FUNCTION_VERSIONS): Likewise.
(cpu_expand_builtin): Add support for target_clone attribute.
(rs6000_valid_attribute_p): Allow "default" attribute.
(get_decl_name): New debug function to simplify printing the
current function name in debugging statements.
(rs6000_clone_priority): New functions to support the target_clone
attribute, and be able to generate code to switch between ISA 2.05
through ISA 3.0 (power6 through power9).
(rs6000_compare_version_priority): Likewise.
(rs6000_get_function_versions_dispatcher): Likewise.
(make_resolver_func): Likewise.
(add_condition_to_bb): Likewise.
(dispatch_function_versions): Likewise.
(rs6000_generate_version_dispatcher_body): Likewise.
(rs6000_can_inline_p): Call get_decl_name for debugging usage.
(fusion_gpr_load_p): Fix a spacing issue.
* doc/extend.texi (Common Function Attributes): Document that the
PowerPC supports the target_clone attribute.
[gcc/testsuite]
2017-05-31 Michael Meissner <meissner@linux.vnet.ibm.com>
* gcc.target/powerpc/clone1.c: New test.
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
[-- Attachment #2: clone.patch05b --]
[-- Type: text/plain, Size: 22381 bytes --]
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 248759)
+++ gcc/config/rs6000/rs6000.c (.../gcc/config/rs6000) (working copy)
@@ -42,6 +42,7 @@
#include "flags.h"
#include "alias.h"
#include "fold-const.h"
+#include "attribs.h"
#include "stor-layout.h"
#include "calls.h"
#include "print-tree.h"
@@ -386,6 +387,32 @@ static const struct
{ "ieee128", PPC_FEATURE2_HAS_IEEE128, 1 }
};
+/* On PowerPC, we have a limited number of target clones that we care about
+ which means we can use an array to hold the options, rather than having more
+ elaborate data structures to identify each possible variation. Order the
+ clones from the default to the highest ISA. */
+const int CLONE_DEFAULT = 0; /* default clone. */
+const int CLONE_ISA_2_05 = 1; /* ISA 2.05 (power6). */
+const int CLONE_ISA_2_06 = 2; /* ISA 2.06 (power7). */
+const int CLONE_ISA_2_07 = 3; /* ISA 2.07 (power8). */
+const int CLONE_ISA_3_00 = 4; /* ISA 3.00 (power9). */
+const int CLONE_MAX = 5;
+
+/* Map compiler ISA bits into HWCAP names. */
+struct clone_map {
+ HOST_WIDE_INT isa_mask; /* rs6000_isa mask */
+ const char *name; /* name to use in __builtin_cpu_supports. */
+};
+
+static const struct clone_map rs6000_clone_map[CLONE_MAX] = {
+ { 0, "" }, /* Default options. */
+ { OPTION_MASK_CMPB, "arch_2_05" }, /* ISA 2.05 (power6). */
+ { OPTION_MASK_POPCNTD, "arch_2_06" }, /* ISA 2.06 (power7). */
+ { OPTION_MASK_P8_VECTOR, "arch_2_07" }, /* ISA 2.07 (power8). */
+ { OPTION_MASK_P9_VECTOR, "arch_3_00" }, /* ISA 3.00 (power9). */
+};
+
+
/* Newer LIBCs explicitly export this symbol to declare that they provide
the AT_PLATFORM and AT_HWCAP/AT_HWCAP2 values in the TCB. We emit a
reference to this symbol whenever we expand a CPU builtin, so that
@@ -1971,6 +1998,21 @@ static const struct attribute_spec rs600
#undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
+
+#undef TARGET_COMPARE_VERSION_PRIORITY
+#define TARGET_COMPARE_VERSION_PRIORITY rs6000_compare_version_priority
+
+#undef TARGET_GENERATE_VERSION_DISPATCHER_BODY
+#define TARGET_GENERATE_VERSION_DISPATCHER_BODY \
+ rs6000_generate_version_dispatcher_body
+
+#undef TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+#define TARGET_GET_FUNCTION_VERSIONS_DISPATCHER \
+ rs6000_get_function_versions_dispatcher
+
+#undef TARGET_OPTION_FUNCTION_VERSIONS
+#define TARGET_OPTION_FUNCTION_VERSIONS common_function_versions
+
\f
/* Processor table. */
@@ -15611,6 +15653,14 @@ cpu_expand_builtin (enum rs6000_builtins
#ifdef TARGET_LIBC_PROVIDES_HWCAP_IN_TCB
tree arg = TREE_OPERAND (CALL_EXPR_ARG (exp, 0), 0);
+ /* Target clones creates an ARRAY_REF instead of STRING_CST, convert it back
+ to a STRING_CST. */
+ if (TREE_CODE (arg) == ARRAY_REF
+ && TREE_CODE (TREE_OPERAND (arg, 0)) == STRING_CST
+ && TREE_CODE (TREE_OPERAND (arg, 1)) == INTEGER_CST
+ && compare_tree_int (TREE_OPERAND (arg, 1), 0) == 0)
+ arg = TREE_OPERAND (arg, 0);
+
if (TREE_CODE (arg) != STRING_CST)
{
error ("builtin %s only accepts a string argument",
@@ -39700,6 +39750,14 @@ rs6000_valid_attribute_p (tree fndecl,
fprintf (stderr, "--------------------\n");
}
+ /* attribute((target("default"))) does nothing, beyond
+ affecting multi-versioning. */
+ if (TREE_VALUE (args)
+ && TREE_CODE (TREE_VALUE (args)) == STRING_CST
+ && TREE_CHAIN (args) == NULL_TREE
+ && strcmp (TREE_STRING_POINTER (TREE_VALUE (args)), "default") == 0)
+ return true;
+
old_optimize = build_optimization_node (&global_options);
func_optimize = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (fndecl);
@@ -40132,6 +40190,452 @@ rs6000_disable_incompatible_switches (vo
}
\f
+/* Helper function for printing the function name when debugging. */
+
+static const char *
+get_decl_name (tree fn)
+{
+ tree name;
+
+ if (!fn)
+ return "<null>";
+
+ name = DECL_NAME (fn);
+ if (!name)
+ return "<no-name>";
+
+ return IDENTIFIER_POINTER (name);
+}
+
+/* Return the clone id of the target we are compiling code for in a target
+ clone. The clone id is ordered from 0 (default) to CLONE_MAX-1 and gives
+ the priority list for the target clones (ordered from lowest to
+ highest). */
+
+static int
+rs6000_clone_priority (tree fndecl)
+{
+ tree fn_opts = DECL_FUNCTION_SPECIFIC_TARGET (fndecl);
+ HOST_WIDE_INT isa_masks;
+ int ret = (int) CLONE_DEFAULT;
+ tree attrs = lookup_attribute ("target", DECL_ATTRIBUTES (fndecl));
+ const char *attrs_str = NULL;
+
+ gcc_assert (attrs != NULL);
+ attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+ gcc_assert (TREE_CODE (attrs) == STRING_CST);
+ attrs_str = TREE_STRING_POINTER (attrs);
+
+ /* Return priority zero for default function. Return the ISA needed for the
+ function if it is not the default. */
+ if (strcmp (attrs_str, "default") != 0)
+ {
+ if (fn_opts == NULL_TREE)
+ fn_opts = target_option_default_node;
+
+ if (!fn_opts || !TREE_TARGET_OPTION (fn_opts))
+ isa_masks = rs6000_isa_flags;
+ else
+ isa_masks = TREE_TARGET_OPTION (fn_opts)->x_rs6000_isa_flags;
+
+ for (ret = CLONE_MAX - 1; ret != 0; ret--)
+ if ((rs6000_clone_map[ret].isa_mask & isa_masks) != 0)
+ break;
+ }
+
+ if (TARGET_DEBUG_TARGET)
+ fprintf (stderr, "rs6000_get_function_version_priority (%s) => %d\n",
+ get_decl_name (fndecl), ret);
+
+ return ret;
+}
+
+/* This compares the priority of target features in function DECL1 and DECL2.
+ It returns positive value if DECL1 is higher priority, negative value if
+ DECL2 is higher priority and 0 if they are the same. Note, priorities are
+ ordered from lowest (currently CLONE_ISA_3_0) to highest
+ (CLONE_DEFAULT). */
+
+static int
+rs6000_compare_version_priority (tree decl1, tree decl2)
+{
+ int priority1 = rs6000_clone_priority (decl1);
+ int priority2 = rs6000_clone_priority (decl2);
+ int ret = priority1 - priority2;
+
+ if (TARGET_DEBUG_TARGET)
+ fprintf (stderr, "rs6000_compare_version_priority (%s, %s) => %d\n",
+ get_decl_name (decl1), get_decl_name (decl2), ret);
+
+ return ret;
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+ Calls to DECL function will be replaced with calls to the dispatcher
+ by the front-end. Returns the decl of the dispatcher function. */
+
+static tree
+rs6000_get_function_versions_dispatcher (void *decl)
+{
+ tree fn = (tree) decl;
+ struct cgraph_node *node = NULL;
+ struct cgraph_node *default_node = NULL;
+ struct cgraph_function_version_info *node_v = NULL;
+ struct cgraph_function_version_info *first_v = NULL;
+
+ tree dispatch_decl = NULL;
+
+ struct cgraph_function_version_info *default_version_info = NULL;
+ gcc_assert (fn != NULL && DECL_FUNCTION_VERSIONED (fn));
+
+ if (TARGET_DEBUG_TARGET)
+ fprintf (stderr, "rs6000_get_function_versions_dispatcher (%s)\n",
+ get_decl_name (fn));
+
+ node = cgraph_node::get (fn);
+ gcc_assert (node != NULL);
+
+ node_v = node->function_version ();
+ gcc_assert (node_v != NULL);
+
+ if (node_v->dispatcher_resolver != NULL)
+ return node_v->dispatcher_resolver;
+
+ /* Find the default version and make it the first node. */
+ first_v = node_v;
+ /* Go to the beginning of the chain. */
+ while (first_v->prev != NULL)
+ first_v = first_v->prev;
+
+ default_version_info = first_v;
+ while (default_version_info != NULL)
+ {
+ const tree decl2 = default_version_info->this_node->decl;
+ if (is_function_default_version (decl2))
+ break;
+ default_version_info = default_version_info->next;
+ }
+
+ /* If there is no default node, just return NULL. */
+ if (default_version_info == NULL)
+ return NULL;
+
+ /* Make default info the first node. */
+ if (first_v != default_version_info)
+ {
+ default_version_info->prev->next = default_version_info->next;
+ if (default_version_info->next)
+ default_version_info->next->prev = default_version_info->prev;
+ first_v->prev = default_version_info;
+ default_version_info->next = first_v;
+ default_version_info->prev = NULL;
+ }
+
+ default_node = default_version_info->this_node;
+
+#if defined (ASM_OUTPUT_TYPE_DIRECTIVE)
+ if (targetm.has_ifunc_p ())
+ {
+ struct cgraph_function_version_info *it_v = NULL;
+ struct cgraph_node *dispatcher_node = NULL;
+ struct cgraph_function_version_info *dispatcher_version_info = NULL;
+
+ /* Right now, the dispatching is done via ifunc. */
+ dispatch_decl = make_dispatcher_decl (default_node->decl);
+
+ dispatcher_node = cgraph_node::get_create (dispatch_decl);
+ gcc_assert (dispatcher_node != NULL);
+ dispatcher_node->dispatcher_function = 1;
+ dispatcher_version_info
+ = dispatcher_node->insert_new_function_version ();
+ dispatcher_version_info->next = default_version_info;
+ dispatcher_node->definition = 1;
+
+ /* Set the dispatcher for all the versions. */
+ it_v = default_version_info;
+ while (it_v != NULL)
+ {
+ it_v->dispatcher_resolver = dispatch_decl;
+ it_v = it_v->next;
+ }
+ }
+ else
+#endif
+ {
+ error_at (DECL_SOURCE_LOCATION (default_node->decl),
+ "multiversioning needs ifunc which is not supported "
+ "on this target");
+ }
+
+ return dispatch_decl;
+}
+
+/* Make the resolver function decl to dispatch the versions of a multi-
+ versioned function, DEFAULT_DECL. Create an empty basic block in the
+ resolver and store the pointer in EMPTY_BB. Return the decl of the resolver
+ function. */
+
+static tree
+make_resolver_func (const tree default_decl,
+ const tree dispatch_decl,
+ basic_block *empty_bb)
+{
+ /* IFUNC's have to be globally visible. So, if the default_decl is
+ not, then the name of the IFUNC should be made unique. */
+ bool is_uniq = (TREE_PUBLIC (default_decl) == 0);
+
+ /* Append the filename to the resolver function if the versions are
+ not externally visible. This is because the resolver function has
+ to be externally visible for the loader to find it. So, appending
+ the filename will prevent conflicts with a resolver function from
+ another module which is based on the same version name. */
+ char *resolver_name = make_unique_name (default_decl, "resolver", is_uniq);
+
+ /* The resolver function should return a (void *). */
+ tree type = build_function_type_list (ptr_type_node, NULL_TREE);
+ tree decl = build_fn_decl (resolver_name, type);
+ tree decl_name = get_identifier (resolver_name);
+ SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+ DECL_NAME (decl) = decl_name;
+ TREE_USED (decl) = 1;
+ DECL_ARTIFICIAL (decl) = 1;
+ DECL_IGNORED_P (decl) = 0;
+ /* IFUNC resolvers have to be externally visible. */
+ TREE_PUBLIC (decl) = 1;
+ DECL_UNINLINABLE (decl) = 1;
+
+ /* Resolver is not external, body is generated. */
+ DECL_EXTERNAL (decl) = 0;
+ DECL_EXTERNAL (dispatch_decl) = 0;
+
+ DECL_CONTEXT (decl) = NULL_TREE;
+ DECL_INITIAL (decl) = make_node (BLOCK);
+ DECL_STATIC_CONSTRUCTOR (decl) = 0;
+
+ if (DECL_COMDAT_GROUP (default_decl) || TREE_PUBLIC (default_decl))
+ {
+ /* In this case, each translation unit with a call to this
+ versioned function will put out a resolver. Ensure it
+ is comdat to keep just one copy. */
+ DECL_COMDAT (decl) = 1;
+ make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+ }
+
+ /* Build result decl and add to function_decl. */
+ tree t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+ DECL_ARTIFICIAL (t) = 1;
+ DECL_IGNORED_P (t) = 1;
+ DECL_RESULT (decl) = t;
+
+ gimplify_function_tree (decl);
+ push_cfun (DECL_STRUCT_FUNCTION (decl));
+ *empty_bb = init_lowered_empty_function (decl, false, 0);
+
+ cgraph_node::add_new_function (decl, true);
+ symtab->call_cgraph_insertion_hooks (cgraph_node::get_create (decl));
+
+ pop_cfun ();
+
+ /* Mark dispatch_decl as "ifunc" with resolver as resolver_name. */
+ DECL_ATTRIBUTES (dispatch_decl)
+ = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));
+
+ cgraph_node::create_same_body_alias (dispatch_decl, decl);
+ XDELETEVEC (resolver_name);
+ return decl;
+}
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL to
+ return a pointer to VERSION_DECL if we are running on a machine that
+ supports the index CLONE_ISA hardware architecture bits. This function will
+ be called during version dispatch to decide which function version to
+ execute. It returns the basic block at the end, to which more conditions
+ can be added. */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+ int clone_isa, basic_block new_bb)
+{
+ push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+
+ gcc_assert (new_bb != NULL);
+ gimple_seq gseq = bb_seq (new_bb);
+
+
+ tree convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+ build_fold_addr_expr (version_decl));
+ tree result_var = create_tmp_var (ptr_type_node);
+ gimple *convert_stmt = gimple_build_assign (result_var, convert_expr);
+ gimple *return_stmt = gimple_build_return (result_var);
+
+ if (clone_isa == CLONE_DEFAULT)
+ {
+ gimple_seq_add_stmt (&gseq, convert_stmt);
+ gimple_seq_add_stmt (&gseq, return_stmt);
+ set_bb_seq (new_bb, gseq);
+ gimple_set_bb (convert_stmt, new_bb);
+ gimple_set_bb (return_stmt, new_bb);
+ pop_cfun ();
+ return new_bb;
+ }
+
+ tree bool_zero = build_int_cst (bool_int_type_node, 0);
+ tree cond_var = create_tmp_var (bool_int_type_node);
+ tree predicate_decl = rs6000_builtin_decls [(int) RS6000_BUILTIN_CPU_SUPPORTS];
+ const char *arg_str = rs6000_clone_map[clone_isa].name;
+ tree predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+ gimple *call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+ gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+ gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+ gimple_set_bb (call_cond_stmt, new_bb);
+ gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+ gimple *if_else_stmt = gimple_build_cond (NE_EXPR, cond_var, bool_zero,
+ NULL_TREE, NULL_TREE);
+ gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+ gimple_set_bb (if_else_stmt, new_bb);
+ gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+ gimple_seq_add_stmt (&gseq, convert_stmt);
+ gimple_seq_add_stmt (&gseq, return_stmt);
+ set_bb_seq (new_bb, gseq);
+
+ basic_block bb1 = new_bb;
+ edge e12 = split_block (bb1, if_else_stmt);
+ basic_block bb2 = e12->dest;
+ e12->flags &= ~EDGE_FALLTHRU;
+ e12->flags |= EDGE_TRUE_VALUE;
+
+ edge e23 = split_block (bb2, return_stmt);
+ gimple_set_bb (convert_stmt, bb2);
+ gimple_set_bb (return_stmt, bb2);
+
+ basic_block bb3 = e23->dest;
+ make_edge (bb1, bb3, EDGE_FALSE_VALUE);
+
+ remove_edge (e23);
+ make_edge (bb2, EXIT_BLOCK_PTR_FOR_FN (cfun), 0);
+
+ pop_cfun ();
+ return bb3;
+}
+
+/* This function generates the dispatch function for multi-versioned functions.
+ DISPATCH_DECL is the function which will contain the dispatch logic.
+ FNDECLS are the function choices for dispatch, and is a tree chain.
+ EMPTY_BB is the basic block pointer in DISPATCH_DECL in which the dispatch
+ code is generated. */
+
+static int
+dispatch_function_versions (tree dispatch_decl,
+ void *fndecls_p,
+ basic_block *empty_bb)
+{
+ int ix;
+ tree ele;
+ vec<tree> *fndecls;
+ tree clones[CLONE_MAX];
+
+ if (TARGET_DEBUG_TARGET)
+ fputs ("dispatch_function_versions, top\n", stderr);
+
+ gcc_assert (dispatch_decl != NULL
+ && fndecls_p != NULL
+ && empty_bb != NULL);
+
+ /* fndecls_p is actually a vector. */
+ fndecls = static_cast<vec<tree> *> (fndecls_p);
+
+ /* At least one more version other than the default. */
+ gcc_assert (fndecls->length () >= 2);
+
+ /* The first version in the vector is the default decl. */
+ memset ((void *) clones, '\0', sizeof (clones));
+ clones[CLONE_DEFAULT] = (*fndecls)[0];
+
+ /* On the PowerPC, we do not need to call __builtin_cpu_init, which is a NOP
+ on the PowerPC (on the x86_64, it is not a NOP). The builtin function
+ __builtin_cpu_support ensures that the TOC fields are setup by requiring a
+ recent glibc. If we ever need to call __builtin_cpu_init, we would need
+ to insert the code here to do the call. */
+
+ for (ix = 1; fndecls->iterate (ix, &ele); ++ix)
+ {
+ int priority = rs6000_clone_priority (ele);
+ if (!clones[priority])
+ clones[priority] = ele;
+ }
+
+ for (ix = CLONE_MAX - 1; ix >= 0; ix--)
+ if (clones[ix])
+ {
+ if (TARGET_DEBUG_TARGET)
+ fprintf (stderr, "dispatch_function_versions, clone %d, %s\n",
+ ix, get_decl_name (clones[ix]));
+
+ *empty_bb = add_condition_to_bb (dispatch_decl, clones[ix], ix,
+ *empty_bb);
+ }
+
+ return 0;
+}
+
+/* Generate the dispatching code body to dispatch multi-versioned function
+ DECL. The target hook is called to process the "target" attributes and
+ provide the code to dispatch the right function at run-time. NODE points
+ to the dispatcher decl whose body will be created. */
+
+static tree
+rs6000_generate_version_dispatcher_body (void *node_p)
+{
+ tree resolver;
+ basic_block empty_bb;
+ struct cgraph_node *node = (cgraph_node *) node_p;
+ struct cgraph_function_version_info *ninfo = node->function_version ();
+
+ if (ninfo->dispatcher_resolver)
+ return ninfo->dispatcher_resolver;
+
+ /* node is going to be an alias, so remove the finalized bit. */
+ node->definition = false;
+
+ /* The first version in the chain corresponds to the default version. */
+ ninfo->dispatcher_resolver = resolver
+ = make_resolver_func (ninfo->next->this_node->decl, node->decl, &empty_bb);
+
+ if (TARGET_DEBUG_TARGET)
+ fprintf (stderr, "rs6000_get_function_versions_dispatcher, %s\n",
+ get_decl_name (resolver));
+
+ push_cfun (DECL_STRUCT_FUNCTION (resolver));
+ auto_vec<tree, 2> fn_ver_vec;
+
+ for (struct cgraph_function_version_info *vinfo = ninfo->next;
+ vinfo;
+ vinfo = vinfo->next)
+ {
+ struct cgraph_node *version = vinfo->this_node;
+ /* Check for virtual functions here again, as by this time it should
+ have been determined if this function needs a vtable index or
+ not. This happens for methods in derived classes that override
+ virtual methods in base classes but are not explicitly marked as
+ virtual. */
+ if (DECL_VINDEX (version->decl))
+ sorry ("Virtual function multiversioning not supported");
+
+ fn_ver_vec.safe_push (version->decl);
+ }
+
+ dispatch_function_versions (resolver, &fn_ver_vec, &empty_bb);
+ cgraph_edge::rebuild_edges ();
+ pop_cfun ();
+ return resolver;
+}
+
+\f
/* Hook to determine if one function can safely inline another. */
static bool
@@ -40165,12 +40669,7 @@ rs6000_can_inline_p (tree caller, tree c
if (TARGET_DEBUG_TARGET)
fprintf (stderr, "rs6000_can_inline_p:, caller %s, callee %s, %s inline\n",
- (DECL_NAME (caller)
- ? IDENTIFIER_POINTER (DECL_NAME (caller))
- : "<unknown>"),
- (DECL_NAME (callee)
- ? IDENTIFIER_POINTER (DECL_NAME (callee))
- : "<unknown>"),
+ get_decl_name (caller), get_decl_name (callee),
(ret ? "can" : "cannot"));
return ret;
@@ -40828,7 +41327,7 @@ bool
fusion_gpr_load_p (rtx addis_reg, /* register set via addis. */
rtx addis_value, /* addis value. */
rtx target, /* target register that is loaded. */
- rtx mem) /* bottom part of the memory addr. */
+ rtx mem) /* bottom part of the memory addr. */
{
rtx addr;
rtx base_reg;
Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/doc) (revision 248759)
+++ gcc/doc/extend.texi (.../gcc/doc) (working copy)
@@ -3257,7 +3257,15 @@ For instance, on an x86, you could compi
@code{target_clones("sse4.1,avx")}. GCC creates two function clones,
one compiled with @option{-msse4.1} and another with @option{-mavx}.
It also creates a resolver function (see the @code{ifunc} attribute
-above) that dynamically selects a clone suitable for current architecture.
+above) that dynamically selects a clone suitable for current
+architecture.
+
+On a PowerPC, you can compile a function with
+@code{target_clones("cpu=power9,default")}. GCC will create two
+function clones, one compiled with @option{-mcpu=power9} and another
+with the default options. It also creates a resolver function (see
+the @code{ifunc} attribute above) that dynamically selects a clone
+suitable for current architecture.
@item unused
@cindex @code{unused} function attribute
Index: gcc/testsuite/gcc.target/powerpc/clone1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/clone1.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/clone1.c (.../gcc/testsuite/gcc.target/powerpc) (revision 248762)
@@ -0,0 +1,26 @@
+/* { dg-do compile { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O2" } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+
+/* Power9 (aka, ISA 3.0) has a MODSD instruction to do modulus, while Power8
+ (aka, ISA 2.07) has to do modulus with divide and multiply. Make sure
+ both clone functions are generated.
+
+ Restrict ourselves to Linux, since IFUNC might not be supported in other
+ operating systems. */
+
+__attribute__((target_clones("cpu=power9,default")))
+long mod_func (long a, long b)
+{
+ return a % b;
+}
+
+long mod_func_or (long a, long b, long c)
+{
+ return mod_func (a, b) | c;
+}
+
+/* { dg-final { scan-assembler-times {\mdivd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mmulld\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mmodsd\M} 1 } } */
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Add attribute((target_clone(...))) to PowerPC
2017-05-31 23:15 ` Michael Meissner
@ 2017-06-01 0:20 ` Michael Meissner
2017-06-01 20:43 ` Segher Boessenkool
1 sibling, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2017-06-01 0:20 UTC (permalink / raw)
To: Michael Meissner, Segher Boessenkool, Florian Weimer,
GCC Patches, David Edelsohn, Bill Schmidt
On Wed, May 31, 2017 at 06:33:37PM -0400, Michael Meissner wrote:
> Here is the updated version of the target_clone attribute support.
>
> The changes include:
>
> 1) Change the order of the CLONE priority list from default being 0 to ISA 3.0
> being the highest (and eliminating the enum and casts).
Just to be clear, I meant that I changed the order of the CLONE priority lists
so that they now start at 0 (default) and go up to ISA 3.0. Previously, ISA
3.0 was 0, and default was the highest.
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Add attribute((target_clone(...))) to PowerPC
2017-05-31 23:15 ` Michael Meissner
2017-06-01 0:20 ` Michael Meissner
@ 2017-06-01 20:43 ` Segher Boessenkool
2017-06-02 14:16 ` Michael Meissner
1 sibling, 1 reply; 13+ messages in thread
From: Segher Boessenkool @ 2017-06-01 20:43 UTC (permalink / raw)
To: Michael Meissner, Florian Weimer, GCC Patches, David Edelsohn,
Bill Schmidt
Hi Mike,
On Wed, May 31, 2017 at 06:33:37PM -0400, Michael Meissner wrote:
> +/* On PowerPC, we have a limited number of target clones that we care about
> + which means we can use an array to hold the options, rather than having more
> + elaborate data structures to identify each possible variation. Order the
> + clones from the default to the highest ISA. */
> +const int CLONE_DEFAULT = 0; /* default clone. */
> +const int CLONE_ISA_2_05 = 1; /* ISA 2.05 (power6). */
> +const int CLONE_ISA_2_06 = 2; /* ISA 2.06 (power7). */
> +const int CLONE_ISA_2_07 = 3; /* ISA 2.07 (power8). */
> +const int CLONE_ISA_3_00 = 4; /* ISA 3.00 (power9). */
> +const int CLONE_MAX = 5;
With "you don't have to give the enum a name" I meant write it as
enum {
CLONE_DEFAULT = 0,
CLONE_ISA_2_05,
[...]
CLONE_MASK
};
If you do "const int", I think it should be "static const int"?
> +/* Helper function for printing the function name when debugging. */
> +
> +static const char *
> +get_decl_name (tree fn)
> +{
> + tree name;
> +
> + if (!fn)
> + return "<null>";
> +
> + name = DECL_NAME (fn);
> + if (!name)
> + return "<no-name>";
> +
> + return IDENTIFIER_POINTER (name);
> +}
Perhaps this would be useful to have in generic code?
> +rs6000_clone_priority (tree fndecl)
> +{
> + tree fn_opts = DECL_FUNCTION_SPECIFIC_TARGET (fndecl);
> + HOST_WIDE_INT isa_masks;
> + int ret = (int) CLONE_DEFAULT;
You don't need this cast afaics.
> + tree attrs = lookup_attribute ("target", DECL_ATTRIBUTES (fndecl));
> + const char *attrs_str = NULL;
> +
> + gcc_assert (attrs != NULL);
> + attrs = TREE_VALUE (TREE_VALUE (attrs));
> +
> + gcc_assert (TREE_CODE (attrs) == STRING_CST);
> + attrs_str = TREE_STRING_POINTER (attrs);
And these asserts neither. There are more of these: if the code
immediately following an assert will obviously fail (in an obvious way)
if the assert is false, then the assert is just noise, makes reading
the code harder instead of easier.
> +/* This compares the priority of target features in function DECL1 and DECL2.
> + It returns positive value if DECL1 is higher priority, negative value if
> + DECL2 is higher priority and 0 if they are the same. Note, priorities are
> + ordered from lowest (currently CLONE_ISA_3_0) to highest
> + (CLONE_DEFAULT). */
This comment needs updating? Swap CLONE_ISA_3_0 with CLONE_DEFAULT?
> +#if defined (ASM_OUTPUT_TYPE_DIRECTIVE)
> + if (targetm.has_ifunc_p ())
Hrm, I still don't see what you need the #ifdef for. What in the
following code won't compile without it? Or does targetm.has_ifunc_p
return the wrong answer?
> + {
> + struct cgraph_function_version_info *it_v = NULL;
> + struct cgraph_node *dispatcher_node = NULL;
> + struct cgraph_function_version_info *dispatcher_version_info = NULL;
> +
> + /* Right now, the dispatching is done via ifunc. */
> + dispatch_decl = make_dispatcher_decl (default_node->decl);
> +
> + dispatcher_node = cgraph_node::get_create (dispatch_decl);
> + gcc_assert (dispatcher_node != NULL);
> + dispatcher_node->dispatcher_function = 1;
> + dispatcher_version_info
> + = dispatcher_node->insert_new_function_version ();
> + dispatcher_version_info->next = default_version_info;
> + dispatcher_node->definition = 1;
> +
> + /* Set the dispatcher for all the versions. */
> + it_v = default_version_info;
> + while (it_v != NULL)
> + {
> + it_v->dispatcher_resolver = dispatch_decl;
> + it_v = it_v->next;
> + }
> + }
> + else
> +#endif
> + /* On the PowerPC, we do not need to call __builtin_cpu_init, which is a NOP
> + on the PowerPC (on the x86_64, it is not a NOP). The builtin function
> + __builtin_cpu_support ensures that the TOC fields are setup by requiring a
> + recent glibc. If we ever need to call __builtin_cpu_init, we would need
> + to insert the code here to do the call. */
Ah cool, thanks :-)
Segher
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Add attribute((target_clone(...))) to PowerPC
2017-06-01 20:43 ` Segher Boessenkool
@ 2017-06-02 14:16 ` Michael Meissner
2017-06-02 16:56 ` Segher Boessenkool
0 siblings, 1 reply; 13+ messages in thread
From: Michael Meissner @ 2017-06-02 14:16 UTC (permalink / raw)
To: Segher Boessenkool
Cc: Michael Meissner, Florian Weimer, GCC Patches, David Edelsohn,
Bill Schmidt
[-- Attachment #1: Type: text/plain, Size: 5074 bytes --]
On Thu, Jun 01, 2017 at 03:43:22PM -0500, Segher Boessenkool wrote:
> Hi Mike,
>
> On Wed, May 31, 2017 at 06:33:37PM -0400, Michael Meissner wrote:
> > +/* On PowerPC, we have a limited number of target clones that we care about
> > + which means we can use an array to hold the options, rather than having more
> > + elaborate data structures to identify each possible variation. Order the
> > + clones from the default to the highest ISA. */
> > +const int CLONE_DEFAULT = 0; /* default clone. */
> > +const int CLONE_ISA_2_05 = 1; /* ISA 2.05 (power6). */
> > +const int CLONE_ISA_2_06 = 2; /* ISA 2.06 (power7). */
> > +const int CLONE_ISA_2_07 = 3; /* ISA 2.07 (power8). */
> > +const int CLONE_ISA_3_00 = 4; /* ISA 3.00 (power9). */
> > +const int CLONE_MAX = 5;
>
> With "you don't have to give the enum a name" I meant write it as
>
> enum {
> CLONE_DEFAULT = 0,
> CLONE_ISA_2_05,
> [...]
> CLONE_MASK
> };
>
> If you do "const int", I think it should be "static const int"?
Ok. I think I was under the impression that enums were more tightly typed on
C++ compared to C, and that you needed explicit casts to/from integer.
> > +/* Helper function for printing the function name when debugging. */
> > +
> > +static const char *
> > +get_decl_name (tree fn)
> > +{
> > + tree name;
> > +
> > + if (!fn)
> > + return "<null>";
> > +
> > + name = DECL_NAME (fn);
> > + if (!name)
> > + return "<no-name>";
> > +
> > + return IDENTIFIER_POINTER (name);
> > +}
>
> Perhaps this would be useful to have in generic code?
Perhaps, but it is just for printing debug messages. I moved it to a separate
function to simplify indentation issues.
> > +rs6000_clone_priority (tree fndecl)
> > +{
> > + tree fn_opts = DECL_FUNCTION_SPECIFIC_TARGET (fndecl);
> > + HOST_WIDE_INT isa_masks;
> > + int ret = (int) CLONE_DEFAULT;
>
> You don't need this cast afaics.
>
> > + tree attrs = lookup_attribute ("target", DECL_ATTRIBUTES (fndecl));
> > + const char *attrs_str = NULL;
> > +
> > + gcc_assert (attrs != NULL);
> > + attrs = TREE_VALUE (TREE_VALUE (attrs));
> > +
> > + gcc_assert (TREE_CODE (attrs) == STRING_CST);
> > + attrs_str = TREE_STRING_POINTER (attrs);
>
> And these asserts neither. There are more of these: if the code
> immediately following an assert will obviously fail (in an obvious way)
> if the assert is false, then the assert is just noise, makes reading
> the code harder instead of easier.
Ok.
> > +/* This compares the priority of target features in function DECL1 and DECL2.
> > + It returns positive value if DECL1 is higher priority, negative value if
> > + DECL2 is higher priority and 0 if they are the same. Note, priorities are
> > + ordered from lowest (currently CLONE_ISA_3_0) to highest
> > + (CLONE_DEFAULT). */
>
> This comment needs updating? Swap CLONE_ISA_3_0 with CLONE_DEFAULT?
Yes, I missed updating this comment.
> > +#if defined (ASM_OUTPUT_TYPE_DIRECTIVE)
> > + if (targetm.has_ifunc_p ())
>
> Hrm, I still don't see what you need the #ifdef for. What in the
> following code won't compile without it? Or does targetm.has_ifunc_p
> return the wrong answer?
Right now, we only enable ifunc by default under Linux, so I removed the
#ifdef. We will see if it breaks on non Linux systems.
Are these patches ok to install?
[gcc]
2017-06-02 Michael Meissner <meissner@linux.vnet.ibm.com>
* config/rs6000/rs6000.c (toplevel): Include attribs.h.
(CLONE_*): New constants to define the processors we can generate
code for with the target_clone attribute.
(rs6000_clone_map): New array to identify which clone processors
the current program is running on.
(TARGET_COMPARE_VERSION_PRIORITY): Define to enable the
target_clone attribute.
(TARGET_GENERATE_VERSION_DISPATCHER_BODY): Likewise.
(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): Likewise.
(TARGET_OPTION_FUNCTION_VERSIONS): Likewise.
(cpu_expand_builtin): Add support for target_clone attribute.
(rs6000_valid_attribute_p): Allow "default" attribute.
(get_decl_name): New debug function to simplify printing the
current function name in debugging statements.
(rs6000_clone_priority): New functions to support the target_clone
attribute, and be able to generate code to switch between ISA 2.05
through ISA 3.0 (power6 through power9).
(rs6000_compare_version_priority): Likewise.
(rs6000_get_function_versions_dispatcher): Likewise.
(make_resolver_func): Likewise.
(add_condition_to_bb): Likewise.
(dispatch_function_versions): Likewise.
(rs6000_generate_version_dispatcher_body): Likewise.
(rs6000_can_inline_p): Call get_decl_name for debugging usage.
(fusion_gpr_load_p): Fix a spacing issue.
* doc/extend.texi (Common Function Attributes): Document that the
PowerPC supports the target_clone attribute.
[gcc/testsuite]
2017-06-02 Michael Meissner <meissner@linux.vnet.ibm.com>
* gcc.target/powerpc/clone1.c: New test.
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
[-- Attachment #2: clone.patch08b --]
[-- Type: text/plain, Size: 21897 bytes --]
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c (revision 248759)
+++ gcc/config/rs6000/rs6000.c (working copy)
@@ -42,6 +42,7 @@
#include "flags.h"
#include "alias.h"
#include "fold-const.h"
+#include "attribs.h"
#include "stor-layout.h"
#include "calls.h"
#include "print-tree.h"
@@ -386,6 +387,34 @@ static const struct
{ "ieee128", PPC_FEATURE2_HAS_IEEE128, 1 }
};
+/* On PowerPC, we have a limited number of target clones that we care about
+ which means we can use an array to hold the options, rather than having more
+ elaborate data structures to identify each possible variation. Order the
+ clones from the default to the highest ISA. */
+enum {
+ CLONE_DEFAULT = 0, /* default clone. */
+ CLONE_ISA_2_05, /* ISA 2.05 (power6). */
+ CLONE_ISA_2_06, /* ISA 2.06 (power7). */
+ CLONE_ISA_2_07, /* ISA 2.07 (power8). */
+ CLONE_ISA_3_00, /* ISA 3.00 (power9). */
+ CLONE_MAX
+};
+
+/* Map compiler ISA bits into HWCAP names. */
+struct clone_map {
+ HOST_WIDE_INT isa_mask; /* rs6000_isa mask */
+ const char *name; /* name to use in __builtin_cpu_supports. */
+};
+
+static const struct clone_map rs6000_clone_map[CLONE_MAX] = {
+ { 0, "" }, /* Default options. */
+ { OPTION_MASK_CMPB, "arch_2_05" }, /* ISA 2.05 (power6). */
+ { OPTION_MASK_POPCNTD, "arch_2_06" }, /* ISA 2.06 (power7). */
+ { OPTION_MASK_P8_VECTOR, "arch_2_07" }, /* ISA 2.07 (power8). */
+ { OPTION_MASK_P9_VECTOR, "arch_3_00" }, /* ISA 3.00 (power9). */
+};
+
+
/* Newer LIBCs explicitly export this symbol to declare that they provide
the AT_PLATFORM and AT_HWCAP/AT_HWCAP2 values in the TCB. We emit a
reference to this symbol whenever we expand a CPU builtin, so that
@@ -1971,6 +2000,21 @@ static const struct attribute_spec rs600
#undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
+
+#undef TARGET_COMPARE_VERSION_PRIORITY
+#define TARGET_COMPARE_VERSION_PRIORITY rs6000_compare_version_priority
+
+#undef TARGET_GENERATE_VERSION_DISPATCHER_BODY
+#define TARGET_GENERATE_VERSION_DISPATCHER_BODY \
+ rs6000_generate_version_dispatcher_body
+
+#undef TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+#define TARGET_GET_FUNCTION_VERSIONS_DISPATCHER \
+ rs6000_get_function_versions_dispatcher
+
+#undef TARGET_OPTION_FUNCTION_VERSIONS
+#define TARGET_OPTION_FUNCTION_VERSIONS common_function_versions
+
\f
/* Processor table. */
@@ -15611,6 +15655,14 @@ cpu_expand_builtin (enum rs6000_builtins
#ifdef TARGET_LIBC_PROVIDES_HWCAP_IN_TCB
tree arg = TREE_OPERAND (CALL_EXPR_ARG (exp, 0), 0);
+ /* Target clones creates an ARRAY_REF instead of STRING_CST, convert it back
+ to a STRING_CST. */
+ if (TREE_CODE (arg) == ARRAY_REF
+ && TREE_CODE (TREE_OPERAND (arg, 0)) == STRING_CST
+ && TREE_CODE (TREE_OPERAND (arg, 1)) == INTEGER_CST
+ && compare_tree_int (TREE_OPERAND (arg, 1), 0) == 0)
+ arg = TREE_OPERAND (arg, 0);
+
if (TREE_CODE (arg) != STRING_CST)
{
error ("builtin %s only accepts a string argument",
@@ -39700,6 +39752,14 @@ rs6000_valid_attribute_p (tree fndecl,
fprintf (stderr, "--------------------\n");
}
+ /* attribute((target("default"))) does nothing, beyond
+ affecting multi-versioning. */
+ if (TREE_VALUE (args)
+ && TREE_CODE (TREE_VALUE (args)) == STRING_CST
+ && TREE_CHAIN (args) == NULL_TREE
+ && strcmp (TREE_STRING_POINTER (TREE_VALUE (args)), "default") == 0)
+ return true;
+
old_optimize = build_optimization_node (&global_options);
func_optimize = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (fndecl);
@@ -40132,6 +40192,446 @@ rs6000_disable_incompatible_switches (vo
}
\f
+/* Helper function for printing the function name when debugging. */
+
+static const char *
+get_decl_name (tree fn)
+{
+ tree name;
+
+ if (!fn)
+ return "<null>";
+
+ name = DECL_NAME (fn);
+ if (!name)
+ return "<no-name>";
+
+ return IDENTIFIER_POINTER (name);
+}
+
+/* Return the clone id of the target we are compiling code for in a target
+ clone. The clone id is ordered from 0 (default) to CLONE_MAX-1 and gives
+ the priority list for the target clones (ordered from lowest to
+ highest). */
+
+static int
+rs6000_clone_priority (tree fndecl)
+{
+ tree fn_opts = DECL_FUNCTION_SPECIFIC_TARGET (fndecl);
+ HOST_WIDE_INT isa_masks;
+ int ret = CLONE_DEFAULT;
+ tree attrs = lookup_attribute ("target", DECL_ATTRIBUTES (fndecl));
+ const char *attrs_str = NULL;
+
+ attrs = TREE_VALUE (TREE_VALUE (attrs));
+ attrs_str = TREE_STRING_POINTER (attrs);
+
+ /* Return priority zero for default function. Return the ISA needed for the
+ function if it is not the default. */
+ if (strcmp (attrs_str, "default") != 0)
+ {
+ if (fn_opts == NULL_TREE)
+ fn_opts = target_option_default_node;
+
+ if (!fn_opts || !TREE_TARGET_OPTION (fn_opts))
+ isa_masks = rs6000_isa_flags;
+ else
+ isa_masks = TREE_TARGET_OPTION (fn_opts)->x_rs6000_isa_flags;
+
+ for (ret = CLONE_MAX - 1; ret != 0; ret--)
+ if ((rs6000_clone_map[ret].isa_mask & isa_masks) != 0)
+ break;
+ }
+
+ if (TARGET_DEBUG_TARGET)
+ fprintf (stderr, "rs6000_get_function_version_priority (%s) => %d\n",
+ get_decl_name (fndecl), ret);
+
+ return ret;
+}
+
+/* This compares the priority of target features in function DECL1 and DECL2.
+ It returns positive value if DECL1 is higher priority, negative value if
+ DECL2 is higher priority and 0 if they are the same. Note, priorities are
+ ordered from lowest (CLONE_DEFAULT) to highest (currently CLONE_ISA_3_0). */
+
+static int
+rs6000_compare_version_priority (tree decl1, tree decl2)
+{
+ int priority1 = rs6000_clone_priority (decl1);
+ int priority2 = rs6000_clone_priority (decl2);
+ int ret = priority1 - priority2;
+
+ if (TARGET_DEBUG_TARGET)
+ fprintf (stderr, "rs6000_compare_version_priority (%s, %s) => %d\n",
+ get_decl_name (decl1), get_decl_name (decl2), ret);
+
+ return ret;
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+ Calls to DECL function will be replaced with calls to the dispatcher
+ by the front-end. Returns the decl of the dispatcher function. */
+
+static tree
+rs6000_get_function_versions_dispatcher (void *decl)
+{
+ tree fn = (tree) decl;
+ struct cgraph_node *node = NULL;
+ struct cgraph_node *default_node = NULL;
+ struct cgraph_function_version_info *node_v = NULL;
+ struct cgraph_function_version_info *first_v = NULL;
+
+ tree dispatch_decl = NULL;
+
+ struct cgraph_function_version_info *default_version_info = NULL;
+ gcc_assert (fn != NULL && DECL_FUNCTION_VERSIONED (fn));
+
+ if (TARGET_DEBUG_TARGET)
+ fprintf (stderr, "rs6000_get_function_versions_dispatcher (%s)\n",
+ get_decl_name (fn));
+
+ node = cgraph_node::get (fn);
+ gcc_assert (node != NULL);
+
+ node_v = node->function_version ();
+ gcc_assert (node_v != NULL);
+
+ if (node_v->dispatcher_resolver != NULL)
+ return node_v->dispatcher_resolver;
+
+ /* Find the default version and make it the first node. */
+ first_v = node_v;
+ /* Go to the beginning of the chain. */
+ while (first_v->prev != NULL)
+ first_v = first_v->prev;
+
+ default_version_info = first_v;
+ while (default_version_info != NULL)
+ {
+ const tree decl2 = default_version_info->this_node->decl;
+ if (is_function_default_version (decl2))
+ break;
+ default_version_info = default_version_info->next;
+ }
+
+ /* If there is no default node, just return NULL. */
+ if (default_version_info == NULL)
+ return NULL;
+
+ /* Make default info the first node. */
+ if (first_v != default_version_info)
+ {
+ default_version_info->prev->next = default_version_info->next;
+ if (default_version_info->next)
+ default_version_info->next->prev = default_version_info->prev;
+ first_v->prev = default_version_info;
+ default_version_info->next = first_v;
+ default_version_info->prev = NULL;
+ }
+
+ default_node = default_version_info->this_node;
+
+ if (targetm.has_ifunc_p ())
+ {
+ struct cgraph_function_version_info *it_v = NULL;
+ struct cgraph_node *dispatcher_node = NULL;
+ struct cgraph_function_version_info *dispatcher_version_info = NULL;
+
+ /* Right now, the dispatching is done via ifunc. */
+ dispatch_decl = make_dispatcher_decl (default_node->decl);
+
+ dispatcher_node = cgraph_node::get_create (dispatch_decl);
+ gcc_assert (dispatcher_node != NULL);
+ dispatcher_node->dispatcher_function = 1;
+ dispatcher_version_info
+ = dispatcher_node->insert_new_function_version ();
+ dispatcher_version_info->next = default_version_info;
+ dispatcher_node->definition = 1;
+
+ /* Set the dispatcher for all the versions. */
+ it_v = default_version_info;
+ while (it_v != NULL)
+ {
+ it_v->dispatcher_resolver = dispatch_decl;
+ it_v = it_v->next;
+ }
+ }
+ else
+ {
+ error_at (DECL_SOURCE_LOCATION (default_node->decl),
+ "multiversioning needs ifunc which is not supported "
+ "on this target");
+ }
+
+ return dispatch_decl;
+}
+
+/* Make the resolver function decl to dispatch the versions of a multi-
+ versioned function, DEFAULT_DECL. Create an empty basic block in the
+ resolver and store the pointer in EMPTY_BB. Return the decl of the resolver
+ function. */
+
+static tree
+make_resolver_func (const tree default_decl,
+ const tree dispatch_decl,
+ basic_block *empty_bb)
+{
+ /* IFUNC's have to be globally visible. So, if the default_decl is
+ not, then the name of the IFUNC should be made unique. */
+ bool is_uniq = (TREE_PUBLIC (default_decl) == 0);
+
+ /* Append the filename to the resolver function if the versions are
+ not externally visible. This is because the resolver function has
+ to be externally visible for the loader to find it. So, appending
+ the filename will prevent conflicts with a resolver function from
+ another module which is based on the same version name. */
+ char *resolver_name = make_unique_name (default_decl, "resolver", is_uniq);
+
+ /* The resolver function should return a (void *). */
+ tree type = build_function_type_list (ptr_type_node, NULL_TREE);
+ tree decl = build_fn_decl (resolver_name, type);
+ tree decl_name = get_identifier (resolver_name);
+ SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+ DECL_NAME (decl) = decl_name;
+ TREE_USED (decl) = 1;
+ DECL_ARTIFICIAL (decl) = 1;
+ DECL_IGNORED_P (decl) = 0;
+ /* IFUNC resolvers have to be externally visible. */
+ TREE_PUBLIC (decl) = 1;
+ DECL_UNINLINABLE (decl) = 1;
+
+ /* Resolver is not external, body is generated. */
+ DECL_EXTERNAL (decl) = 0;
+ DECL_EXTERNAL (dispatch_decl) = 0;
+
+ DECL_CONTEXT (decl) = NULL_TREE;
+ DECL_INITIAL (decl) = make_node (BLOCK);
+ DECL_STATIC_CONSTRUCTOR (decl) = 0;
+
+ if (DECL_COMDAT_GROUP (default_decl) || TREE_PUBLIC (default_decl))
+ {
+ /* In this case, each translation unit with a call to this
+ versioned function will put out a resolver. Ensure it
+ is comdat to keep just one copy. */
+ DECL_COMDAT (decl) = 1;
+ make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+ }
+
+ /* Build result decl and add to function_decl. */
+ tree t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+ DECL_ARTIFICIAL (t) = 1;
+ DECL_IGNORED_P (t) = 1;
+ DECL_RESULT (decl) = t;
+
+ gimplify_function_tree (decl);
+ push_cfun (DECL_STRUCT_FUNCTION (decl));
+ *empty_bb = init_lowered_empty_function (decl, false, 0);
+
+ cgraph_node::add_new_function (decl, true);
+ symtab->call_cgraph_insertion_hooks (cgraph_node::get_create (decl));
+
+ pop_cfun ();
+
+ /* Mark dispatch_decl as "ifunc" with resolver as resolver_name. */
+ DECL_ATTRIBUTES (dispatch_decl)
+ = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));
+
+ cgraph_node::create_same_body_alias (dispatch_decl, decl);
+ XDELETEVEC (resolver_name);
+ return decl;
+}
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL to
+ return a pointer to VERSION_DECL if we are running on a machine that
+ supports the index CLONE_ISA hardware architecture bits. This function will
+ be called during version dispatch to decide which function version to
+ execute. It returns the basic block at the end, to which more conditions
+ can be added. */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+ int clone_isa, basic_block new_bb)
+{
+ push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+
+ gcc_assert (new_bb != NULL);
+ gimple_seq gseq = bb_seq (new_bb);
+
+
+ tree convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+ build_fold_addr_expr (version_decl));
+ tree result_var = create_tmp_var (ptr_type_node);
+ gimple *convert_stmt = gimple_build_assign (result_var, convert_expr);
+ gimple *return_stmt = gimple_build_return (result_var);
+
+ if (clone_isa == CLONE_DEFAULT)
+ {
+ gimple_seq_add_stmt (&gseq, convert_stmt);
+ gimple_seq_add_stmt (&gseq, return_stmt);
+ set_bb_seq (new_bb, gseq);
+ gimple_set_bb (convert_stmt, new_bb);
+ gimple_set_bb (return_stmt, new_bb);
+ pop_cfun ();
+ return new_bb;
+ }
+
+ tree bool_zero = build_int_cst (bool_int_type_node, 0);
+ tree cond_var = create_tmp_var (bool_int_type_node);
+ tree predicate_decl = rs6000_builtin_decls [(int) RS6000_BUILTIN_CPU_SUPPORTS];
+ const char *arg_str = rs6000_clone_map[clone_isa].name;
+ tree predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+ gimple *call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+ gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+ gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+ gimple_set_bb (call_cond_stmt, new_bb);
+ gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+ gimple *if_else_stmt = gimple_build_cond (NE_EXPR, cond_var, bool_zero,
+ NULL_TREE, NULL_TREE);
+ gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+ gimple_set_bb (if_else_stmt, new_bb);
+ gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+ gimple_seq_add_stmt (&gseq, convert_stmt);
+ gimple_seq_add_stmt (&gseq, return_stmt);
+ set_bb_seq (new_bb, gseq);
+
+ basic_block bb1 = new_bb;
+ edge e12 = split_block (bb1, if_else_stmt);
+ basic_block bb2 = e12->dest;
+ e12->flags &= ~EDGE_FALLTHRU;
+ e12->flags |= EDGE_TRUE_VALUE;
+
+ edge e23 = split_block (bb2, return_stmt);
+ gimple_set_bb (convert_stmt, bb2);
+ gimple_set_bb (return_stmt, bb2);
+
+ basic_block bb3 = e23->dest;
+ make_edge (bb1, bb3, EDGE_FALSE_VALUE);
+
+ remove_edge (e23);
+ make_edge (bb2, EXIT_BLOCK_PTR_FOR_FN (cfun), 0);
+
+ pop_cfun ();
+ return bb3;
+}
+
+/* This function generates the dispatch function for multi-versioned functions.
+ DISPATCH_DECL is the function which will contain the dispatch logic.
+ FNDECLS are the function choices for dispatch, and is a tree chain.
+ EMPTY_BB is the basic block pointer in DISPATCH_DECL in which the dispatch
+ code is generated. */
+
+static int
+dispatch_function_versions (tree dispatch_decl,
+ void *fndecls_p,
+ basic_block *empty_bb)
+{
+ int ix;
+ tree ele;
+ vec<tree> *fndecls;
+ tree clones[CLONE_MAX];
+
+ if (TARGET_DEBUG_TARGET)
+ fputs ("dispatch_function_versions, top\n", stderr);
+
+ gcc_assert (dispatch_decl != NULL
+ && fndecls_p != NULL
+ && empty_bb != NULL);
+
+ /* fndecls_p is actually a vector. */
+ fndecls = static_cast<vec<tree> *> (fndecls_p);
+
+ /* At least one more version other than the default. */
+ gcc_assert (fndecls->length () >= 2);
+
+ /* The first version in the vector is the default decl. */
+ memset ((void *) clones, '\0', sizeof (clones));
+ clones[CLONE_DEFAULT] = (*fndecls)[0];
+
+ /* On the PowerPC, we do not need to call __builtin_cpu_init, which is a NOP
+ on the PowerPC (on the x86_64, it is not a NOP). The builtin function
+ __builtin_cpu_support ensures that the TOC fields are setup by requiring a
+ recent glibc. If we ever need to call __builtin_cpu_init, we would need
+ to insert the code here to do the call. */
+
+ for (ix = 1; fndecls->iterate (ix, &ele); ++ix)
+ {
+ int priority = rs6000_clone_priority (ele);
+ if (!clones[priority])
+ clones[priority] = ele;
+ }
+
+ for (ix = CLONE_MAX - 1; ix >= 0; ix--)
+ if (clones[ix])
+ {
+ if (TARGET_DEBUG_TARGET)
+ fprintf (stderr, "dispatch_function_versions, clone %d, %s\n",
+ ix, get_decl_name (clones[ix]));
+
+ *empty_bb = add_condition_to_bb (dispatch_decl, clones[ix], ix,
+ *empty_bb);
+ }
+
+ return 0;
+}
+
+/* Generate the dispatching code body to dispatch multi-versioned function
+ DECL. The target hook is called to process the "target" attributes and
+ provide the code to dispatch the right function at run-time. NODE points
+ to the dispatcher decl whose body will be created. */
+
+static tree
+rs6000_generate_version_dispatcher_body (void *node_p)
+{
+ tree resolver;
+ basic_block empty_bb;
+ struct cgraph_node *node = (cgraph_node *) node_p;
+ struct cgraph_function_version_info *ninfo = node->function_version ();
+
+ if (ninfo->dispatcher_resolver)
+ return ninfo->dispatcher_resolver;
+
+ /* node is going to be an alias, so remove the finalized bit. */
+ node->definition = false;
+
+ /* The first version in the chain corresponds to the default version. */
+ ninfo->dispatcher_resolver = resolver
+ = make_resolver_func (ninfo->next->this_node->decl, node->decl, &empty_bb);
+
+ if (TARGET_DEBUG_TARGET)
+ fprintf (stderr, "rs6000_get_function_versions_dispatcher, %s\n",
+ get_decl_name (resolver));
+
+ push_cfun (DECL_STRUCT_FUNCTION (resolver));
+ auto_vec<tree, 2> fn_ver_vec;
+
+ for (struct cgraph_function_version_info *vinfo = ninfo->next;
+ vinfo;
+ vinfo = vinfo->next)
+ {
+ struct cgraph_node *version = vinfo->this_node;
+ /* Check for virtual functions here again, as by this time it should
+ have been determined if this function needs a vtable index or
+ not. This happens for methods in derived classes that override
+ virtual methods in base classes but are not explicitly marked as
+ virtual. */
+ if (DECL_VINDEX (version->decl))
+ sorry ("Virtual function multiversioning not supported");
+
+ fn_ver_vec.safe_push (version->decl);
+ }
+
+ dispatch_function_versions (resolver, &fn_ver_vec, &empty_bb);
+ cgraph_edge::rebuild_edges ();
+ pop_cfun ();
+ return resolver;
+}
+
+\f
/* Hook to determine if one function can safely inline another. */
static bool
@@ -40165,12 +40665,7 @@ rs6000_can_inline_p (tree caller, tree c
if (TARGET_DEBUG_TARGET)
fprintf (stderr, "rs6000_can_inline_p:, caller %s, callee %s, %s inline\n",
- (DECL_NAME (caller)
- ? IDENTIFIER_POINTER (DECL_NAME (caller))
- : "<unknown>"),
- (DECL_NAME (callee)
- ? IDENTIFIER_POINTER (DECL_NAME (callee))
- : "<unknown>"),
+ get_decl_name (caller), get_decl_name (callee),
(ret ? "can" : "cannot"));
return ret;
@@ -40828,7 +41323,7 @@ bool
fusion_gpr_load_p (rtx addis_reg, /* register set via addis. */
rtx addis_value, /* addis value. */
rtx target, /* target register that is loaded. */
- rtx mem) /* bottom part of the memory addr. */
+ rtx mem) /* bottom part of the memory addr. */
{
rtx addr;
rtx base_reg;
Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi (revision 248759)
+++ gcc/doc/extend.texi (working copy)
@@ -3257,7 +3257,15 @@ For instance, on an x86, you could compi
@code{target_clones("sse4.1,avx")}. GCC creates two function clones,
one compiled with @option{-msse4.1} and another with @option{-mavx}.
It also creates a resolver function (see the @code{ifunc} attribute
-above) that dynamically selects a clone suitable for current architecture.
+above) that dynamically selects a clone suitable for current
+architecture.
+
+On a PowerPC, you can compile a function with
+@code{target_clones("cpu=power9,default")}. GCC will create two
+function clones, one compiled with @option{-mcpu=power9} and another
+with the default options. It also creates a resolver function (see
+the @code{ifunc} attribute above) that dynamically selects a clone
+suitable for current architecture.
@item unused
@cindex @code{unused} function attribute
Index: gcc/testsuite/gcc.target/powerpc/clone1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/clone1.c (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/clone1.c (revision 0)
@@ -0,0 +1,26 @@
+/* { dg-do compile { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O2" } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+
+/* Power9 (aka, ISA 3.0) has a MODSD instruction to do modulus, while Power8
+ (aka, ISA 2.07) has to do modulus with divide and multiply. Make sure
+ both clone functions are generated.
+
+ Restrict ourselves to Linux, since IFUNC might not be supported in other
+ operating systems. */
+
+__attribute__((target_clones("cpu=power9,default")))
+long mod_func (long a, long b)
+{
+ return a % b;
+}
+
+long mod_func_or (long a, long b, long c)
+{
+ return mod_func (a, b) | c;
+}
+
+/* { dg-final { scan-assembler-times {\mdivd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mmulld\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mmodsd\M} 1 } } */
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Add attribute((target_clone(...))) to PowerPC
2017-06-02 14:16 ` Michael Meissner
@ 2017-06-02 16:56 ` Segher Boessenkool
2017-06-02 17:39 ` Michael Meissner
0 siblings, 1 reply; 13+ messages in thread
From: Segher Boessenkool @ 2017-06-02 16:56 UTC (permalink / raw)
To: Michael Meissner, Florian Weimer, GCC Patches, David Edelsohn,
Bill Schmidt
Hi!
On Fri, Jun 02, 2017 at 10:16:27AM -0400, Michael Meissner wrote:
> > With "you don't have to give the enum a name" I meant write it as
> >
> > enum {
> > CLONE_DEFAULT = 0,
> > CLONE_ISA_2_05,
> > [...]
> > CLONE_MASK
> > };
> >
> > If you do "const int", I think it should be "static const int"?
>
> Ok. I think I was under the impression that enums were more tightly typed on
> C++ compared to C, and that you needed explicit casts to/from integer.
No, conversions from enum to int are still explicitly allowed (but not
the other way around indeed).
> > > +#if defined (ASM_OUTPUT_TYPE_DIRECTIVE)
> > > + if (targetm.has_ifunc_p ())
> >
> > Hrm, I still don't see what you need the #ifdef for. What in the
> > following code won't compile without it? Or does targetm.has_ifunc_p
> > return the wrong answer?
>
> Right now, we only enable ifunc by default under Linux, so I removed the
> #ifdef. We will see if it breaks on non Linux systems.
Heh, you could test, you know ;-)
The patch is okay for trunk, but please test on AIX.
Segher
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Add attribute((target_clone(...))) to PowerPC
2017-06-02 16:56 ` Segher Boessenkool
@ 2017-06-02 17:39 ` Michael Meissner
2017-06-02 17:41 ` Segher Boessenkool
0 siblings, 1 reply; 13+ messages in thread
From: Michael Meissner @ 2017-06-02 17:39 UTC (permalink / raw)
To: Segher Boessenkool
Cc: Michael Meissner, Florian Weimer, GCC Patches, David Edelsohn,
Bill Schmidt
On Fri, Jun 02, 2017 at 11:55:57AM -0500, Segher Boessenkool wrote:
> Hi!
>
> On Fri, Jun 02, 2017 at 10:16:27AM -0400, Michael Meissner wrote:
> > > With "you don't have to give the enum a name" I meant write it as
> > >
> > > enum {
> > > CLONE_DEFAULT = 0,
> > > CLONE_ISA_2_05,
> > > [...]
> > > CLONE_MASK
> > > };
> > >
> > > If you do "const int", I think it should be "static const int"?
> >
> > Ok. I think I was under the impression that enums were more tightly typed on
> > C++ compared to C, and that you needed explicit casts to/from integer.
>
> No, conversions from enum to int are still explicitly allowed (but not
> the other way around indeed).
>
> > > > +#if defined (ASM_OUTPUT_TYPE_DIRECTIVE)
> > > > + if (targetm.has_ifunc_p ())
> > >
> > > Hrm, I still don't see what you need the #ifdef for. What in the
> > > following code won't compile without it? Or does targetm.has_ifunc_p
> > > return the wrong answer?
> >
> > Right now, we only enable ifunc by default under Linux, so I removed the
> > #ifdef. We will see if it breaks on non Linux systems.
>
> Heh, you could test, you know ;-)
I actually did a bootstrap/make check of everything but the removal of the
#ifdef. There was one test that had failed with my previous base run that now
runs, but it looks like a filesystem problem with the old base run.
> The patch is okay for trunk, but please test on AIX.
You mentioned in private IRC that you would do the run on AIX, did you want me
to wait until it is finished?
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Add attribute((target_clone(...))) to PowerPC
2017-06-02 17:39 ` Michael Meissner
@ 2017-06-02 17:41 ` Segher Boessenkool
2017-06-05 21:20 ` Michael Meissner
0 siblings, 1 reply; 13+ messages in thread
From: Segher Boessenkool @ 2017-06-02 17:41 UTC (permalink / raw)
To: Michael Meissner, Florian Weimer, GCC Patches, David Edelsohn,
Bill Schmidt
On Fri, Jun 02, 2017 at 01:39:34PM -0400, Michael Meissner wrote:
> > The patch is okay for trunk, but please test on AIX.
>
> You mentioned in private IRC that you would do the run on AIX, did you want me
> to wait until it is finished?
Yes please.
Segher
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Add attribute((target_clone(...))) to PowerPC
2017-06-02 17:41 ` Segher Boessenkool
@ 2017-06-05 21:20 ` Michael Meissner
0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2017-06-05 21:20 UTC (permalink / raw)
To: Segher Boessenkool
Cc: Michael Meissner, Florian Weimer, GCC Patches, David Edelsohn,
Bill Schmidt
Jan Hubicka <hubicka@ucw.cz> recently changed the calling sequence to
init_lowered_empty_function, and my patch doesn't build any more. I cloned the
changes made in the x86_64 port and committed the following change (subversion
id 248902) so that the rs6000 build should build once again.
2017-06-05 Michael Meissner <meissner@linux.vnet.ibm.com>
* config/rs6000/rs6000.c (make_resolver_func): Update
init_lowered_empty_function call.
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c (revision 248901)
+++ gcc/config/rs6000/rs6000.c (working copy)
@@ -40493,7 +40493,8 @@ make_resolver_func (const tree default_d
gimplify_function_tree (decl);
push_cfun (DECL_STRUCT_FUNCTION (decl));
- *empty_bb = init_lowered_empty_function (decl, false, 0);
+ *empty_bb = init_lowered_empty_function (decl, false,
+ profile_count::uninitialized ());
cgraph_node::add_new_function (decl, true);
symtab->call_cgraph_insertion_hooks (cgraph_node::get_create (decl));
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-06-05 21:20 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-25 18:54 [PATCH] Add attribute((target_clone(...))) to PowerPC Michael Meissner
2017-05-25 20:05 ` Florian Weimer
2017-05-25 20:18 ` Michael Meissner
2017-05-30 22:04 ` Segher Boessenkool
2017-05-31 2:22 ` Michael Meissner
2017-05-31 23:15 ` Michael Meissner
2017-06-01 0:20 ` Michael Meissner
2017-06-01 20:43 ` Segher Boessenkool
2017-06-02 14:16 ` Michael Meissner
2017-06-02 16:56 ` Segher Boessenkool
2017-06-02 17:39 ` Michael Meissner
2017-06-02 17:41 ` Segher Boessenkool
2017-06-05 21:20 ` Michael Meissner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).