From: Xinliang David Li <davidxl@google.com>
To: Sriraman Tallam <tmsriram@google.com>
Cc: Richard Guenther <richard.guenther@gmail.com>,
reply@codereview.appspotmail.com, gcc-patches@gcc.gnu.org
Subject: Re: User directed Function Multiversioning via Function Overloading (issue5752064)
Date: Thu, 08 Mar 2012 21:37:00 -0000 [thread overview]
Message-ID: <CAAkRFZ+s2-fvR5CovaJZF4yJdiwpT1M73ADafAXkeVU9+At+zA@mail.gmail.com> (raw)
In-Reply-To: <CAAs8Hmzawe6KhQkTwM0jtmXkK+Cch9EtG5BMwZ6aNzUmtoFhdg@mail.gmail.com>
On Wed, Mar 7, 2012 at 11:08 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Wed, Mar 7, 2012 at 6:05 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Wed, Mar 7, 2012 at 1:46 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> User directed Function Multiversioning (MV) via Function Overloading
>>> ====================================================================
>>>
>>> This patch adds support for user directed function MV via function overloading.
>>> For more detailed description:
>>> http://gcc.gnu.org/ml/gcc/2012-03/msg00074.html
>>>
>>>
>>> Here is an example program with function versions:
>>>
>>> int foo (); /* Default version */
>>> int foo () __attribute__ ((targetv("arch=corei7")));/*Specialized for corei7 */
>>> int foo () __attribute__ ((targetv("arch=core2")));/*Specialized for core2 */
>>>
>>> int main ()
>>> {
>>> int (*p)() = &foo;
>>> return foo () + (*p)();
>>> }
>>>
>>> int foo ()
>>> {
>>> return 0;
>>> }
>>>
>>> int __attribute__ ((targetv("arch=corei7")))
>>> foo ()
>>> {
>>> return 0;
>>> }
>>>
>>> int __attribute__ ((targetv("arch=core2")))
>>> foo ()
>>> {
>>> return 0;
>>> }
>>>
>>> The above example has foo defined 3 times, but all 3 definitions of foo are
>>> different versions of the same function. The call to foo in main, directly and
>>> via a pointer, are calls to the multi-versioned function foo which is dispatched
>>> to the right foo at run-time.
>>>
>>> Function versions must have the same signature but must differ in the specifier
>>> string provided to a new attribute called "targetv", which is nothing but the
>>> target attribute with an extra specification to indicate a version. Any number
>>> of versions can be created using the targetv attribute but it is mandatory to
>>> have one function without the attribute, which is treated as the default
>>> version.
>>>
>>> The dispatching is done using the IFUNC mechanism to keep the dispatch overhead
>>> low. The compiler creates a dispatcher function which checks the CPU type and
>>> calls the right version of foo. The dispatching code checks for the platform
>>> type and calls the first version that matches. The default function is called if
>>> no specialized version is appropriate for execution.
>>>
>>> The pointer to foo is made to be the address of the dispatcher function, so that
>>> it is unique and calls made via the pointer also work correctly. The assembler
>>> names of the various versions of foo is made different, by tagging
>>> the specifier strings, to keep them unique. A specific version can be called
>>> directly by creating an alias to its assembler name. For instance, to call the
>>> corei7 version directly, make an alias :
>>> int foo_corei7 () __attribute__((alias ("_Z3foov.arch_corei7")));
>>> and then call foo_corei7.
>>>
>>> Note that using IFUNC blocks inlining of versioned functions. I had implemented
>>> an optimization earlier to do hot path cloning to allow versioned functions to
>>> be inlined. Please see : http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02285.html
>>> In the next iteration, I plan to merge these two. With that, hot code paths with
>>> versioned functions will be cloned so that versioned functions can be inlined.
>>
>> Note that inlining of functions with the target attribute is limited as well,
>> but your issue is that of the indirect dispatch as ...
>>
>> You don't give an overview of the frontend implementation. Thus I have
>> extracted the following
>>
>> - the FE does not really know about the "overloading", nor can it directly
>> resolve calls from a "sse" function to another "sse" function without going
>> through the 2nd IFUNC
>
> This is a good point but I can change function joust, where the
> overload candidate is selected, to return the decl of the versioned
> function with matching target attributes as that of the callee. That
> will solve this problem. I have to treat the target attributes as an
> additional criterion for a match in overload resolution. The front end
> *does know* about the overloading, it is a question of doing the
> overload resolution correctly right? This is easy when there is no
> cloning involved.
Should this be covered by a new IFUNC folding rule? FE just needs to
generate dummy code.
>
> When cloning of a version is required, it gets complicated since the
> FE must clone and produce the bodies. Once, all the bodies are
> available the overload resolution can do the right thing.
>
How can you safely clone a function without knowing if the versioned
body is available in another module?
David
>>
>> - cgraph also does not know about the "overloading", so it cannot do such
>> "devirtualization" either
>>
>> you seem to have implemented something inbetween a pure frontend
>> solution and a proper middle-end solution.
>
> The only thing I delayed is the code generation of the dispatcher. I
> thought it is better to have this come later, after cfg and cgraph is
> generated, so that multiple dispatching mechanisms could be
> implemented.
>
> For optimization and eventually
>> automatically selecting functions for cloning (like, callees of a manual "sse"
>> versioned function should be cloned?) it would be nice if the cgraph would
>> know about the different versions and their relationships (and the dispatcher).
>> Especially the cgraph code should know the functions are semantically
>> equivalent (I suppose we should require that). The IFUNC should be
>> generated by cgraph / target code, similar to how we generate C++ thunks.
>>
>> Honza, any suggestions on how the FE side of such cgraph infrastructure
>> should look like and how we should encode the target bits?
>>
>> Thanks,
>> Richard.
>>
>>> * doc/tm.texi.in: Add description for TARGET_DISPATCH_VERSION.
>>> * doc/tm.texi: Regenerate.
>>> * c-family/c-common.c (handle_targetv_attribute): New function.
>>> * target.def (dispatch_version): New target hook.
>>> * tree.h (DECL_FUNCTION_VERSIONED): New macro.
>>> (tree_function_decl): New bit-field versioned_function.
>>> * tree-pass.h (pass_dispatch_versions): New pass.
>>> * multiversion.c: New file.
>>> * multiversion.h: New file.
>>> * cgraphunit.c: Include multiversion.h
>>> (cgraph_finalize_function): Change assembler names of versioned
>>> functions.
>>> * cp/class.c: Include multiversion.h
>>> (add_method): aggregate function versions. Change assembler names of
>>> versioned functions.
>>> (resolve_address_of_overloaded_function): Match address of function
>>> version with default function. Return address of ifunc dispatcher
>>> for address of versioned functions.
>>> * cp/decl.c (decls_match): Make decls unmatched for versioned
>>> functions.
>>> (duplicate_decls): Remove ambiguity for versioned functions. Notify
>>> of deleted function version decls.
>>> (start_decl): Change assembler name of versioned functions.
>>> (start_function): Change assembler name of versioned functions.
>>> (cxx_comdat_group): Make comdat group of versioned functions be the
>>> same.
>>> * cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
>>> functions that are also marked inline.
>>> * cp/decl2.c: Include multiversion.h
>>> (check_classfn): Check attributes of versioned functions for match.
>>> * cp/call.c: Include multiversion.h
>>> (build_over_call): Make calls to multiversioned functions to call the
>>> dispatcher.
>>> (joust): For calls to multi-versioned functions, make the default
>>> function win.
>>> * timevar.def (TV_MULTIVERSION_DISPATCH): New time var.
>>> * varasm.c (finish_aliases_1): Check if the alias points to a function
>>> with a body before giving an error.
>>> * Makefile.in: Add multiversion.o
>>> * passes.c: Add pass_dispatch_versions to the pass list.
>>> * config/i386/i386.c (add_condition_to_bb): New function.
>>> (get_builtin_code_for_version): New function.
>>> (ix86_dispatch_version): New function.
>>> (TARGET_DISPATCH_VERSION): New macro.
>>> * testsuite/g++.dg/mv1.C: New test.
>>>
>>> Index: doc/tm.texi
>>> ===================================================================
>>> --- doc/tm.texi (revision 184971)
>>> +++ doc/tm.texi (working copy)
>>> @@ -10995,6 +10995,14 @@ The result is another tree containing a simplified
>>> call's result. If @var{ignore} is true the value will be ignored.
>>> @end deftypefn
>>>
>>> +@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb})
>>> +For multi-versioned function, this hook sets up the dispatcher.
>>> +@var{dispatch_decl} is the function that will be used to dispatch the
>>> +version. @var{fndecls} are the function choices for dispatch.
>>> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the
>>> +code to do the dispatch will be added.
>>> +@end deftypefn
>>> +
>>> @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
>>>
>>> Take an instruction in @var{insn} and return NULL if it is valid within a
>>> Index: doc/tm.texi.in
>>> ===================================================================
>>> --- doc/tm.texi.in (revision 184971)
>>> +++ doc/tm.texi.in (working copy)
>>> @@ -10873,6 +10873,14 @@ The result is another tree containing a simplified
>>> call's result. If @var{ignore} is true the value will be ignored.
>>> @end deftypefn
>>>
>>> +@hook TARGET_DISPATCH_VERSION
>>> +For multi-versioned function, this hook sets up the dispatcher.
>>> +@var{dispatch_decl} is the function that will be used to dispatch the
>>> +version. @var{fndecls} are the function choices for dispatch.
>>> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the
>>> +code to do the dispatch will be added.
>>> +@end deftypefn
>>> +
>>> @hook TARGET_INVALID_WITHIN_DOLOOP
>>>
>>> Take an instruction in @var{insn} and return NULL if it is valid within a
>>> Index: c-family/c-common.c
>>> ===================================================================
>>> --- c-family/c-common.c (revision 184971)
>>> +++ c-family/c-common.c (working copy)
>>> @@ -315,6 +315,7 @@ static tree check_case_value (tree);
>>> static bool check_case_bounds (tree, tree, tree *, tree *);
>>>
>>> static tree handle_packed_attribute (tree *, tree, tree, int, bool *);
>>> +static tree handle_targetv_attribute (tree *, tree, tree, int, bool *);
>>> static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *);
>>> static tree handle_common_attribute (tree *, tree, tree, int, bool *);
>>> static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *);
>>> @@ -604,6 +605,8 @@ const struct attribute_spec c_common_attribute_tab
>>> {
>>> /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
>>> affects_type_identity } */
>>> + { "targetv", 1, -1, true, false, false,
>>> + handle_targetv_attribute, false },
>>> { "packed", 0, 0, false, false, false,
>>> handle_packed_attribute , false},
>>> { "nocommon", 0, 0, true, false, false,
>>> @@ -5869,6 +5872,54 @@ handle_packed_attribute (tree *node, tree name, tr
>>> return NULL_TREE;
>>> }
>>>
>>> +/* The targetv attribue is used to specify a function version
>>> + targeted to specific platform types. The "targetv" attributes
>>> + have to be valid "target" attributes. NODE should always point
>>> + to a FUNCTION_DECL. ARGS contain the arguments to "targetv"
>>> + which should be valid arguments to attribute "target" too.
>>> + Check handle_target_attribute for FLAGS and NO_ADD_ATTRS. */
>>> +
>>> +static tree
>>> +handle_targetv_attribute (tree *node, tree name,
>>> + tree args,
>>> + int flags,
>>> + bool *no_add_attrs)
>>> +{
>>> + const char *attr_str = NULL;
>>> + gcc_assert (TREE_CODE (*node) == FUNCTION_DECL);
>>> + gcc_assert (args != NULL);
>>> +
>>> + /* This is a function version. */
>>> + DECL_FUNCTION_VERSIONED (*node) = 1;
>>> +
>>> + attr_str = TREE_STRING_POINTER (TREE_VALUE (args));
>>> +
>>> + /* Check if multiple sets of target attributes are there. This
>>> + is not supported now. In future, this will be supported by
>>> + cloning this function for each set. */
>>> + if (TREE_CHAIN (args) != NULL)
>>> + warning (OPT_Wattributes, "%qE attribute has multiple sets which "
>>> + "is not supported", name);
>>> +
>>> + if (attr_str == NULL
>>> + || strstr (attr_str, "arch=") == NULL)
>>> + error_at (DECL_SOURCE_LOCATION (*node),
>>> + "Versioning supported only on \"arch=\" for now");
>>> +
>>> + /* targetv attributes must translate into target attributes. */
>>> + handle_target_attribute (node, get_identifier ("target"), args, flags,
>>> + no_add_attrs);
>>> +
>>> + if (*no_add_attrs)
>>> + warning (OPT_Wattributes, "%qE attribute has no effect", name);
>>> +
>>> + /* This is necessary to keep the attribute tagged to the decl
>>> + all the time. */
>>> + *no_add_attrs = false;
>>> +
>>> + return NULL_TREE;
>>> +}
>>> +
>>> /* Handle a "nocommon" attribute; arguments as in
>>> struct attribute_spec.handler. */
>>>
>>> Index: target.def
>>> ===================================================================
>>> --- target.def (revision 184971)
>>> +++ target.def (working copy)
>>> @@ -1249,6 +1249,15 @@ DEFHOOK
>>> tree, (tree fndecl, int n_args, tree *argp, bool ignore),
>>> hook_tree_tree_int_treep_bool_null)
>>>
>>> +/* Target hook to generate the dispatching code for calls to multi-versioned
>>> + functions. DISPATCH_DECL is the function that will have the dispatching
>>> + logic. FNDECLS are the list of choices for dispatch and EMPTY_BB is the
>>> + basic bloc in DISPATCH_DECL which will contain the code. */
>>> +DEFHOOK
>>> +(dispatch_version,
>>> + "",
>>> + int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL)
>>> +
>>> /* Returns a code for a target-specific builtin that implements
>>> reciprocal of the function, or NULL_TREE if not available. */
>>> DEFHOOK
>>> Index: tree.h
>>> ===================================================================
>>> --- tree.h (revision 184971)
>>> +++ tree.h (working copy)
>>> @@ -3532,6 +3532,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
>>> #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
>>> (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
>>>
>>> +/* In FUNCTION_DECL, this is set if this function has other versions generated
>>> + using "targetv" attributes. The default version is the one which does not
>>> + have any "targetv" attribute set. */
>>> +#define DECL_FUNCTION_VERSIONED(NODE)\
>>> + (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
>>> +
>>> /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
>>> arguments/result/saved_tree fields by front ends. It was either inherit
>>> FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
>>> @@ -3576,8 +3582,8 @@ struct GTY(()) tree_function_decl {
>>> unsigned looping_const_or_pure_flag : 1;
>>> unsigned has_debug_args_flag : 1;
>>> unsigned tm_clone_flag : 1;
>>> -
>>> - /* 1 bit left */
>>> + unsigned versioned_function : 1;
>>> + /* No bits left. */
>>> };
>>>
>>> /* The source language of the translation-unit. */
>>> Index: tree-pass.h
>>> ===================================================================
>>> --- tree-pass.h (revision 184971)
>>> +++ tree-pass.h (working copy)
>>> @@ -455,6 +455,7 @@ extern struct gimple_opt_pass pass_tm_memopt;
>>> extern struct gimple_opt_pass pass_tm_edges;
>>> extern struct gimple_opt_pass pass_split_functions;
>>> extern struct gimple_opt_pass pass_feedback_split_functions;
>>> +extern struct gimple_opt_pass pass_dispatch_versions;
>>>
>>> /* IPA Passes */
>>> extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
>>> Index: multiversion.c
>>> ===================================================================
>>> --- multiversion.c (revision 0)
>>> +++ multiversion.c (revision 0)
>>> @@ -0,0 +1,798 @@
>>> +/* Function Multiversioning.
>>> + Copyright (C) 2012 Free Software Foundation, Inc.
>>> + Contributed by Sriraman Tallam (tmsriram@google.com)
>>> +
>>> +This file is part of GCC.
>>> +
>>> +GCC is free software; you can redistribute it and/or modify it under
>>> +the terms of the GNU General Public License as published by the Free
>>> +Software Foundation; either version 3, or (at your option) any later
>>> +version.
>>> +
>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
>>> +for more details.
>>> +
>>> +You should have received a copy of the GNU General Public License
>>> +along with GCC; see the file COPYING3. If not see
>>> +<http://www.gnu.org/licenses/>. */
>>> +
>>> +/* Holds the state for multi-versioned functions here. The front-end
>>> + updates the state as and when function versions are encountered.
>>> + This is then used to generate the dispatch code. Also, the
>>> + optimization passes to clone hot paths involving versioned functions
>>> + will be done here.
>>> +
>>> + Function versions are created by using the same function signature but
>>> + also tagging attribute "targetv" to specify the platform type for which
>>> + the version must be executed. Here is an example:
>>> +
>>> + int foo ()
>>> + {
>>> + printf ("Execute as default");
>>> + return 0;
>>> + }
>>> +
>>> + int __attribute__ ((targetv ("arch=corei7")))
>>> + foo ()
>>> + {
>>> + printf ("Execute for corei7");
>>> + return 0;
>>> + }
>>> +
>>> + int main ()
>>> + {
>>> + return foo ();
>>> + }
>>> +
>>> + The call to foo in main is replaced with a call to an IFUNC function that
>>> + contains the dispatch code to call the correct function version at
>>> + run-time. */
>>> +
>>> +
>>> +#include "config.h"
>>> +#include "system.h"
>>> +#include "coretypes.h"
>>> +#include "tm.h"
>>> +#include "tree.h"
>>> +#include "tree-inline.h"
>>> +#include "langhooks.h"
>>> +#include "flags.h"
>>> +#include "cgraph.h"
>>> +#include "diagnostic.h"
>>> +#include "toplev.h"
>>> +#include "timevar.h"
>>> +#include "params.h"
>>> +#include "fibheap.h"
>>> +#include "intl.h"
>>> +#include "tree-pass.h"
>>> +#include "hashtab.h"
>>> +#include "coverage.h"
>>> +#include "ggc.h"
>>> +#include "tree-flow.h"
>>> +#include "rtl.h"
>>> +#include "ipa-prop.h"
>>> +#include "basic-block.h"
>>> +#include "toplev.h"
>>> +#include "dbgcnt.h"
>>> +#include "tree-dump.h"
>>> +#include "output.h"
>>> +#include "vecprim.h"
>>> +#include "gimple-pretty-print.h"
>>> +#include "ipa-inline.h"
>>> +#include "target.h"
>>> +#include "multiversion.h"
>>> +
>>> +typedef void * void_p;
>>> +
>>> +DEF_VEC_P (void_p);
>>> +DEF_VEC_ALLOC_P (void_p, heap);
>>> +
>>> +/* Each function decl that is a function version gets an instance of this
>>> + structure. Since this is called by the front-end, decl merging can
>>> + happen, where a decl created for a new declaration is merged with
>>> + the old. In this case, the new decl is deleted and the IS_DELETED
>>> + field is set for the struct instance corresponding to the new decl.
>>> + IFUNC_DECL is the decl of the ifunc function for default decls.
>>> + IFUNC_RESOLVER_DECL is the decl of the dispatch function. VERSIONS
>>> + is a vector containing the list of function versions that are
>>> + the candidates for dispatch. */
>>> +
>>> +typedef struct version_function_d {
>>> + tree decl;
>>> + tree ifunc_decl;
>>> + tree ifunc_resolver_decl;
>>> + VEC (void_p, heap) *versions;
>>> + bool is_deleted;
>>> +} version_function;
>>> +
>>> +/* Hashmap has an entry for every function decl that has other function
>>> + versions. For function decls that are the default, it also stores the
>>> + list of all the other function versions. Each entry is a structure
>>> + of type version_function_d. */
>>> +static htab_t decl_version_htab = NULL;
>>> +
>>> +/* Hashtable helpers for decl_version_htab. */
>>> +
>>> +static hashval_t
>>> +decl_version_htab_hash_descriptor (const void *p)
>>> +{
>>> + const version_function *t = (const version_function *) p;
>>> + return htab_hash_pointer (t->decl);
>>> +}
>>> +
>>> +/* Hashtable helper for decl_version_htab. */
>>> +
>>> +static int
>>> +decl_version_htab_eq_descriptor (const void *p1, const void *p2)
>>> +{
>>> + const version_function *t1 = (const version_function *) p1;
>>> + return htab_eq_pointer ((const void_p) t1->decl, p2);
>>> +}
>>> +
>>> +/* Create the decl_version_htab. */
>>> +static void
>>> +create_decl_version_htab (void)
>>> +{
>>> + if (decl_version_htab == NULL)
>>> + decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor,
>>> + decl_version_htab_eq_descriptor, NULL);
>>> +}
>>> +
>>> +/* Creates an instance of version_function for decl DECL. */
>>> +
>>> +static version_function*
>>> +new_version_function (const tree decl)
>>> +{
>>> + version_function *v;
>>> + v = (version_function *)xmalloc(sizeof (version_function));
>>> + v->decl = decl;
>>> + v->ifunc_decl = NULL;
>>> + v->ifunc_resolver_decl = NULL;
>>> + v->versions = NULL;
>>> + v->is_deleted = false;
>>> + return v;
>>> +}
>>> +
>>> +/* Comparator function to be used in qsort routine to sort attribute
>>> + specification strings to "targetv". */
>>> +
>>> +static int
>>> +attr_strcmp (const void *v1, const void *v2)
>>> +{
>>> + const char *c1 = *(char *const*)v1;
>>> + const char *c2 = *(char *const*)v2;
>>> + return strcmp (c1, c2);
>>> +}
>>> +
>>> +/* STR is the argument to targetv attribute. This function tokenizes
>>> + the comma separated arguments, sorts them and returns a string which
>>> + is a unique identifier for the comma separated arguments. */
>>> +
>>> +static char *
>>> +sorted_attr_string (const char *str)
>>> +{
>>> + char **args = NULL;
>>> + char *attr_str, *ret_str;
>>> + char *attr = NULL;
>>> + unsigned int argnum = 1;
>>> + unsigned int i;
>>> +
>>> + for (i = 0; i < strlen (str); i++)
>>> + if (str[i] == ',')
>>> + argnum++;
>>> +
>>> + attr_str = (char *)xmalloc (strlen (str) + 1);
>>> + strcpy (attr_str, str);
>>> +
>>> + for (i = 0; i < strlen (attr_str); i++)
>>> + if (attr_str[i] == '=')
>>> + attr_str[i] = '_';
>>> +
>>> + if (argnum == 1)
>>> + return attr_str;
>>> +
>>> + args = (char **)xmalloc (argnum * sizeof (char *));
>>> +
>>> + i = 0;
>>> + attr = strtok (attr_str, ",");
>>> + while (attr != NULL)
>>> + {
>>> + args[i] = attr;
>>> + i++;
>>> + attr = strtok (NULL, ",");
>>> + }
>>> +
>>> + qsort (args, argnum, sizeof (char*), attr_strcmp);
>>> +
>>> + ret_str = (char *)xmalloc (strlen (str) + 1);
>>> + strcpy (ret_str, args[0]);
>>> + for (i = 1; i < argnum; i++)
>>> + {
>>> + strcat (ret_str, "_");
>>> + strcat (ret_str, args[i]);
>>> + }
>>> +
>>> + free (args);
>>> + free (attr_str);
>>> + return ret_str;
>>> +}
>>> +
>>> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv"
>>> + or if the "targetv" attribute strings of DECL1 and DECL2 dont match. */
>>> +
>>> +bool
>>> +has_different_version_attributes (const tree decl1, const tree decl2)
>>> +{
>>> + tree attr1, attr2;
>>> + char *c1, *c2;
>>> + bool ret = false;
>>> +
>>> + if (TREE_CODE (decl1) != FUNCTION_DECL
>>> + || TREE_CODE (decl2) != FUNCTION_DECL)
>>> + return false;
>>> +
>>> + attr1 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl1));
>>> + attr2 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl2));
>>> +
>>> + if (attr1 == NULL_TREE && attr2 == NULL_TREE)
>>> + return false;
>>> +
>>> + if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
>>> + || (attr1 != NULL_TREE && attr2 == NULL_TREE))
>>> + return true;
>>> +
>>> + c1 = sorted_attr_string (
>>> + TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1))));
>>> + c2 = sorted_attr_string (
>>> + TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2))));
>>> +
>>> + if (strcmp (c1, c2) != 0)
>>> + ret = true;
>>> +
>>> + free (c1);
>>> + free (c2);
>>> +
>>> + return ret;
>>> +}
>>> +
>>> +/* If this decl corresponds to a function and has "targetv" attribute,
>>> + append the attribute string to its assembler name. */
>>> +
>>> +void
>>> +version_assembler_name (const tree decl)
>>> +{
>>> + tree version_attr;
>>> + const char *orig_name, *version_string, *attr_str;
>>> + char *assembler_name;
>>> + tree assembler_name_tree;
>>> +
>>> + if (TREE_CODE (decl) != FUNCTION_DECL
>>> + || DECL_ASSEMBLER_NAME_SET_P (decl)
>>> + || !DECL_FUNCTION_VERSIONED (decl))
>>> + return;
>>> +
>>> + if (DECL_DECLARED_INLINE_P (decl)
>>> + &&lookup_attribute ("gnu_inline",
>>> + DECL_ATTRIBUTES (decl)))
>>> + error_at (DECL_SOURCE_LOCATION (decl),
>>> + "Function versions cannot be marked as gnu_inline,"
>>> + " bodies have to be generated\n");
>>> +
>>> + if (DECL_VIRTUAL_P (decl)
>>> + || DECL_VINDEX (decl))
>>> + error_at (DECL_SOURCE_LOCATION (decl),
>>> + "Virtual function versioning not supported\n");
>>> +
>>> + version_attr = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl));
>>> + /* targetv attribute string is NULL for default functions. */
>>> + if (version_attr == NULL_TREE)
>>> + return;
>>> +
>>> + orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
>>> + version_string
>>> + = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
>>> +
>>> + attr_str = sorted_attr_string (version_string);
>>> + assembler_name = (char *) xmalloc (strlen (orig_name)
>>> + + strlen (attr_str) + 2);
>>> +
>>> + sprintf (assembler_name, "%s.%s", orig_name, attr_str);
>>> + if (dump_file)
>>> + fprintf (dump_file, "Assembler name set to %s for function version %s\n",
>>> + assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
>>> + assembler_name_tree = get_identifier (assembler_name);
>>> + SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree);
>>> +}
>>> +
>>> +/* Returns true if decl is multi-versioned and DECL is the default function,
>>> + that is it is not tagged with "targetv" attribute. */
>>> +
>>> +bool
>>> +is_default_function (const tree decl)
>>> +{
>>> + return (TREE_CODE (decl) == FUNCTION_DECL
>>> + && DECL_FUNCTION_VERSIONED (decl)
>>> + && (lookup_attribute ("targetv", DECL_ATTRIBUTES (decl))
>>> + == NULL_TREE));
>>> +}
>>> +
>>> +/* For function decl DECL, find the version_function struct in the
>>> + decl_version_htab. */
>>> +
>>> +static version_function *
>>> +find_function_version (const tree decl)
>>> +{
>>> + void *slot;
>>> +
>>> + if (!DECL_FUNCTION_VERSIONED (decl))
>>> + return NULL;
>>> +
>>> + if (!decl_version_htab)
>>> + return NULL;
>>> +
>>> + slot = htab_find_with_hash (decl_version_htab, decl,
>>> + htab_hash_pointer (decl));
>>> +
>>> + if (slot != NULL)
>>> + return (version_function *)slot;
>>> +
>>> + return NULL;
>>> +}
>>> +
>>> +/* Record DECL as a function version by creating a version_function struct
>>> + for it and storing it in the hashtable. */
>>> +
>>> +static version_function *
>>> +add_function_version (const tree decl)
>>> +{
>>> + void **slot;
>>> + version_function *v;
>>> +
>>> + if (!DECL_FUNCTION_VERSIONED (decl))
>>> + return NULL;
>>> +
>>> + create_decl_version_htab ();
>>> +
>>> + slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl,
>>> + htab_hash_pointer ((const void_p)decl),
>>> + INSERT);
>>> +
>>> + if (*slot != NULL)
>>> + return (version_function *)*slot;
>>> +
>>> + v = new_version_function (decl);
>>> + *slot = v;
>>> +
>>> + return v;
>>> +}
>>> +
>>> +/* Push V into VEC only if it is not already present. */
>>> +
>>> +static void
>>> +push_function_version (version_function *v, VEC (void_p, heap) *vec)
>>> +{
>>> + int ix;
>>> + void_p ele;
>>> + for (ix = 0; VEC_iterate (void_p, vec, ix, ele); ++ix)
>>> + {
>>> + if (ele == (void_p)v)
>>> + return;
>>> + }
>>> +
>>> + VEC_safe_push (void_p, heap, vec, (void*)v);
>>> +}
>>> +
>>> +/* Mark DECL as deleted. This is called by the front-end when a duplicate
>>> + decl is merged with the original decl and the duplicate decl is deleted.
>>> + This function marks the duplicate_decl as invalid. Called by
>>> + duplicate_decls in cp/decl.c. */
>>> +
>>> +void
>>> +mark_delete_decl_version (const tree decl)
>>> +{
>>> + version_function *decl_v;
>>> +
>>> + decl_v = find_function_version (decl);
>>> +
>>> + if (decl_v == NULL)
>>> + return;
>>> +
>>> + decl_v->is_deleted = true;
>>> +
>>> + if (is_default_function (decl)
>>> + && decl_v->versions != NULL)
>>> + {
>>> + VEC_truncate (void_p, decl_v->versions, 0);
>>> + VEC_free (void_p, heap, decl_v->versions);
>>> + }
>>> +}
>>> +
>>> +/* Mark DECL1 and DECL2 to be function versions in the same group. One
>>> + of DECL1 and DECL2 must be the default, otherwise this function does
>>> + nothing. This function aggregates the versions. */
>>> +
>>> +int
>>> +group_function_versions (const tree decl1, const tree decl2)
>>> +{
>>> + tree default_decl, version_decl;
>>> + version_function *default_v, *version_v;
>>> +
>>> + gcc_assert (DECL_FUNCTION_VERSIONED (decl1)
>>> + && DECL_FUNCTION_VERSIONED (decl2));
>>> +
>>> + /* The version decls are added only to the default decl. */
>>> + if (!is_default_function (decl1)
>>> + && !is_default_function (decl2))
>>> + return 0;
>>> +
>>> + /* This can happen with duplicate declarations. Just ignore. */
>>> + if (is_default_function (decl1)
>>> + && is_default_function (decl2))
>>> + return 0;
>>> +
>>> + default_decl = (is_default_function (decl1)) ? decl1 : decl2;
>>> + version_decl = (default_decl == decl1) ? decl2 : decl1;
>>> +
>>> + gcc_assert (default_decl != version_decl);
>>> + create_decl_version_htab ();
>>> +
>>> + /* If the version function is found, it has been added. */
>>> + if (find_function_version (version_decl))
>>> + return 0;
>>> +
>>> + default_v = add_function_version (default_decl);
>>> + version_v = add_function_version (version_decl);
>>> +
>>> + if (default_v->versions == NULL)
>>> + default_v->versions = VEC_alloc (void_p, heap, 1);
>>> +
>>> + push_function_version (version_v, default_v->versions);
>>> + return 0;
>>> +}
>>> +
>>> +/* Makes a function attribute of the form NAME(ARG_NAME) and chains
>>> + it to CHAIN. */
>>> +
>>> +static tree
>>> +make_attribute (const char *name, const char *arg_name, tree chain)
>>> +{
>>> + tree attr_name;
>>> + tree attr_arg_name;
>>> + tree attr_args;
>>> + tree attr;
>>> +
>>> + attr_name = get_identifier (name);
>>> + attr_arg_name = build_string (strlen (arg_name), arg_name);
>>> + attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
>>> + attr = tree_cons (attr_name, attr_args, chain);
>>> + return attr;
>>> +}
>>> +
>>> +/* Return a new name by appending SUFFIX to the DECL name. If
>>> + make_unique is true, append the full path name. */
>>> +
>>> +static char *
>>> +make_name (tree decl, const char *suffix, bool make_unique)
>>> +{
>>> + char *global_var_name;
>>> + int name_len;
>>> + const char *name;
>>> + const char *unique_name = NULL;
>>> +
>>> + name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
>>> +
>>> + /* Get a unique name that can be used globally without any chances
>>> + of collision at link time. */
>>> + if (make_unique)
>>> + unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
>>> +
>>> + name_len = strlen (name) + strlen (suffix) + 2;
>>> +
>>> + if (make_unique)
>>> + name_len += strlen (unique_name) + 1;
>>> + global_var_name = (char *) xmalloc (name_len);
>>> +
>>> + /* Use '.' to concatenate names as it is demangler friendly. */
>>> + if (make_unique)
>>> + snprintf (global_var_name, name_len, "%s.%s.%s", name,
>>> + unique_name, suffix);
>>> + else
>>> + snprintf (global_var_name, name_len, "%s.%s", name, suffix);
>>> +
>>> + return global_var_name;
>>> +}
>>> +
>>> +/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch
>>> + the versions of multi-versioned function DEFAULT_DECL. Create and
>>> + empty basic block in the resolver and store the pointer in
>>> + EMPTY_BB. Return the decl of the resolver function. */
>>> +
>>> +static tree
>>> +make_ifunc_resolver_func (const tree default_decl,
>>> + const tree ifunc_decl,
>>> + basic_block *empty_bb)
>>> +{
>>> + char *resolver_name;
>>> + tree decl, type, decl_name, t;
>>> + basic_block new_bb;
>>> + tree old_current_function_decl;
>>> + bool make_unique = false;
>>> +
>>> + /* IFUNC's have to be globally visible. So, if the default_decl is
>>> + not, then the name of the IFUNC should be made unique. */
>>> + if (TREE_PUBLIC (default_decl) == 0)
>>> + make_unique = true;
>>> +
>>> + /* Append the filename to the resolver function if the versions are
>>> + not externally visible. This is because the resolver function has
>>> + to be externally visible for the loader to find it. So, appending
>>> + the filename will prevent conflicts with a resolver function from
>>> + another module which is based on the same version name. */
>>> + resolver_name = make_name (default_decl, "resolver", make_unique);
>>> +
>>> + /* The resolver function should return a (void *). */
>>> + type = build_function_type_list (ptr_type_node, NULL_TREE);
>>> +
>>> + decl = build_fn_decl (resolver_name, type);
>>> + decl_name = get_identifier (resolver_name);
>>> + SET_DECL_ASSEMBLER_NAME (decl, decl_name);
>>> +
>>> + DECL_NAME (decl) = decl_name;
>>> + TREE_USED (decl) = TREE_USED (default_decl);
>>> + DECL_ARTIFICIAL (decl) = 1;
>>> + DECL_IGNORED_P (decl) = 0;
>>> + /* IFUNC resolvers have to be externally visible. */
>>> + TREE_PUBLIC (decl) = 1;
>>> + DECL_UNINLINABLE (decl) = 1;
>>> +
>>> + DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl);
>>> + DECL_EXTERNAL (ifunc_decl) = 0;
>>> +
>>> + DECL_CONTEXT (decl) = NULL_TREE;
>>> + DECL_INITIAL (decl) = make_node (BLOCK);
>>> + DECL_STATIC_CONSTRUCTOR (decl) = 0;
>>> + TREE_READONLY (decl) = 0;
>>> + DECL_PURE_P (decl) = 0;
>>> + DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
>>> + if (DECL_COMDAT_GROUP (default_decl))
>>> + {
>>> + make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
>>> + }
>>> + /* Build result decl and add to function_decl. */
>>> + t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
>>> + DECL_ARTIFICIAL (t) = 1;
>>> + DECL_IGNORED_P (t) = 1;
>>> + DECL_RESULT (decl) = t;
>>> +
>>> + gimplify_function_tree (decl);
>>> + old_current_function_decl = current_function_decl;
>>> + push_cfun (DECL_STRUCT_FUNCTION (decl));
>>> + current_function_decl = decl;
>>> + init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
>>> + cfun->curr_properties |=
>>> + (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
>>> + PROP_ssa);
>>> + new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
>>> + make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
>>> + make_edge (new_bb, EXIT_BLOCK_PTR, 0);
>>> + *empty_bb = new_bb;
>>> +
>>> + cgraph_add_new_function (decl, true);
>>> + cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
>>> + cgraph_analyze_function (cgraph_get_create_node (decl));
>>> + cgraph_mark_needed_node (cgraph_get_create_node (decl));
>>> +
>>> + if (DECL_COMDAT_GROUP (default_decl))
>>> + {
>>> + gcc_assert (cgraph_get_node (default_decl));
>>> + cgraph_add_to_same_comdat_group (cgraph_get_node (decl),
>>> + cgraph_get_node (default_decl));
>>> + }
>>> +
>>> + pop_cfun ();
>>> + current_function_decl = old_current_function_decl;
>>> +
>>> + gcc_assert (ifunc_decl != NULL);
>>> + DECL_ATTRIBUTES (ifunc_decl)
>>> + = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl));
>>> + assemble_alias (ifunc_decl, get_identifier (resolver_name));
>>> + return decl;
>>> +}
>>> +
>>> +/* Make and ifunc declaration for the multi-versioned function DECL. Calls to
>>> + DECL function will be replaced with calls to the ifunc. Return the decl
>>> + of the ifunc created. */
>>> +
>>> +static tree
>>> +make_ifunc_func (const tree decl)
>>> +{
>>> + tree ifunc_decl;
>>> + char *ifunc_name, *resolver_name;
>>> + tree fn_type, ifunc_type;
>>> + bool make_unique = false;
>>> +
>>> + if (TREE_PUBLIC (decl) == 0)
>>> + make_unique = true;
>>> +
>>> + ifunc_name = make_name (decl, "ifunc", make_unique);
>>> + resolver_name = make_name (decl, "resolver", make_unique);
>>> + gcc_assert (resolver_name);
>>> +
>>> + fn_type = TREE_TYPE (decl);
>>> + ifunc_type = build_function_type (TREE_TYPE (fn_type),
>>> + TYPE_ARG_TYPES (fn_type));
>>> +
>>> + ifunc_decl = build_fn_decl (ifunc_name, ifunc_type);
>>> + TREE_USED (ifunc_decl) = 1;
>>> + DECL_CONTEXT (ifunc_decl) = NULL_TREE;
>>> + DECL_INITIAL (ifunc_decl) = error_mark_node;
>>> + DECL_ARTIFICIAL (ifunc_decl) = 1;
>>> + /* Mark this ifunc as external, the resolver will flip it again if
>>> + it gets generated. */
>>> + DECL_EXTERNAL (ifunc_decl) = 1;
>>> + /* IFUNCs have to be externally visible. */
>>> + TREE_PUBLIC (ifunc_decl) = 1;
>>> +
>>> + return ifunc_decl;
>>> +}
>>> +
>>> +/* For multi-versioned function decl, which should also be the default,
>>> + return the decl of the ifunc resolver, create it if it does not
>>> + exist. */
>>> +
>>> +tree
>>> +get_ifunc_for_version (const tree decl)
>>> +{
>>> + version_function *decl_v;
>>> + int ix;
>>> + void_p ele;
>>> +
>>> + /* DECL has to be the default version, otherwise it is missing and
>>> + that is not allowed. */
>>> + if (!is_default_function (decl))
>>> + {
>>> + error_at (DECL_SOURCE_LOCATION (decl), "Default version not found");
>>> + return decl;
>>> + }
>>> +
>>> + decl_v = find_function_version (decl);
>>> + gcc_assert (decl_v != NULL);
>>> + if (decl_v->ifunc_decl == NULL)
>>> + {
>>> + tree ifunc_decl;
>>> + ifunc_decl = make_ifunc_func (decl);
>>> + decl_v->ifunc_decl = ifunc_decl;
>>> + }
>>> +
>>> + if (cgraph_get_node (decl))
>>> + cgraph_mark_needed_node (cgraph_get_node (decl));
>>> +
>>> + for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
>>> + {
>>> + version_function *v = (version_function *) ele;
>>> + gcc_assert (v->decl != NULL);
>>> + if (cgraph_get_node (v->decl))
>>> + cgraph_mark_needed_node (cgraph_get_node (v->decl));
>>> + }
>>> +
>>> + return decl_v->ifunc_decl;
>>> +}
>>> +
>>> +/* Generate the dispatching code to dispatch multi-versioned function
>>> + DECL. Make a new function decl for dispatching and call the target
>>> + hook to process the "targetv" attributes and provide the code to
>>> + dispatch the right function at run-time. */
>>> +
>>> +static tree
>>> +make_ifunc_resolver_for_version (const tree decl)
>>> +{
>>> + version_function *decl_v;
>>> + tree ifunc_resolver_decl, ifunc_decl;
>>> + basic_block empty_bb;
>>> + int ix;
>>> + void_p ele;
>>> + VEC (tree, heap) *fn_ver_vec = NULL;
>>> +
>>> + gcc_assert (is_default_function (decl));
>>> +
>>> + decl_v = find_function_version (decl);
>>> + gcc_assert (decl_v != NULL);
>>> +
>>> + if (decl_v->ifunc_resolver_decl != NULL)
>>> + return decl_v->ifunc_resolver_decl;
>>> +
>>> + ifunc_decl = decl_v->ifunc_decl;
>>> +
>>> + if (ifunc_decl == NULL)
>>> + ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl);
>>> +
>>> + ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl,
>>> + &empty_bb);
>>> +
>>> + fn_ver_vec = VEC_alloc (tree, heap, 2);
>>> + VEC_safe_push (tree, heap, fn_ver_vec, decl);
>>> +
>>> + for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
>>> + {
>>> + version_function *v = (version_function *) ele;
>>> + gcc_assert (v->decl != NULL);
>>> + /* Check for virtual functions here again, as by this time it should
>>> + have been determined if this function needs a vtable index or
>>> + not. This happens for methods in derived classes that override
>>> + virtual methods in base classes but are not explicitly marked as
>>> + virtual. */
>>> + if (DECL_VINDEX (v->decl))
>>> + error_at (DECL_SOURCE_LOCATION (v->decl),
>>> + "Virtual function versioning not supported\n");
>>> + if (!v->is_deleted)
>>> + VEC_safe_push (tree, heap, fn_ver_vec, v->decl);
>>> + }
>>> +
>>> + gcc_assert (targetm.dispatch_version);
>>> + targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb);
>>> + decl_v->ifunc_resolver_decl = ifunc_resolver_decl;
>>> +
>>> + return ifunc_resolver_decl;
>>> +}
>>> +
>>> +/* Main entry point to pass_dispatch_versions. For multi-versioned functions,
>>> + generate the dispatching code. */
>>> +
>>> +static unsigned int
>>> +do_dispatch_versions (void)
>>> +{
>>> + /* A new pass for generating dispatch code for multi-versioned functions.
>>> + Other forms of dispatch can be added when ifunc support is not available
>>> + like just calling the function directly after checking for target type.
>>> + Currently, dispatching is done through IFUNC. This pass will become
>>> + more meaningful when other dispatch mechanisms are added. */
>>> +
>>> + /* Cloning a function to produce more versions will happen here when the
>>> + user requests that via the targetv attribute. For example,
>>> + int foo () __attribute__ ((targetv(("arch=core2"), ("arch=corei7"))));
>>> + means that the user wants the same body of foo to be versioned for core2
>>> + and corei7. In that case, this function will be cloned during this
>>> + pass. */
>>> +
>>> + if (DECL_FUNCTION_VERSIONED (current_function_decl)
>>> + && is_default_function (current_function_decl))
>>> + {
>>> + tree decl = make_ifunc_resolver_for_version (current_function_decl);
>>> + if (dump_file && decl)
>>> + dump_function_to_file (decl, dump_file, TDF_BLOCKS);
>>> + }
>>> + return 0;
>>> +}
>>> +
>>> +static bool
>>> +gate_dispatch_versions (void)
>>> +{
>>> + return true;
>>> +}
>>> +
>>> +/* A pass to generate the dispatch code to execute the appropriate version
>>> + of a multi-versioned function at run-time. */
>>> +
>>> +struct gimple_opt_pass pass_dispatch_versions =
>>> +{
>>> + {
>>> + GIMPLE_PASS,
>>> + "dispatch_multiversion_functions", /* name */
>>> + gate_dispatch_versions, /* gate */
>>> + do_dispatch_versions, /* execute */
>>> + NULL, /* sub */
>>> + NULL, /* next */
>>> + 0, /* static_pass_number */
>>> + TV_MULTIVERSION_DISPATCH, /* tv_id */
>>> + PROP_cfg, /* properties_required */
>>> + PROP_cfg, /* properties_provided */
>>> + 0, /* properties_destroyed */
>>> + 0, /* todo_flags_start */
>>> + TODO_dump_func | /* todo_flags_finish */
>>> + TODO_cleanup_cfg | TODO_dump_cgraph
>>> + }
>>> +};
>>> Index: cgraphunit.c
>>> ===================================================================
>>> --- cgraphunit.c (revision 184971)
>>> +++ cgraphunit.c (working copy)
>>> @@ -141,6 +141,7 @@ along with GCC; see the file COPYING3. If not see
>>> #include "ipa-inline.h"
>>> #include "ipa-utils.h"
>>> #include "lto-streamer.h"
>>> +#include "multiversion.h"
>>>
>>> static void cgraph_expand_all_functions (void);
>>> static void cgraph_mark_functions_to_output (void);
>>> @@ -343,6 +344,13 @@ cgraph_finalize_function (tree decl, bool nested)
>>> node->local.redefined_extern_inline = true;
>>> }
>>>
>>> + /* If this is a function version and not the default, change the
>>> + assembler name of this function. The DECL names of function
>>> + versions are the same, only the assembler names are made unique.
>>> + The assembler name is changed by appending the string from
>>> + the "targetv" attribute. */
>>> + version_assembler_name (decl);
>>> +
>>> notice_global_symbol (decl);
>>> node->local.finalized = true;
>>> node->lowered = DECL_STRUCT_FUNCTION (decl)->cfg != NULL;
>>> Index: multiversion.h
>>> ===================================================================
>>> --- multiversion.h (revision 0)
>>> +++ multiversion.h (revision 0)
>>> @@ -0,0 +1,52 @@
>>> +/* Function Multiversioning.
>>> + Copyright (C) 2012 Free Software Foundation, Inc.
>>> + Contributed by Sriraman Tallam (tmsriram@google.com)
>>> +
>>> +This file is part of GCC.
>>> +
>>> +GCC is free software; you can redistribute it and/or modify it under
>>> +the terms of the GNU General Public License as published by the Free
>>> +Software Foundation; either version 3, or (at your option) any later
>>> +version.
>>> +
>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
>>> +for more details.
>>> +
>>> +You should have received a copy of the GNU General Public License
>>> +along with GCC; see the file COPYING3. If not see
>>> +<http://www.gnu.org/licenses/>. */
>>> +
>>> +/* This is the header file which provides the functions to keep track
>>> + of functions that are multi-versioned and to generate the dispatch
>>> + code to call the right version at run-time. */
>>> +
>>> +#ifndef GCC_MULTIVERSION_H
>>> +#define GCC_MULTIVERION_H
>>> +
>>> +#include "tree.h"
>>> +
>>> +/* Mark DECL1 and DECL2 as function versions. */
>>> +int group_function_versions (const tree decl1, const tree decl2);
>>> +
>>> +/* Mark DECL as deleted and no longer a version. */
>>> +void mark_delete_decl_version (const tree decl);
>>> +
>>> +/* Returns true if DECL is the default version to be executed if all
>>> + other versions are inappropriate at run-time. */
>>> +bool is_default_function (const tree decl);
>>> +
>>> +/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL
>>> + must be the default function in the multi-versioned group. */
>>> +tree get_ifunc_for_version (const tree decl);
>>> +
>>> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv"
>>> + or if the "targetv" attribute strings of DECL1 and DECL2 dont match. */
>>> +bool has_different_version_attributes (const tree decl1, const tree decl2);
>>> +
>>> +/* If DECL is a function version and not the default version, the assembler
>>> + name of DECL is changed to include the attribute string to keep the
>>> + name unambiguous. */
>>> +void version_assembler_name (const tree decl);
>>> +#endif
>>> Index: cp/class.c
>>> ===================================================================
>>> --- cp/class.c (revision 184971)
>>> +++ cp/class.c (working copy)
>>> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3. If not see
>>> #include "tree-dump.h"
>>> #include "splay-tree.h"
>>> #include "pointer-set.h"
>>> +#include "multiversion.h"
>>>
>>> /* The number of nested classes being processed. If we are not in the
>>> scope of any class, this is zero. */
>>> @@ -1092,7 +1093,20 @@ add_method (tree type, tree method, tree using_dec
>>> || same_type_p (TREE_TYPE (fn_type),
>>> TREE_TYPE (method_type))))
>>> {
>>> - if (using_decl)
>>> + /* For function versions, their parms and types match
>>> + but they are not duplicates. Record function versions
>>> + as and when they are found. */
>>> + if (TREE_CODE (fn) == FUNCTION_DECL
>>> + && TREE_CODE (method) == FUNCTION_DECL
>>> + && (DECL_FUNCTION_VERSIONED (fn)
>>> + || DECL_FUNCTION_VERSIONED (method)))
>>> + {
>>> + DECL_FUNCTION_VERSIONED (fn) = 1;
>>> + DECL_FUNCTION_VERSIONED (method) = 1;
>>> + group_function_versions (fn, method);
>>> + continue;
>>> + }
>>> + else if (using_decl)
>>> {
>>> if (DECL_CONTEXT (fn) == type)
>>> /* Defer to the local function. */
>>> @@ -1150,6 +1164,13 @@ add_method (tree type, tree method, tree using_dec
>>> else
>>> /* Replace the current slot. */
>>> VEC_replace (tree, method_vec, slot, overload);
>>> +
>>> + /* Change the assembler name of method here if it has "targetv"
>>> + attributes. Since all versions have the same mangled name,
>>> + their assembler name is changed by appending the string from
>>> + the "targetv" attribute. */
>>> + version_assembler_name (method);
>>> +
>>> return true;
>>> }
>>>
>>> @@ -6890,8 +6911,11 @@ resolve_address_of_overloaded_function (tree targe
>>> if (DECL_ANTICIPATED (fn))
>>> continue;
>>>
>>> - /* See if there's a match. */
>>> - if (same_type_p (target_fn_type, static_fn_type (fn)))
>>> + /* See if there's a match. For functions that are multi-versioned
>>> + match it to the default function. */
>>> + if (same_type_p (target_fn_type, static_fn_type (fn))
>>> + && (!DECL_FUNCTION_VERSIONED (fn)
>>> + || is_default_function (fn)))
>>> matches = tree_cons (fn, NULL_TREE, matches);
>>> }
>>> }
>>> @@ -7053,6 +7077,21 @@ resolve_address_of_overloaded_function (tree targe
>>> perform_or_defer_access_check (access_path, fn, fn);
>>> }
>>>
>>> + /* If a pointer to a function that is multi-versioned is requested, the
>>> + pointer to the dispatcher function is returned instead. This works
>>> + well because indirectly calling the function will dispatch the right
>>> + function version at run-time. Also, the function address is kept
>>> + unique. */
>>> + if (DECL_FUNCTION_VERSIONED (fn)
>>> + && is_default_function (fn))
>>> + {
>>> + tree ifunc_decl;
>>> + ifunc_decl = get_ifunc_for_version (fn);
>>> + gcc_assert (ifunc_decl != NULL);
>>> + mark_used (fn);
>>> + return build_fold_addr_expr (ifunc_decl);
>>> + }
>>> +
>>> if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
>>> return cp_build_addr_expr (fn, flags);
>>> else
>>> Index: cp/decl.c
>>> ===================================================================
>>> --- cp/decl.c (revision 184971)
>>> +++ cp/decl.c (working copy)
>>> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3. If not see
>>> #include "pointer-set.h"
>>> #include "splay-tree.h"
>>> #include "plugin.h"
>>> +#include "multiversion.h"
>>>
>>> /* Possible cases of bad specifiers type used by bad_specifiers. */
>>> enum bad_spec_place {
>>> @@ -972,6 +973,23 @@ decls_match (tree newdecl, tree olddecl)
>>> if (t1 != t2)
>>> return 0;
>>>
>>> + /* The decls dont match if they correspond to two different versions
>>> + of the same function. */
>>> + if (compparms (p1, p2)
>>> + && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2))
>>> + && (DECL_FUNCTION_VERSIONED (newdecl)
>>> + || DECL_FUNCTION_VERSIONED (olddecl))
>>> + && has_different_version_attributes (newdecl, olddecl))
>>> + {
>>> + /* One of the decls could be the default without the "targetv"
>>> + attribute. Set it to be a versioned function here. */
>>> + DECL_FUNCTION_VERSIONED (newdecl) = 1;
>>> + DECL_FUNCTION_VERSIONED (olddecl) = 1;
>>> + /* Accumulate all the versions of a function. */
>>> + group_function_versions (olddecl, newdecl);
>>> + return 0;
>>> + }
>>> +
>>> if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
>>> && ! (DECL_EXTERN_C_P (newdecl)
>>> && DECL_EXTERN_C_P (olddecl)))
>>> @@ -1482,7 +1500,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
>>> error ("previous declaration %q+#D here", olddecl);
>>> return NULL_TREE;
>>> }
>>> - else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
>>> + /* For function versions, params and types match, but they
>>> + are not ambiguous. */
>>> + else if ((!DECL_FUNCTION_VERSIONED (newdecl)
>>> + && !DECL_FUNCTION_VERSIONED (olddecl))
>>> + && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
>>> TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
>>> {
>>> error ("new declaration %q#D", newdecl);
>>> @@ -2250,6 +2272,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool
>>> else if (DECL_PRESERVE_P (newdecl))
>>> DECL_PRESERVE_P (olddecl) = 1;
>>>
>>> + /* If the olddecl is a version, so is the newdecl. */
>>> + if (TREE_CODE (newdecl) == FUNCTION_DECL
>>> + && DECL_FUNCTION_VERSIONED (olddecl))
>>> + {
>>> + DECL_FUNCTION_VERSIONED (newdecl) = 1;
>>> + /* Record that newdecl is not a valid version and has
>>> + been deleted. */
>>> + mark_delete_decl_version (newdecl);
>>> + }
>>> +
>>> if (TREE_CODE (newdecl) == FUNCTION_DECL)
>>> {
>>> int function_size;
>>> @@ -4512,6 +4544,10 @@ start_decl (const cp_declarator *declarator,
>>> /* Enter this declaration into the symbol table. */
>>> decl = maybe_push_decl (decl);
>>>
>>> + /* If this decl is a function version and not the default, its assembler
>>> + name has to be changed. */
>>> + version_assembler_name (decl);
>>> +
>>> if (processing_template_decl)
>>> decl = push_template_decl (decl);
>>> if (decl == error_mark_node)
>>> @@ -13019,6 +13055,10 @@ start_function (cp_decl_specifier_seq *declspecs,
>>> gcc_assert (same_type_p (TREE_TYPE (TREE_TYPE (decl1)),
>>> integer_type_node));
>>>
>>> + /* If this decl is a function version and not the default, its assembler
>>> + name has to be changed. */
>>> + version_assembler_name (decl1);
>>> +
>>> start_preparsed_function (decl1, attrs, /*flags=*/SF_DEFAULT);
>>>
>>> return 1;
>>> @@ -13960,6 +14000,11 @@ cxx_comdat_group (tree decl)
>>> break;
>>> }
>>> name = DECL_ASSEMBLER_NAME (decl);
>>> + if (TREE_CODE (decl) == FUNCTION_DECL
>>> + && DECL_FUNCTION_VERSIONED (decl))
>>> + name = DECL_NAME (decl);
>>> + else
>>> + name = DECL_ASSEMBLER_NAME (decl);
>>> }
>>>
>>> return name;
>>> Index: cp/semantics.c
>>> ===================================================================
>>> --- cp/semantics.c (revision 184971)
>>> +++ cp/semantics.c (working copy)
>>> @@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn)
>>> /* If the user wants us to keep all inline functions, then mark
>>> this function as needed so that finish_file will make sure to
>>> output it later. Similarly, all dllexport'd functions must
>>> - be emitted; there may be callers in other DLLs. */
>>> - if ((flag_keep_inline_functions
>>> + be emitted; there may be callers in other DLLs.
>>> + Also, mark this function as needed if it is marked inline but
>>> + is a multi-versioned function. */
>>> + if (((flag_keep_inline_functions
>>> + || DECL_FUNCTION_VERSIONED (fn))
>>> && DECL_DECLARED_INLINE_P (fn)
>>> && !DECL_REALLY_EXTERN (fn))
>>> || (flag_keep_inline_dllexport
>>> Index: cp/decl2.c
>>> ===================================================================
>>> --- cp/decl2.c (revision 184971)
>>> +++ cp/decl2.c (working copy)
>>> @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3. If not see
>>> #include "splay-tree.h"
>>> #include "langhooks.h"
>>> #include "c-family/c-ada-spec.h"
>>> +#include "multiversion.h"
>>>
>>> extern cpp_reader *parse_in;
>>>
>>> @@ -674,9 +675,13 @@ check_classfn (tree ctype, tree function, tree tem
>>> if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
>>> continue;
>>>
>>> + /* While finding a match, same types and params are not enough
>>> + if the function is versioned. Also check version ("targetv")
>>> + attributes. */
>>> if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
>>> TREE_TYPE (TREE_TYPE (fndecl)))
>>> && compparms (p1, p2)
>>> + && !has_different_version_attributes (function, fndecl)
>>> && (!is_template
>>> || comp_template_parms (template_parms,
>>> DECL_TEMPLATE_PARMS (fndecl)))
>>> Index: cp/call.c
>>> ===================================================================
>>> --- cp/call.c (revision 184971)
>>> +++ cp/call.c (working copy)
>>> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3. If not see
>>> #include "langhooks.h"
>>> #include "c-family/c-objc.h"
>>> #include "timevar.h"
>>> +#include "multiversion.h"
>>>
>>> /* The various kinds of conversion. */
>>>
>>> @@ -6730,6 +6731,17 @@ build_over_call (struct z_candidate *cand, int fla
>>> if (!already_used)
>>> mark_used (fn);
>>>
>>> + /* For a call to a multi-versioned function, the call should actually be to
>>> + the dispatcher. */
>>> + if (DECL_FUNCTION_VERSIONED (fn))
>>> + {
>>> + tree ifunc_decl;
>>> + ifunc_decl = get_ifunc_for_version (fn);
>>> + gcc_assert (ifunc_decl != NULL);
>>> + return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl,
>>> + nargs, argarray);
>>> + }
>>> +
>>> if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
>>> {
>>> tree t;
>>> @@ -7980,6 +7992,30 @@ joust (struct z_candidate *cand1, struct z_candida
>>> size_t i;
>>> size_t len;
>>>
>>> + /* For Candidates of a multi-versioned function, the one marked default
>>> + wins. This is because the default decl is used as key to aggregate
>>> + all the other versions provided for it in multiversion.c. When
>>> + generating the actual call, the appropriate dispatcher is created
>>> + to call the right function version at run-time. */
>>> +
>>> + if ((TREE_CODE (cand1->fn) == FUNCTION_DECL
>>> + && DECL_FUNCTION_VERSIONED (cand1->fn))
>>> + ||(TREE_CODE (cand2->fn) == FUNCTION_DECL
>>> + && DECL_FUNCTION_VERSIONED (cand2->fn)))
>>> + {
>>> + if (is_default_function (cand1->fn))
>>> + {
>>> + mark_used (cand2->fn);
>>> + return 1;
>>> + }
>>> + if (is_default_function (cand2->fn))
>>> + {
>>> + mark_used (cand1->fn);
>>> + return -1;
>>> + }
>>> + return 0;
>>> + }
>>> +
>>> /* Candidates that involve bad conversions are always worse than those
>>> that don't. */
>>> if (cand1->viable > cand2->viable)
>>> Index: timevar.def
>>> ===================================================================
>>> --- timevar.def (revision 184971)
>>> +++ timevar.def (working copy)
>>> @@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE , "tree if-co
>>> DEFTIMEVAR (TV_TREE_UNINIT , "uninit var analysis")
>>> DEFTIMEVAR (TV_PLUGIN_INIT , "plugin initialization")
>>> DEFTIMEVAR (TV_PLUGIN_RUN , "plugin execution")
>>> +DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch")
>>>
>>> /* Everything else in rest_of_compilation not included above. */
>>> DEFTIMEVAR (TV_EARLY_LOCAL , "early local passes")
>>> Index: varasm.c
>>> ===================================================================
>>> --- varasm.c (revision 184971)
>>> +++ varasm.c (working copy)
>>> @@ -5755,6 +5755,8 @@ finish_aliases_1 (void)
>>> }
>>> else if (! (p->emitted_diags & ALIAS_DIAG_TO_EXTERN)
>>> && DECL_EXTERNAL (target_decl)
>>> + && (!TREE_CODE (target_decl) == FUNCTION_DECL
>>> + || !DECL_STRUCT_FUNCTION (target_decl))
>>> /* We use local aliases for C++ thunks to force the tailcall
>>> to bind locally. This is a hack - to keep it working do
>>> the following (which is not strictly correct). */
>>> Index: Makefile.in
>>> ===================================================================
>>> --- Makefile.in (revision 184971)
>>> +++ Makefile.in (working copy)
>>> @@ -1298,6 +1298,7 @@ OBJS = \
>>> mcf.o \
>>> mode-switching.o \
>>> modulo-sched.o \
>>> + multiversion.o \
>>> omega.o \
>>> omp-low.o \
>>> optabs.o \
>>> @@ -3030,6 +3031,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h
>>> $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>> $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
>>> $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
>>> +multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
>>> + $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
>>> + $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
>>> + $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
>>> + $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h
>>> cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>> $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>> $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \
>>> Index: passes.c
>>> ===================================================================
>>> --- passes.c (revision 184971)
>>> +++ passes.c (working copy)
>>> @@ -1190,6 +1190,7 @@ init_optimization_passes (void)
>>> NEXT_PASS (pass_build_cfg);
>>> NEXT_PASS (pass_warn_function_return);
>>> NEXT_PASS (pass_build_cgraph_edges);
>>> + NEXT_PASS (pass_dispatch_versions);
>>> *p = NULL;
>>>
>>> /* Interprocedural optimization passes. */
>>> Index: config/i386/i386.c
>>> ===================================================================
>>> --- config/i386/i386.c (revision 184971)
>>> +++ config/i386/i386.c (working copy)
>>> @@ -27446,6 +27473,593 @@ ix86_init_mmx_sse_builtins (void)
>>> }
>>> }
>>>
>>> +/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
>>> + to return a pointer to VERSION_DECL if the outcome of the function
>>> + PREDICATE_DECL is true. This function will be called during version
>>> + dispatch to decide which function version to execute. It returns the
>>> + basic block at the end to which more conditions can be added. */
>>> +
>>> +static basic_block
>>> +add_condition_to_bb (tree function_decl, tree version_decl,
>>> + basic_block new_bb, tree predicate_decl)
>>> +{
>>> + gimple return_stmt;
>>> + tree convert_expr, result_var;
>>> + gimple convert_stmt;
>>> + gimple call_cond_stmt;
>>> + gimple if_else_stmt;
>>> +
>>> + basic_block bb1, bb2, bb3;
>>> + edge e12, e23;
>>> +
>>> + tree cond_var;
>>> + gimple_seq gseq;
>>> +
>>> + tree old_current_function_decl;
>>> +
>>> + old_current_function_decl = current_function_decl;
>>> + push_cfun (DECL_STRUCT_FUNCTION (function_decl));
>>> + current_function_decl = function_decl;
>>> +
>>> + gcc_assert (new_bb != NULL);
>>> + gseq = bb_seq (new_bb);
>>> +
>>> +
>>> + convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
>>> + build_fold_addr_expr (version_decl));
>>> + result_var = create_tmp_var (ptr_type_node, NULL);
>>> + convert_stmt = gimple_build_assign (result_var, convert_expr);
>>> + return_stmt = gimple_build_return (result_var);
>>> +
>>> + if (predicate_decl == NULL_TREE)
>>> + {
>>> + gimple_seq_add_stmt (&gseq, convert_stmt);
>>> + gimple_seq_add_stmt (&gseq, return_stmt);
>>> + set_bb_seq (new_bb, gseq);
>>> + gimple_set_bb (convert_stmt, new_bb);
>>> + gimple_set_bb (return_stmt, new_bb);
>>> + pop_cfun ();
>>> + current_function_decl = old_current_function_decl;
>>> + return new_bb;
>>> + }
>>> +
>>> + cond_var = create_tmp_var (integer_type_node, NULL);
>>> + call_cond_stmt = gimple_build_call (predicate_decl, 0);
>>> + gimple_call_set_lhs (call_cond_stmt, cond_var);
>>> +
>>> + gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
>>> + gimple_set_bb (call_cond_stmt, new_bb);
>>> + gimple_seq_add_stmt (&gseq, call_cond_stmt);
>>> +
>>> + if_else_stmt = gimple_build_cond (GT_EXPR, cond_var,
>>> + integer_zero_node,
>>> + NULL_TREE, NULL_TREE);
>>> + gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
>>> + gimple_set_bb (if_else_stmt, new_bb);
>>> + gimple_seq_add_stmt (&gseq, if_else_stmt);
>>> +
>>> + gimple_seq_add_stmt (&gseq, convert_stmt);
>>> + gimple_seq_add_stmt (&gseq, return_stmt);
>>> + set_bb_seq (new_bb, gseq);
>>> +
>>> + bb1 = new_bb;
>>> + e12 = split_block (bb1, if_else_stmt);
>>> + bb2 = e12->dest;
>>> + e12->flags &= ~EDGE_FALLTHRU;
>>> + e12->flags |= EDGE_TRUE_VALUE;
>>> +
>>> + e23 = split_block (bb2, return_stmt);
>>> +
>>> + gimple_set_bb (convert_stmt, bb2);
>>> + gimple_set_bb (return_stmt, bb2);
>>> +
>>> + bb3 = e23->dest;
>>> + make_edge (bb1, bb3, EDGE_FALSE_VALUE);
>>> +
>>> + remove_edge (e23);
>>> + make_edge (bb2, EXIT_BLOCK_PTR, 0);
>>> +
>>> + rebuild_cgraph_edges ();
>>> +
>>> + pop_cfun ();
>>> + current_function_decl = old_current_function_decl;
>>> +
>>> + return bb3;
>>> +}
>>> +
>>> +/* This parses the attribute arguments to targetv in DECL and determines
>>> + the right builtin to use to match the platform specification.
>>> + For now, only one target argument ("arch=") is allowed. */
>>> +
>>> +static enum ix86_builtins
>>> +get_builtin_code_for_version (tree decl)
>>> +{
>>> + tree attrs;
>>> + struct cl_target_option cur_target;
>>> + tree target_node;
>>> + struct cl_target_option *new_target;
>>> + enum ix86_builtins builtin_code = IX86_BUILTIN_MAX;
>>> +
>>> + attrs = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl));
>>> + gcc_assert (attrs != NULL);
>>> +
>>> + cl_target_option_save (&cur_target, &global_options);
>>> +
>>> + target_node = ix86_valid_target_attribute_tree
>>> + (TREE_VALUE (TREE_VALUE (attrs)));
>>> +
>>> + gcc_assert (target_node);
>>> + new_target = TREE_TARGET_OPTION (target_node);
>>> + gcc_assert (new_target);
>>> +
>>> + if (new_target->arch_specified && new_target->arch > 0)
>>> + {
>>> + switch (new_target->arch)
>>> + {
>>> + case 1:
>>> + case 2:
>>> + case 3:
>>> + case 4:
>>> + case 5:
>>> + case 6:
>>> + case 7:
>>> + case 8:
>>> + case 9:
>>> + case 10:
>>> + case 11:
>>> + builtin_code = IX86_BUILTIN_CPU_IS_INTEL;
>>> + break;
>>> + case 12:
>>> + builtin_code = IX86_BUILTIN_CPU_IS_INTEL_CORE2;
>>> + break;
>>> + case 13:
>>> + builtin_code = IX86_BUILTIN_CPU_IS_INTEL_COREI7;
>>> + break;
>>> + case 14:
>>> + builtin_code = IX86_BUILTIN_CPU_IS_INTEL_ATOM;
>>> + break;
>>> + case 15:
>>> + case 16:
>>> + case 17:
>>> + case 18:
>>> + case 19:
>>> + case 20:
>>> + case 21:
>>> + builtin_code = IX86_BUILTIN_CPU_IS_AMD;
>>> + break;
>>> + case 22:
>>> + builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM10H;
>>> + break;
>>> + case 23:
>>> + builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER1;
>>> + break;
>>> + case 24:
>>> + builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER2;
>>> + break;
>>> + case 25: /* What is btver1 ? */
>>> + builtin_code = IX86_BUILTIN_CPU_IS_AMD;
>>> + break;
>>> + }
>>> + }
>>> +
>>> + cl_target_option_restore (&global_options, &cur_target);
>>> + if (builtin_code == IX86_BUILTIN_MAX)
>>> + error_at (DECL_SOURCE_LOCATION (decl),
>>> + "No dispatcher found for the versioning attributes");
>>> +
>>> + return builtin_code;
>>> +}
>>> +
>>> +/* This is the target hook to generate the dispatch function for
>>> + multi-versioned functions. DISPATCH_DECL is the function which will
>>> + contain the dispatch logic. FNDECLS are the function choices for
>>> + dispatch, and is a tree chain. EMPTY_BB is the basic block pointer
>>> + in DISPATCH_DECL in which the dispatch code is generated. */
>>> +
>>> +static int
>>> +ix86_dispatch_version (tree dispatch_decl,
>>> + void *fndecls_p,
>>> + basic_block *empty_bb)
>>> +{
>>> + tree default_decl;
>>> + gimple ifunc_cpu_init_stmt;
>>> + gimple_seq gseq;
>>> + tree old_current_function_decl;
>>> + int ix;
>>> + tree ele;
>>> + VEC (tree, heap) *fndecls;
>>> +
>>> + gcc_assert (dispatch_decl != NULL
>>> + && fndecls_p != NULL
>>> + && empty_bb != NULL);
>>> +
>>> + /*fndecls_p is actually a vector. */
>>> + fndecls = (VEC (tree, heap) *)fndecls_p;
>>> +
>>> + /* Atleast one more version other than the default. */
>>> + gcc_assert (VEC_length (tree, fndecls) >= 2);
>>> +
>>> + /* The first version in the vector is the default decl. */
>>> + default_decl = VEC_index (tree, fndecls, 0);
>>> +
>>> + old_current_function_decl = current_function_decl;
>>> + push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
>>> + current_function_decl = dispatch_decl;
>>> +
>>> + gseq = bb_seq (*empty_bb);
>>> + ifunc_cpu_init_stmt = gimple_build_call_vec (
>>> + ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
>>> + gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
>>> + gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
>>> + set_bb_seq (*empty_bb, gseq);
>>> +
>>> + pop_cfun ();
>>> + current_function_decl = old_current_function_decl;
>>> +
>>> +
>>> + for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
>>> + {
>>> + tree version_decl = ele;
>>> + /* Get attribute string, parse it and find the right predicate decl.
>>> + The predicate function could be a lengthy combination of many
>>> + features, like arch-type and various isa-variants. For now, only
>>> + check the arch-type. */
>>> + tree predicate_decl = ix86_builtins [
>>> + get_builtin_code_for_version (version_decl)];
>>> + *empty_bb = add_condition_to_bb (dispatch_decl, version_decl, *empty_bb,
>>> + predicate_decl);
>>> +
>>> + }
>>> + /* dispatch default version at the end. */
>>> + *empty_bb = add_condition_to_bb (dispatch_decl, default_decl, *empty_bb,
>>> + NULL);
>>> + return 0;
>>> +}
>>>
>>> @@ -38610,6 +39269,12 @@ ix86_autovectorize_vector_sizes (void)
>>> #undef TARGET_BUILD_BUILTIN_VA_LIST
>>> #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list
>>>
>>> +#undef TARGET_DISPATCH_VERSION
>>> +#define TARGET_DISPATCH_VERSION ix86_dispatch_version
>>> +
>>> #undef TARGET_ENUM_VA_LIST_P
>>> #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
>>>
>>> Index: testsuite/g++.dg/mv1.C
>>> ===================================================================
>>> --- testsuite/g++.dg/mv1.C (revision 0)
>>> +++ testsuite/g++.dg/mv1.C (revision 0)
>>> @@ -0,0 +1,23 @@
>>> +/* Simple test case to check if Multiversioning works. */
>>> +/* { dg-do run } */
>>> +/* { dg-options "-O2" } */
>>> +
>>> +int foo ();
>>> +int foo () __attribute__ ((targetv("arch=corei7")));
>>> +
>>> +int main ()
>>> +{
>>> + int (*p)() = &foo;
>>> + return foo () + (*p)();
>>> +}
>>> +
>>> +int foo ()
>>> +{
>>> + return 0;
>>> +}
>>> +
>>> +int __attribute__ ((targetv("arch=corei7")))
>>> +foo ()
>>> +{
>>> + return 0;
>>> +}
>>>
>>>
>>> --
>>> This patch is available for review at http://codereview.appspot.com/5752064
next prev parent reply other threads:[~2012-03-08 21:37 UTC|newest]
Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-07 0:47 Sriraman Tallam
2012-03-07 14:05 ` Richard Guenther
2012-03-07 19:08 ` Sriraman Tallam
2012-03-08 21:37 ` Xinliang David Li [this message]
2012-03-08 21:00 ` Xinliang David Li
2012-03-09 20:04 ` Sriraman Tallam
2012-04-27 5:09 ` Sriraman Tallam
2012-04-27 13:39 ` H.J. Lu
2012-04-27 14:35 ` Sriraman Tallam
2012-04-27 14:39 ` H.J. Lu
2012-04-27 14:53 ` Sriraman Tallam
2012-04-27 15:36 ` H.J. Lu
2012-04-27 15:45 ` Sriraman Tallam
2012-05-01 23:51 ` Sriraman Tallam
2012-05-02 0:09 ` H.J. Lu
2012-05-02 2:45 ` Sriraman Tallam
2012-05-02 13:42 ` H.J. Lu
2012-05-02 15:08 ` Sriraman Tallam
2012-05-02 16:06 ` H.J. Lu
2012-05-02 17:44 ` Sriraman Tallam
2012-05-02 18:04 ` H.J. Lu
2012-05-07 16:58 ` Sriraman Tallam
2012-05-09 19:01 ` Sriraman Tallam
2012-05-10 17:55 ` H.J. Lu
2012-05-12 2:04 ` Sriraman Tallam
2012-05-12 13:38 ` H.J. Lu
2012-05-14 18:29 ` Sriraman Tallam
2012-05-26 0:07 ` H.J. Lu
2012-05-26 0:16 ` Sriraman Tallam
2012-05-26 0:27 ` H.J. Lu
2012-05-26 1:54 ` Sriraman Tallam
[not found] ` <CAMe9rOowm9K7r1xnRdRjW5Y4Ay+WxgSsBLTgGvq24z=i42AS+g@mail.gmail.com>
[not found] ` <CAAs8HmzeQigcLQyfkC02u=6gCTLkjLLa_jYmp+b1HEtpMCrYWw@mail.gmail.com>
2012-05-26 5:06 ` H.J. Lu
2012-05-26 22:35 ` Sriraman Tallam
2012-05-26 23:56 ` H.J. Lu
2012-05-27 0:24 ` Sriraman Tallam
2012-05-27 2:06 ` H.J. Lu
2012-05-27 2:23 ` Sriraman Tallam
2012-05-27 2:31 ` H.J. Lu
2012-05-27 19:02 ` Ian Lance Taylor
2012-06-04 19:01 ` Sriraman Tallam
2012-06-04 21:36 ` H.J. Lu
2012-06-04 22:29 ` Sriraman Tallam
2012-06-05 13:56 ` H.J. Lu
2012-06-14 20:35 ` Sriraman Tallam
2012-06-20 1:10 ` Sriraman Tallam
2012-07-06 9:14 ` Richard Guenther
2012-07-06 17:38 ` Sriraman Tallam
2012-07-07 6:06 ` Jason Merrill
2012-07-07 18:38 ` Xinliang David Li
2012-07-08 11:21 ` Jason Merrill
2012-07-09 21:27 ` Xinliang David Li
2012-07-10 9:46 ` Jason Merrill
2012-07-10 16:09 ` Xinliang David Li
[not found] ` <CAAs8HmxHF38ktt6syjWp-MpjiX+6NcXh7_8Xn6iKnAiF2vRymQ@mail.gmail.com>
2012-07-19 20:40 ` Jason Merrill
2012-07-30 19:16 ` Sriraman Tallam
2012-08-25 0:34 ` Sriraman Tallam
2012-09-18 16:29 ` Sriraman Tallam
2012-10-05 17:07 ` Xinliang David Li
2012-10-05 17:44 ` Jason Merrill
2012-10-05 18:14 ` Jason Merrill
2012-10-05 21:58 ` Sriraman Tallam
2012-10-05 22:50 ` Jason Merrill
2012-10-05 23:45 ` Sriraman Tallam
2012-10-05 18:32 ` Jason Merrill
2012-10-11 0:13 ` Sriraman Tallam
2012-10-12 22:41 ` Sriraman Tallam
2012-10-19 15:23 ` Diego Novillo
2012-10-20 4:29 ` Sriraman Tallam
2012-10-23 21:21 ` Sriraman Tallam
2012-10-26 16:53 ` Jan Hubicka
2012-10-28 4:31 ` Sriraman Tallam
2012-10-29 13:05 ` Jan Hubicka
2012-10-29 17:56 ` Sriraman Tallam
2012-10-30 19:18 ` Jason Merrill
2012-10-31 0:58 ` Sriraman Tallam
[not found] ` <CAAs8Hmw09giv-5_v0irhByTjTJV=kD58rCAD2SAz7M8zrwjBOA@mail.gmail.com>
2012-10-31 14:27 ` Jason Merrill
2012-11-02 2:53 ` Sriraman Tallam
2012-11-06 2:38 ` Sriraman Tallam
2012-11-06 15:52 ` Jason Merrill
2012-11-06 18:17 ` Sriraman Tallam
2012-11-10 1:33 ` Sriraman Tallam
2012-11-12 5:04 ` Jason Merrill
2012-11-13 1:11 ` Sriraman Tallam
2012-11-13 2:39 ` Jason Merrill
2012-11-13 21:57 ` Sriraman Tallam
2012-11-17 22:23 ` H.J. Lu
2012-11-06 22:15 ` Gerald Pfeifer
2012-10-26 14:11 ` Diego Novillo
2012-10-26 16:54 Xinliang David Li
2012-10-26 17:28 ` Sriraman Tallam
2012-11-06 22:17 Dominique Dhumieres
2012-11-07 1:16 ` Gerald Pfeifer
2012-11-07 8:53 ` Dominique Dhumieres
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAAkRFZ+s2-fvR5CovaJZF4yJdiwpT1M73ADafAXkeVU9+At+zA@mail.gmail.com \
--to=davidxl@google.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=reply@codereview.appspotmail.com \
--cc=richard.guenther@gmail.com \
--cc=tmsriram@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).