public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* User directed Function Multiversioning via Function Overloading (issue5752064)
@ 2012-03-07  0:47 Sriraman Tallam
  2012-03-07 14:05 ` Richard Guenther
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-03-07  0:47 UTC (permalink / raw)
  To: reply, gcc-patches

User directed Function Multiversioning (MV) via Function Overloading
====================================================================

This patch adds support for user directed function MV via function overloading.
For more detailed description:
http://gcc.gnu.org/ml/gcc/2012-03/msg00074.html


Here is an example program with function versions:

int foo ();  /* Default version */
int foo () __attribute__ ((targetv("arch=corei7")));/*Specialized for corei7 */
int foo () __attribute__ ((targetv("arch=core2")));/*Specialized for core2 */ 

int main ()
{
  int (*p)() = &foo;
  return foo () + (*p)();
}

int foo ()
{
  return 0;
}

int __attribute__ ((targetv("arch=corei7")))
foo ()
{
  return 0;
}

int __attribute__ ((targetv("arch=core2")))
foo ()
{
  return 0;
}

The above example has foo defined 3 times, but all 3 definitions of foo are
different versions of the same function. The call to foo in main, directly and
via a pointer, are calls to the multi-versioned function foo which is dispatched
to the right foo at run-time.

Function versions must have the same signature but must differ in the specifier
string provided to a new attribute called "targetv", which is nothing but the
target attribute with an extra specification to indicate a version. Any number
of versions can be created using the targetv attribute but it is mandatory to
have one function without the attribute, which is treated as the default
version.

The dispatching is done using the IFUNC mechanism to keep the dispatch overhead
low. The compiler creates a dispatcher function which checks the CPU type and
calls the right version of foo. The dispatching code checks for the platform
type and calls the first version that matches. The default function is called if
no specialized version is appropriate for execution.

The pointer to foo is made to be the address of the dispatcher function, so that
it is unique and calls made via the pointer also work correctly. The assembler
names of the various versions of foo is made different, by tagging
the specifier strings, to keep them unique.  A specific version can be called
directly by creating an alias to its assembler name. For instance, to call the
corei7 version directly, make an alias :
int foo_corei7 () __attribute__((alias ("_Z3foov.arch_corei7")));
and then call foo_corei7.

Note that using IFUNC  blocks inlining of versioned functions. I had implemented
an optimization earlier to do hot path cloning to allow versioned functions to
be inlined. Please see : http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02285.html
In the next iteration, I plan to merge these two. With that, hot code paths with
versioned functions will be cloned so that versioned functions can be inlined.

	* doc/tm.texi.in: Add description for TARGET_DISPATCH_VERSION.
	* doc/tm.texi: Regenerate.
	* c-family/c-common.c (handle_targetv_attribute): New function.
	* target.def (dispatch_version): New target hook.
	* tree.h (DECL_FUNCTION_VERSIONED): New macro.
	(tree_function_decl): New bit-field versioned_function.
	* tree-pass.h (pass_dispatch_versions): New pass.
	* multiversion.c: New file.
	* multiversion.h: New file.
	* cgraphunit.c: Include multiversion.h
	(cgraph_finalize_function): Change assembler names of versioned
	functions.
	* cp/class.c: Include multiversion.h
	(add_method): aggregate function versions. Change assembler names of
	versioned functions.
	(resolve_address_of_overloaded_function): Match address of function
	version with default function.  Return address of ifunc dispatcher
	for address of versioned functions.
	* cp/decl.c (decls_match): Make decls unmatched for versioned
	functions.
	(duplicate_decls): Remove ambiguity for versioned functions. Notify
	of deleted function version decls.
	(start_decl): Change assembler name of versioned functions.
	(start_function): Change assembler name of versioned functions.
	(cxx_comdat_group): Make comdat group of versioned functions be the
	same.
	* cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
	functions that are also marked inline.
	* cp/decl2.c: Include multiversion.h
	(check_classfn): Check attributes of versioned functions for match.
	* cp/call.c: Include multiversion.h
	(build_over_call): Make calls to multiversioned functions to call the
	dispatcher.
	(joust): For calls to multi-versioned functions, make the default
	function win.
	* timevar.def (TV_MULTIVERSION_DISPATCH): New time var.
	* varasm.c (finish_aliases_1): Check if the alias points to a function
	with a body before giving an error.
	* Makefile.in: Add multiversion.o
	* passes.c: Add pass_dispatch_versions to the pass list.
	* config/i386/i386.c (add_condition_to_bb): New function.
	(get_builtin_code_for_version): New function.
	(ix86_dispatch_version): New function.
	(TARGET_DISPATCH_VERSION): New macro.
	* testsuite/g++.dg/mv1.C: New test.

Index: doc/tm.texi
===================================================================
--- doc/tm.texi	(revision 184971)
+++ doc/tm.texi	(working copy)
@@ -10995,6 +10995,14 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb})
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: doc/tm.texi.in
===================================================================
--- doc/tm.texi.in	(revision 184971)
+++ doc/tm.texi.in	(working copy)
@@ -10873,6 +10873,14 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@hook TARGET_DISPATCH_VERSION
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: c-family/c-common.c
===================================================================
--- c-family/c-common.c	(revision 184971)
+++ c-family/c-common.c	(working copy)
@@ -315,6 +315,7 @@ static tree check_case_value (tree);
 static bool check_case_bounds (tree, tree, tree *, tree *);
 
 static tree handle_packed_attribute (tree *, tree, tree, int, bool *);
+static tree handle_targetv_attribute (tree *, tree, tree, int, bool *);
 static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *);
 static tree handle_common_attribute (tree *, tree, tree, int, bool *);
 static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *);
@@ -604,6 +605,8 @@ const struct attribute_spec c_common_attribute_tab
 {
   /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
        affects_type_identity } */
+  { "targetv",	      	      1, -1, true, false, false,
+			      handle_targetv_attribute, false },
   { "packed",                 0, 0, false, false, false,
 			      handle_packed_attribute , false},
   { "nocommon",               0, 0, true,  false, false,
@@ -5869,6 +5872,54 @@ handle_packed_attribute (tree *node, tree name, tr
   return NULL_TREE;
 }
 
+/* The targetv attribue is used to specify a function version
+   targeted to specific platform types.  The "targetv" attributes
+   have to be valid "target" attributes.  NODE should always point
+   to a FUNCTION_DECL.  ARGS contain the arguments to "targetv"
+   which should be valid arguments to attribute "target" too.
+   Check handle_target_attribute for FLAGS and NO_ADD_ATTRS.  */
+
+static tree
+handle_targetv_attribute (tree *node, tree name,
+ 		   	  tree args,
+ 			  int flags,
+			  bool *no_add_attrs)
+{
+  const char *attr_str = NULL;
+  gcc_assert (TREE_CODE (*node) == FUNCTION_DECL);
+  gcc_assert (args != NULL);
+
+  /* This is a function version.  */
+  DECL_FUNCTION_VERSIONED (*node) = 1;
+  
+  attr_str = TREE_STRING_POINTER (TREE_VALUE (args));
+
+  /* Check if multiple sets of target attributes are there.  This
+     is not supported now.   In future, this will be supported by
+     cloning this function for each set.  */
+  if (TREE_CHAIN (args) != NULL)
+    warning (OPT_Wattributes, "%qE attribute has multiple sets which "
+	     "is not supported", name);
+
+  if (attr_str == NULL
+      || strstr (attr_str, "arch=") == NULL)
+    error_at (DECL_SOURCE_LOCATION (*node),
+	      "Versioning supported only on \"arch=\" for now");
+
+  /* targetv attributes must translate into target attributes.  */
+  handle_target_attribute (node, get_identifier ("target"), args, flags,
+			   no_add_attrs);
+
+  if (*no_add_attrs)
+    warning (OPT_Wattributes, "%qE attribute has no effect", name);
+
+  /* This is necessary to keep the attribute tagged to the decl
+     all the time.  */
+  *no_add_attrs = false;
+
+  return NULL_TREE;
+}
+
 /* Handle a "nocommon" attribute; arguments as in
    struct attribute_spec.handler.  */
 
Index: target.def
===================================================================
--- target.def	(revision 184971)
+++ target.def	(working copy)
@@ -1249,6 +1249,15 @@ DEFHOOK
  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
  hook_tree_tree_int_treep_bool_null)
 
+/* Target hook to generate the dispatching code for calls to multi-versioned
+   functions.  DISPATCH_DECL is the function that will have the dispatching
+   logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the
+   basic bloc in DISPATCH_DECL which will contain the code.  */
+DEFHOOK
+(dispatch_version,
+ "",
+ int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL)
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 DEFHOOK
Index: tree.h
===================================================================
--- tree.h	(revision 184971)
+++ tree.h	(working copy)
@@ -3532,6 +3532,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
 #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
 
+/* In FUNCTION_DECL, this is set if this function has other versions generated
+   using "targetv" attributes.  The default version is the one which does not
+   have any "targetv" attribute set. */
+#define DECL_FUNCTION_VERSIONED(NODE)\
+   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
+
 /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
    arguments/result/saved_tree fields by front ends.   It was either inherit
    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
@@ -3576,8 +3582,8 @@ struct GTY(()) tree_function_decl {
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
-
-  /* 1 bit left */
+  unsigned versioned_function : 1;
+  /* No bits left.  */
 };
 
 /* The source language of the translation-unit.  */
Index: tree-pass.h
===================================================================
--- tree-pass.h	(revision 184971)
+++ tree-pass.h	(working copy)
@@ -455,6 +455,7 @@ extern struct gimple_opt_pass pass_tm_memopt;
 extern struct gimple_opt_pass pass_tm_edges;
 extern struct gimple_opt_pass pass_split_functions;
 extern struct gimple_opt_pass pass_feedback_split_functions;
+extern struct gimple_opt_pass pass_dispatch_versions;
 
 /* IPA Passes */
 extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
Index: multiversion.c
===================================================================
--- multiversion.c	(revision 0)
+++ multiversion.c	(revision 0)
@@ -0,0 +1,798 @@
+/* Function Multiversioning.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+/* Holds the state for multi-versioned functions here. The front-end
+   updates the state as and when function versions are encountered.
+   This is then used to generate the dispatch code.  Also, the
+   optimization passes to clone hot paths involving versioned functions
+   will be done here.
+
+   Function versions are created by using the same function signature but
+   also tagging attribute "targetv" to specify the platform type for which
+   the version must be executed.  Here is an example:
+
+   int foo ()
+   {
+     printf ("Execute as default");
+     return 0;
+   }
+
+   int  __attribute__ ((targetv ("arch=corei7")))
+   foo ()
+   {
+     printf ("Execute for corei7");
+     return 0;
+   }
+   
+   int main ()
+   {
+     return foo ();
+   } 
+
+   The call to foo in main is replaced with a call to an IFUNC function that
+   contains the dispatch code to call the correct function version at
+   run-time.  */
+
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "tree-inline.h"
+#include "langhooks.h"
+#include "flags.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "toplev.h"
+#include "timevar.h"
+#include "params.h"
+#include "fibheap.h"
+#include "intl.h"
+#include "tree-pass.h"
+#include "hashtab.h"
+#include "coverage.h"
+#include "ggc.h"
+#include "tree-flow.h"
+#include "rtl.h"
+#include "ipa-prop.h"
+#include "basic-block.h"
+#include "toplev.h"
+#include "dbgcnt.h"
+#include "tree-dump.h"
+#include "output.h"
+#include "vecprim.h"
+#include "gimple-pretty-print.h"
+#include "ipa-inline.h"
+#include "target.h"
+#include "multiversion.h"
+
+typedef void * void_p;
+
+DEF_VEC_P (void_p);
+DEF_VEC_ALLOC_P (void_p, heap);
+
+/* Each function decl that is a function version gets an instance of this
+   structure.   Since this is called by the front-end, decl merging can
+   happen, where a decl created for a new declaration is merged with 
+   the old. In this case, the new decl is deleted and the IS_DELETED
+   field is set for the struct instance corresponding to the new decl.
+   IFUNC_DECL is the decl of the ifunc function for default decls.
+   IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS
+   is a vector containing the list of function versions  that are
+   the candidates for dispatch.  */
+
+typedef struct version_function_d {
+  tree decl;
+  tree ifunc_decl;
+  tree ifunc_resolver_decl;
+  VEC (void_p, heap) *versions;
+  bool is_deleted;
+} version_function;
+
+/* Hashmap has an entry for every function decl that has other function
+   versions.  For function decls that are the default, it also stores the
+   list of all the other function versions.  Each entry is a structure
+   of type version_function_d.  */
+static htab_t decl_version_htab = NULL;
+
+/* Hashtable helpers for decl_version_htab. */
+
+static hashval_t
+decl_version_htab_hash_descriptor (const void *p)
+{
+  const version_function *t = (const version_function *) p;
+  return htab_hash_pointer (t->decl);
+}
+
+/* Hashtable helper for decl_version_htab. */
+
+static int
+decl_version_htab_eq_descriptor (const void *p1, const void *p2)
+{
+  const version_function *t1 = (const version_function *) p1;
+  return htab_eq_pointer ((const void_p) t1->decl, p2);
+}
+
+/* Create the decl_version_htab.  */
+static void
+create_decl_version_htab (void)
+{
+  if (decl_version_htab == NULL)
+    decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor,
+				     decl_version_htab_eq_descriptor, NULL);
+}
+
+/* Creates an instance of version_function for decl DECL.  */
+
+static version_function*
+new_version_function (const tree decl)
+{
+  version_function *v;
+  v = (version_function *)xmalloc(sizeof (version_function));
+  v->decl = decl;
+  v->ifunc_decl = NULL;
+  v->ifunc_resolver_decl = NULL;
+  v->versions = NULL;
+  v->is_deleted = false;
+  return v;
+}
+
+/* Comparator function to be used in qsort routine to sort attribute
+   specification strings to "targetv".  */
+
+static int
+attr_strcmp (const void *v1, const void *v2)
+{
+  const char *c1 = *(char *const*)v1;
+  const char *c2 = *(char *const*)v2;
+  return strcmp (c1, c2);
+}
+
+/* STR is the argument to targetv attribute.  This function tokenizes
+   the comma separated arguments, sorts them and returns a string which
+   is a unique identifier for the comma separated arguments.  */
+
+static char *
+sorted_attr_string (const char *str)
+{
+  char **args = NULL;
+  char *attr_str, *ret_str;
+  char *attr = NULL;
+  unsigned int argnum = 1;
+  unsigned int i;
+
+  for (i = 0; i < strlen (str); i++)
+    if (str[i] == ',')
+      argnum++;
+
+  attr_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (attr_str, str);
+
+  for (i = 0; i < strlen (attr_str); i++)
+    if (attr_str[i] == '=')
+      attr_str[i] = '_';
+
+  if (argnum == 1)
+    return attr_str;
+
+  args = (char **)xmalloc (argnum * sizeof (char *));
+
+  i = 0;
+  attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      args[i] = attr;
+      i++;
+      attr = strtok (NULL, ",");
+    }
+
+  qsort (args, argnum, sizeof (char*), attr_strcmp);
+
+  ret_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (ret_str, args[0]);
+  for (i = 1; i < argnum; i++)
+    {
+      strcat (ret_str, "_");
+      strcat (ret_str, args[i]);
+    }
+
+  free (args);
+  free (attr_str);
+  return ret_str;
+}
+
+/* Returns true when only one of DECL1 and DECL2 is marked with "targetv"
+   or if the "targetv" attribute strings of DECL1 and DECL2 dont match.  */
+
+bool
+has_different_version_attributes (const tree decl1, const tree decl2)
+{
+  tree attr1, attr2;
+  char *c1, *c2;
+  bool ret = false;
+
+  if (TREE_CODE (decl1) != FUNCTION_DECL
+      || TREE_CODE (decl2) != FUNCTION_DECL)
+    return false;
+
+  attr1 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl1));
+  attr2 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl2));
+
+  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
+    return false;
+
+  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
+      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
+    return true;
+ 
+  c1 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1))));
+  c2 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2))));
+
+  if (strcmp (c1, c2) != 0)
+     ret = true;
+
+  free (c1);
+  free (c2);
+
+  return ret;
+}
+
+/* If this decl corresponds to a function and has "targetv" attribute,
+   append the attribute string to its assembler name.  */
+
+void
+version_assembler_name (const tree decl)
+{
+  tree version_attr;
+  const char *orig_name, *version_string, *attr_str;
+  char *assembler_name;
+  tree assembler_name_tree;
+  
+  if (TREE_CODE (decl) != FUNCTION_DECL
+      || DECL_ASSEMBLER_NAME_SET_P (decl)
+      || !DECL_FUNCTION_VERSIONED (decl))
+    return;
+
+  if (DECL_DECLARED_INLINE_P (decl)
+      &&lookup_attribute ("gnu_inline",
+			  DECL_ATTRIBUTES (decl)))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Function versions cannot be marked as gnu_inline,"
+	      " bodies have to be generated\n");
+
+  if (DECL_VIRTUAL_P (decl)
+      || DECL_VINDEX (decl))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Virtual function versioning not supported\n");
+
+  version_attr = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl));
+  /* targetv attribute string is NULL for default functions.  */
+  if (version_attr == NULL_TREE)
+    return;
+
+  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+  version_string
+    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
+
+  attr_str = sorted_attr_string (version_string);
+  assembler_name = (char *) xmalloc (strlen (orig_name)
+				     + strlen (attr_str) + 2);
+
+  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
+  if (dump_file)
+    fprintf (dump_file, "Assembler name set to %s for function version %s\n",
+	     assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
+  assembler_name_tree = get_identifier (assembler_name);
+  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree); 
+}
+
+/* Returns true if decl is multi-versioned and DECL is the default function,
+   that is it is not tagged with "targetv" attribute.  */
+
+bool
+is_default_function (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && (lookup_attribute ("targetv", DECL_ATTRIBUTES (decl))
+	      == NULL_TREE));	
+}
+
+/* For function decl DECL, find the version_function struct in the
+   decl_version_htab.  */
+
+static version_function *
+find_function_version (const tree decl)
+{
+  void *slot;
+
+  if (!DECL_FUNCTION_VERSIONED (decl))
+    return NULL;
+
+  if (!decl_version_htab)
+    return NULL;
+
+  slot = htab_find_with_hash (decl_version_htab, decl,
+                              htab_hash_pointer (decl));
+
+  if (slot != NULL)
+    return (version_function *)slot;
+
+  return NULL;
+}
+
+/* Record DECL as a function version by creating a version_function struct
+   for it and storing it in the hashtable.  */
+
+static version_function *
+add_function_version (const tree decl)
+{
+  void **slot;
+  version_function *v;
+
+  if (!DECL_FUNCTION_VERSIONED (decl))
+    return NULL;
+
+  create_decl_version_htab ();
+
+  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl,
+                                   htab_hash_pointer ((const void_p)decl),
+				   INSERT);
+
+  if (*slot != NULL)
+    return (version_function *)*slot;
+
+  v = new_version_function (decl);
+  *slot = v;
+
+  return v;
+}
+
+/* Push V into VEC only if it is not already present.  */
+
+static void
+push_function_version (version_function *v, VEC (void_p, heap) *vec)
+{
+  int ix;
+  void_p ele; 
+  for (ix = 0; VEC_iterate (void_p, vec, ix, ele); ++ix)
+    {
+      if (ele == (void_p)v)
+        return;
+    }
+
+  VEC_safe_push (void_p, heap, vec, (void*)v);
+}
+ 
+/* Mark DECL as deleted.  This is called by the front-end when a duplicate
+   decl is merged with the original decl and the duplicate decl is deleted.
+   This function marks the duplicate_decl as invalid.  Called by
+   duplicate_decls in cp/decl.c.  */
+
+void
+mark_delete_decl_version (const tree decl)
+{
+  version_function *decl_v;
+
+  decl_v = find_function_version (decl);
+
+  if (decl_v == NULL)
+    return;
+
+  decl_v->is_deleted = true;
+
+  if (is_default_function (decl)
+      && decl_v->versions != NULL)
+    {
+      VEC_truncate (void_p, decl_v->versions, 0);
+      VEC_free (void_p, heap, decl_v->versions);
+    }
+}
+
+/* Mark DECL1 and DECL2 to be function versions in the same group.  One
+   of DECL1 and DECL2 must be the default, otherwise this function does
+   nothing.  This function aggregates the versions.  */
+
+int
+group_function_versions (const tree decl1, const tree decl2)
+{
+  tree default_decl, version_decl;
+  version_function *default_v, *version_v;
+
+  gcc_assert (DECL_FUNCTION_VERSIONED (decl1)
+	      && DECL_FUNCTION_VERSIONED (decl2));
+
+  /* The version decls are added only to the default decl.  */
+  if (!is_default_function (decl1)
+      && !is_default_function (decl2))
+    return 0;
+
+  /* This can happen with duplicate declarations.  Just ignore.  */
+  if (is_default_function (decl1)
+      && is_default_function (decl2))
+    return 0;
+
+  default_decl = (is_default_function (decl1)) ? decl1 : decl2;
+  version_decl = (default_decl == decl1) ? decl2 : decl1;
+
+  gcc_assert (default_decl != version_decl);
+  create_decl_version_htab ();
+
+  /* If the version function is found, it has been added.  */
+  if (find_function_version (version_decl))
+    return 0;
+
+  default_v = add_function_version (default_decl);
+  version_v = add_function_version (version_decl);
+
+  if (default_v->versions == NULL)
+    default_v->versions = VEC_alloc (void_p, heap, 1);
+
+  push_function_version (version_v, default_v->versions);
+  return 0;
+}
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* Return a new name by appending SUFFIX to the DECL name.  If
+   make_unique is true, append the full path name.  */
+
+static char *
+make_name (tree decl, const char *suffix, bool make_unique)
+{
+  char *global_var_name;
+  int name_len;
+  const char *name;
+  const char *unique_name = NULL;
+
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* Get a unique name that can be used globally without any chances
+     of collision at link time.  */
+  if (make_unique)
+    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
+
+  name_len = strlen (name) + strlen (suffix) + 2;
+
+  if (make_unique)
+    name_len += strlen (unique_name) + 1;
+  global_var_name = (char *) xmalloc (name_len);
+
+  /* Use '.' to concatenate names as it is demangler friendly.  */
+  if (make_unique)
+      snprintf (global_var_name, name_len, "%s.%s.%s", name,
+		unique_name, suffix);
+  else
+      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
+
+  return global_var_name;
+}
+
+/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch
+   the versions of multi-versioned function DEFAULT_DECL.  Create and
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_ifunc_resolver_func (const tree default_decl,
+			  const tree ifunc_decl,
+			  basic_block *empty_bb)
+{
+  char *resolver_name;
+  tree decl, type, decl_name, t;
+  basic_block new_bb;
+  tree old_current_function_decl;
+  bool make_unique = false;
+
+  /* IFUNC's have to be globally visible.  So, if the default_decl is
+     not, then the name of the IFUNC should be made unique.  */
+  if (TREE_PUBLIC (default_decl) == 0)
+    make_unique = true;
+
+  /* Append the filename to the resolver function if the versions are
+     not externally visible.  This is because the resolver function has
+     to be externally visible for the loader to find it.  So, appending
+     the filename will prevent conflicts with a resolver function from
+     another module which is based on the same version name.  */
+  resolver_name = make_name (default_decl, "resolver", make_unique);
+
+  /* The resolver function should return a (void *). */
+  type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  decl_name = get_identifier (resolver_name);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = TREE_USED (default_decl);
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  /* IFUNC resolvers have to be externally visible.  */
+  TREE_PUBLIC (decl) = 1;
+  DECL_UNINLINABLE (decl) = 1;
+
+  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl);
+  DECL_EXTERNAL (ifunc_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+  TREE_READONLY (decl) = 0;
+  DECL_PURE_P (decl) = 0;
+  DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
+    }
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  current_function_decl = decl;
+  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
+  cfun->curr_properties |=
+    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
+     PROP_ssa);
+  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
+  *empty_bb = new_bb;
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
+  cgraph_analyze_function (cgraph_get_create_node (decl));
+  cgraph_mark_needed_node (cgraph_get_create_node (decl));
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      gcc_assert (cgraph_get_node (default_decl));
+      cgraph_add_to_same_comdat_group (cgraph_get_node (decl),
+				       cgraph_get_node (default_decl));
+    }
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  gcc_assert (ifunc_decl != NULL);
+  DECL_ATTRIBUTES (ifunc_decl) 
+    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl));
+  assemble_alias (ifunc_decl, get_identifier (resolver_name));
+  return decl;
+}
+
+/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to
+   DECL function will be replaced with calls to the ifunc.   Return the decl
+   of the ifunc created.  */
+
+static tree
+make_ifunc_func (const tree decl)
+{
+  tree ifunc_decl;
+  char *ifunc_name, *resolver_name;
+  tree fn_type, ifunc_type;
+  bool make_unique = false;
+
+  if (TREE_PUBLIC (decl) == 0)
+    make_unique = true;
+
+  ifunc_name = make_name (decl, "ifunc", make_unique);
+  resolver_name = make_name (decl, "resolver", make_unique);
+  gcc_assert (resolver_name);
+
+  fn_type = TREE_TYPE (decl);
+  ifunc_type = build_function_type (TREE_TYPE (fn_type),
+				    TYPE_ARG_TYPES (fn_type));
+  
+  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type);
+  TREE_USED (ifunc_decl) = 1;
+  DECL_CONTEXT (ifunc_decl) = NULL_TREE;
+  DECL_INITIAL (ifunc_decl) = error_mark_node;
+  DECL_ARTIFICIAL (ifunc_decl) = 1;
+  /* Mark this ifunc as external, the resolver will flip it again if
+     it gets generated.  */
+  DECL_EXTERNAL (ifunc_decl) = 1;
+  /* IFUNCs have to be externally visible.  */
+  TREE_PUBLIC (ifunc_decl) = 1;
+
+  return ifunc_decl;  
+}
+
+/* For multi-versioned function decl, which should also be the default,
+   return the decl of the ifunc resolver, create it if it does not
+   exist.  */
+
+tree
+get_ifunc_for_version (const tree decl)
+{
+  version_function *decl_v;
+  int ix;
+  void_p ele;
+
+  /* DECL has to be the default version, otherwise it is missing and
+     that is not allowed.  */
+  if (!is_default_function (decl))
+    {
+      error_at (DECL_SOURCE_LOCATION (decl), "Default version not found");
+      return decl;
+    }
+
+  decl_v = find_function_version (decl);
+  gcc_assert (decl_v != NULL);
+  if (decl_v->ifunc_decl == NULL)
+    {
+      tree ifunc_decl;
+      ifunc_decl = make_ifunc_func (decl);
+      decl_v->ifunc_decl = ifunc_decl;
+    }
+
+  if (cgraph_get_node (decl))
+    cgraph_mark_needed_node (cgraph_get_node (decl));
+
+  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
+    {
+      version_function *v = (version_function *) ele;
+      gcc_assert (v->decl != NULL);
+      if (cgraph_get_node (v->decl))
+	cgraph_mark_needed_node (cgraph_get_node (v->decl));
+    }
+
+  return decl_v->ifunc_decl;
+}
+
+/* Generate the dispatching code to dispatch multi-versioned function
+   DECL.  Make a new function decl for dispatching and call the target
+   hook to process the "targetv" attributes and provide the code to
+   dispatch the right function at run-time.  */
+
+static tree
+make_ifunc_resolver_for_version (const tree decl)
+{
+  version_function *decl_v;
+  tree ifunc_resolver_decl, ifunc_decl;
+  basic_block empty_bb;
+  int ix;
+  void_p ele;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+
+  gcc_assert (is_default_function (decl));
+
+  decl_v = find_function_version (decl);
+  gcc_assert (decl_v != NULL);
+
+  if (decl_v->ifunc_resolver_decl != NULL)
+    return decl_v->ifunc_resolver_decl;
+
+  ifunc_decl = decl_v->ifunc_decl;
+
+  if (ifunc_decl == NULL)
+    ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl);
+
+  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl,
+						  &empty_bb);
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+  VEC_safe_push (tree, heap, fn_ver_vec, decl);
+
+  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
+    {
+      version_function *v = (version_function *) ele;
+      gcc_assert (v->decl != NULL);
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (v->decl))
+        error_at (DECL_SOURCE_LOCATION (v->decl),
+		  "Virtual function versioning not supported\n");
+      if (!v->is_deleted)
+	VEC_safe_push (tree, heap, fn_ver_vec, v->decl);
+    }
+
+  gcc_assert (targetm.dispatch_version);
+  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb);
+  decl_v->ifunc_resolver_decl = ifunc_resolver_decl;
+
+  return ifunc_resolver_decl;
+}
+
+/* Main entry point to pass_dispatch_versions. For multi-versioned functions,
+   generate the dispatching code.  */
+
+static unsigned int
+do_dispatch_versions (void)
+{
+  /* A new pass for generating dispatch code for multi-versioned functions.
+     Other forms of dispatch can be added when ifunc support is not available
+     like just calling the function directly after checking for target type.
+     Currently, dispatching is done through IFUNC.  This pass will become
+     more meaningful when other dispatch mechanisms are added.  */
+
+  /* Cloning a function to produce more versions will happen here when the
+     user requests that via the targetv attribute. For example,
+     int foo () __attribute__ ((targetv(("arch=core2"), ("arch=corei7"))));
+     means that the user wants the same body of foo to be versioned for core2
+     and corei7.  In that case, this function will be cloned during this
+     pass.  */
+  
+  if (DECL_FUNCTION_VERSIONED (current_function_decl)
+      && is_default_function (current_function_decl))
+    {
+      tree decl = make_ifunc_resolver_for_version (current_function_decl);
+      if (dump_file && decl)
+	dump_function_to_file (decl, dump_file, TDF_BLOCKS);
+    }
+  return 0;
+}
+
+static  bool
+gate_dispatch_versions (void)
+{
+  return true;
+}
+
+/* A pass to generate the dispatch code to execute the appropriate version
+   of a multi-versioned function at run-time.  */
+
+struct gimple_opt_pass pass_dispatch_versions =
+{
+ {
+  GIMPLE_PASS,
+  "dispatch_multiversion_functions",    /* name */
+  gate_dispatch_versions,		/* gate */
+  do_dispatch_versions,			/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_MULTIVERSION_DISPATCH,		/* tv_id */
+  PROP_cfg,				/* properties_required */
+  PROP_cfg,				/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  TODO_dump_func |			/* todo_flags_finish */
+  TODO_cleanup_cfg | TODO_dump_cgraph
+ }
+};
Index: cgraphunit.c
===================================================================
--- cgraphunit.c	(revision 184971)
+++ cgraphunit.c	(working copy)
@@ -141,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-inline.h"
 #include "ipa-utils.h"
 #include "lto-streamer.h"
+#include "multiversion.h"
 
 static void cgraph_expand_all_functions (void);
 static void cgraph_mark_functions_to_output (void);
@@ -343,6 +344,13 @@ cgraph_finalize_function (tree decl, bool nested)
       node->local.redefined_extern_inline = true;
     }
 
+  /* If this is a function version and not the default, change the
+     assembler name of this function.  The DECL names of function
+     versions are the same, only the assembler names are made unique.
+     The assembler name is changed by appending the string from
+     the "targetv" attribute.  */
+  version_assembler_name (decl);
+
   notice_global_symbol (decl);
   node->local.finalized = true;
   node->lowered = DECL_STRUCT_FUNCTION (decl)->cfg != NULL;
Index: multiversion.h
===================================================================
--- multiversion.h	(revision 0)
+++ multiversion.h	(revision 0)
@@ -0,0 +1,52 @@
+/* Function Multiversioning.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+/* This is the header file which provides the functions to keep track
+   of functions that are multi-versioned and to generate the dispatch
+   code to call the right version at run-time.  */
+
+#ifndef GCC_MULTIVERSION_H
+#define GCC_MULTIVERION_H
+
+#include "tree.h"
+
+/* Mark DECL1 and DECL2 as function versions.  */
+int group_function_versions (const tree decl1, const tree decl2);
+
+/* Mark DECL as deleted and no longer a version.  */
+void mark_delete_decl_version (const tree decl);
+
+/* Returns true if DECL is the default version to be executed if all
+   other versions are inappropriate at run-time.  */
+bool is_default_function (const tree decl);
+
+/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL
+   must be the default function in the multi-versioned group.  */
+tree get_ifunc_for_version (const tree decl);
+
+/* Returns true when only one of DECL1 and DECL2 is marked with "targetv"
+   or if the "targetv" attribute strings of  DECL1 and DECL2 dont match.  */
+bool has_different_version_attributes (const tree decl1, const tree decl2);
+
+/* If DECL is a function version and not the default version, the assembler
+   name of DECL is changed to include the attribute string to keep the
+   name unambiguous.  */
+void version_assembler_name (const tree decl);
+#endif
Index: cp/class.c
===================================================================
--- cp/class.c	(revision 184971)
+++ cp/class.c	(working copy)
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-dump.h"
 #include "splay-tree.h"
 #include "pointer-set.h"
+#include "multiversion.h"
 
 /* The number of nested classes being processed.  If we are not in the
    scope of any class, this is zero.  */
@@ -1092,7 +1093,20 @@ add_method (tree type, tree method, tree using_dec
 	      || same_type_p (TREE_TYPE (fn_type),
 			      TREE_TYPE (method_type))))
 	{
-	  if (using_decl)
+	  /* For function versions, their parms and types match
+	     but they are not duplicates.  Record function versions
+	     as and when they are found.  */
+	  if (TREE_CODE (fn) == FUNCTION_DECL
+	      && TREE_CODE (method) == FUNCTION_DECL
+	      && (DECL_FUNCTION_VERSIONED (fn)
+	          || DECL_FUNCTION_VERSIONED (method)))
+ 	    {
+	      DECL_FUNCTION_VERSIONED (fn) = 1;
+	      DECL_FUNCTION_VERSIONED (method) = 1;
+	      group_function_versions (fn, method);
+	      continue;
+	    }
+	  else if (using_decl)
 	    {
 	      if (DECL_CONTEXT (fn) == type)
 		/* Defer to the local function.  */
@@ -1150,6 +1164,13 @@ add_method (tree type, tree method, tree using_dec
   else
     /* Replace the current slot.  */
     VEC_replace (tree, method_vec, slot, overload);
+
+  /* Change the assembler name of method here if it has "targetv"
+     attributes.  Since all versions have the same mangled name,
+     their assembler name is changed by appending the string from
+     the "targetv" attribute. */
+  version_assembler_name (method);
+
   return true;
 }
 
@@ -6890,8 +6911,11 @@ resolve_address_of_overloaded_function (tree targe
 	  if (DECL_ANTICIPATED (fn))
 	    continue;
 
-	  /* See if there's a match.  */
-	  if (same_type_p (target_fn_type, static_fn_type (fn)))
+	  /* See if there's a match.   For functions that are multi-versioned
+	     match it to the default function.  */
+	  if (same_type_p (target_fn_type, static_fn_type (fn))
+	      && (!DECL_FUNCTION_VERSIONED (fn)
+		  || is_default_function (fn)))
 	    matches = tree_cons (fn, NULL_TREE, matches);
 	}
     }
@@ -7053,6 +7077,21 @@ resolve_address_of_overloaded_function (tree targe
       perform_or_defer_access_check (access_path, fn, fn);
     }
 
+  /* If a pointer to a function that is multi-versioned is requested, the
+     pointer to the dispatcher function is returned instead.  This works
+     well because indirectly calling the function will dispatch the right
+     function version at run-time. Also, the function address is kept
+     unique.  */
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && is_default_function (fn))
+    {
+      tree ifunc_decl;
+      ifunc_decl = get_ifunc_for_version (fn);
+      gcc_assert (ifunc_decl != NULL);
+      mark_used (fn);
+      return build_fold_addr_expr (ifunc_decl);
+    }
+
   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
     return cp_build_addr_expr (fn, flags);
   else
Index: cp/decl.c
===================================================================
--- cp/decl.c	(revision 184971)
+++ cp/decl.c	(working copy)
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "pointer-set.h"
 #include "splay-tree.h"
 #include "plugin.h"
+#include "multiversion.h"
 
 /* Possible cases of bad specifiers type used by bad_specifiers. */
 enum bad_spec_place {
@@ -972,6 +973,23 @@ decls_match (tree newdecl, tree olddecl)
       if (t1 != t2)
 	return 0;
 
+      /* The decls dont match if they correspond to two different versions
+	 of the same function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) 
+	  && (DECL_FUNCTION_VERSIONED (newdecl)
+	      || DECL_FUNCTION_VERSIONED (olddecl))
+	  && has_different_version_attributes (newdecl, olddecl))
+	{
+	  /* One of the decls could be the default without the "targetv"
+	     attribute. Set it to be a versioned function here.  */
+	  DECL_FUNCTION_VERSIONED (newdecl) = 1;
+	  DECL_FUNCTION_VERSIONED (olddecl) = 1;
+	  /* Accumulate all the versions of a function.  */
+	  group_function_versions (olddecl, newdecl);
+	  return 0;
+	}
+
       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
 	  && ! (DECL_EXTERN_C_P (newdecl)
 		&& DECL_EXTERN_C_P (olddecl)))
@@ -1482,7 +1500,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
 	      error ("previous declaration %q+#D here", olddecl);
 	      return NULL_TREE;
 	    }
-	  else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
+	  /* For function versions, params and types match, but they
+	     are not ambiguous.  */
+	  else if ((!DECL_FUNCTION_VERSIONED (newdecl)
+		    && !DECL_FUNCTION_VERSIONED (olddecl))
+		   && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
 			      TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
 	    {
 	      error ("new declaration %q#D", newdecl);
@@ -2250,6 +2272,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
+  /* If the olddecl is a version, so is the newdecl.  */
+  if (TREE_CODE (newdecl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (olddecl))
+    {
+      DECL_FUNCTION_VERSIONED (newdecl) = 1;
+      /* Record that newdecl is not a valid version and has
+	 been deleted.  */
+      mark_delete_decl_version (newdecl);
+    }
+
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
     {
       int function_size;
@@ -4512,6 +4544,10 @@ start_decl (const cp_declarator *declarator,
   /* Enter this declaration into the symbol table.  */
   decl = maybe_push_decl (decl);
 
+  /* If this decl is a function version and not the default, its assembler
+     name has to be changed.  */
+  version_assembler_name (decl);
+
   if (processing_template_decl)
     decl = push_template_decl (decl);
   if (decl == error_mark_node)
@@ -13019,6 +13055,10 @@ start_function (cp_decl_specifier_seq *declspecs,
     gcc_assert (same_type_p (TREE_TYPE (TREE_TYPE (decl1)),
 			     integer_type_node));
 
+  /* If this decl is a function version and not the default, its assembler
+     name has to be changed.  */
+  version_assembler_name (decl1);
+
   start_preparsed_function (decl1, attrs, /*flags=*/SF_DEFAULT);
 
   return 1;
@@ -13960,6 +14000,11 @@ cxx_comdat_group (tree decl)
 	    break;
 	}
       name = DECL_ASSEMBLER_NAME (decl);
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl))
+	name = DECL_NAME (decl);
+      else
+        name = DECL_ASSEMBLER_NAME (decl);
     }
 
   return name;
Index: cp/semantics.c
===================================================================
--- cp/semantics.c	(revision 184971)
+++ cp/semantics.c	(working copy)
@@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn)
       /* If the user wants us to keep all inline functions, then mark
 	 this function as needed so that finish_file will make sure to
 	 output it later.  Similarly, all dllexport'd functions must
-	 be emitted; there may be callers in other DLLs.  */
-      if ((flag_keep_inline_functions
+	 be emitted; there may be callers in other DLLs.
+	 Also, mark this function as needed if it is marked inline but
+	 is a multi-versioned function.  */
+      if (((flag_keep_inline_functions
+	    || DECL_FUNCTION_VERSIONED (fn))
 	   && DECL_DECLARED_INLINE_P (fn)
 	   && !DECL_REALLY_EXTERN (fn))
 	  || (flag_keep_inline_dllexport
Index: cp/decl2.c
===================================================================
--- cp/decl2.c	(revision 184971)
+++ cp/decl2.c	(working copy)
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "splay-tree.h"
 #include "langhooks.h"
 #include "c-family/c-ada-spec.h"
+#include "multiversion.h"
 
 extern cpp_reader *parse_in;
 
@@ -674,9 +675,13 @@ check_classfn (tree ctype, tree function, tree tem
 	  if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
 	    continue;
 
+	  /* While finding a match, same types and params are not enough
+	     if the function is versioned.  Also check version ("targetv")
+	     attributes.  */
 	  if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
 			   TREE_TYPE (TREE_TYPE (fndecl)))
 	      && compparms (p1, p2)
+	      && !has_different_version_attributes (function, fndecl)
 	      && (!is_template
 		  || comp_template_parms (template_parms,
 					  DECL_TEMPLATE_PARMS (fndecl)))
Index: cp/call.c
===================================================================
--- cp/call.c	(revision 184971)
+++ cp/call.c	(working copy)
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "timevar.h"
+#include "multiversion.h"
 
 /* The various kinds of conversion.  */
 
@@ -6730,6 +6731,17 @@ build_over_call (struct z_candidate *cand, int fla
   if (!already_used)
     mark_used (fn);
 
+  /* For a call to a multi-versioned function, the call should actually be to
+     the dispatcher.  */
+  if (DECL_FUNCTION_VERSIONED (fn))
+    {
+      tree ifunc_decl;
+      ifunc_decl = get_ifunc_for_version (fn);
+      gcc_assert (ifunc_decl != NULL);
+      return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl,
+					nargs, argarray);
+    }
+
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
     {
       tree t;
@@ -7980,6 +7992,30 @@ joust (struct z_candidate *cand1, struct z_candida
   size_t i;
   size_t len;
 
+  /* For Candidates of a multi-versioned function, the one marked default
+     wins.  This is because the default decl is used as key to aggregate
+     all the other versions provided for it in multiversion.c.  When
+     generating the actual call, the appropriate dispatcher is created
+     to call the right function version at run-time.  */
+
+  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL
+       && DECL_FUNCTION_VERSIONED (cand1->fn))
+      ||(TREE_CODE (cand2->fn) == FUNCTION_DECL
+	 && DECL_FUNCTION_VERSIONED (cand2->fn)))
+    {	
+      if (is_default_function (cand1->fn))
+	{
+          mark_used (cand2->fn);
+	  return 1;
+	}
+      if (is_default_function (cand2->fn))
+	{
+          mark_used (cand1->fn);
+	  return -1;
+	}
+      return 0;
+    }
+
   /* Candidates that involve bad conversions are always worse than those
      that don't.  */
   if (cand1->viable > cand2->viable)
Index: timevar.def
===================================================================
--- timevar.def	(revision 184971)
+++ timevar.def	(working copy)
@@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE        , "tree if-co
 DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
 DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
 DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
+DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch")
 
 /* Everything else in rest_of_compilation not included above.  */
 DEFTIMEVAR (TV_EARLY_LOCAL	     , "early local passes")
Index: varasm.c
===================================================================
--- varasm.c	(revision 184971)
+++ varasm.c	(working copy)
@@ -5755,6 +5755,8 @@ finish_aliases_1 (void)
 	}
       else if (! (p->emitted_diags & ALIAS_DIAG_TO_EXTERN)
 	       && DECL_EXTERNAL (target_decl)
+	       && (!TREE_CODE (target_decl) == FUNCTION_DECL
+		   || !DECL_STRUCT_FUNCTION (target_decl))
 	       /* We use local aliases for C++ thunks to force the tailcall
 		  to bind locally.  This is a hack - to keep it working do
 		  the following (which is not strictly correct).  */
Index: Makefile.in
===================================================================
--- Makefile.in	(revision 184971)
+++ Makefile.in	(working copy)
@@ -1298,6 +1298,7 @@ OBJS = \
 	mcf.o \
 	mode-switching.o \
 	modulo-sched.o \
+	multiversion.o \
 	omega.o \
 	omp-low.o \
 	optabs.o \
@@ -3030,6 +3031,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h
    $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
    $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
    $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
+multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+   $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
+   $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
+   $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
+   $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h
 cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \
Index: passes.c
===================================================================
--- passes.c	(revision 184971)
+++ passes.c	(working copy)
@@ -1190,6 +1190,7 @@ init_optimization_passes (void)
   NEXT_PASS (pass_build_cfg);
   NEXT_PASS (pass_warn_function_return);
   NEXT_PASS (pass_build_cgraph_edges);
+  NEXT_PASS (pass_dispatch_versions);
   *p = NULL;
 
   /* Interprocedural optimization passes.  */
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 184971)
+++ config/i386/i386.c	(working copy)
@@ -27446,6 +27473,593 @@ ix86_init_mmx_sse_builtins (void)
     }
 }

+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
+   to return a pointer to VERSION_DECL if the outcome of the function
+   PREDICATE_DECL is true.  This function will be called during version
+   dispatch to decide which function version to execute.  It returns the
+   basic block at the end to which more conditions can be added.  */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     basic_block new_bb, tree predicate_decl)
+{
+  gimple return_stmt;
+  tree convert_expr, result_var;
+  gimple convert_stmt;
+  gimple call_cond_stmt;
+  gimple if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+
+  tree cond_var;
+  gimple_seq gseq;
+
+  tree old_current_function_decl;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+  current_function_decl = function_decl;
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node, NULL);
+  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  return_stmt = gimple_build_return (result_var);
+
+  if (predicate_decl == NULL_TREE)
+    {
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      current_function_decl = old_current_function_decl;
+      return new_bb;
+    }
+
+  cond_var = create_tmp_var (integer_type_node, NULL);
+  call_cond_stmt = gimple_build_call (predicate_decl, 0);
+  gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+  gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (call_cond_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+  if_else_stmt = gimple_build_cond (GT_EXPR, cond_var,
+	  		            integer_zero_node,
+				    NULL_TREE, NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR, 0);
+
+  rebuild_cgraph_edges ();
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  return bb3;
+}
+
+/* This parses the attribute arguments to targetv in DECL and determines
+   the right builtin to use to match the platform specification.
+   For now, only one target argument ("arch=") is allowed.  */
+
+static enum ix86_builtins
+get_builtin_code_for_version (tree decl)
+{
+  tree attrs;
+  struct cl_target_option cur_target;
+  tree target_node;
+  struct cl_target_option *new_target;
+  enum ix86_builtins builtin_code = IX86_BUILTIN_MAX;
+
+  attrs = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl));
+  gcc_assert (attrs != NULL);
+
+  cl_target_option_save (&cur_target, &global_options);
+
+  target_node = ix86_valid_target_attribute_tree
+		  (TREE_VALUE (TREE_VALUE (attrs)));
+
+  gcc_assert (target_node);
+  new_target = TREE_TARGET_OPTION (target_node);
+  gcc_assert (new_target);
+  
+  if (new_target->arch_specified && new_target->arch > 0)
+    {
+      switch (new_target->arch)
+        {
+	case 1:
+	case 2:
+	case 3:
+	case 4:
+	case 5:
+	case 6:
+	case 7:
+	case 8:
+	case 9:
+	case 10:
+	case 11:
+	  builtin_code = IX86_BUILTIN_CPU_IS_INTEL;
+	  break;
+	case 12:
+	  builtin_code = IX86_BUILTIN_CPU_IS_INTEL_CORE2;
+	  break;
+	case 13:
+	  builtin_code = IX86_BUILTIN_CPU_IS_INTEL_COREI7;
+	  break;
+	case 14:
+	  builtin_code = IX86_BUILTIN_CPU_IS_INTEL_ATOM;
+	  break;
+	case 15:
+	case 16:
+	case 17:
+	case 18:
+	case 19:
+	case 20:
+	case 21:
+	  builtin_code = IX86_BUILTIN_CPU_IS_AMD;
+	  break;
+	case 22:
+	  builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM10H;
+	  break;
+	case 23:
+	  builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER1;
+	  break;
+	case 24:
+	  builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER2;
+	  break;
+	case 25: /* What is btver1 ? */
+	  builtin_code = IX86_BUILTIN_CPU_IS_AMD;
+	  break;
+	}  
+    }    
+
+  cl_target_option_restore (&global_options, &cur_target);
+  if (builtin_code == IX86_BUILTIN_MAX)
+      error_at (DECL_SOURCE_LOCATION (decl),
+		"No dispatcher found for the versioning attributes");
+
+  return builtin_code;
+} 
+
+/* This is the target hook to generate the dispatch function for
+   multi-versioned functions.  DISPATCH_DECL is the function which will
+   contain the dispatch logic.  FNDECLS are the function choices for
+   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
+   in DISPATCH_DECL in which the dispatch code is generated.  */
+
+static int
+ix86_dispatch_version (tree dispatch_decl,
+		       void *fndecls_p,
+		       basic_block *empty_bb)
+{
+  tree default_decl;
+  gimple ifunc_cpu_init_stmt;
+  gimple_seq gseq;
+  tree old_current_function_decl;
+  int ix;
+  tree ele;
+  VEC (tree, heap) *fndecls;
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /*fndecls_p is actually a vector.  */
+  fndecls = (VEC (tree, heap) *)fndecls_p;
+
+  /* Atleast one more version other than the default.  */
+  gcc_assert (VEC_length (tree, fndecls) >= 2);
+
+  /* The first version in the vector is the default decl.  */
+  default_decl = VEC_index (tree, fndecls, 0);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
+  current_function_decl = dispatch_decl;
+
+  gseq = bb_seq (*empty_bb);
+  ifunc_cpu_init_stmt = gimple_build_call_vec (
+                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
+  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
+  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
+  set_bb_seq (*empty_bb, gseq);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+
+  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
+    {
+      tree version_decl = ele;
+      /* Get attribute string, parse it and find the right predicate decl.
+         The predicate function could be a lengthy combination of many
+	 features, like arch-type and various isa-variants.  For now, only
+	 check the arch-type.  */
+      tree predicate_decl = ix86_builtins [
+			get_builtin_code_for_version (version_decl)];
+      *empty_bb = add_condition_to_bb (dispatch_decl, version_decl, *empty_bb,
+				       predicate_decl);
+
+    }
+  /* dispatch default version at the end.  */
+  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl, *empty_bb,
+				   NULL);
+  return 0;
+}

@@ -38610,6 +39269,12 @@ ix86_autovectorize_vector_sizes (void)
 #undef TARGET_BUILD_BUILTIN_VA_LIST
 #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list
 
+#undef TARGET_DISPATCH_VERSION
+#define TARGET_DISPATCH_VERSION ix86_dispatch_version
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list

Index: testsuite/g++.dg/mv1.C
===================================================================
--- testsuite/g++.dg/mv1.C	(revision 0)
+++ testsuite/g++.dg/mv1.C	(revision 0)
@@ -0,0 +1,23 @@
+/* Simple test case to check if Multiversioning works.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+int foo ();
+int foo () __attribute__ ((targetv("arch=corei7")));
+
+int main ()
+{
+  int (*p)() = &foo;
+  return foo () + (*p)();
+}
+
+int foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((targetv("arch=corei7")))
+foo ()
+{
+  return 0;
+}
 

--
This patch is available for review at http://codereview.appspot.com/5752064

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-03-07  0:47 User directed Function Multiversioning via Function Overloading (issue5752064) Sriraman Tallam
@ 2012-03-07 14:05 ` Richard Guenther
  2012-03-07 19:08   ` Sriraman Tallam
                     ` (2 more replies)
  0 siblings, 3 replies; 93+ messages in thread
From: Richard Guenther @ 2012-03-07 14:05 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: reply, gcc-patches

On Wed, Mar 7, 2012 at 1:46 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> User directed Function Multiversioning (MV) via Function Overloading
> ====================================================================
>
> This patch adds support for user directed function MV via function overloading.
> For more detailed description:
> http://gcc.gnu.org/ml/gcc/2012-03/msg00074.html
>
>
> Here is an example program with function versions:
>
> int foo ();  /* Default version */
> int foo () __attribute__ ((targetv("arch=corei7")));/*Specialized for corei7 */
> int foo () __attribute__ ((targetv("arch=core2")));/*Specialized for core2 */
>
> int main ()
> {
>  int (*p)() = &foo;
>  return foo () + (*p)();
> }
>
> int foo ()
> {
>  return 0;
> }
>
> int __attribute__ ((targetv("arch=corei7")))
> foo ()
> {
>  return 0;
> }
>
> int __attribute__ ((targetv("arch=core2")))
> foo ()
> {
>  return 0;
> }
>
> The above example has foo defined 3 times, but all 3 definitions of foo are
> different versions of the same function. The call to foo in main, directly and
> via a pointer, are calls to the multi-versioned function foo which is dispatched
> to the right foo at run-time.
>
> Function versions must have the same signature but must differ in the specifier
> string provided to a new attribute called "targetv", which is nothing but the
> target attribute with an extra specification to indicate a version. Any number
> of versions can be created using the targetv attribute but it is mandatory to
> have one function without the attribute, which is treated as the default
> version.
>
> The dispatching is done using the IFUNC mechanism to keep the dispatch overhead
> low. The compiler creates a dispatcher function which checks the CPU type and
> calls the right version of foo. The dispatching code checks for the platform
> type and calls the first version that matches. The default function is called if
> no specialized version is appropriate for execution.
>
> The pointer to foo is made to be the address of the dispatcher function, so that
> it is unique and calls made via the pointer also work correctly. The assembler
> names of the various versions of foo is made different, by tagging
> the specifier strings, to keep them unique.  A specific version can be called
> directly by creating an alias to its assembler name. For instance, to call the
> corei7 version directly, make an alias :
> int foo_corei7 () __attribute__((alias ("_Z3foov.arch_corei7")));
> and then call foo_corei7.
>
> Note that using IFUNC  blocks inlining of versioned functions. I had implemented
> an optimization earlier to do hot path cloning to allow versioned functions to
> be inlined. Please see : http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02285.html
> In the next iteration, I plan to merge these two. With that, hot code paths with
> versioned functions will be cloned so that versioned functions can be inlined.

Note that inlining of functions with the target attribute is limited as well,
but your issue is that of the indirect dispatch as ...

You don't give an overview of the frontend implementation.  Thus I have
extracted the following

 - the FE does not really know about the "overloading", nor can it directly
   resolve calls from a "sse" function to another "sse" function without going
   through the 2nd IFUNC

 - cgraph also does not know about the "overloading", so it cannot do such
   "devirtualization" either

you seem to have implemented something inbetween a pure frontend
solution and a proper middle-end solution.  For optimization and eventually
automatically selecting functions for cloning (like, callees of a manual "sse"
versioned function should be cloned?) it would be nice if the cgraph would
know about the different versions and their relationships (and the dispatcher).
Especially the cgraph code should know the functions are semantically
equivalent (I suppose we should require that).  The IFUNC should be
generated by cgraph / target code, similar to how we generate C++ thunks.

Honza, any suggestions on how the FE side of such cgraph infrastructure
should look like and how we should encode the target bits?

Thanks,
Richard.

>        * doc/tm.texi.in: Add description for TARGET_DISPATCH_VERSION.
>        * doc/tm.texi: Regenerate.
>        * c-family/c-common.c (handle_targetv_attribute): New function.
>        * target.def (dispatch_version): New target hook.
>        * tree.h (DECL_FUNCTION_VERSIONED): New macro.
>        (tree_function_decl): New bit-field versioned_function.
>        * tree-pass.h (pass_dispatch_versions): New pass.
>        * multiversion.c: New file.
>        * multiversion.h: New file.
>        * cgraphunit.c: Include multiversion.h
>        (cgraph_finalize_function): Change assembler names of versioned
>        functions.
>        * cp/class.c: Include multiversion.h
>        (add_method): aggregate function versions. Change assembler names of
>        versioned functions.
>        (resolve_address_of_overloaded_function): Match address of function
>        version with default function.  Return address of ifunc dispatcher
>        for address of versioned functions.
>        * cp/decl.c (decls_match): Make decls unmatched for versioned
>        functions.
>        (duplicate_decls): Remove ambiguity for versioned functions. Notify
>        of deleted function version decls.
>        (start_decl): Change assembler name of versioned functions.
>        (start_function): Change assembler name of versioned functions.
>        (cxx_comdat_group): Make comdat group of versioned functions be the
>        same.
>        * cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
>        functions that are also marked inline.
>        * cp/decl2.c: Include multiversion.h
>        (check_classfn): Check attributes of versioned functions for match.
>        * cp/call.c: Include multiversion.h
>        (build_over_call): Make calls to multiversioned functions to call the
>        dispatcher.
>        (joust): For calls to multi-versioned functions, make the default
>        function win.
>        * timevar.def (TV_MULTIVERSION_DISPATCH): New time var.
>        * varasm.c (finish_aliases_1): Check if the alias points to a function
>        with a body before giving an error.
>        * Makefile.in: Add multiversion.o
>        * passes.c: Add pass_dispatch_versions to the pass list.
>        * config/i386/i386.c (add_condition_to_bb): New function.
>        (get_builtin_code_for_version): New function.
>        (ix86_dispatch_version): New function.
>        (TARGET_DISPATCH_VERSION): New macro.
>        * testsuite/g++.dg/mv1.C: New test.
>
> Index: doc/tm.texi
> ===================================================================
> --- doc/tm.texi (revision 184971)
> +++ doc/tm.texi (working copy)
> @@ -10995,6 +10995,14 @@ The result is another tree containing a simplified
>  call's result.  If @var{ignore} is true the value will be ignored.
>  @end deftypefn
>
> +@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb})
> +For multi-versioned function, this hook sets up the dispatcher.
> +@var{dispatch_decl} is the function that will be used to dispatch the
> +version. @var{fndecls} are the function choices for dispatch.
> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the
> +code to do the dispatch will be added.
> +@end deftypefn
> +
>  @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
>
>  Take an instruction in @var{insn} and return NULL if it is valid within a
> Index: doc/tm.texi.in
> ===================================================================
> --- doc/tm.texi.in      (revision 184971)
> +++ doc/tm.texi.in      (working copy)
> @@ -10873,6 +10873,14 @@ The result is another tree containing a simplified
>  call's result.  If @var{ignore} is true the value will be ignored.
>  @end deftypefn
>
> +@hook TARGET_DISPATCH_VERSION
> +For multi-versioned function, this hook sets up the dispatcher.
> +@var{dispatch_decl} is the function that will be used to dispatch the
> +version. @var{fndecls} are the function choices for dispatch.
> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the
> +code to do the dispatch will be added.
> +@end deftypefn
> +
>  @hook TARGET_INVALID_WITHIN_DOLOOP
>
>  Take an instruction in @var{insn} and return NULL if it is valid within a
> Index: c-family/c-common.c
> ===================================================================
> --- c-family/c-common.c (revision 184971)
> +++ c-family/c-common.c (working copy)
> @@ -315,6 +315,7 @@ static tree check_case_value (tree);
>  static bool check_case_bounds (tree, tree, tree *, tree *);
>
>  static tree handle_packed_attribute (tree *, tree, tree, int, bool *);
> +static tree handle_targetv_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_common_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *);
> @@ -604,6 +605,8 @@ const struct attribute_spec c_common_attribute_tab
>  {
>   /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
>        affects_type_identity } */
> +  { "targetv",               1, -1, true, false, false,
> +                             handle_targetv_attribute, false },
>   { "packed",                 0, 0, false, false, false,
>                              handle_packed_attribute , false},
>   { "nocommon",               0, 0, true,  false, false,
> @@ -5869,6 +5872,54 @@ handle_packed_attribute (tree *node, tree name, tr
>   return NULL_TREE;
>  }
>
> +/* The targetv attribue is used to specify a function version
> +   targeted to specific platform types.  The "targetv" attributes
> +   have to be valid "target" attributes.  NODE should always point
> +   to a FUNCTION_DECL.  ARGS contain the arguments to "targetv"
> +   which should be valid arguments to attribute "target" too.
> +   Check handle_target_attribute for FLAGS and NO_ADD_ATTRS.  */
> +
> +static tree
> +handle_targetv_attribute (tree *node, tree name,
> +                         tree args,
> +                         int flags,
> +                         bool *no_add_attrs)
> +{
> +  const char *attr_str = NULL;
> +  gcc_assert (TREE_CODE (*node) == FUNCTION_DECL);
> +  gcc_assert (args != NULL);
> +
> +  /* This is a function version.  */
> +  DECL_FUNCTION_VERSIONED (*node) = 1;
> +
> +  attr_str = TREE_STRING_POINTER (TREE_VALUE (args));
> +
> +  /* Check if multiple sets of target attributes are there.  This
> +     is not supported now.   In future, this will be supported by
> +     cloning this function for each set.  */
> +  if (TREE_CHAIN (args) != NULL)
> +    warning (OPT_Wattributes, "%qE attribute has multiple sets which "
> +            "is not supported", name);
> +
> +  if (attr_str == NULL
> +      || strstr (attr_str, "arch=") == NULL)
> +    error_at (DECL_SOURCE_LOCATION (*node),
> +             "Versioning supported only on \"arch=\" for now");
> +
> +  /* targetv attributes must translate into target attributes.  */
> +  handle_target_attribute (node, get_identifier ("target"), args, flags,
> +                          no_add_attrs);
> +
> +  if (*no_add_attrs)
> +    warning (OPT_Wattributes, "%qE attribute has no effect", name);
> +
> +  /* This is necessary to keep the attribute tagged to the decl
> +     all the time.  */
> +  *no_add_attrs = false;
> +
> +  return NULL_TREE;
> +}
> +
>  /* Handle a "nocommon" attribute; arguments as in
>    struct attribute_spec.handler.  */
>
> Index: target.def
> ===================================================================
> --- target.def  (revision 184971)
> +++ target.def  (working copy)
> @@ -1249,6 +1249,15 @@ DEFHOOK
>  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
>  hook_tree_tree_int_treep_bool_null)
>
> +/* Target hook to generate the dispatching code for calls to multi-versioned
> +   functions.  DISPATCH_DECL is the function that will have the dispatching
> +   logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the
> +   basic bloc in DISPATCH_DECL which will contain the code.  */
> +DEFHOOK
> +(dispatch_version,
> + "",
> + int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL)
> +
>  /* Returns a code for a target-specific builtin that implements
>    reciprocal of the function, or NULL_TREE if not available.  */
>  DEFHOOK
> Index: tree.h
> ===================================================================
> --- tree.h      (revision 184971)
> +++ tree.h      (working copy)
> @@ -3532,6 +3532,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
>  #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
>    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
>
> +/* In FUNCTION_DECL, this is set if this function has other versions generated
> +   using "targetv" attributes.  The default version is the one which does not
> +   have any "targetv" attribute set. */
> +#define DECL_FUNCTION_VERSIONED(NODE)\
> +   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
> +
>  /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
>    arguments/result/saved_tree fields by front ends.   It was either inherit
>    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
> @@ -3576,8 +3582,8 @@ struct GTY(()) tree_function_decl {
>   unsigned looping_const_or_pure_flag : 1;
>   unsigned has_debug_args_flag : 1;
>   unsigned tm_clone_flag : 1;
> -
> -  /* 1 bit left */
> +  unsigned versioned_function : 1;
> +  /* No bits left.  */
>  };
>
>  /* The source language of the translation-unit.  */
> Index: tree-pass.h
> ===================================================================
> --- tree-pass.h (revision 184971)
> +++ tree-pass.h (working copy)
> @@ -455,6 +455,7 @@ extern struct gimple_opt_pass pass_tm_memopt;
>  extern struct gimple_opt_pass pass_tm_edges;
>  extern struct gimple_opt_pass pass_split_functions;
>  extern struct gimple_opt_pass pass_feedback_split_functions;
> +extern struct gimple_opt_pass pass_dispatch_versions;
>
>  /* IPA Passes */
>  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
> Index: multiversion.c
> ===================================================================
> --- multiversion.c      (revision 0)
> +++ multiversion.c      (revision 0)
> @@ -0,0 +1,798 @@
> +/* Function Multiversioning.
> +   Copyright (C) 2012 Free Software Foundation, Inc.
> +   Contributed by Sriraman Tallam (tmsriram@google.com)
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>. */
> +
> +/* Holds the state for multi-versioned functions here. The front-end
> +   updates the state as and when function versions are encountered.
> +   This is then used to generate the dispatch code.  Also, the
> +   optimization passes to clone hot paths involving versioned functions
> +   will be done here.
> +
> +   Function versions are created by using the same function signature but
> +   also tagging attribute "targetv" to specify the platform type for which
> +   the version must be executed.  Here is an example:
> +
> +   int foo ()
> +   {
> +     printf ("Execute as default");
> +     return 0;
> +   }
> +
> +   int  __attribute__ ((targetv ("arch=corei7")))
> +   foo ()
> +   {
> +     printf ("Execute for corei7");
> +     return 0;
> +   }
> +
> +   int main ()
> +   {
> +     return foo ();
> +   }
> +
> +   The call to foo in main is replaced with a call to an IFUNC function that
> +   contains the dispatch code to call the correct function version at
> +   run-time.  */
> +
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#include "tree.h"
> +#include "tree-inline.h"
> +#include "langhooks.h"
> +#include "flags.h"
> +#include "cgraph.h"
> +#include "diagnostic.h"
> +#include "toplev.h"
> +#include "timevar.h"
> +#include "params.h"
> +#include "fibheap.h"
> +#include "intl.h"
> +#include "tree-pass.h"
> +#include "hashtab.h"
> +#include "coverage.h"
> +#include "ggc.h"
> +#include "tree-flow.h"
> +#include "rtl.h"
> +#include "ipa-prop.h"
> +#include "basic-block.h"
> +#include "toplev.h"
> +#include "dbgcnt.h"
> +#include "tree-dump.h"
> +#include "output.h"
> +#include "vecprim.h"
> +#include "gimple-pretty-print.h"
> +#include "ipa-inline.h"
> +#include "target.h"
> +#include "multiversion.h"
> +
> +typedef void * void_p;
> +
> +DEF_VEC_P (void_p);
> +DEF_VEC_ALLOC_P (void_p, heap);
> +
> +/* Each function decl that is a function version gets an instance of this
> +   structure.   Since this is called by the front-end, decl merging can
> +   happen, where a decl created for a new declaration is merged with
> +   the old. In this case, the new decl is deleted and the IS_DELETED
> +   field is set for the struct instance corresponding to the new decl.
> +   IFUNC_DECL is the decl of the ifunc function for default decls.
> +   IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS
> +   is a vector containing the list of function versions  that are
> +   the candidates for dispatch.  */
> +
> +typedef struct version_function_d {
> +  tree decl;
> +  tree ifunc_decl;
> +  tree ifunc_resolver_decl;
> +  VEC (void_p, heap) *versions;
> +  bool is_deleted;
> +} version_function;
> +
> +/* Hashmap has an entry for every function decl that has other function
> +   versions.  For function decls that are the default, it also stores the
> +   list of all the other function versions.  Each entry is a structure
> +   of type version_function_d.  */
> +static htab_t decl_version_htab = NULL;
> +
> +/* Hashtable helpers for decl_version_htab. */
> +
> +static hashval_t
> +decl_version_htab_hash_descriptor (const void *p)
> +{
> +  const version_function *t = (const version_function *) p;
> +  return htab_hash_pointer (t->decl);
> +}
> +
> +/* Hashtable helper for decl_version_htab. */
> +
> +static int
> +decl_version_htab_eq_descriptor (const void *p1, const void *p2)
> +{
> +  const version_function *t1 = (const version_function *) p1;
> +  return htab_eq_pointer ((const void_p) t1->decl, p2);
> +}
> +
> +/* Create the decl_version_htab.  */
> +static void
> +create_decl_version_htab (void)
> +{
> +  if (decl_version_htab == NULL)
> +    decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor,
> +                                    decl_version_htab_eq_descriptor, NULL);
> +}
> +
> +/* Creates an instance of version_function for decl DECL.  */
> +
> +static version_function*
> +new_version_function (const tree decl)
> +{
> +  version_function *v;
> +  v = (version_function *)xmalloc(sizeof (version_function));
> +  v->decl = decl;
> +  v->ifunc_decl = NULL;
> +  v->ifunc_resolver_decl = NULL;
> +  v->versions = NULL;
> +  v->is_deleted = false;
> +  return v;
> +}
> +
> +/* Comparator function to be used in qsort routine to sort attribute
> +   specification strings to "targetv".  */
> +
> +static int
> +attr_strcmp (const void *v1, const void *v2)
> +{
> +  const char *c1 = *(char *const*)v1;
> +  const char *c2 = *(char *const*)v2;
> +  return strcmp (c1, c2);
> +}
> +
> +/* STR is the argument to targetv attribute.  This function tokenizes
> +   the comma separated arguments, sorts them and returns a string which
> +   is a unique identifier for the comma separated arguments.  */
> +
> +static char *
> +sorted_attr_string (const char *str)
> +{
> +  char **args = NULL;
> +  char *attr_str, *ret_str;
> +  char *attr = NULL;
> +  unsigned int argnum = 1;
> +  unsigned int i;
> +
> +  for (i = 0; i < strlen (str); i++)
> +    if (str[i] == ',')
> +      argnum++;
> +
> +  attr_str = (char *)xmalloc (strlen (str) + 1);
> +  strcpy (attr_str, str);
> +
> +  for (i = 0; i < strlen (attr_str); i++)
> +    if (attr_str[i] == '=')
> +      attr_str[i] = '_';
> +
> +  if (argnum == 1)
> +    return attr_str;
> +
> +  args = (char **)xmalloc (argnum * sizeof (char *));
> +
> +  i = 0;
> +  attr = strtok (attr_str, ",");
> +  while (attr != NULL)
> +    {
> +      args[i] = attr;
> +      i++;
> +      attr = strtok (NULL, ",");
> +    }
> +
> +  qsort (args, argnum, sizeof (char*), attr_strcmp);
> +
> +  ret_str = (char *)xmalloc (strlen (str) + 1);
> +  strcpy (ret_str, args[0]);
> +  for (i = 1; i < argnum; i++)
> +    {
> +      strcat (ret_str, "_");
> +      strcat (ret_str, args[i]);
> +    }
> +
> +  free (args);
> +  free (attr_str);
> +  return ret_str;
> +}
> +
> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv"
> +   or if the "targetv" attribute strings of DECL1 and DECL2 dont match.  */
> +
> +bool
> +has_different_version_attributes (const tree decl1, const tree decl2)
> +{
> +  tree attr1, attr2;
> +  char *c1, *c2;
> +  bool ret = false;
> +
> +  if (TREE_CODE (decl1) != FUNCTION_DECL
> +      || TREE_CODE (decl2) != FUNCTION_DECL)
> +    return false;
> +
> +  attr1 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl1));
> +  attr2 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl2));
> +
> +  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
> +    return false;
> +
> +  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
> +      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
> +    return true;
> +
> +  c1 = sorted_attr_string (
> +       TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1))));
> +  c2 = sorted_attr_string (
> +       TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2))));
> +
> +  if (strcmp (c1, c2) != 0)
> +     ret = true;
> +
> +  free (c1);
> +  free (c2);
> +
> +  return ret;
> +}
> +
> +/* If this decl corresponds to a function and has "targetv" attribute,
> +   append the attribute string to its assembler name.  */
> +
> +void
> +version_assembler_name (const tree decl)
> +{
> +  tree version_attr;
> +  const char *orig_name, *version_string, *attr_str;
> +  char *assembler_name;
> +  tree assembler_name_tree;
> +
> +  if (TREE_CODE (decl) != FUNCTION_DECL
> +      || DECL_ASSEMBLER_NAME_SET_P (decl)
> +      || !DECL_FUNCTION_VERSIONED (decl))
> +    return;
> +
> +  if (DECL_DECLARED_INLINE_P (decl)
> +      &&lookup_attribute ("gnu_inline",
> +                         DECL_ATTRIBUTES (decl)))
> +    error_at (DECL_SOURCE_LOCATION (decl),
> +             "Function versions cannot be marked as gnu_inline,"
> +             " bodies have to be generated\n");
> +
> +  if (DECL_VIRTUAL_P (decl)
> +      || DECL_VINDEX (decl))
> +    error_at (DECL_SOURCE_LOCATION (decl),
> +             "Virtual function versioning not supported\n");
> +
> +  version_attr = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl));
> +  /* targetv attribute string is NULL for default functions.  */
> +  if (version_attr == NULL_TREE)
> +    return;
> +
> +  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
> +  version_string
> +    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
> +
> +  attr_str = sorted_attr_string (version_string);
> +  assembler_name = (char *) xmalloc (strlen (orig_name)
> +                                    + strlen (attr_str) + 2);
> +
> +  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
> +  if (dump_file)
> +    fprintf (dump_file, "Assembler name set to %s for function version %s\n",
> +            assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
> +  assembler_name_tree = get_identifier (assembler_name);
> +  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree);
> +}
> +
> +/* Returns true if decl is multi-versioned and DECL is the default function,
> +   that is it is not tagged with "targetv" attribute.  */
> +
> +bool
> +is_default_function (const tree decl)
> +{
> +  return (TREE_CODE (decl) == FUNCTION_DECL
> +         && DECL_FUNCTION_VERSIONED (decl)
> +         && (lookup_attribute ("targetv", DECL_ATTRIBUTES (decl))
> +             == NULL_TREE));
> +}
> +
> +/* For function decl DECL, find the version_function struct in the
> +   decl_version_htab.  */
> +
> +static version_function *
> +find_function_version (const tree decl)
> +{
> +  void *slot;
> +
> +  if (!DECL_FUNCTION_VERSIONED (decl))
> +    return NULL;
> +
> +  if (!decl_version_htab)
> +    return NULL;
> +
> +  slot = htab_find_with_hash (decl_version_htab, decl,
> +                              htab_hash_pointer (decl));
> +
> +  if (slot != NULL)
> +    return (version_function *)slot;
> +
> +  return NULL;
> +}
> +
> +/* Record DECL as a function version by creating a version_function struct
> +   for it and storing it in the hashtable.  */
> +
> +static version_function *
> +add_function_version (const tree decl)
> +{
> +  void **slot;
> +  version_function *v;
> +
> +  if (!DECL_FUNCTION_VERSIONED (decl))
> +    return NULL;
> +
> +  create_decl_version_htab ();
> +
> +  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl,
> +                                   htab_hash_pointer ((const void_p)decl),
> +                                  INSERT);
> +
> +  if (*slot != NULL)
> +    return (version_function *)*slot;
> +
> +  v = new_version_function (decl);
> +  *slot = v;
> +
> +  return v;
> +}
> +
> +/* Push V into VEC only if it is not already present.  */
> +
> +static void
> +push_function_version (version_function *v, VEC (void_p, heap) *vec)
> +{
> +  int ix;
> +  void_p ele;
> +  for (ix = 0; VEC_iterate (void_p, vec, ix, ele); ++ix)
> +    {
> +      if (ele == (void_p)v)
> +        return;
> +    }
> +
> +  VEC_safe_push (void_p, heap, vec, (void*)v);
> +}
> +
> +/* Mark DECL as deleted.  This is called by the front-end when a duplicate
> +   decl is merged with the original decl and the duplicate decl is deleted.
> +   This function marks the duplicate_decl as invalid.  Called by
> +   duplicate_decls in cp/decl.c.  */
> +
> +void
> +mark_delete_decl_version (const tree decl)
> +{
> +  version_function *decl_v;
> +
> +  decl_v = find_function_version (decl);
> +
> +  if (decl_v == NULL)
> +    return;
> +
> +  decl_v->is_deleted = true;
> +
> +  if (is_default_function (decl)
> +      && decl_v->versions != NULL)
> +    {
> +      VEC_truncate (void_p, decl_v->versions, 0);
> +      VEC_free (void_p, heap, decl_v->versions);
> +    }
> +}
> +
> +/* Mark DECL1 and DECL2 to be function versions in the same group.  One
> +   of DECL1 and DECL2 must be the default, otherwise this function does
> +   nothing.  This function aggregates the versions.  */
> +
> +int
> +group_function_versions (const tree decl1, const tree decl2)
> +{
> +  tree default_decl, version_decl;
> +  version_function *default_v, *version_v;
> +
> +  gcc_assert (DECL_FUNCTION_VERSIONED (decl1)
> +             && DECL_FUNCTION_VERSIONED (decl2));
> +
> +  /* The version decls are added only to the default decl.  */
> +  if (!is_default_function (decl1)
> +      && !is_default_function (decl2))
> +    return 0;
> +
> +  /* This can happen with duplicate declarations.  Just ignore.  */
> +  if (is_default_function (decl1)
> +      && is_default_function (decl2))
> +    return 0;
> +
> +  default_decl = (is_default_function (decl1)) ? decl1 : decl2;
> +  version_decl = (default_decl == decl1) ? decl2 : decl1;
> +
> +  gcc_assert (default_decl != version_decl);
> +  create_decl_version_htab ();
> +
> +  /* If the version function is found, it has been added.  */
> +  if (find_function_version (version_decl))
> +    return 0;
> +
> +  default_v = add_function_version (default_decl);
> +  version_v = add_function_version (version_decl);
> +
> +  if (default_v->versions == NULL)
> +    default_v->versions = VEC_alloc (void_p, heap, 1);
> +
> +  push_function_version (version_v, default_v->versions);
> +  return 0;
> +}
> +
> +/* Makes a function attribute of the form NAME(ARG_NAME) and chains
> +   it to CHAIN.  */
> +
> +static tree
> +make_attribute (const char *name, const char *arg_name, tree chain)
> +{
> +  tree attr_name;
> +  tree attr_arg_name;
> +  tree attr_args;
> +  tree attr;
> +
> +  attr_name = get_identifier (name);
> +  attr_arg_name = build_string (strlen (arg_name), arg_name);
> +  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
> +  attr = tree_cons (attr_name, attr_args, chain);
> +  return attr;
> +}
> +
> +/* Return a new name by appending SUFFIX to the DECL name.  If
> +   make_unique is true, append the full path name.  */
> +
> +static char *
> +make_name (tree decl, const char *suffix, bool make_unique)
> +{
> +  char *global_var_name;
> +  int name_len;
> +  const char *name;
> +  const char *unique_name = NULL;
> +
> +  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
> +
> +  /* Get a unique name that can be used globally without any chances
> +     of collision at link time.  */
> +  if (make_unique)
> +    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
> +
> +  name_len = strlen (name) + strlen (suffix) + 2;
> +
> +  if (make_unique)
> +    name_len += strlen (unique_name) + 1;
> +  global_var_name = (char *) xmalloc (name_len);
> +
> +  /* Use '.' to concatenate names as it is demangler friendly.  */
> +  if (make_unique)
> +      snprintf (global_var_name, name_len, "%s.%s.%s", name,
> +               unique_name, suffix);
> +  else
> +      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
> +
> +  return global_var_name;
> +}
> +
> +/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch
> +   the versions of multi-versioned function DEFAULT_DECL.  Create and
> +   empty basic block in the resolver and store the pointer in
> +   EMPTY_BB.  Return the decl of the resolver function.  */
> +
> +static tree
> +make_ifunc_resolver_func (const tree default_decl,
> +                         const tree ifunc_decl,
> +                         basic_block *empty_bb)
> +{
> +  char *resolver_name;
> +  tree decl, type, decl_name, t;
> +  basic_block new_bb;
> +  tree old_current_function_decl;
> +  bool make_unique = false;
> +
> +  /* IFUNC's have to be globally visible.  So, if the default_decl is
> +     not, then the name of the IFUNC should be made unique.  */
> +  if (TREE_PUBLIC (default_decl) == 0)
> +    make_unique = true;
> +
> +  /* Append the filename to the resolver function if the versions are
> +     not externally visible.  This is because the resolver function has
> +     to be externally visible for the loader to find it.  So, appending
> +     the filename will prevent conflicts with a resolver function from
> +     another module which is based on the same version name.  */
> +  resolver_name = make_name (default_decl, "resolver", make_unique);
> +
> +  /* The resolver function should return a (void *). */
> +  type = build_function_type_list (ptr_type_node, NULL_TREE);
> +
> +  decl = build_fn_decl (resolver_name, type);
> +  decl_name = get_identifier (resolver_name);
> +  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
> +
> +  DECL_NAME (decl) = decl_name;
> +  TREE_USED (decl) = TREE_USED (default_decl);
> +  DECL_ARTIFICIAL (decl) = 1;
> +  DECL_IGNORED_P (decl) = 0;
> +  /* IFUNC resolvers have to be externally visible.  */
> +  TREE_PUBLIC (decl) = 1;
> +  DECL_UNINLINABLE (decl) = 1;
> +
> +  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl);
> +  DECL_EXTERNAL (ifunc_decl) = 0;
> +
> +  DECL_CONTEXT (decl) = NULL_TREE;
> +  DECL_INITIAL (decl) = make_node (BLOCK);
> +  DECL_STATIC_CONSTRUCTOR (decl) = 0;
> +  TREE_READONLY (decl) = 0;
> +  DECL_PURE_P (decl) = 0;
> +  DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
> +  if (DECL_COMDAT_GROUP (default_decl))
> +    {
> +      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
> +    }
> +  /* Build result decl and add to function_decl. */
> +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
> +  DECL_ARTIFICIAL (t) = 1;
> +  DECL_IGNORED_P (t) = 1;
> +  DECL_RESULT (decl) = t;
> +
> +  gimplify_function_tree (decl);
> +  old_current_function_decl = current_function_decl;
> +  push_cfun (DECL_STRUCT_FUNCTION (decl));
> +  current_function_decl = decl;
> +  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
> +  cfun->curr_properties |=
> +    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
> +     PROP_ssa);
> +  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
> +  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
> +  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
> +  *empty_bb = new_bb;
> +
> +  cgraph_add_new_function (decl, true);
> +  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
> +  cgraph_analyze_function (cgraph_get_create_node (decl));
> +  cgraph_mark_needed_node (cgraph_get_create_node (decl));
> +
> +  if (DECL_COMDAT_GROUP (default_decl))
> +    {
> +      gcc_assert (cgraph_get_node (default_decl));
> +      cgraph_add_to_same_comdat_group (cgraph_get_node (decl),
> +                                      cgraph_get_node (default_decl));
> +    }
> +
> +  pop_cfun ();
> +  current_function_decl = old_current_function_decl;
> +
> +  gcc_assert (ifunc_decl != NULL);
> +  DECL_ATTRIBUTES (ifunc_decl)
> +    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl));
> +  assemble_alias (ifunc_decl, get_identifier (resolver_name));
> +  return decl;
> +}
> +
> +/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to
> +   DECL function will be replaced with calls to the ifunc.   Return the decl
> +   of the ifunc created.  */
> +
> +static tree
> +make_ifunc_func (const tree decl)
> +{
> +  tree ifunc_decl;
> +  char *ifunc_name, *resolver_name;
> +  tree fn_type, ifunc_type;
> +  bool make_unique = false;
> +
> +  if (TREE_PUBLIC (decl) == 0)
> +    make_unique = true;
> +
> +  ifunc_name = make_name (decl, "ifunc", make_unique);
> +  resolver_name = make_name (decl, "resolver", make_unique);
> +  gcc_assert (resolver_name);
> +
> +  fn_type = TREE_TYPE (decl);
> +  ifunc_type = build_function_type (TREE_TYPE (fn_type),
> +                                   TYPE_ARG_TYPES (fn_type));
> +
> +  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type);
> +  TREE_USED (ifunc_decl) = 1;
> +  DECL_CONTEXT (ifunc_decl) = NULL_TREE;
> +  DECL_INITIAL (ifunc_decl) = error_mark_node;
> +  DECL_ARTIFICIAL (ifunc_decl) = 1;
> +  /* Mark this ifunc as external, the resolver will flip it again if
> +     it gets generated.  */
> +  DECL_EXTERNAL (ifunc_decl) = 1;
> +  /* IFUNCs have to be externally visible.  */
> +  TREE_PUBLIC (ifunc_decl) = 1;
> +
> +  return ifunc_decl;
> +}
> +
> +/* For multi-versioned function decl, which should also be the default,
> +   return the decl of the ifunc resolver, create it if it does not
> +   exist.  */
> +
> +tree
> +get_ifunc_for_version (const tree decl)
> +{
> +  version_function *decl_v;
> +  int ix;
> +  void_p ele;
> +
> +  /* DECL has to be the default version, otherwise it is missing and
> +     that is not allowed.  */
> +  if (!is_default_function (decl))
> +    {
> +      error_at (DECL_SOURCE_LOCATION (decl), "Default version not found");
> +      return decl;
> +    }
> +
> +  decl_v = find_function_version (decl);
> +  gcc_assert (decl_v != NULL);
> +  if (decl_v->ifunc_decl == NULL)
> +    {
> +      tree ifunc_decl;
> +      ifunc_decl = make_ifunc_func (decl);
> +      decl_v->ifunc_decl = ifunc_decl;
> +    }
> +
> +  if (cgraph_get_node (decl))
> +    cgraph_mark_needed_node (cgraph_get_node (decl));
> +
> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
> +    {
> +      version_function *v = (version_function *) ele;
> +      gcc_assert (v->decl != NULL);
> +      if (cgraph_get_node (v->decl))
> +       cgraph_mark_needed_node (cgraph_get_node (v->decl));
> +    }
> +
> +  return decl_v->ifunc_decl;
> +}
> +
> +/* Generate the dispatching code to dispatch multi-versioned function
> +   DECL.  Make a new function decl for dispatching and call the target
> +   hook to process the "targetv" attributes and provide the code to
> +   dispatch the right function at run-time.  */
> +
> +static tree
> +make_ifunc_resolver_for_version (const tree decl)
> +{
> +  version_function *decl_v;
> +  tree ifunc_resolver_decl, ifunc_decl;
> +  basic_block empty_bb;
> +  int ix;
> +  void_p ele;
> +  VEC (tree, heap) *fn_ver_vec = NULL;
> +
> +  gcc_assert (is_default_function (decl));
> +
> +  decl_v = find_function_version (decl);
> +  gcc_assert (decl_v != NULL);
> +
> +  if (decl_v->ifunc_resolver_decl != NULL)
> +    return decl_v->ifunc_resolver_decl;
> +
> +  ifunc_decl = decl_v->ifunc_decl;
> +
> +  if (ifunc_decl == NULL)
> +    ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl);
> +
> +  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl,
> +                                                 &empty_bb);
> +
> +  fn_ver_vec = VEC_alloc (tree, heap, 2);
> +  VEC_safe_push (tree, heap, fn_ver_vec, decl);
> +
> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
> +    {
> +      version_function *v = (version_function *) ele;
> +      gcc_assert (v->decl != NULL);
> +      /* Check for virtual functions here again, as by this time it should
> +        have been determined if this function needs a vtable index or
> +        not.  This happens for methods in derived classes that override
> +        virtual methods in base classes but are not explicitly marked as
> +        virtual.  */
> +      if (DECL_VINDEX (v->decl))
> +        error_at (DECL_SOURCE_LOCATION (v->decl),
> +                 "Virtual function versioning not supported\n");
> +      if (!v->is_deleted)
> +       VEC_safe_push (tree, heap, fn_ver_vec, v->decl);
> +    }
> +
> +  gcc_assert (targetm.dispatch_version);
> +  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb);
> +  decl_v->ifunc_resolver_decl = ifunc_resolver_decl;
> +
> +  return ifunc_resolver_decl;
> +}
> +
> +/* Main entry point to pass_dispatch_versions. For multi-versioned functions,
> +   generate the dispatching code.  */
> +
> +static unsigned int
> +do_dispatch_versions (void)
> +{
> +  /* A new pass for generating dispatch code for multi-versioned functions.
> +     Other forms of dispatch can be added when ifunc support is not available
> +     like just calling the function directly after checking for target type.
> +     Currently, dispatching is done through IFUNC.  This pass will become
> +     more meaningful when other dispatch mechanisms are added.  */
> +
> +  /* Cloning a function to produce more versions will happen here when the
> +     user requests that via the targetv attribute. For example,
> +     int foo () __attribute__ ((targetv(("arch=core2"), ("arch=corei7"))));
> +     means that the user wants the same body of foo to be versioned for core2
> +     and corei7.  In that case, this function will be cloned during this
> +     pass.  */
> +
> +  if (DECL_FUNCTION_VERSIONED (current_function_decl)
> +      && is_default_function (current_function_decl))
> +    {
> +      tree decl = make_ifunc_resolver_for_version (current_function_decl);
> +      if (dump_file && decl)
> +       dump_function_to_file (decl, dump_file, TDF_BLOCKS);
> +    }
> +  return 0;
> +}
> +
> +static  bool
> +gate_dispatch_versions (void)
> +{
> +  return true;
> +}
> +
> +/* A pass to generate the dispatch code to execute the appropriate version
> +   of a multi-versioned function at run-time.  */
> +
> +struct gimple_opt_pass pass_dispatch_versions =
> +{
> + {
> +  GIMPLE_PASS,
> +  "dispatch_multiversion_functions",    /* name */
> +  gate_dispatch_versions,              /* gate */
> +  do_dispatch_versions,                        /* execute */
> +  NULL,                                        /* sub */
> +  NULL,                                        /* next */
> +  0,                                   /* static_pass_number */
> +  TV_MULTIVERSION_DISPATCH,            /* tv_id */
> +  PROP_cfg,                            /* properties_required */
> +  PROP_cfg,                            /* properties_provided */
> +  0,                                   /* properties_destroyed */
> +  0,                                   /* todo_flags_start */
> +  TODO_dump_func |                     /* todo_flags_finish */
> +  TODO_cleanup_cfg | TODO_dump_cgraph
> + }
> +};
> Index: cgraphunit.c
> ===================================================================
> --- cgraphunit.c        (revision 184971)
> +++ cgraphunit.c        (working copy)
> @@ -141,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "ipa-inline.h"
>  #include "ipa-utils.h"
>  #include "lto-streamer.h"
> +#include "multiversion.h"
>
>  static void cgraph_expand_all_functions (void);
>  static void cgraph_mark_functions_to_output (void);
> @@ -343,6 +344,13 @@ cgraph_finalize_function (tree decl, bool nested)
>       node->local.redefined_extern_inline = true;
>     }
>
> +  /* If this is a function version and not the default, change the
> +     assembler name of this function.  The DECL names of function
> +     versions are the same, only the assembler names are made unique.
> +     The assembler name is changed by appending the string from
> +     the "targetv" attribute.  */
> +  version_assembler_name (decl);
> +
>   notice_global_symbol (decl);
>   node->local.finalized = true;
>   node->lowered = DECL_STRUCT_FUNCTION (decl)->cfg != NULL;
> Index: multiversion.h
> ===================================================================
> --- multiversion.h      (revision 0)
> +++ multiversion.h      (revision 0)
> @@ -0,0 +1,52 @@
> +/* Function Multiversioning.
> +   Copyright (C) 2012 Free Software Foundation, Inc.
> +   Contributed by Sriraman Tallam (tmsriram@google.com)
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>. */
> +
> +/* This is the header file which provides the functions to keep track
> +   of functions that are multi-versioned and to generate the dispatch
> +   code to call the right version at run-time.  */
> +
> +#ifndef GCC_MULTIVERSION_H
> +#define GCC_MULTIVERION_H
> +
> +#include "tree.h"
> +
> +/* Mark DECL1 and DECL2 as function versions.  */
> +int group_function_versions (const tree decl1, const tree decl2);
> +
> +/* Mark DECL as deleted and no longer a version.  */
> +void mark_delete_decl_version (const tree decl);
> +
> +/* Returns true if DECL is the default version to be executed if all
> +   other versions are inappropriate at run-time.  */
> +bool is_default_function (const tree decl);
> +
> +/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL
> +   must be the default function in the multi-versioned group.  */
> +tree get_ifunc_for_version (const tree decl);
> +
> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv"
> +   or if the "targetv" attribute strings of  DECL1 and DECL2 dont match.  */
> +bool has_different_version_attributes (const tree decl1, const tree decl2);
> +
> +/* If DECL is a function version and not the default version, the assembler
> +   name of DECL is changed to include the attribute string to keep the
> +   name unambiguous.  */
> +void version_assembler_name (const tree decl);
> +#endif
> Index: cp/class.c
> ===================================================================
> --- cp/class.c  (revision 184971)
> +++ cp/class.c  (working copy)
> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-dump.h"
>  #include "splay-tree.h"
>  #include "pointer-set.h"
> +#include "multiversion.h"
>
>  /* The number of nested classes being processed.  If we are not in the
>    scope of any class, this is zero.  */
> @@ -1092,7 +1093,20 @@ add_method (tree type, tree method, tree using_dec
>              || same_type_p (TREE_TYPE (fn_type),
>                              TREE_TYPE (method_type))))
>        {
> -         if (using_decl)
> +         /* For function versions, their parms and types match
> +            but they are not duplicates.  Record function versions
> +            as and when they are found.  */
> +         if (TREE_CODE (fn) == FUNCTION_DECL
> +             && TREE_CODE (method) == FUNCTION_DECL
> +             && (DECL_FUNCTION_VERSIONED (fn)
> +                 || DECL_FUNCTION_VERSIONED (method)))
> +           {
> +             DECL_FUNCTION_VERSIONED (fn) = 1;
> +             DECL_FUNCTION_VERSIONED (method) = 1;
> +             group_function_versions (fn, method);
> +             continue;
> +           }
> +         else if (using_decl)
>            {
>              if (DECL_CONTEXT (fn) == type)
>                /* Defer to the local function.  */
> @@ -1150,6 +1164,13 @@ add_method (tree type, tree method, tree using_dec
>   else
>     /* Replace the current slot.  */
>     VEC_replace (tree, method_vec, slot, overload);
> +
> +  /* Change the assembler name of method here if it has "targetv"
> +     attributes.  Since all versions have the same mangled name,
> +     their assembler name is changed by appending the string from
> +     the "targetv" attribute. */
> +  version_assembler_name (method);
> +
>   return true;
>  }
>
> @@ -6890,8 +6911,11 @@ resolve_address_of_overloaded_function (tree targe
>          if (DECL_ANTICIPATED (fn))
>            continue;
>
> -         /* See if there's a match.  */
> -         if (same_type_p (target_fn_type, static_fn_type (fn)))
> +         /* See if there's a match.   For functions that are multi-versioned
> +            match it to the default function.  */
> +         if (same_type_p (target_fn_type, static_fn_type (fn))
> +             && (!DECL_FUNCTION_VERSIONED (fn)
> +                 || is_default_function (fn)))
>            matches = tree_cons (fn, NULL_TREE, matches);
>        }
>     }
> @@ -7053,6 +7077,21 @@ resolve_address_of_overloaded_function (tree targe
>       perform_or_defer_access_check (access_path, fn, fn);
>     }
>
> +  /* If a pointer to a function that is multi-versioned is requested, the
> +     pointer to the dispatcher function is returned instead.  This works
> +     well because indirectly calling the function will dispatch the right
> +     function version at run-time. Also, the function address is kept
> +     unique.  */
> +  if (DECL_FUNCTION_VERSIONED (fn)
> +      && is_default_function (fn))
> +    {
> +      tree ifunc_decl;
> +      ifunc_decl = get_ifunc_for_version (fn);
> +      gcc_assert (ifunc_decl != NULL);
> +      mark_used (fn);
> +      return build_fold_addr_expr (ifunc_decl);
> +    }
> +
>   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
>     return cp_build_addr_expr (fn, flags);
>   else
> Index: cp/decl.c
> ===================================================================
> --- cp/decl.c   (revision 184971)
> +++ cp/decl.c   (working copy)
> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "pointer-set.h"
>  #include "splay-tree.h"
>  #include "plugin.h"
> +#include "multiversion.h"
>
>  /* Possible cases of bad specifiers type used by bad_specifiers. */
>  enum bad_spec_place {
> @@ -972,6 +973,23 @@ decls_match (tree newdecl, tree olddecl)
>       if (t1 != t2)
>        return 0;
>
> +      /* The decls dont match if they correspond to two different versions
> +        of the same function.  */
> +      if (compparms (p1, p2)
> +         && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2))
> +         && (DECL_FUNCTION_VERSIONED (newdecl)
> +             || DECL_FUNCTION_VERSIONED (olddecl))
> +         && has_different_version_attributes (newdecl, olddecl))
> +       {
> +         /* One of the decls could be the default without the "targetv"
> +            attribute. Set it to be a versioned function here.  */
> +         DECL_FUNCTION_VERSIONED (newdecl) = 1;
> +         DECL_FUNCTION_VERSIONED (olddecl) = 1;
> +         /* Accumulate all the versions of a function.  */
> +         group_function_versions (olddecl, newdecl);
> +         return 0;
> +       }
> +
>       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
>          && ! (DECL_EXTERN_C_P (newdecl)
>                && DECL_EXTERN_C_P (olddecl)))
> @@ -1482,7 +1500,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
>              error ("previous declaration %q+#D here", olddecl);
>              return NULL_TREE;
>            }
> -         else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
> +         /* For function versions, params and types match, but they
> +            are not ambiguous.  */
> +         else if ((!DECL_FUNCTION_VERSIONED (newdecl)
> +                   && !DECL_FUNCTION_VERSIONED (olddecl))
> +                  && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
>                              TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
>            {
>              error ("new declaration %q#D", newdecl);
> @@ -2250,6 +2272,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool
>   else if (DECL_PRESERVE_P (newdecl))
>     DECL_PRESERVE_P (olddecl) = 1;
>
> +  /* If the olddecl is a version, so is the newdecl.  */
> +  if (TREE_CODE (newdecl) == FUNCTION_DECL
> +      && DECL_FUNCTION_VERSIONED (olddecl))
> +    {
> +      DECL_FUNCTION_VERSIONED (newdecl) = 1;
> +      /* Record that newdecl is not a valid version and has
> +        been deleted.  */
> +      mark_delete_decl_version (newdecl);
> +    }
> +
>   if (TREE_CODE (newdecl) == FUNCTION_DECL)
>     {
>       int function_size;
> @@ -4512,6 +4544,10 @@ start_decl (const cp_declarator *declarator,
>   /* Enter this declaration into the symbol table.  */
>   decl = maybe_push_decl (decl);
>
> +  /* If this decl is a function version and not the default, its assembler
> +     name has to be changed.  */
> +  version_assembler_name (decl);
> +
>   if (processing_template_decl)
>     decl = push_template_decl (decl);
>   if (decl == error_mark_node)
> @@ -13019,6 +13055,10 @@ start_function (cp_decl_specifier_seq *declspecs,
>     gcc_assert (same_type_p (TREE_TYPE (TREE_TYPE (decl1)),
>                             integer_type_node));
>
> +  /* If this decl is a function version and not the default, its assembler
> +     name has to be changed.  */
> +  version_assembler_name (decl1);
> +
>   start_preparsed_function (decl1, attrs, /*flags=*/SF_DEFAULT);
>
>   return 1;
> @@ -13960,6 +14000,11 @@ cxx_comdat_group (tree decl)
>            break;
>        }
>       name = DECL_ASSEMBLER_NAME (decl);
> +      if (TREE_CODE (decl) == FUNCTION_DECL
> +         && DECL_FUNCTION_VERSIONED (decl))
> +       name = DECL_NAME (decl);
> +      else
> +        name = DECL_ASSEMBLER_NAME (decl);
>     }
>
>   return name;
> Index: cp/semantics.c
> ===================================================================
> --- cp/semantics.c      (revision 184971)
> +++ cp/semantics.c      (working copy)
> @@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn)
>       /* If the user wants us to keep all inline functions, then mark
>         this function as needed so that finish_file will make sure to
>         output it later.  Similarly, all dllexport'd functions must
> -        be emitted; there may be callers in other DLLs.  */
> -      if ((flag_keep_inline_functions
> +        be emitted; there may be callers in other DLLs.
> +        Also, mark this function as needed if it is marked inline but
> +        is a multi-versioned function.  */
> +      if (((flag_keep_inline_functions
> +           || DECL_FUNCTION_VERSIONED (fn))
>           && DECL_DECLARED_INLINE_P (fn)
>           && !DECL_REALLY_EXTERN (fn))
>          || (flag_keep_inline_dllexport
> Index: cp/decl2.c
> ===================================================================
> --- cp/decl2.c  (revision 184971)
> +++ cp/decl2.c  (working copy)
> @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "splay-tree.h"
>  #include "langhooks.h"
>  #include "c-family/c-ada-spec.h"
> +#include "multiversion.h"
>
>  extern cpp_reader *parse_in;
>
> @@ -674,9 +675,13 @@ check_classfn (tree ctype, tree function, tree tem
>          if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
>            continue;
>
> +         /* While finding a match, same types and params are not enough
> +            if the function is versioned.  Also check version ("targetv")
> +            attributes.  */
>          if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
>                           TREE_TYPE (TREE_TYPE (fndecl)))
>              && compparms (p1, p2)
> +             && !has_different_version_attributes (function, fndecl)
>              && (!is_template
>                  || comp_template_parms (template_parms,
>                                          DECL_TEMPLATE_PARMS (fndecl)))
> Index: cp/call.c
> ===================================================================
> --- cp/call.c   (revision 184971)
> +++ cp/call.c   (working copy)
> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "langhooks.h"
>  #include "c-family/c-objc.h"
>  #include "timevar.h"
> +#include "multiversion.h"
>
>  /* The various kinds of conversion.  */
>
> @@ -6730,6 +6731,17 @@ build_over_call (struct z_candidate *cand, int fla
>   if (!already_used)
>     mark_used (fn);
>
> +  /* For a call to a multi-versioned function, the call should actually be to
> +     the dispatcher.  */
> +  if (DECL_FUNCTION_VERSIONED (fn))
> +    {
> +      tree ifunc_decl;
> +      ifunc_decl = get_ifunc_for_version (fn);
> +      gcc_assert (ifunc_decl != NULL);
> +      return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl,
> +                                       nargs, argarray);
> +    }
> +
>   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
>     {
>       tree t;
> @@ -7980,6 +7992,30 @@ joust (struct z_candidate *cand1, struct z_candida
>   size_t i;
>   size_t len;
>
> +  /* For Candidates of a multi-versioned function, the one marked default
> +     wins.  This is because the default decl is used as key to aggregate
> +     all the other versions provided for it in multiversion.c.  When
> +     generating the actual call, the appropriate dispatcher is created
> +     to call the right function version at run-time.  */
> +
> +  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL
> +       && DECL_FUNCTION_VERSIONED (cand1->fn))
> +      ||(TREE_CODE (cand2->fn) == FUNCTION_DECL
> +        && DECL_FUNCTION_VERSIONED (cand2->fn)))
> +    {
> +      if (is_default_function (cand1->fn))
> +       {
> +          mark_used (cand2->fn);
> +         return 1;
> +       }
> +      if (is_default_function (cand2->fn))
> +       {
> +          mark_used (cand1->fn);
> +         return -1;
> +       }
> +      return 0;
> +    }
> +
>   /* Candidates that involve bad conversions are always worse than those
>      that don't.  */
>   if (cand1->viable > cand2->viable)
> Index: timevar.def
> ===================================================================
> --- timevar.def (revision 184971)
> +++ timevar.def (working copy)
> @@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE        , "tree if-co
>  DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
>  DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
>  DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
> +DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch")
>
>  /* Everything else in rest_of_compilation not included above.  */
>  DEFTIMEVAR (TV_EARLY_LOCAL          , "early local passes")
> Index: varasm.c
> ===================================================================
> --- varasm.c    (revision 184971)
> +++ varasm.c    (working copy)
> @@ -5755,6 +5755,8 @@ finish_aliases_1 (void)
>        }
>       else if (! (p->emitted_diags & ALIAS_DIAG_TO_EXTERN)
>               && DECL_EXTERNAL (target_decl)
> +              && (!TREE_CODE (target_decl) == FUNCTION_DECL
> +                  || !DECL_STRUCT_FUNCTION (target_decl))
>               /* We use local aliases for C++ thunks to force the tailcall
>                  to bind locally.  This is a hack - to keep it working do
>                  the following (which is not strictly correct).  */
> Index: Makefile.in
> ===================================================================
> --- Makefile.in (revision 184971)
> +++ Makefile.in (working copy)
> @@ -1298,6 +1298,7 @@ OBJS = \
>        mcf.o \
>        mode-switching.o \
>        modulo-sched.o \
> +       multiversion.o \
>        omega.o \
>        omp-low.o \
>        optabs.o \
> @@ -3030,6 +3031,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h
>    $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>    $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
>    $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
> +multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
> +   $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
> +   $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
> +   $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
> +   $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h
>  cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \
> Index: passes.c
> ===================================================================
> --- passes.c    (revision 184971)
> +++ passes.c    (working copy)
> @@ -1190,6 +1190,7 @@ init_optimization_passes (void)
>   NEXT_PASS (pass_build_cfg);
>   NEXT_PASS (pass_warn_function_return);
>   NEXT_PASS (pass_build_cgraph_edges);
> +  NEXT_PASS (pass_dispatch_versions);
>   *p = NULL;
>
>   /* Interprocedural optimization passes.  */
> Index: config/i386/i386.c
> ===================================================================
> --- config/i386/i386.c  (revision 184971)
> +++ config/i386/i386.c  (working copy)
> @@ -27446,6 +27473,593 @@ ix86_init_mmx_sse_builtins (void)
>     }
>  }
>
> +/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
> +   to return a pointer to VERSION_DECL if the outcome of the function
> +   PREDICATE_DECL is true.  This function will be called during version
> +   dispatch to decide which function version to execute.  It returns the
> +   basic block at the end to which more conditions can be added.  */
> +
> +static basic_block
> +add_condition_to_bb (tree function_decl, tree version_decl,
> +                    basic_block new_bb, tree predicate_decl)
> +{
> +  gimple return_stmt;
> +  tree convert_expr, result_var;
> +  gimple convert_stmt;
> +  gimple call_cond_stmt;
> +  gimple if_else_stmt;
> +
> +  basic_block bb1, bb2, bb3;
> +  edge e12, e23;
> +
> +  tree cond_var;
> +  gimple_seq gseq;
> +
> +  tree old_current_function_decl;
> +
> +  old_current_function_decl = current_function_decl;
> +  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
> +  current_function_decl = function_decl;
> +
> +  gcc_assert (new_bb != NULL);
> +  gseq = bb_seq (new_bb);
> +
> +
> +  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
> +                        build_fold_addr_expr (version_decl));
> +  result_var = create_tmp_var (ptr_type_node, NULL);
> +  convert_stmt = gimple_build_assign (result_var, convert_expr);
> +  return_stmt = gimple_build_return (result_var);
> +
> +  if (predicate_decl == NULL_TREE)
> +    {
> +      gimple_seq_add_stmt (&gseq, convert_stmt);
> +      gimple_seq_add_stmt (&gseq, return_stmt);
> +      set_bb_seq (new_bb, gseq);
> +      gimple_set_bb (convert_stmt, new_bb);
> +      gimple_set_bb (return_stmt, new_bb);
> +      pop_cfun ();
> +      current_function_decl = old_current_function_decl;
> +      return new_bb;
> +    }
> +
> +  cond_var = create_tmp_var (integer_type_node, NULL);
> +  call_cond_stmt = gimple_build_call (predicate_decl, 0);
> +  gimple_call_set_lhs (call_cond_stmt, cond_var);
> +
> +  gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
> +  gimple_set_bb (call_cond_stmt, new_bb);
> +  gimple_seq_add_stmt (&gseq, call_cond_stmt);
> +
> +  if_else_stmt = gimple_build_cond (GT_EXPR, cond_var,
> +                                   integer_zero_node,
> +                                   NULL_TREE, NULL_TREE);
> +  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
> +  gimple_set_bb (if_else_stmt, new_bb);
> +  gimple_seq_add_stmt (&gseq, if_else_stmt);
> +
> +  gimple_seq_add_stmt (&gseq, convert_stmt);
> +  gimple_seq_add_stmt (&gseq, return_stmt);
> +  set_bb_seq (new_bb, gseq);
> +
> +  bb1 = new_bb;
> +  e12 = split_block (bb1, if_else_stmt);
> +  bb2 = e12->dest;
> +  e12->flags &= ~EDGE_FALLTHRU;
> +  e12->flags |= EDGE_TRUE_VALUE;
> +
> +  e23 = split_block (bb2, return_stmt);
> +
> +  gimple_set_bb (convert_stmt, bb2);
> +  gimple_set_bb (return_stmt, bb2);
> +
> +  bb3 = e23->dest;
> +  make_edge (bb1, bb3, EDGE_FALSE_VALUE);
> +
> +  remove_edge (e23);
> +  make_edge (bb2, EXIT_BLOCK_PTR, 0);
> +
> +  rebuild_cgraph_edges ();
> +
> +  pop_cfun ();
> +  current_function_decl = old_current_function_decl;
> +
> +  return bb3;
> +}
> +
> +/* This parses the attribute arguments to targetv in DECL and determines
> +   the right builtin to use to match the platform specification.
> +   For now, only one target argument ("arch=") is allowed.  */
> +
> +static enum ix86_builtins
> +get_builtin_code_for_version (tree decl)
> +{
> +  tree attrs;
> +  struct cl_target_option cur_target;
> +  tree target_node;
> +  struct cl_target_option *new_target;
> +  enum ix86_builtins builtin_code = IX86_BUILTIN_MAX;
> +
> +  attrs = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl));
> +  gcc_assert (attrs != NULL);
> +
> +  cl_target_option_save (&cur_target, &global_options);
> +
> +  target_node = ix86_valid_target_attribute_tree
> +                 (TREE_VALUE (TREE_VALUE (attrs)));
> +
> +  gcc_assert (target_node);
> +  new_target = TREE_TARGET_OPTION (target_node);
> +  gcc_assert (new_target);
> +
> +  if (new_target->arch_specified && new_target->arch > 0)
> +    {
> +      switch (new_target->arch)
> +        {
> +       case 1:
> +       case 2:
> +       case 3:
> +       case 4:
> +       case 5:
> +       case 6:
> +       case 7:
> +       case 8:
> +       case 9:
> +       case 10:
> +       case 11:
> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL;
> +         break;
> +       case 12:
> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_CORE2;
> +         break;
> +       case 13:
> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_COREI7;
> +         break;
> +       case 14:
> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_ATOM;
> +         break;
> +       case 15:
> +       case 16:
> +       case 17:
> +       case 18:
> +       case 19:
> +       case 20:
> +       case 21:
> +         builtin_code = IX86_BUILTIN_CPU_IS_AMD;
> +         break;
> +       case 22:
> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM10H;
> +         break;
> +       case 23:
> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER1;
> +         break;
> +       case 24:
> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER2;
> +         break;
> +       case 25: /* What is btver1 ? */
> +         builtin_code = IX86_BUILTIN_CPU_IS_AMD;
> +         break;
> +       }
> +    }
> +
> +  cl_target_option_restore (&global_options, &cur_target);
> +  if (builtin_code == IX86_BUILTIN_MAX)
> +      error_at (DECL_SOURCE_LOCATION (decl),
> +               "No dispatcher found for the versioning attributes");
> +
> +  return builtin_code;
> +}
> +
> +/* This is the target hook to generate the dispatch function for
> +   multi-versioned functions.  DISPATCH_DECL is the function which will
> +   contain the dispatch logic.  FNDECLS are the function choices for
> +   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
> +   in DISPATCH_DECL in which the dispatch code is generated.  */
> +
> +static int
> +ix86_dispatch_version (tree dispatch_decl,
> +                      void *fndecls_p,
> +                      basic_block *empty_bb)
> +{
> +  tree default_decl;
> +  gimple ifunc_cpu_init_stmt;
> +  gimple_seq gseq;
> +  tree old_current_function_decl;
> +  int ix;
> +  tree ele;
> +  VEC (tree, heap) *fndecls;
> +
> +  gcc_assert (dispatch_decl != NULL
> +             && fndecls_p != NULL
> +             && empty_bb != NULL);
> +
> +  /*fndecls_p is actually a vector.  */
> +  fndecls = (VEC (tree, heap) *)fndecls_p;
> +
> +  /* Atleast one more version other than the default.  */
> +  gcc_assert (VEC_length (tree, fndecls) >= 2);
> +
> +  /* The first version in the vector is the default decl.  */
> +  default_decl = VEC_index (tree, fndecls, 0);
> +
> +  old_current_function_decl = current_function_decl;
> +  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
> +  current_function_decl = dispatch_decl;
> +
> +  gseq = bb_seq (*empty_bb);
> +  ifunc_cpu_init_stmt = gimple_build_call_vec (
> +                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
> +  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
> +  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
> +  set_bb_seq (*empty_bb, gseq);
> +
> +  pop_cfun ();
> +  current_function_decl = old_current_function_decl;
> +
> +
> +  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
> +    {
> +      tree version_decl = ele;
> +      /* Get attribute string, parse it and find the right predicate decl.
> +         The predicate function could be a lengthy combination of many
> +        features, like arch-type and various isa-variants.  For now, only
> +        check the arch-type.  */
> +      tree predicate_decl = ix86_builtins [
> +                       get_builtin_code_for_version (version_decl)];
> +      *empty_bb = add_condition_to_bb (dispatch_decl, version_decl, *empty_bb,
> +                                      predicate_decl);
> +
> +    }
> +  /* dispatch default version at the end.  */
> +  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl, *empty_bb,
> +                                  NULL);
> +  return 0;
> +}
>
> @@ -38610,6 +39269,12 @@ ix86_autovectorize_vector_sizes (void)
>  #undef TARGET_BUILD_BUILTIN_VA_LIST
>  #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list
>
> +#undef TARGET_DISPATCH_VERSION
> +#define TARGET_DISPATCH_VERSION ix86_dispatch_version
> +
>  #undef TARGET_ENUM_VA_LIST_P
>  #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
>
> Index: testsuite/g++.dg/mv1.C
> ===================================================================
> --- testsuite/g++.dg/mv1.C      (revision 0)
> +++ testsuite/g++.dg/mv1.C      (revision 0)
> @@ -0,0 +1,23 @@
> +/* Simple test case to check if Multiversioning works.  */
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +int foo ();
> +int foo () __attribute__ ((targetv("arch=corei7")));
> +
> +int main ()
> +{
> +  int (*p)() = &foo;
> +  return foo () + (*p)();
> +}
> +
> +int foo ()
> +{
> +  return 0;
> +}
> +
> +int __attribute__ ((targetv("arch=corei7")))
> +foo ()
> +{
> +  return 0;
> +}
>
>
> --
> This patch is available for review at http://codereview.appspot.com/5752064

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-03-07 14:05 ` Richard Guenther
@ 2012-03-07 19:08   ` Sriraman Tallam
  2012-03-08 21:37     ` Xinliang David Li
  2012-03-08 21:00   ` Xinliang David Li
  2012-03-09 20:04   ` Sriraman Tallam
  2 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-03-07 19:08 UTC (permalink / raw)
  To: Richard Guenther; +Cc: reply, gcc-patches

On Wed, Mar 7, 2012 at 6:05 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Mar 7, 2012 at 1:46 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> User directed Function Multiversioning (MV) via Function Overloading
>> ====================================================================
>>
>> This patch adds support for user directed function MV via function overloading.
>> For more detailed description:
>> http://gcc.gnu.org/ml/gcc/2012-03/msg00074.html
>>
>>
>> Here is an example program with function versions:
>>
>> int foo ();  /* Default version */
>> int foo () __attribute__ ((targetv("arch=corei7")));/*Specialized for corei7 */
>> int foo () __attribute__ ((targetv("arch=core2")));/*Specialized for core2 */
>>
>> int main ()
>> {
>>  int (*p)() = &foo;
>>  return foo () + (*p)();
>> }
>>
>> int foo ()
>> {
>>  return 0;
>> }
>>
>> int __attribute__ ((targetv("arch=corei7")))
>> foo ()
>> {
>>  return 0;
>> }
>>
>> int __attribute__ ((targetv("arch=core2")))
>> foo ()
>> {
>>  return 0;
>> }
>>
>> The above example has foo defined 3 times, but all 3 definitions of foo are
>> different versions of the same function. The call to foo in main, directly and
>> via a pointer, are calls to the multi-versioned function foo which is dispatched
>> to the right foo at run-time.
>>
>> Function versions must have the same signature but must differ in the specifier
>> string provided to a new attribute called "targetv", which is nothing but the
>> target attribute with an extra specification to indicate a version. Any number
>> of versions can be created using the targetv attribute but it is mandatory to
>> have one function without the attribute, which is treated as the default
>> version.
>>
>> The dispatching is done using the IFUNC mechanism to keep the dispatch overhead
>> low. The compiler creates a dispatcher function which checks the CPU type and
>> calls the right version of foo. The dispatching code checks for the platform
>> type and calls the first version that matches. The default function is called if
>> no specialized version is appropriate for execution.
>>
>> The pointer to foo is made to be the address of the dispatcher function, so that
>> it is unique and calls made via the pointer also work correctly. The assembler
>> names of the various versions of foo is made different, by tagging
>> the specifier strings, to keep them unique.  A specific version can be called
>> directly by creating an alias to its assembler name. For instance, to call the
>> corei7 version directly, make an alias :
>> int foo_corei7 () __attribute__((alias ("_Z3foov.arch_corei7")));
>> and then call foo_corei7.
>>
>> Note that using IFUNC  blocks inlining of versioned functions. I had implemented
>> an optimization earlier to do hot path cloning to allow versioned functions to
>> be inlined. Please see : http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02285.html
>> In the next iteration, I plan to merge these two. With that, hot code paths with
>> versioned functions will be cloned so that versioned functions can be inlined.
>
> Note that inlining of functions with the target attribute is limited as well,
> but your issue is that of the indirect dispatch as ...
>
> You don't give an overview of the frontend implementation.  Thus I have
> extracted the following
>
>  - the FE does not really know about the "overloading", nor can it directly
>   resolve calls from a "sse" function to another "sse" function without going
>   through the 2nd IFUNC

This is a good point but I can change function joust, where the
overload candidate is selected, to return the decl of the versioned
function with matching target attributes as that of the callee. That
will solve this problem. I have to treat the target attributes as an
additional criterion for a match in overload resolution. The front end
*does know* about the overloading, it is a question of doing the
overload resolution correctly right?  This is easy when there is no
cloning involved.

When cloning of a version is required, it gets complicated since the
FE must clone and produce the bodies. Once, all the bodies are
available the overload resolution can do the right thing.

>
>  - cgraph also does not know about the "overloading", so it cannot do such
>   "devirtualization" either
>
> you seem to have implemented something inbetween a pure frontend
> solution and a proper middle-end solution.

The only thing I delayed is the code generation of the dispatcher. I
thought it is better to have this come later, after cfg and cgraph is
generated, so that multiple dispatching mechanisms could be
implemented.

For optimization and eventually
> automatically selecting functions for cloning (like, callees of a manual "sse"
> versioned function should be cloned?) it would be nice if the cgraph would
> know about the different versions and their relationships (and the dispatcher).
> Especially the cgraph code should know the functions are semantically
> equivalent (I suppose we should require that).  The IFUNC should be
> generated by cgraph / target code, similar to how we generate C++ thunks.
>
> Honza, any suggestions on how the FE side of such cgraph infrastructure
> should look like and how we should encode the target bits?
>
> Thanks,
> Richard.
>
>>        * doc/tm.texi.in: Add description for TARGET_DISPATCH_VERSION.
>>        * doc/tm.texi: Regenerate.
>>        * c-family/c-common.c (handle_targetv_attribute): New function.
>>        * target.def (dispatch_version): New target hook.
>>        * tree.h (DECL_FUNCTION_VERSIONED): New macro.
>>        (tree_function_decl): New bit-field versioned_function.
>>        * tree-pass.h (pass_dispatch_versions): New pass.
>>        * multiversion.c: New file.
>>        * multiversion.h: New file.
>>        * cgraphunit.c: Include multiversion.h
>>        (cgraph_finalize_function): Change assembler names of versioned
>>        functions.
>>        * cp/class.c: Include multiversion.h
>>        (add_method): aggregate function versions. Change assembler names of
>>        versioned functions.
>>        (resolve_address_of_overloaded_function): Match address of function
>>        version with default function.  Return address of ifunc dispatcher
>>        for address of versioned functions.
>>        * cp/decl.c (decls_match): Make decls unmatched for versioned
>>        functions.
>>        (duplicate_decls): Remove ambiguity for versioned functions. Notify
>>        of deleted function version decls.
>>        (start_decl): Change assembler name of versioned functions.
>>        (start_function): Change assembler name of versioned functions.
>>        (cxx_comdat_group): Make comdat group of versioned functions be the
>>        same.
>>        * cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
>>        functions that are also marked inline.
>>        * cp/decl2.c: Include multiversion.h
>>        (check_classfn): Check attributes of versioned functions for match.
>>        * cp/call.c: Include multiversion.h
>>        (build_over_call): Make calls to multiversioned functions to call the
>>        dispatcher.
>>        (joust): For calls to multi-versioned functions, make the default
>>        function win.
>>        * timevar.def (TV_MULTIVERSION_DISPATCH): New time var.
>>        * varasm.c (finish_aliases_1): Check if the alias points to a function
>>        with a body before giving an error.
>>        * Makefile.in: Add multiversion.o
>>        * passes.c: Add pass_dispatch_versions to the pass list.
>>        * config/i386/i386.c (add_condition_to_bb): New function.
>>        (get_builtin_code_for_version): New function.
>>        (ix86_dispatch_version): New function.
>>        (TARGET_DISPATCH_VERSION): New macro.
>>        * testsuite/g++.dg/mv1.C: New test.
>>
>> Index: doc/tm.texi
>> ===================================================================
>> --- doc/tm.texi (revision 184971)
>> +++ doc/tm.texi (working copy)
>> @@ -10995,6 +10995,14 @@ The result is another tree containing a simplified
>>  call's result.  If @var{ignore} is true the value will be ignored.
>>  @end deftypefn
>>
>> +@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb})
>> +For multi-versioned function, this hook sets up the dispatcher.
>> +@var{dispatch_decl} is the function that will be used to dispatch the
>> +version. @var{fndecls} are the function choices for dispatch.
>> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the
>> +code to do the dispatch will be added.
>> +@end deftypefn
>> +
>>  @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
>>
>>  Take an instruction in @var{insn} and return NULL if it is valid within a
>> Index: doc/tm.texi.in
>> ===================================================================
>> --- doc/tm.texi.in      (revision 184971)
>> +++ doc/tm.texi.in      (working copy)
>> @@ -10873,6 +10873,14 @@ The result is another tree containing a simplified
>>  call's result.  If @var{ignore} is true the value will be ignored.
>>  @end deftypefn
>>
>> +@hook TARGET_DISPATCH_VERSION
>> +For multi-versioned function, this hook sets up the dispatcher.
>> +@var{dispatch_decl} is the function that will be used to dispatch the
>> +version. @var{fndecls} are the function choices for dispatch.
>> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the
>> +code to do the dispatch will be added.
>> +@end deftypefn
>> +
>>  @hook TARGET_INVALID_WITHIN_DOLOOP
>>
>>  Take an instruction in @var{insn} and return NULL if it is valid within a
>> Index: c-family/c-common.c
>> ===================================================================
>> --- c-family/c-common.c (revision 184971)
>> +++ c-family/c-common.c (working copy)
>> @@ -315,6 +315,7 @@ static tree check_case_value (tree);
>>  static bool check_case_bounds (tree, tree, tree *, tree *);
>>
>>  static tree handle_packed_attribute (tree *, tree, tree, int, bool *);
>> +static tree handle_targetv_attribute (tree *, tree, tree, int, bool *);
>>  static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *);
>>  static tree handle_common_attribute (tree *, tree, tree, int, bool *);
>>  static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *);
>> @@ -604,6 +605,8 @@ const struct attribute_spec c_common_attribute_tab
>>  {
>>   /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
>>        affects_type_identity } */
>> +  { "targetv",               1, -1, true, false, false,
>> +                             handle_targetv_attribute, false },
>>   { "packed",                 0, 0, false, false, false,
>>                              handle_packed_attribute , false},
>>   { "nocommon",               0, 0, true,  false, false,
>> @@ -5869,6 +5872,54 @@ handle_packed_attribute (tree *node, tree name, tr
>>   return NULL_TREE;
>>  }
>>
>> +/* The targetv attribue is used to specify a function version
>> +   targeted to specific platform types.  The "targetv" attributes
>> +   have to be valid "target" attributes.  NODE should always point
>> +   to a FUNCTION_DECL.  ARGS contain the arguments to "targetv"
>> +   which should be valid arguments to attribute "target" too.
>> +   Check handle_target_attribute for FLAGS and NO_ADD_ATTRS.  */
>> +
>> +static tree
>> +handle_targetv_attribute (tree *node, tree name,
>> +                         tree args,
>> +                         int flags,
>> +                         bool *no_add_attrs)
>> +{
>> +  const char *attr_str = NULL;
>> +  gcc_assert (TREE_CODE (*node) == FUNCTION_DECL);
>> +  gcc_assert (args != NULL);
>> +
>> +  /* This is a function version.  */
>> +  DECL_FUNCTION_VERSIONED (*node) = 1;
>> +
>> +  attr_str = TREE_STRING_POINTER (TREE_VALUE (args));
>> +
>> +  /* Check if multiple sets of target attributes are there.  This
>> +     is not supported now.   In future, this will be supported by
>> +     cloning this function for each set.  */
>> +  if (TREE_CHAIN (args) != NULL)
>> +    warning (OPT_Wattributes, "%qE attribute has multiple sets which "
>> +            "is not supported", name);
>> +
>> +  if (attr_str == NULL
>> +      || strstr (attr_str, "arch=") == NULL)
>> +    error_at (DECL_SOURCE_LOCATION (*node),
>> +             "Versioning supported only on \"arch=\" for now");
>> +
>> +  /* targetv attributes must translate into target attributes.  */
>> +  handle_target_attribute (node, get_identifier ("target"), args, flags,
>> +                          no_add_attrs);
>> +
>> +  if (*no_add_attrs)
>> +    warning (OPT_Wattributes, "%qE attribute has no effect", name);
>> +
>> +  /* This is necessary to keep the attribute tagged to the decl
>> +     all the time.  */
>> +  *no_add_attrs = false;
>> +
>> +  return NULL_TREE;
>> +}
>> +
>>  /* Handle a "nocommon" attribute; arguments as in
>>    struct attribute_spec.handler.  */
>>
>> Index: target.def
>> ===================================================================
>> --- target.def  (revision 184971)
>> +++ target.def  (working copy)
>> @@ -1249,6 +1249,15 @@ DEFHOOK
>>  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
>>  hook_tree_tree_int_treep_bool_null)
>>
>> +/* Target hook to generate the dispatching code for calls to multi-versioned
>> +   functions.  DISPATCH_DECL is the function that will have the dispatching
>> +   logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the
>> +   basic bloc in DISPATCH_DECL which will contain the code.  */
>> +DEFHOOK
>> +(dispatch_version,
>> + "",
>> + int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL)
>> +
>>  /* Returns a code for a target-specific builtin that implements
>>    reciprocal of the function, or NULL_TREE if not available.  */
>>  DEFHOOK
>> Index: tree.h
>> ===================================================================
>> --- tree.h      (revision 184971)
>> +++ tree.h      (working copy)
>> @@ -3532,6 +3532,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
>>  #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
>>    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
>>
>> +/* In FUNCTION_DECL, this is set if this function has other versions generated
>> +   using "targetv" attributes.  The default version is the one which does not
>> +   have any "targetv" attribute set. */
>> +#define DECL_FUNCTION_VERSIONED(NODE)\
>> +   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
>> +
>>  /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
>>    arguments/result/saved_tree fields by front ends.   It was either inherit
>>    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
>> @@ -3576,8 +3582,8 @@ struct GTY(()) tree_function_decl {
>>   unsigned looping_const_or_pure_flag : 1;
>>   unsigned has_debug_args_flag : 1;
>>   unsigned tm_clone_flag : 1;
>> -
>> -  /* 1 bit left */
>> +  unsigned versioned_function : 1;
>> +  /* No bits left.  */
>>  };
>>
>>  /* The source language of the translation-unit.  */
>> Index: tree-pass.h
>> ===================================================================
>> --- tree-pass.h (revision 184971)
>> +++ tree-pass.h (working copy)
>> @@ -455,6 +455,7 @@ extern struct gimple_opt_pass pass_tm_memopt;
>>  extern struct gimple_opt_pass pass_tm_edges;
>>  extern struct gimple_opt_pass pass_split_functions;
>>  extern struct gimple_opt_pass pass_feedback_split_functions;
>> +extern struct gimple_opt_pass pass_dispatch_versions;
>>
>>  /* IPA Passes */
>>  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
>> Index: multiversion.c
>> ===================================================================
>> --- multiversion.c      (revision 0)
>> +++ multiversion.c      (revision 0)
>> @@ -0,0 +1,798 @@
>> +/* Function Multiversioning.
>> +   Copyright (C) 2012 Free Software Foundation, Inc.
>> +   Contributed by Sriraman Tallam (tmsriram@google.com)
>> +
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redistribute it and/or modify it under
>> +the terms of the GNU General Public License as published by the Free
>> +Software Foundation; either version 3, or (at your option) any later
>> +version.
>> +
>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>> +for more details.
>> +
>> +You should have received a copy of the GNU General Public License
>> +along with GCC; see the file COPYING3.  If not see
>> +<http://www.gnu.org/licenses/>. */
>> +
>> +/* Holds the state for multi-versioned functions here. The front-end
>> +   updates the state as and when function versions are encountered.
>> +   This is then used to generate the dispatch code.  Also, the
>> +   optimization passes to clone hot paths involving versioned functions
>> +   will be done here.
>> +
>> +   Function versions are created by using the same function signature but
>> +   also tagging attribute "targetv" to specify the platform type for which
>> +   the version must be executed.  Here is an example:
>> +
>> +   int foo ()
>> +   {
>> +     printf ("Execute as default");
>> +     return 0;
>> +   }
>> +
>> +   int  __attribute__ ((targetv ("arch=corei7")))
>> +   foo ()
>> +   {
>> +     printf ("Execute for corei7");
>> +     return 0;
>> +   }
>> +
>> +   int main ()
>> +   {
>> +     return foo ();
>> +   }
>> +
>> +   The call to foo in main is replaced with a call to an IFUNC function that
>> +   contains the dispatch code to call the correct function version at
>> +   run-time.  */
>> +
>> +
>> +#include "config.h"
>> +#include "system.h"
>> +#include "coretypes.h"
>> +#include "tm.h"
>> +#include "tree.h"
>> +#include "tree-inline.h"
>> +#include "langhooks.h"
>> +#include "flags.h"
>> +#include "cgraph.h"
>> +#include "diagnostic.h"
>> +#include "toplev.h"
>> +#include "timevar.h"
>> +#include "params.h"
>> +#include "fibheap.h"
>> +#include "intl.h"
>> +#include "tree-pass.h"
>> +#include "hashtab.h"
>> +#include "coverage.h"
>> +#include "ggc.h"
>> +#include "tree-flow.h"
>> +#include "rtl.h"
>> +#include "ipa-prop.h"
>> +#include "basic-block.h"
>> +#include "toplev.h"
>> +#include "dbgcnt.h"
>> +#include "tree-dump.h"
>> +#include "output.h"
>> +#include "vecprim.h"
>> +#include "gimple-pretty-print.h"
>> +#include "ipa-inline.h"
>> +#include "target.h"
>> +#include "multiversion.h"
>> +
>> +typedef void * void_p;
>> +
>> +DEF_VEC_P (void_p);
>> +DEF_VEC_ALLOC_P (void_p, heap);
>> +
>> +/* Each function decl that is a function version gets an instance of this
>> +   structure.   Since this is called by the front-end, decl merging can
>> +   happen, where a decl created for a new declaration is merged with
>> +   the old. In this case, the new decl is deleted and the IS_DELETED
>> +   field is set for the struct instance corresponding to the new decl.
>> +   IFUNC_DECL is the decl of the ifunc function for default decls.
>> +   IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS
>> +   is a vector containing the list of function versions  that are
>> +   the candidates for dispatch.  */
>> +
>> +typedef struct version_function_d {
>> +  tree decl;
>> +  tree ifunc_decl;
>> +  tree ifunc_resolver_decl;
>> +  VEC (void_p, heap) *versions;
>> +  bool is_deleted;
>> +} version_function;
>> +
>> +/* Hashmap has an entry for every function decl that has other function
>> +   versions.  For function decls that are the default, it also stores the
>> +   list of all the other function versions.  Each entry is a structure
>> +   of type version_function_d.  */
>> +static htab_t decl_version_htab = NULL;
>> +
>> +/* Hashtable helpers for decl_version_htab. */
>> +
>> +static hashval_t
>> +decl_version_htab_hash_descriptor (const void *p)
>> +{
>> +  const version_function *t = (const version_function *) p;
>> +  return htab_hash_pointer (t->decl);
>> +}
>> +
>> +/* Hashtable helper for decl_version_htab. */
>> +
>> +static int
>> +decl_version_htab_eq_descriptor (const void *p1, const void *p2)
>> +{
>> +  const version_function *t1 = (const version_function *) p1;
>> +  return htab_eq_pointer ((const void_p) t1->decl, p2);
>> +}
>> +
>> +/* Create the decl_version_htab.  */
>> +static void
>> +create_decl_version_htab (void)
>> +{
>> +  if (decl_version_htab == NULL)
>> +    decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor,
>> +                                    decl_version_htab_eq_descriptor, NULL);
>> +}
>> +
>> +/* Creates an instance of version_function for decl DECL.  */
>> +
>> +static version_function*
>> +new_version_function (const tree decl)
>> +{
>> +  version_function *v;
>> +  v = (version_function *)xmalloc(sizeof (version_function));
>> +  v->decl = decl;
>> +  v->ifunc_decl = NULL;
>> +  v->ifunc_resolver_decl = NULL;
>> +  v->versions = NULL;
>> +  v->is_deleted = false;
>> +  return v;
>> +}
>> +
>> +/* Comparator function to be used in qsort routine to sort attribute
>> +   specification strings to "targetv".  */
>> +
>> +static int
>> +attr_strcmp (const void *v1, const void *v2)
>> +{
>> +  const char *c1 = *(char *const*)v1;
>> +  const char *c2 = *(char *const*)v2;
>> +  return strcmp (c1, c2);
>> +}
>> +
>> +/* STR is the argument to targetv attribute.  This function tokenizes
>> +   the comma separated arguments, sorts them and returns a string which
>> +   is a unique identifier for the comma separated arguments.  */
>> +
>> +static char *
>> +sorted_attr_string (const char *str)
>> +{
>> +  char **args = NULL;
>> +  char *attr_str, *ret_str;
>> +  char *attr = NULL;
>> +  unsigned int argnum = 1;
>> +  unsigned int i;
>> +
>> +  for (i = 0; i < strlen (str); i++)
>> +    if (str[i] == ',')
>> +      argnum++;
>> +
>> +  attr_str = (char *)xmalloc (strlen (str) + 1);
>> +  strcpy (attr_str, str);
>> +
>> +  for (i = 0; i < strlen (attr_str); i++)
>> +    if (attr_str[i] == '=')
>> +      attr_str[i] = '_';
>> +
>> +  if (argnum == 1)
>> +    return attr_str;
>> +
>> +  args = (char **)xmalloc (argnum * sizeof (char *));
>> +
>> +  i = 0;
>> +  attr = strtok (attr_str, ",");
>> +  while (attr != NULL)
>> +    {
>> +      args[i] = attr;
>> +      i++;
>> +      attr = strtok (NULL, ",");
>> +    }
>> +
>> +  qsort (args, argnum, sizeof (char*), attr_strcmp);
>> +
>> +  ret_str = (char *)xmalloc (strlen (str) + 1);
>> +  strcpy (ret_str, args[0]);
>> +  for (i = 1; i < argnum; i++)
>> +    {
>> +      strcat (ret_str, "_");
>> +      strcat (ret_str, args[i]);
>> +    }
>> +
>> +  free (args);
>> +  free (attr_str);
>> +  return ret_str;
>> +}
>> +
>> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv"
>> +   or if the "targetv" attribute strings of DECL1 and DECL2 dont match.  */
>> +
>> +bool
>> +has_different_version_attributes (const tree decl1, const tree decl2)
>> +{
>> +  tree attr1, attr2;
>> +  char *c1, *c2;
>> +  bool ret = false;
>> +
>> +  if (TREE_CODE (decl1) != FUNCTION_DECL
>> +      || TREE_CODE (decl2) != FUNCTION_DECL)
>> +    return false;
>> +
>> +  attr1 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl1));
>> +  attr2 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl2));
>> +
>> +  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
>> +    return false;
>> +
>> +  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
>> +      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
>> +    return true;
>> +
>> +  c1 = sorted_attr_string (
>> +       TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1))));
>> +  c2 = sorted_attr_string (
>> +       TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2))));
>> +
>> +  if (strcmp (c1, c2) != 0)
>> +     ret = true;
>> +
>> +  free (c1);
>> +  free (c2);
>> +
>> +  return ret;
>> +}
>> +
>> +/* If this decl corresponds to a function and has "targetv" attribute,
>> +   append the attribute string to its assembler name.  */
>> +
>> +void
>> +version_assembler_name (const tree decl)
>> +{
>> +  tree version_attr;
>> +  const char *orig_name, *version_string, *attr_str;
>> +  char *assembler_name;
>> +  tree assembler_name_tree;
>> +
>> +  if (TREE_CODE (decl) != FUNCTION_DECL
>> +      || DECL_ASSEMBLER_NAME_SET_P (decl)
>> +      || !DECL_FUNCTION_VERSIONED (decl))
>> +    return;
>> +
>> +  if (DECL_DECLARED_INLINE_P (decl)
>> +      &&lookup_attribute ("gnu_inline",
>> +                         DECL_ATTRIBUTES (decl)))
>> +    error_at (DECL_SOURCE_LOCATION (decl),
>> +             "Function versions cannot be marked as gnu_inline,"
>> +             " bodies have to be generated\n");
>> +
>> +  if (DECL_VIRTUAL_P (decl)
>> +      || DECL_VINDEX (decl))
>> +    error_at (DECL_SOURCE_LOCATION (decl),
>> +             "Virtual function versioning not supported\n");
>> +
>> +  version_attr = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl));
>> +  /* targetv attribute string is NULL for default functions.  */
>> +  if (version_attr == NULL_TREE)
>> +    return;
>> +
>> +  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
>> +  version_string
>> +    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
>> +
>> +  attr_str = sorted_attr_string (version_string);
>> +  assembler_name = (char *) xmalloc (strlen (orig_name)
>> +                                    + strlen (attr_str) + 2);
>> +
>> +  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
>> +  if (dump_file)
>> +    fprintf (dump_file, "Assembler name set to %s for function version %s\n",
>> +            assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
>> +  assembler_name_tree = get_identifier (assembler_name);
>> +  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree);
>> +}
>> +
>> +/* Returns true if decl is multi-versioned and DECL is the default function,
>> +   that is it is not tagged with "targetv" attribute.  */
>> +
>> +bool
>> +is_default_function (const tree decl)
>> +{
>> +  return (TREE_CODE (decl) == FUNCTION_DECL
>> +         && DECL_FUNCTION_VERSIONED (decl)
>> +         && (lookup_attribute ("targetv", DECL_ATTRIBUTES (decl))
>> +             == NULL_TREE));
>> +}
>> +
>> +/* For function decl DECL, find the version_function struct in the
>> +   decl_version_htab.  */
>> +
>> +static version_function *
>> +find_function_version (const tree decl)
>> +{
>> +  void *slot;
>> +
>> +  if (!DECL_FUNCTION_VERSIONED (decl))
>> +    return NULL;
>> +
>> +  if (!decl_version_htab)
>> +    return NULL;
>> +
>> +  slot = htab_find_with_hash (decl_version_htab, decl,
>> +                              htab_hash_pointer (decl));
>> +
>> +  if (slot != NULL)
>> +    return (version_function *)slot;
>> +
>> +  return NULL;
>> +}
>> +
>> +/* Record DECL as a function version by creating a version_function struct
>> +   for it and storing it in the hashtable.  */
>> +
>> +static version_function *
>> +add_function_version (const tree decl)
>> +{
>> +  void **slot;
>> +  version_function *v;
>> +
>> +  if (!DECL_FUNCTION_VERSIONED (decl))
>> +    return NULL;
>> +
>> +  create_decl_version_htab ();
>> +
>> +  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl,
>> +                                   htab_hash_pointer ((const void_p)decl),
>> +                                  INSERT);
>> +
>> +  if (*slot != NULL)
>> +    return (version_function *)*slot;
>> +
>> +  v = new_version_function (decl);
>> +  *slot = v;
>> +
>> +  return v;
>> +}
>> +
>> +/* Push V into VEC only if it is not already present.  */
>> +
>> +static void
>> +push_function_version (version_function *v, VEC (void_p, heap) *vec)
>> +{
>> +  int ix;
>> +  void_p ele;
>> +  for (ix = 0; VEC_iterate (void_p, vec, ix, ele); ++ix)
>> +    {
>> +      if (ele == (void_p)v)
>> +        return;
>> +    }
>> +
>> +  VEC_safe_push (void_p, heap, vec, (void*)v);
>> +}
>> +
>> +/* Mark DECL as deleted.  This is called by the front-end when a duplicate
>> +   decl is merged with the original decl and the duplicate decl is deleted.
>> +   This function marks the duplicate_decl as invalid.  Called by
>> +   duplicate_decls in cp/decl.c.  */
>> +
>> +void
>> +mark_delete_decl_version (const tree decl)
>> +{
>> +  version_function *decl_v;
>> +
>> +  decl_v = find_function_version (decl);
>> +
>> +  if (decl_v == NULL)
>> +    return;
>> +
>> +  decl_v->is_deleted = true;
>> +
>> +  if (is_default_function (decl)
>> +      && decl_v->versions != NULL)
>> +    {
>> +      VEC_truncate (void_p, decl_v->versions, 0);
>> +      VEC_free (void_p, heap, decl_v->versions);
>> +    }
>> +}
>> +
>> +/* Mark DECL1 and DECL2 to be function versions in the same group.  One
>> +   of DECL1 and DECL2 must be the default, otherwise this function does
>> +   nothing.  This function aggregates the versions.  */
>> +
>> +int
>> +group_function_versions (const tree decl1, const tree decl2)
>> +{
>> +  tree default_decl, version_decl;
>> +  version_function *default_v, *version_v;
>> +
>> +  gcc_assert (DECL_FUNCTION_VERSIONED (decl1)
>> +             && DECL_FUNCTION_VERSIONED (decl2));
>> +
>> +  /* The version decls are added only to the default decl.  */
>> +  if (!is_default_function (decl1)
>> +      && !is_default_function (decl2))
>> +    return 0;
>> +
>> +  /* This can happen with duplicate declarations.  Just ignore.  */
>> +  if (is_default_function (decl1)
>> +      && is_default_function (decl2))
>> +    return 0;
>> +
>> +  default_decl = (is_default_function (decl1)) ? decl1 : decl2;
>> +  version_decl = (default_decl == decl1) ? decl2 : decl1;
>> +
>> +  gcc_assert (default_decl != version_decl);
>> +  create_decl_version_htab ();
>> +
>> +  /* If the version function is found, it has been added.  */
>> +  if (find_function_version (version_decl))
>> +    return 0;
>> +
>> +  default_v = add_function_version (default_decl);
>> +  version_v = add_function_version (version_decl);
>> +
>> +  if (default_v->versions == NULL)
>> +    default_v->versions = VEC_alloc (void_p, heap, 1);
>> +
>> +  push_function_version (version_v, default_v->versions);
>> +  return 0;
>> +}
>> +
>> +/* Makes a function attribute of the form NAME(ARG_NAME) and chains
>> +   it to CHAIN.  */
>> +
>> +static tree
>> +make_attribute (const char *name, const char *arg_name, tree chain)
>> +{
>> +  tree attr_name;
>> +  tree attr_arg_name;
>> +  tree attr_args;
>> +  tree attr;
>> +
>> +  attr_name = get_identifier (name);
>> +  attr_arg_name = build_string (strlen (arg_name), arg_name);
>> +  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
>> +  attr = tree_cons (attr_name, attr_args, chain);
>> +  return attr;
>> +}
>> +
>> +/* Return a new name by appending SUFFIX to the DECL name.  If
>> +   make_unique is true, append the full path name.  */
>> +
>> +static char *
>> +make_name (tree decl, const char *suffix, bool make_unique)
>> +{
>> +  char *global_var_name;
>> +  int name_len;
>> +  const char *name;
>> +  const char *unique_name = NULL;
>> +
>> +  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
>> +
>> +  /* Get a unique name that can be used globally without any chances
>> +     of collision at link time.  */
>> +  if (make_unique)
>> +    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
>> +
>> +  name_len = strlen (name) + strlen (suffix) + 2;
>> +
>> +  if (make_unique)
>> +    name_len += strlen (unique_name) + 1;
>> +  global_var_name = (char *) xmalloc (name_len);
>> +
>> +  /* Use '.' to concatenate names as it is demangler friendly.  */
>> +  if (make_unique)
>> +      snprintf (global_var_name, name_len, "%s.%s.%s", name,
>> +               unique_name, suffix);
>> +  else
>> +      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
>> +
>> +  return global_var_name;
>> +}
>> +
>> +/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch
>> +   the versions of multi-versioned function DEFAULT_DECL.  Create and
>> +   empty basic block in the resolver and store the pointer in
>> +   EMPTY_BB.  Return the decl of the resolver function.  */
>> +
>> +static tree
>> +make_ifunc_resolver_func (const tree default_decl,
>> +                         const tree ifunc_decl,
>> +                         basic_block *empty_bb)
>> +{
>> +  char *resolver_name;
>> +  tree decl, type, decl_name, t;
>> +  basic_block new_bb;
>> +  tree old_current_function_decl;
>> +  bool make_unique = false;
>> +
>> +  /* IFUNC's have to be globally visible.  So, if the default_decl is
>> +     not, then the name of the IFUNC should be made unique.  */
>> +  if (TREE_PUBLIC (default_decl) == 0)
>> +    make_unique = true;
>> +
>> +  /* Append the filename to the resolver function if the versions are
>> +     not externally visible.  This is because the resolver function has
>> +     to be externally visible for the loader to find it.  So, appending
>> +     the filename will prevent conflicts with a resolver function from
>> +     another module which is based on the same version name.  */
>> +  resolver_name = make_name (default_decl, "resolver", make_unique);
>> +
>> +  /* The resolver function should return a (void *). */
>> +  type = build_function_type_list (ptr_type_node, NULL_TREE);
>> +
>> +  decl = build_fn_decl (resolver_name, type);
>> +  decl_name = get_identifier (resolver_name);
>> +  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
>> +
>> +  DECL_NAME (decl) = decl_name;
>> +  TREE_USED (decl) = TREE_USED (default_decl);
>> +  DECL_ARTIFICIAL (decl) = 1;
>> +  DECL_IGNORED_P (decl) = 0;
>> +  /* IFUNC resolvers have to be externally visible.  */
>> +  TREE_PUBLIC (decl) = 1;
>> +  DECL_UNINLINABLE (decl) = 1;
>> +
>> +  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl);
>> +  DECL_EXTERNAL (ifunc_decl) = 0;
>> +
>> +  DECL_CONTEXT (decl) = NULL_TREE;
>> +  DECL_INITIAL (decl) = make_node (BLOCK);
>> +  DECL_STATIC_CONSTRUCTOR (decl) = 0;
>> +  TREE_READONLY (decl) = 0;
>> +  DECL_PURE_P (decl) = 0;
>> +  DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
>> +  if (DECL_COMDAT_GROUP (default_decl))
>> +    {
>> +      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
>> +    }
>> +  /* Build result decl and add to function_decl. */
>> +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
>> +  DECL_ARTIFICIAL (t) = 1;
>> +  DECL_IGNORED_P (t) = 1;
>> +  DECL_RESULT (decl) = t;
>> +
>> +  gimplify_function_tree (decl);
>> +  old_current_function_decl = current_function_decl;
>> +  push_cfun (DECL_STRUCT_FUNCTION (decl));
>> +  current_function_decl = decl;
>> +  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
>> +  cfun->curr_properties |=
>> +    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
>> +     PROP_ssa);
>> +  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
>> +  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
>> +  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
>> +  *empty_bb = new_bb;
>> +
>> +  cgraph_add_new_function (decl, true);
>> +  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
>> +  cgraph_analyze_function (cgraph_get_create_node (decl));
>> +  cgraph_mark_needed_node (cgraph_get_create_node (decl));
>> +
>> +  if (DECL_COMDAT_GROUP (default_decl))
>> +    {
>> +      gcc_assert (cgraph_get_node (default_decl));
>> +      cgraph_add_to_same_comdat_group (cgraph_get_node (decl),
>> +                                      cgraph_get_node (default_decl));
>> +    }
>> +
>> +  pop_cfun ();
>> +  current_function_decl = old_current_function_decl;
>> +
>> +  gcc_assert (ifunc_decl != NULL);
>> +  DECL_ATTRIBUTES (ifunc_decl)
>> +    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl));
>> +  assemble_alias (ifunc_decl, get_identifier (resolver_name));
>> +  return decl;
>> +}
>> +
>> +/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to
>> +   DECL function will be replaced with calls to the ifunc.   Return the decl
>> +   of the ifunc created.  */
>> +
>> +static tree
>> +make_ifunc_func (const tree decl)
>> +{
>> +  tree ifunc_decl;
>> +  char *ifunc_name, *resolver_name;
>> +  tree fn_type, ifunc_type;
>> +  bool make_unique = false;
>> +
>> +  if (TREE_PUBLIC (decl) == 0)
>> +    make_unique = true;
>> +
>> +  ifunc_name = make_name (decl, "ifunc", make_unique);
>> +  resolver_name = make_name (decl, "resolver", make_unique);
>> +  gcc_assert (resolver_name);
>> +
>> +  fn_type = TREE_TYPE (decl);
>> +  ifunc_type = build_function_type (TREE_TYPE (fn_type),
>> +                                   TYPE_ARG_TYPES (fn_type));
>> +
>> +  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type);
>> +  TREE_USED (ifunc_decl) = 1;
>> +  DECL_CONTEXT (ifunc_decl) = NULL_TREE;
>> +  DECL_INITIAL (ifunc_decl) = error_mark_node;
>> +  DECL_ARTIFICIAL (ifunc_decl) = 1;
>> +  /* Mark this ifunc as external, the resolver will flip it again if
>> +     it gets generated.  */
>> +  DECL_EXTERNAL (ifunc_decl) = 1;
>> +  /* IFUNCs have to be externally visible.  */
>> +  TREE_PUBLIC (ifunc_decl) = 1;
>> +
>> +  return ifunc_decl;
>> +}
>> +
>> +/* For multi-versioned function decl, which should also be the default,
>> +   return the decl of the ifunc resolver, create it if it does not
>> +   exist.  */
>> +
>> +tree
>> +get_ifunc_for_version (const tree decl)
>> +{
>> +  version_function *decl_v;
>> +  int ix;
>> +  void_p ele;
>> +
>> +  /* DECL has to be the default version, otherwise it is missing and
>> +     that is not allowed.  */
>> +  if (!is_default_function (decl))
>> +    {
>> +      error_at (DECL_SOURCE_LOCATION (decl), "Default version not found");
>> +      return decl;
>> +    }
>> +
>> +  decl_v = find_function_version (decl);
>> +  gcc_assert (decl_v != NULL);
>> +  if (decl_v->ifunc_decl == NULL)
>> +    {
>> +      tree ifunc_decl;
>> +      ifunc_decl = make_ifunc_func (decl);
>> +      decl_v->ifunc_decl = ifunc_decl;
>> +    }
>> +
>> +  if (cgraph_get_node (decl))
>> +    cgraph_mark_needed_node (cgraph_get_node (decl));
>> +
>> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
>> +    {
>> +      version_function *v = (version_function *) ele;
>> +      gcc_assert (v->decl != NULL);
>> +      if (cgraph_get_node (v->decl))
>> +       cgraph_mark_needed_node (cgraph_get_node (v->decl));
>> +    }
>> +
>> +  return decl_v->ifunc_decl;
>> +}
>> +
>> +/* Generate the dispatching code to dispatch multi-versioned function
>> +   DECL.  Make a new function decl for dispatching and call the target
>> +   hook to process the "targetv" attributes and provide the code to
>> +   dispatch the right function at run-time.  */
>> +
>> +static tree
>> +make_ifunc_resolver_for_version (const tree decl)
>> +{
>> +  version_function *decl_v;
>> +  tree ifunc_resolver_decl, ifunc_decl;
>> +  basic_block empty_bb;
>> +  int ix;
>> +  void_p ele;
>> +  VEC (tree, heap) *fn_ver_vec = NULL;
>> +
>> +  gcc_assert (is_default_function (decl));
>> +
>> +  decl_v = find_function_version (decl);
>> +  gcc_assert (decl_v != NULL);
>> +
>> +  if (decl_v->ifunc_resolver_decl != NULL)
>> +    return decl_v->ifunc_resolver_decl;
>> +
>> +  ifunc_decl = decl_v->ifunc_decl;
>> +
>> +  if (ifunc_decl == NULL)
>> +    ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl);
>> +
>> +  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl,
>> +                                                 &empty_bb);
>> +
>> +  fn_ver_vec = VEC_alloc (tree, heap, 2);
>> +  VEC_safe_push (tree, heap, fn_ver_vec, decl);
>> +
>> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
>> +    {
>> +      version_function *v = (version_function *) ele;
>> +      gcc_assert (v->decl != NULL);
>> +      /* Check for virtual functions here again, as by this time it should
>> +        have been determined if this function needs a vtable index or
>> +        not.  This happens for methods in derived classes that override
>> +        virtual methods in base classes but are not explicitly marked as
>> +        virtual.  */
>> +      if (DECL_VINDEX (v->decl))
>> +        error_at (DECL_SOURCE_LOCATION (v->decl),
>> +                 "Virtual function versioning not supported\n");
>> +      if (!v->is_deleted)
>> +       VEC_safe_push (tree, heap, fn_ver_vec, v->decl);
>> +    }
>> +
>> +  gcc_assert (targetm.dispatch_version);
>> +  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb);
>> +  decl_v->ifunc_resolver_decl = ifunc_resolver_decl;
>> +
>> +  return ifunc_resolver_decl;
>> +}
>> +
>> +/* Main entry point to pass_dispatch_versions. For multi-versioned functions,
>> +   generate the dispatching code.  */
>> +
>> +static unsigned int
>> +do_dispatch_versions (void)
>> +{
>> +  /* A new pass for generating dispatch code for multi-versioned functions.
>> +     Other forms of dispatch can be added when ifunc support is not available
>> +     like just calling the function directly after checking for target type.
>> +     Currently, dispatching is done through IFUNC.  This pass will become
>> +     more meaningful when other dispatch mechanisms are added.  */
>> +
>> +  /* Cloning a function to produce more versions will happen here when the
>> +     user requests that via the targetv attribute. For example,
>> +     int foo () __attribute__ ((targetv(("arch=core2"), ("arch=corei7"))));
>> +     means that the user wants the same body of foo to be versioned for core2
>> +     and corei7.  In that case, this function will be cloned during this
>> +     pass.  */
>> +
>> +  if (DECL_FUNCTION_VERSIONED (current_function_decl)
>> +      && is_default_function (current_function_decl))
>> +    {
>> +      tree decl = make_ifunc_resolver_for_version (current_function_decl);
>> +      if (dump_file && decl)
>> +       dump_function_to_file (decl, dump_file, TDF_BLOCKS);
>> +    }
>> +  return 0;
>> +}
>> +
>> +static  bool
>> +gate_dispatch_versions (void)
>> +{
>> +  return true;
>> +}
>> +
>> +/* A pass to generate the dispatch code to execute the appropriate version
>> +   of a multi-versioned function at run-time.  */
>> +
>> +struct gimple_opt_pass pass_dispatch_versions =
>> +{
>> + {
>> +  GIMPLE_PASS,
>> +  "dispatch_multiversion_functions",    /* name */
>> +  gate_dispatch_versions,              /* gate */
>> +  do_dispatch_versions,                        /* execute */
>> +  NULL,                                        /* sub */
>> +  NULL,                                        /* next */
>> +  0,                                   /* static_pass_number */
>> +  TV_MULTIVERSION_DISPATCH,            /* tv_id */
>> +  PROP_cfg,                            /* properties_required */
>> +  PROP_cfg,                            /* properties_provided */
>> +  0,                                   /* properties_destroyed */
>> +  0,                                   /* todo_flags_start */
>> +  TODO_dump_func |                     /* todo_flags_finish */
>> +  TODO_cleanup_cfg | TODO_dump_cgraph
>> + }
>> +};
>> Index: cgraphunit.c
>> ===================================================================
>> --- cgraphunit.c        (revision 184971)
>> +++ cgraphunit.c        (working copy)
>> @@ -141,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "ipa-inline.h"
>>  #include "ipa-utils.h"
>>  #include "lto-streamer.h"
>> +#include "multiversion.h"
>>
>>  static void cgraph_expand_all_functions (void);
>>  static void cgraph_mark_functions_to_output (void);
>> @@ -343,6 +344,13 @@ cgraph_finalize_function (tree decl, bool nested)
>>       node->local.redefined_extern_inline = true;
>>     }
>>
>> +  /* If this is a function version and not the default, change the
>> +     assembler name of this function.  The DECL names of function
>> +     versions are the same, only the assembler names are made unique.
>> +     The assembler name is changed by appending the string from
>> +     the "targetv" attribute.  */
>> +  version_assembler_name (decl);
>> +
>>   notice_global_symbol (decl);
>>   node->local.finalized = true;
>>   node->lowered = DECL_STRUCT_FUNCTION (decl)->cfg != NULL;
>> Index: multiversion.h
>> ===================================================================
>> --- multiversion.h      (revision 0)
>> +++ multiversion.h      (revision 0)
>> @@ -0,0 +1,52 @@
>> +/* Function Multiversioning.
>> +   Copyright (C) 2012 Free Software Foundation, Inc.
>> +   Contributed by Sriraman Tallam (tmsriram@google.com)
>> +
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redistribute it and/or modify it under
>> +the terms of the GNU General Public License as published by the Free
>> +Software Foundation; either version 3, or (at your option) any later
>> +version.
>> +
>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>> +for more details.
>> +
>> +You should have received a copy of the GNU General Public License
>> +along with GCC; see the file COPYING3.  If not see
>> +<http://www.gnu.org/licenses/>. */
>> +
>> +/* This is the header file which provides the functions to keep track
>> +   of functions that are multi-versioned and to generate the dispatch
>> +   code to call the right version at run-time.  */
>> +
>> +#ifndef GCC_MULTIVERSION_H
>> +#define GCC_MULTIVERION_H
>> +
>> +#include "tree.h"
>> +
>> +/* Mark DECL1 and DECL2 as function versions.  */
>> +int group_function_versions (const tree decl1, const tree decl2);
>> +
>> +/* Mark DECL as deleted and no longer a version.  */
>> +void mark_delete_decl_version (const tree decl);
>> +
>> +/* Returns true if DECL is the default version to be executed if all
>> +   other versions are inappropriate at run-time.  */
>> +bool is_default_function (const tree decl);
>> +
>> +/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL
>> +   must be the default function in the multi-versioned group.  */
>> +tree get_ifunc_for_version (const tree decl);
>> +
>> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv"
>> +   or if the "targetv" attribute strings of  DECL1 and DECL2 dont match.  */
>> +bool has_different_version_attributes (const tree decl1, const tree decl2);
>> +
>> +/* If DECL is a function version and not the default version, the assembler
>> +   name of DECL is changed to include the attribute string to keep the
>> +   name unambiguous.  */
>> +void version_assembler_name (const tree decl);
>> +#endif
>> Index: cp/class.c
>> ===================================================================
>> --- cp/class.c  (revision 184971)
>> +++ cp/class.c  (working copy)
>> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree-dump.h"
>>  #include "splay-tree.h"
>>  #include "pointer-set.h"
>> +#include "multiversion.h"
>>
>>  /* The number of nested classes being processed.  If we are not in the
>>    scope of any class, this is zero.  */
>> @@ -1092,7 +1093,20 @@ add_method (tree type, tree method, tree using_dec
>>              || same_type_p (TREE_TYPE (fn_type),
>>                              TREE_TYPE (method_type))))
>>        {
>> -         if (using_decl)
>> +         /* For function versions, their parms and types match
>> +            but they are not duplicates.  Record function versions
>> +            as and when they are found.  */
>> +         if (TREE_CODE (fn) == FUNCTION_DECL
>> +             && TREE_CODE (method) == FUNCTION_DECL
>> +             && (DECL_FUNCTION_VERSIONED (fn)
>> +                 || DECL_FUNCTION_VERSIONED (method)))
>> +           {
>> +             DECL_FUNCTION_VERSIONED (fn) = 1;
>> +             DECL_FUNCTION_VERSIONED (method) = 1;
>> +             group_function_versions (fn, method);
>> +             continue;
>> +           }
>> +         else if (using_decl)
>>            {
>>              if (DECL_CONTEXT (fn) == type)
>>                /* Defer to the local function.  */
>> @@ -1150,6 +1164,13 @@ add_method (tree type, tree method, tree using_dec
>>   else
>>     /* Replace the current slot.  */
>>     VEC_replace (tree, method_vec, slot, overload);
>> +
>> +  /* Change the assembler name of method here if it has "targetv"
>> +     attributes.  Since all versions have the same mangled name,
>> +     their assembler name is changed by appending the string from
>> +     the "targetv" attribute. */
>> +  version_assembler_name (method);
>> +
>>   return true;
>>  }
>>
>> @@ -6890,8 +6911,11 @@ resolve_address_of_overloaded_function (tree targe
>>          if (DECL_ANTICIPATED (fn))
>>            continue;
>>
>> -         /* See if there's a match.  */
>> -         if (same_type_p (target_fn_type, static_fn_type (fn)))
>> +         /* See if there's a match.   For functions that are multi-versioned
>> +            match it to the default function.  */
>> +         if (same_type_p (target_fn_type, static_fn_type (fn))
>> +             && (!DECL_FUNCTION_VERSIONED (fn)
>> +                 || is_default_function (fn)))
>>            matches = tree_cons (fn, NULL_TREE, matches);
>>        }
>>     }
>> @@ -7053,6 +7077,21 @@ resolve_address_of_overloaded_function (tree targe
>>       perform_or_defer_access_check (access_path, fn, fn);
>>     }
>>
>> +  /* If a pointer to a function that is multi-versioned is requested, the
>> +     pointer to the dispatcher function is returned instead.  This works
>> +     well because indirectly calling the function will dispatch the right
>> +     function version at run-time. Also, the function address is kept
>> +     unique.  */
>> +  if (DECL_FUNCTION_VERSIONED (fn)
>> +      && is_default_function (fn))
>> +    {
>> +      tree ifunc_decl;
>> +      ifunc_decl = get_ifunc_for_version (fn);
>> +      gcc_assert (ifunc_decl != NULL);
>> +      mark_used (fn);
>> +      return build_fold_addr_expr (ifunc_decl);
>> +    }
>> +
>>   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
>>     return cp_build_addr_expr (fn, flags);
>>   else
>> Index: cp/decl.c
>> ===================================================================
>> --- cp/decl.c   (revision 184971)
>> +++ cp/decl.c   (working copy)
>> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "pointer-set.h"
>>  #include "splay-tree.h"
>>  #include "plugin.h"
>> +#include "multiversion.h"
>>
>>  /* Possible cases of bad specifiers type used by bad_specifiers. */
>>  enum bad_spec_place {
>> @@ -972,6 +973,23 @@ decls_match (tree newdecl, tree olddecl)
>>       if (t1 != t2)
>>        return 0;
>>
>> +      /* The decls dont match if they correspond to two different versions
>> +        of the same function.  */
>> +      if (compparms (p1, p2)
>> +         && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2))
>> +         && (DECL_FUNCTION_VERSIONED (newdecl)
>> +             || DECL_FUNCTION_VERSIONED (olddecl))
>> +         && has_different_version_attributes (newdecl, olddecl))
>> +       {
>> +         /* One of the decls could be the default without the "targetv"
>> +            attribute. Set it to be a versioned function here.  */
>> +         DECL_FUNCTION_VERSIONED (newdecl) = 1;
>> +         DECL_FUNCTION_VERSIONED (olddecl) = 1;
>> +         /* Accumulate all the versions of a function.  */
>> +         group_function_versions (olddecl, newdecl);
>> +         return 0;
>> +       }
>> +
>>       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
>>          && ! (DECL_EXTERN_C_P (newdecl)
>>                && DECL_EXTERN_C_P (olddecl)))
>> @@ -1482,7 +1500,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
>>              error ("previous declaration %q+#D here", olddecl);
>>              return NULL_TREE;
>>            }
>> -         else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
>> +         /* For function versions, params and types match, but they
>> +            are not ambiguous.  */
>> +         else if ((!DECL_FUNCTION_VERSIONED (newdecl)
>> +                   && !DECL_FUNCTION_VERSIONED (olddecl))
>> +                  && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
>>                              TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
>>            {
>>              error ("new declaration %q#D", newdecl);
>> @@ -2250,6 +2272,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool
>>   else if (DECL_PRESERVE_P (newdecl))
>>     DECL_PRESERVE_P (olddecl) = 1;
>>
>> +  /* If the olddecl is a version, so is the newdecl.  */
>> +  if (TREE_CODE (newdecl) == FUNCTION_DECL
>> +      && DECL_FUNCTION_VERSIONED (olddecl))
>> +    {
>> +      DECL_FUNCTION_VERSIONED (newdecl) = 1;
>> +      /* Record that newdecl is not a valid version and has
>> +        been deleted.  */
>> +      mark_delete_decl_version (newdecl);
>> +    }
>> +
>>   if (TREE_CODE (newdecl) == FUNCTION_DECL)
>>     {
>>       int function_size;
>> @@ -4512,6 +4544,10 @@ start_decl (const cp_declarator *declarator,
>>   /* Enter this declaration into the symbol table.  */
>>   decl = maybe_push_decl (decl);
>>
>> +  /* If this decl is a function version and not the default, its assembler
>> +     name has to be changed.  */
>> +  version_assembler_name (decl);
>> +
>>   if (processing_template_decl)
>>     decl = push_template_decl (decl);
>>   if (decl == error_mark_node)
>> @@ -13019,6 +13055,10 @@ start_function (cp_decl_specifier_seq *declspecs,
>>     gcc_assert (same_type_p (TREE_TYPE (TREE_TYPE (decl1)),
>>                             integer_type_node));
>>
>> +  /* If this decl is a function version and not the default, its assembler
>> +     name has to be changed.  */
>> +  version_assembler_name (decl1);
>> +
>>   start_preparsed_function (decl1, attrs, /*flags=*/SF_DEFAULT);
>>
>>   return 1;
>> @@ -13960,6 +14000,11 @@ cxx_comdat_group (tree decl)
>>            break;
>>        }
>>       name = DECL_ASSEMBLER_NAME (decl);
>> +      if (TREE_CODE (decl) == FUNCTION_DECL
>> +         && DECL_FUNCTION_VERSIONED (decl))
>> +       name = DECL_NAME (decl);
>> +      else
>> +        name = DECL_ASSEMBLER_NAME (decl);
>>     }
>>
>>   return name;
>> Index: cp/semantics.c
>> ===================================================================
>> --- cp/semantics.c      (revision 184971)
>> +++ cp/semantics.c      (working copy)
>> @@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn)
>>       /* If the user wants us to keep all inline functions, then mark
>>         this function as needed so that finish_file will make sure to
>>         output it later.  Similarly, all dllexport'd functions must
>> -        be emitted; there may be callers in other DLLs.  */
>> -      if ((flag_keep_inline_functions
>> +        be emitted; there may be callers in other DLLs.
>> +        Also, mark this function as needed if it is marked inline but
>> +        is a multi-versioned function.  */
>> +      if (((flag_keep_inline_functions
>> +           || DECL_FUNCTION_VERSIONED (fn))
>>           && DECL_DECLARED_INLINE_P (fn)
>>           && !DECL_REALLY_EXTERN (fn))
>>          || (flag_keep_inline_dllexport
>> Index: cp/decl2.c
>> ===================================================================
>> --- cp/decl2.c  (revision 184971)
>> +++ cp/decl2.c  (working copy)
>> @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "splay-tree.h"
>>  #include "langhooks.h"
>>  #include "c-family/c-ada-spec.h"
>> +#include "multiversion.h"
>>
>>  extern cpp_reader *parse_in;
>>
>> @@ -674,9 +675,13 @@ check_classfn (tree ctype, tree function, tree tem
>>          if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
>>            continue;
>>
>> +         /* While finding a match, same types and params are not enough
>> +            if the function is versioned.  Also check version ("targetv")
>> +            attributes.  */
>>          if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
>>                           TREE_TYPE (TREE_TYPE (fndecl)))
>>              && compparms (p1, p2)
>> +             && !has_different_version_attributes (function, fndecl)
>>              && (!is_template
>>                  || comp_template_parms (template_parms,
>>                                          DECL_TEMPLATE_PARMS (fndecl)))
>> Index: cp/call.c
>> ===================================================================
>> --- cp/call.c   (revision 184971)
>> +++ cp/call.c   (working copy)
>> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "langhooks.h"
>>  #include "c-family/c-objc.h"
>>  #include "timevar.h"
>> +#include "multiversion.h"
>>
>>  /* The various kinds of conversion.  */
>>
>> @@ -6730,6 +6731,17 @@ build_over_call (struct z_candidate *cand, int fla
>>   if (!already_used)
>>     mark_used (fn);
>>
>> +  /* For a call to a multi-versioned function, the call should actually be to
>> +     the dispatcher.  */
>> +  if (DECL_FUNCTION_VERSIONED (fn))
>> +    {
>> +      tree ifunc_decl;
>> +      ifunc_decl = get_ifunc_for_version (fn);
>> +      gcc_assert (ifunc_decl != NULL);
>> +      return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl,
>> +                                       nargs, argarray);
>> +    }
>> +
>>   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
>>     {
>>       tree t;
>> @@ -7980,6 +7992,30 @@ joust (struct z_candidate *cand1, struct z_candida
>>   size_t i;
>>   size_t len;
>>
>> +  /* For Candidates of a multi-versioned function, the one marked default
>> +     wins.  This is because the default decl is used as key to aggregate
>> +     all the other versions provided for it in multiversion.c.  When
>> +     generating the actual call, the appropriate dispatcher is created
>> +     to call the right function version at run-time.  */
>> +
>> +  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL
>> +       && DECL_FUNCTION_VERSIONED (cand1->fn))
>> +      ||(TREE_CODE (cand2->fn) == FUNCTION_DECL
>> +        && DECL_FUNCTION_VERSIONED (cand2->fn)))
>> +    {
>> +      if (is_default_function (cand1->fn))
>> +       {
>> +          mark_used (cand2->fn);
>> +         return 1;
>> +       }
>> +      if (is_default_function (cand2->fn))
>> +       {
>> +          mark_used (cand1->fn);
>> +         return -1;
>> +       }
>> +      return 0;
>> +    }
>> +
>>   /* Candidates that involve bad conversions are always worse than those
>>      that don't.  */
>>   if (cand1->viable > cand2->viable)
>> Index: timevar.def
>> ===================================================================
>> --- timevar.def (revision 184971)
>> +++ timevar.def (working copy)
>> @@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE        , "tree if-co
>>  DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
>>  DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
>>  DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
>> +DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch")
>>
>>  /* Everything else in rest_of_compilation not included above.  */
>>  DEFTIMEVAR (TV_EARLY_LOCAL          , "early local passes")
>> Index: varasm.c
>> ===================================================================
>> --- varasm.c    (revision 184971)
>> +++ varasm.c    (working copy)
>> @@ -5755,6 +5755,8 @@ finish_aliases_1 (void)
>>        }
>>       else if (! (p->emitted_diags & ALIAS_DIAG_TO_EXTERN)
>>               && DECL_EXTERNAL (target_decl)
>> +              && (!TREE_CODE (target_decl) == FUNCTION_DECL
>> +                  || !DECL_STRUCT_FUNCTION (target_decl))
>>               /* We use local aliases for C++ thunks to force the tailcall
>>                  to bind locally.  This is a hack - to keep it working do
>>                  the following (which is not strictly correct).  */
>> Index: Makefile.in
>> ===================================================================
>> --- Makefile.in (revision 184971)
>> +++ Makefile.in (working copy)
>> @@ -1298,6 +1298,7 @@ OBJS = \
>>        mcf.o \
>>        mode-switching.o \
>>        modulo-sched.o \
>> +       multiversion.o \
>>        omega.o \
>>        omp-low.o \
>>        optabs.o \
>> @@ -3030,6 +3031,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h
>>    $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>    $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
>>    $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
>> +multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
>> +   $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
>> +   $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
>> +   $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
>> +   $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h
>>  cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \
>> Index: passes.c
>> ===================================================================
>> --- passes.c    (revision 184971)
>> +++ passes.c    (working copy)
>> @@ -1190,6 +1190,7 @@ init_optimization_passes (void)
>>   NEXT_PASS (pass_build_cfg);
>>   NEXT_PASS (pass_warn_function_return);
>>   NEXT_PASS (pass_build_cgraph_edges);
>> +  NEXT_PASS (pass_dispatch_versions);
>>   *p = NULL;
>>
>>   /* Interprocedural optimization passes.  */
>> Index: config/i386/i386.c
>> ===================================================================
>> --- config/i386/i386.c  (revision 184971)
>> +++ config/i386/i386.c  (working copy)
>> @@ -27446,6 +27473,593 @@ ix86_init_mmx_sse_builtins (void)
>>     }
>>  }
>>
>> +/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
>> +   to return a pointer to VERSION_DECL if the outcome of the function
>> +   PREDICATE_DECL is true.  This function will be called during version
>> +   dispatch to decide which function version to execute.  It returns the
>> +   basic block at the end to which more conditions can be added.  */
>> +
>> +static basic_block
>> +add_condition_to_bb (tree function_decl, tree version_decl,
>> +                    basic_block new_bb, tree predicate_decl)
>> +{
>> +  gimple return_stmt;
>> +  tree convert_expr, result_var;
>> +  gimple convert_stmt;
>> +  gimple call_cond_stmt;
>> +  gimple if_else_stmt;
>> +
>> +  basic_block bb1, bb2, bb3;
>> +  edge e12, e23;
>> +
>> +  tree cond_var;
>> +  gimple_seq gseq;
>> +
>> +  tree old_current_function_decl;
>> +
>> +  old_current_function_decl = current_function_decl;
>> +  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
>> +  current_function_decl = function_decl;
>> +
>> +  gcc_assert (new_bb != NULL);
>> +  gseq = bb_seq (new_bb);
>> +
>> +
>> +  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
>> +                        build_fold_addr_expr (version_decl));
>> +  result_var = create_tmp_var (ptr_type_node, NULL);
>> +  convert_stmt = gimple_build_assign (result_var, convert_expr);
>> +  return_stmt = gimple_build_return (result_var);
>> +
>> +  if (predicate_decl == NULL_TREE)
>> +    {
>> +      gimple_seq_add_stmt (&gseq, convert_stmt);
>> +      gimple_seq_add_stmt (&gseq, return_stmt);
>> +      set_bb_seq (new_bb, gseq);
>> +      gimple_set_bb (convert_stmt, new_bb);
>> +      gimple_set_bb (return_stmt, new_bb);
>> +      pop_cfun ();
>> +      current_function_decl = old_current_function_decl;
>> +      return new_bb;
>> +    }
>> +
>> +  cond_var = create_tmp_var (integer_type_node, NULL);
>> +  call_cond_stmt = gimple_build_call (predicate_decl, 0);
>> +  gimple_call_set_lhs (call_cond_stmt, cond_var);
>> +
>> +  gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
>> +  gimple_set_bb (call_cond_stmt, new_bb);
>> +  gimple_seq_add_stmt (&gseq, call_cond_stmt);
>> +
>> +  if_else_stmt = gimple_build_cond (GT_EXPR, cond_var,
>> +                                   integer_zero_node,
>> +                                   NULL_TREE, NULL_TREE);
>> +  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
>> +  gimple_set_bb (if_else_stmt, new_bb);
>> +  gimple_seq_add_stmt (&gseq, if_else_stmt);
>> +
>> +  gimple_seq_add_stmt (&gseq, convert_stmt);
>> +  gimple_seq_add_stmt (&gseq, return_stmt);
>> +  set_bb_seq (new_bb, gseq);
>> +
>> +  bb1 = new_bb;
>> +  e12 = split_block (bb1, if_else_stmt);
>> +  bb2 = e12->dest;
>> +  e12->flags &= ~EDGE_FALLTHRU;
>> +  e12->flags |= EDGE_TRUE_VALUE;
>> +
>> +  e23 = split_block (bb2, return_stmt);
>> +
>> +  gimple_set_bb (convert_stmt, bb2);
>> +  gimple_set_bb (return_stmt, bb2);
>> +
>> +  bb3 = e23->dest;
>> +  make_edge (bb1, bb3, EDGE_FALSE_VALUE);
>> +
>> +  remove_edge (e23);
>> +  make_edge (bb2, EXIT_BLOCK_PTR, 0);
>> +
>> +  rebuild_cgraph_edges ();
>> +
>> +  pop_cfun ();
>> +  current_function_decl = old_current_function_decl;
>> +
>> +  return bb3;
>> +}
>> +
>> +/* This parses the attribute arguments to targetv in DECL and determines
>> +   the right builtin to use to match the platform specification.
>> +   For now, only one target argument ("arch=") is allowed.  */
>> +
>> +static enum ix86_builtins
>> +get_builtin_code_for_version (tree decl)
>> +{
>> +  tree attrs;
>> +  struct cl_target_option cur_target;
>> +  tree target_node;
>> +  struct cl_target_option *new_target;
>> +  enum ix86_builtins builtin_code = IX86_BUILTIN_MAX;
>> +
>> +  attrs = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl));
>> +  gcc_assert (attrs != NULL);
>> +
>> +  cl_target_option_save (&cur_target, &global_options);
>> +
>> +  target_node = ix86_valid_target_attribute_tree
>> +                 (TREE_VALUE (TREE_VALUE (attrs)));
>> +
>> +  gcc_assert (target_node);
>> +  new_target = TREE_TARGET_OPTION (target_node);
>> +  gcc_assert (new_target);
>> +
>> +  if (new_target->arch_specified && new_target->arch > 0)
>> +    {
>> +      switch (new_target->arch)
>> +        {
>> +       case 1:
>> +       case 2:
>> +       case 3:
>> +       case 4:
>> +       case 5:
>> +       case 6:
>> +       case 7:
>> +       case 8:
>> +       case 9:
>> +       case 10:
>> +       case 11:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL;
>> +         break;
>> +       case 12:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_CORE2;
>> +         break;
>> +       case 13:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_COREI7;
>> +         break;
>> +       case 14:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_ATOM;
>> +         break;
>> +       case 15:
>> +       case 16:
>> +       case 17:
>> +       case 18:
>> +       case 19:
>> +       case 20:
>> +       case 21:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMD;
>> +         break;
>> +       case 22:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM10H;
>> +         break;
>> +       case 23:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER1;
>> +         break;
>> +       case 24:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER2;
>> +         break;
>> +       case 25: /* What is btver1 ? */
>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMD;
>> +         break;
>> +       }
>> +    }
>> +
>> +  cl_target_option_restore (&global_options, &cur_target);
>> +  if (builtin_code == IX86_BUILTIN_MAX)
>> +      error_at (DECL_SOURCE_LOCATION (decl),
>> +               "No dispatcher found for the versioning attributes");
>> +
>> +  return builtin_code;
>> +}
>> +
>> +/* This is the target hook to generate the dispatch function for
>> +   multi-versioned functions.  DISPATCH_DECL is the function which will
>> +   contain the dispatch logic.  FNDECLS are the function choices for
>> +   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
>> +   in DISPATCH_DECL in which the dispatch code is generated.  */
>> +
>> +static int
>> +ix86_dispatch_version (tree dispatch_decl,
>> +                      void *fndecls_p,
>> +                      basic_block *empty_bb)
>> +{
>> +  tree default_decl;
>> +  gimple ifunc_cpu_init_stmt;
>> +  gimple_seq gseq;
>> +  tree old_current_function_decl;
>> +  int ix;
>> +  tree ele;
>> +  VEC (tree, heap) *fndecls;
>> +
>> +  gcc_assert (dispatch_decl != NULL
>> +             && fndecls_p != NULL
>> +             && empty_bb != NULL);
>> +
>> +  /*fndecls_p is actually a vector.  */
>> +  fndecls = (VEC (tree, heap) *)fndecls_p;
>> +
>> +  /* Atleast one more version other than the default.  */
>> +  gcc_assert (VEC_length (tree, fndecls) >= 2);
>> +
>> +  /* The first version in the vector is the default decl.  */
>> +  default_decl = VEC_index (tree, fndecls, 0);
>> +
>> +  old_current_function_decl = current_function_decl;
>> +  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
>> +  current_function_decl = dispatch_decl;
>> +
>> +  gseq = bb_seq (*empty_bb);
>> +  ifunc_cpu_init_stmt = gimple_build_call_vec (
>> +                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
>> +  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
>> +  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
>> +  set_bb_seq (*empty_bb, gseq);
>> +
>> +  pop_cfun ();
>> +  current_function_decl = old_current_function_decl;
>> +
>> +
>> +  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
>> +    {
>> +      tree version_decl = ele;
>> +      /* Get attribute string, parse it and find the right predicate decl.
>> +         The predicate function could be a lengthy combination of many
>> +        features, like arch-type and various isa-variants.  For now, only
>> +        check the arch-type.  */
>> +      tree predicate_decl = ix86_builtins [
>> +                       get_builtin_code_for_version (version_decl)];
>> +      *empty_bb = add_condition_to_bb (dispatch_decl, version_decl, *empty_bb,
>> +                                      predicate_decl);
>> +
>> +    }
>> +  /* dispatch default version at the end.  */
>> +  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl, *empty_bb,
>> +                                  NULL);
>> +  return 0;
>> +}
>>
>> @@ -38610,6 +39269,12 @@ ix86_autovectorize_vector_sizes (void)
>>  #undef TARGET_BUILD_BUILTIN_VA_LIST
>>  #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list
>>
>> +#undef TARGET_DISPATCH_VERSION
>> +#define TARGET_DISPATCH_VERSION ix86_dispatch_version
>> +
>>  #undef TARGET_ENUM_VA_LIST_P
>>  #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
>>
>> Index: testsuite/g++.dg/mv1.C
>> ===================================================================
>> --- testsuite/g++.dg/mv1.C      (revision 0)
>> +++ testsuite/g++.dg/mv1.C      (revision 0)
>> @@ -0,0 +1,23 @@
>> +/* Simple test case to check if Multiversioning works.  */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +
>> +int foo ();
>> +int foo () __attribute__ ((targetv("arch=corei7")));
>> +
>> +int main ()
>> +{
>> +  int (*p)() = &foo;
>> +  return foo () + (*p)();
>> +}
>> +
>> +int foo ()
>> +{
>> +  return 0;
>> +}
>> +
>> +int __attribute__ ((targetv("arch=corei7")))
>> +foo ()
>> +{
>> +  return 0;
>> +}
>>
>>
>> --
>> This patch is available for review at http://codereview.appspot.com/5752064

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-03-07 14:05 ` Richard Guenther
  2012-03-07 19:08   ` Sriraman Tallam
@ 2012-03-08 21:00   ` Xinliang David Li
  2012-03-09 20:04   ` Sriraman Tallam
  2 siblings, 0 replies; 93+ messages in thread
From: Xinliang David Li @ 2012-03-08 21:00 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Sriraman Tallam, reply, gcc-patches

> You don't give an overview of the frontend implementation.  Thus I have
> extracted the following
>
>  - the FE does not really know about the "overloading", nor can it directly
>   resolve calls from a "sse" function to another "sse" function without going
>   through the 2nd IFUNC
>
>  - cgraph also does not know about the "overloading", so it cannot do such
>   "devirtualization" either
>
> you seem to have implemented something inbetween a pure frontend
> solution and a proper middle-end solution.  For optimization and eventually
> automatically selecting functions for cloning (like, callees of a manual "sse"
> versioned function should be cloned?) it would be nice if the cgraph would
> know about the different versions and their relationships (and the dispatcher).
> Especially the cgraph code should know the functions are semantically
> equivalent (I suppose we should require that).  The IFUNC should be
> generated by cgraph / target code, similar to how we generate C++ thunks.

The implementation is very similar to the case when the user writes
its own ifunc and resolver. The difference here is that the
resolver/dispatcher is synthesized by the compiler.  Thunk is
different -- as it is completely user invisible.

Promoting ifunc to cgraph level has its advantage, but can also
introduce burdens to ipa passes as it has to be understood by them.

thanks,

David
>
> Honza, any suggestions on how the FE side of such cgraph infrastructure
> should look like and how we should encode the target bits?
>
> Thanks,
> Richard.
>
>>        * doc/tm.texi.in: Add description for TARGET_DISPATCH_VERSION.
>>        * doc/tm.texi: Regenerate.
>>        * c-family/c-common.c (handle_targetv_attribute): New function.
>>        * target.def (dispatch_version): New target hook.
>>        * tree.h (DECL_FUNCTION_VERSIONED): New macro.
>>        (tree_function_decl): New bit-field versioned_function.
>>        * tree-pass.h (pass_dispatch_versions): New pass.
>>        * multiversion.c: New file.
>>        * multiversion.h: New file.
>>        * cgraphunit.c: Include multiversion.h
>>        (cgraph_finalize_function): Change assembler names of versioned
>>        functions.
>>        * cp/class.c: Include multiversion.h
>>        (add_method): aggregate function versions. Change assembler names of
>>        versioned functions.
>>        (resolve_address_of_overloaded_function): Match address of function
>>        version with default function.  Return address of ifunc dispatcher
>>        for address of versioned functions.
>>        * cp/decl.c (decls_match): Make decls unmatched for versioned
>>        functions.
>>        (duplicate_decls): Remove ambiguity for versioned functions. Notify
>>        of deleted function version decls.
>>        (start_decl): Change assembler name of versioned functions.
>>        (start_function): Change assembler name of versioned functions.
>>        (cxx_comdat_group): Make comdat group of versioned functions be the
>>        same.
>>        * cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
>>        functions that are also marked inline.
>>        * cp/decl2.c: Include multiversion.h
>>        (check_classfn): Check attributes of versioned functions for match.
>>        * cp/call.c: Include multiversion.h
>>        (build_over_call): Make calls to multiversioned functions to call the
>>        dispatcher.
>>        (joust): For calls to multi-versioned functions, make the default
>>        function win.
>>        * timevar.def (TV_MULTIVERSION_DISPATCH): New time var.
>>        * varasm.c (finish_aliases_1): Check if the alias points to a function
>>        with a body before giving an error.
>>        * Makefile.in: Add multiversion.o
>>        * passes.c: Add pass_dispatch_versions to the pass list.
>>        * config/i386/i386.c (add_condition_to_bb): New function.
>>        (get_builtin_code_for_version): New function.
>>        (ix86_dispatch_version): New function.
>>        (TARGET_DISPATCH_VERSION): New macro.
>>        * testsuite/g++.dg/mv1.C: New test.
>>
>> Index: doc/tm.texi
>> ===================================================================
>> --- doc/tm.texi (revision 184971)
>> +++ doc/tm.texi (working copy)
>> @@ -10995,6 +10995,14 @@ The result is another tree containing a simplified
>>  call's result.  If @var{ignore} is true the value will be ignored.
>>  @end deftypefn
>>
>> +@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb})
>> +For multi-versioned function, this hook sets up the dispatcher.
>> +@var{dispatch_decl} is the function that will be used to dispatch the
>> +version. @var{fndecls} are the function choices for dispatch.
>> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the
>> +code to do the dispatch will be added.
>> +@end deftypefn
>> +
>>  @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
>>
>>  Take an instruction in @var{insn} and return NULL if it is valid within a
>> Index: doc/tm.texi.in
>> ===================================================================
>> --- doc/tm.texi.in      (revision 184971)
>> +++ doc/tm.texi.in      (working copy)
>> @@ -10873,6 +10873,14 @@ The result is another tree containing a simplified
>>  call's result.  If @var{ignore} is true the value will be ignored.
>>  @end deftypefn
>>
>> +@hook TARGET_DISPATCH_VERSION
>> +For multi-versioned function, this hook sets up the dispatcher.
>> +@var{dispatch_decl} is the function that will be used to dispatch the
>> +version. @var{fndecls} are the function choices for dispatch.
>> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the
>> +code to do the dispatch will be added.
>> +@end deftypefn
>> +
>>  @hook TARGET_INVALID_WITHIN_DOLOOP
>>
>>  Take an instruction in @var{insn} and return NULL if it is valid within a
>> Index: c-family/c-common.c
>> ===================================================================
>> --- c-family/c-common.c (revision 184971)
>> +++ c-family/c-common.c (working copy)
>> @@ -315,6 +315,7 @@ static tree check_case_value (tree);
>>  static bool check_case_bounds (tree, tree, tree *, tree *);
>>
>>  static tree handle_packed_attribute (tree *, tree, tree, int, bool *);
>> +static tree handle_targetv_attribute (tree *, tree, tree, int, bool *);
>>  static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *);
>>  static tree handle_common_attribute (tree *, tree, tree, int, bool *);
>>  static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *);
>> @@ -604,6 +605,8 @@ const struct attribute_spec c_common_attribute_tab
>>  {
>>   /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
>>        affects_type_identity } */
>> +  { "targetv",               1, -1, true, false, false,
>> +                             handle_targetv_attribute, false },
>>   { "packed",                 0, 0, false, false, false,
>>                              handle_packed_attribute , false},
>>   { "nocommon",               0, 0, true,  false, false,
>> @@ -5869,6 +5872,54 @@ handle_packed_attribute (tree *node, tree name, tr
>>   return NULL_TREE;
>>  }
>>
>> +/* The targetv attribue is used to specify a function version
>> +   targeted to specific platform types.  The "targetv" attributes
>> +   have to be valid "target" attributes.  NODE should always point
>> +   to a FUNCTION_DECL.  ARGS contain the arguments to "targetv"
>> +   which should be valid arguments to attribute "target" too.
>> +   Check handle_target_attribute for FLAGS and NO_ADD_ATTRS.  */
>> +
>> +static tree
>> +handle_targetv_attribute (tree *node, tree name,
>> +                         tree args,
>> +                         int flags,
>> +                         bool *no_add_attrs)
>> +{
>> +  const char *attr_str = NULL;
>> +  gcc_assert (TREE_CODE (*node) == FUNCTION_DECL);
>> +  gcc_assert (args != NULL);
>> +
>> +  /* This is a function version.  */
>> +  DECL_FUNCTION_VERSIONED (*node) = 1;
>> +
>> +  attr_str = TREE_STRING_POINTER (TREE_VALUE (args));
>> +
>> +  /* Check if multiple sets of target attributes are there.  This
>> +     is not supported now.   In future, this will be supported by
>> +     cloning this function for each set.  */
>> +  if (TREE_CHAIN (args) != NULL)
>> +    warning (OPT_Wattributes, "%qE attribute has multiple sets which "
>> +            "is not supported", name);
>> +
>> +  if (attr_str == NULL
>> +      || strstr (attr_str, "arch=") == NULL)
>> +    error_at (DECL_SOURCE_LOCATION (*node),
>> +             "Versioning supported only on \"arch=\" for now");
>> +
>> +  /* targetv attributes must translate into target attributes.  */
>> +  handle_target_attribute (node, get_identifier ("target"), args, flags,
>> +                          no_add_attrs);
>> +
>> +  if (*no_add_attrs)
>> +    warning (OPT_Wattributes, "%qE attribute has no effect", name);
>> +
>> +  /* This is necessary to keep the attribute tagged to the decl
>> +     all the time.  */
>> +  *no_add_attrs = false;
>> +
>> +  return NULL_TREE;
>> +}
>> +
>>  /* Handle a "nocommon" attribute; arguments as in
>>    struct attribute_spec.handler.  */
>>
>> Index: target.def
>> ===================================================================
>> --- target.def  (revision 184971)
>> +++ target.def  (working copy)
>> @@ -1249,6 +1249,15 @@ DEFHOOK
>>  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
>>  hook_tree_tree_int_treep_bool_null)
>>
>> +/* Target hook to generate the dispatching code for calls to multi-versioned
>> +   functions.  DISPATCH_DECL is the function that will have the dispatching
>> +   logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the
>> +   basic bloc in DISPATCH_DECL which will contain the code.  */
>> +DEFHOOK
>> +(dispatch_version,
>> + "",
>> + int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL)
>> +
>>  /* Returns a code for a target-specific builtin that implements
>>    reciprocal of the function, or NULL_TREE if not available.  */
>>  DEFHOOK
>> Index: tree.h
>> ===================================================================
>> --- tree.h      (revision 184971)
>> +++ tree.h      (working copy)
>> @@ -3532,6 +3532,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
>>  #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
>>    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
>>
>> +/* In FUNCTION_DECL, this is set if this function has other versions generated
>> +   using "targetv" attributes.  The default version is the one which does not
>> +   have any "targetv" attribute set. */
>> +#define DECL_FUNCTION_VERSIONED(NODE)\
>> +   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
>> +
>>  /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
>>    arguments/result/saved_tree fields by front ends.   It was either inherit
>>    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
>> @@ -3576,8 +3582,8 @@ struct GTY(()) tree_function_decl {
>>   unsigned looping_const_or_pure_flag : 1;
>>   unsigned has_debug_args_flag : 1;
>>   unsigned tm_clone_flag : 1;
>> -
>> -  /* 1 bit left */
>> +  unsigned versioned_function : 1;
>> +  /* No bits left.  */
>>  };
>>
>>  /* The source language of the translation-unit.  */
>> Index: tree-pass.h
>> ===================================================================
>> --- tree-pass.h (revision 184971)
>> +++ tree-pass.h (working copy)
>> @@ -455,6 +455,7 @@ extern struct gimple_opt_pass pass_tm_memopt;
>>  extern struct gimple_opt_pass pass_tm_edges;
>>  extern struct gimple_opt_pass pass_split_functions;
>>  extern struct gimple_opt_pass pass_feedback_split_functions;
>> +extern struct gimple_opt_pass pass_dispatch_versions;
>>
>>  /* IPA Passes */
>>  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
>> Index: multiversion.c
>> ===================================================================
>> --- multiversion.c      (revision 0)
>> +++ multiversion.c      (revision 0)
>> @@ -0,0 +1,798 @@
>> +/* Function Multiversioning.
>> +   Copyright (C) 2012 Free Software Foundation, Inc.
>> +   Contributed by Sriraman Tallam (tmsriram@google.com)
>> +
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redistribute it and/or modify it under
>> +the terms of the GNU General Public License as published by the Free
>> +Software Foundation; either version 3, or (at your option) any later
>> +version.
>> +
>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>> +for more details.
>> +
>> +You should have received a copy of the GNU General Public License
>> +along with GCC; see the file COPYING3.  If not see
>> +<http://www.gnu.org/licenses/>. */
>> +
>> +/* Holds the state for multi-versioned functions here. The front-end
>> +   updates the state as and when function versions are encountered.
>> +   This is then used to generate the dispatch code.  Also, the
>> +   optimization passes to clone hot paths involving versioned functions
>> +   will be done here.
>> +
>> +   Function versions are created by using the same function signature but
>> +   also tagging attribute "targetv" to specify the platform type for which
>> +   the version must be executed.  Here is an example:
>> +
>> +   int foo ()
>> +   {
>> +     printf ("Execute as default");
>> +     return 0;
>> +   }
>> +
>> +   int  __attribute__ ((targetv ("arch=corei7")))
>> +   foo ()
>> +   {
>> +     printf ("Execute for corei7");
>> +     return 0;
>> +   }
>> +
>> +   int main ()
>> +   {
>> +     return foo ();
>> +   }
>> +
>> +   The call to foo in main is replaced with a call to an IFUNC function that
>> +   contains the dispatch code to call the correct function version at
>> +   run-time.  */
>> +
>> +
>> +#include "config.h"
>> +#include "system.h"
>> +#include "coretypes.h"
>> +#include "tm.h"
>> +#include "tree.h"
>> +#include "tree-inline.h"
>> +#include "langhooks.h"
>> +#include "flags.h"
>> +#include "cgraph.h"
>> +#include "diagnostic.h"
>> +#include "toplev.h"
>> +#include "timevar.h"
>> +#include "params.h"
>> +#include "fibheap.h"
>> +#include "intl.h"
>> +#include "tree-pass.h"
>> +#include "hashtab.h"
>> +#include "coverage.h"
>> +#include "ggc.h"
>> +#include "tree-flow.h"
>> +#include "rtl.h"
>> +#include "ipa-prop.h"
>> +#include "basic-block.h"
>> +#include "toplev.h"
>> +#include "dbgcnt.h"
>> +#include "tree-dump.h"
>> +#include "output.h"
>> +#include "vecprim.h"
>> +#include "gimple-pretty-print.h"
>> +#include "ipa-inline.h"
>> +#include "target.h"
>> +#include "multiversion.h"
>> +
>> +typedef void * void_p;
>> +
>> +DEF_VEC_P (void_p);
>> +DEF_VEC_ALLOC_P (void_p, heap);
>> +
>> +/* Each function decl that is a function version gets an instance of this
>> +   structure.   Since this is called by the front-end, decl merging can
>> +   happen, where a decl created for a new declaration is merged with
>> +   the old. In this case, the new decl is deleted and the IS_DELETED
>> +   field is set for the struct instance corresponding to the new decl.
>> +   IFUNC_DECL is the decl of the ifunc function for default decls.
>> +   IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS
>> +   is a vector containing the list of function versions  that are
>> +   the candidates for dispatch.  */
>> +
>> +typedef struct version_function_d {
>> +  tree decl;
>> +  tree ifunc_decl;
>> +  tree ifunc_resolver_decl;
>> +  VEC (void_p, heap) *versions;
>> +  bool is_deleted;
>> +} version_function;
>> +
>> +/* Hashmap has an entry for every function decl that has other function
>> +   versions.  For function decls that are the default, it also stores the
>> +   list of all the other function versions.  Each entry is a structure
>> +   of type version_function_d.  */
>> +static htab_t decl_version_htab = NULL;
>> +
>> +/* Hashtable helpers for decl_version_htab. */
>> +
>> +static hashval_t
>> +decl_version_htab_hash_descriptor (const void *p)
>> +{
>> +  const version_function *t = (const version_function *) p;
>> +  return htab_hash_pointer (t->decl);
>> +}
>> +
>> +/* Hashtable helper for decl_version_htab. */
>> +
>> +static int
>> +decl_version_htab_eq_descriptor (const void *p1, const void *p2)
>> +{
>> +  const version_function *t1 = (const version_function *) p1;
>> +  return htab_eq_pointer ((const void_p) t1->decl, p2);
>> +}
>> +
>> +/* Create the decl_version_htab.  */
>> +static void
>> +create_decl_version_htab (void)
>> +{
>> +  if (decl_version_htab == NULL)
>> +    decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor,
>> +                                    decl_version_htab_eq_descriptor, NULL);
>> +}
>> +
>> +/* Creates an instance of version_function for decl DECL.  */
>> +
>> +static version_function*
>> +new_version_function (const tree decl)
>> +{
>> +  version_function *v;
>> +  v = (version_function *)xmalloc(sizeof (version_function));
>> +  v->decl = decl;
>> +  v->ifunc_decl = NULL;
>> +  v->ifunc_resolver_decl = NULL;
>> +  v->versions = NULL;
>> +  v->is_deleted = false;
>> +  return v;
>> +}
>> +
>> +/* Comparator function to be used in qsort routine to sort attribute
>> +   specification strings to "targetv".  */
>> +
>> +static int
>> +attr_strcmp (const void *v1, const void *v2)
>> +{
>> +  const char *c1 = *(char *const*)v1;
>> +  const char *c2 = *(char *const*)v2;
>> +  return strcmp (c1, c2);
>> +}
>> +
>> +/* STR is the argument to targetv attribute.  This function tokenizes
>> +   the comma separated arguments, sorts them and returns a string which
>> +   is a unique identifier for the comma separated arguments.  */
>> +
>> +static char *
>> +sorted_attr_string (const char *str)
>> +{
>> +  char **args = NULL;
>> +  char *attr_str, *ret_str;
>> +  char *attr = NULL;
>> +  unsigned int argnum = 1;
>> +  unsigned int i;
>> +
>> +  for (i = 0; i < strlen (str); i++)
>> +    if (str[i] == ',')
>> +      argnum++;
>> +
>> +  attr_str = (char *)xmalloc (strlen (str) + 1);
>> +  strcpy (attr_str, str);
>> +
>> +  for (i = 0; i < strlen (attr_str); i++)
>> +    if (attr_str[i] == '=')
>> +      attr_str[i] = '_';
>> +
>> +  if (argnum == 1)
>> +    return attr_str;
>> +
>> +  args = (char **)xmalloc (argnum * sizeof (char *));
>> +
>> +  i = 0;
>> +  attr = strtok (attr_str, ",");
>> +  while (attr != NULL)
>> +    {
>> +      args[i] = attr;
>> +      i++;
>> +      attr = strtok (NULL, ",");
>> +    }
>> +
>> +  qsort (args, argnum, sizeof (char*), attr_strcmp);
>> +
>> +  ret_str = (char *)xmalloc (strlen (str) + 1);
>> +  strcpy (ret_str, args[0]);
>> +  for (i = 1; i < argnum; i++)
>> +    {
>> +      strcat (ret_str, "_");
>> +      strcat (ret_str, args[i]);
>> +    }
>> +
>> +  free (args);
>> +  free (attr_str);
>> +  return ret_str;
>> +}
>> +
>> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv"
>> +   or if the "targetv" attribute strings of DECL1 and DECL2 dont match.  */
>> +
>> +bool
>> +has_different_version_attributes (const tree decl1, const tree decl2)
>> +{
>> +  tree attr1, attr2;
>> +  char *c1, *c2;
>> +  bool ret = false;
>> +
>> +  if (TREE_CODE (decl1) != FUNCTION_DECL
>> +      || TREE_CODE (decl2) != FUNCTION_DECL)
>> +    return false;
>> +
>> +  attr1 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl1));
>> +  attr2 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl2));
>> +
>> +  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
>> +    return false;
>> +
>> +  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
>> +      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
>> +    return true;
>> +
>> +  c1 = sorted_attr_string (
>> +       TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1))));
>> +  c2 = sorted_attr_string (
>> +       TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2))));
>> +
>> +  if (strcmp (c1, c2) != 0)
>> +     ret = true;
>> +
>> +  free (c1);
>> +  free (c2);
>> +
>> +  return ret;
>> +}
>> +
>> +/* If this decl corresponds to a function and has "targetv" attribute,
>> +   append the attribute string to its assembler name.  */
>> +
>> +void
>> +version_assembler_name (const tree decl)
>> +{
>> +  tree version_attr;
>> +  const char *orig_name, *version_string, *attr_str;
>> +  char *assembler_name;
>> +  tree assembler_name_tree;
>> +
>> +  if (TREE_CODE (decl) != FUNCTION_DECL
>> +      || DECL_ASSEMBLER_NAME_SET_P (decl)
>> +      || !DECL_FUNCTION_VERSIONED (decl))
>> +    return;
>> +
>> +  if (DECL_DECLARED_INLINE_P (decl)
>> +      &&lookup_attribute ("gnu_inline",
>> +                         DECL_ATTRIBUTES (decl)))
>> +    error_at (DECL_SOURCE_LOCATION (decl),
>> +             "Function versions cannot be marked as gnu_inline,"
>> +             " bodies have to be generated\n");
>> +
>> +  if (DECL_VIRTUAL_P (decl)
>> +      || DECL_VINDEX (decl))
>> +    error_at (DECL_SOURCE_LOCATION (decl),
>> +             "Virtual function versioning not supported\n");
>> +
>> +  version_attr = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl));
>> +  /* targetv attribute string is NULL for default functions.  */
>> +  if (version_attr == NULL_TREE)
>> +    return;
>> +
>> +  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
>> +  version_string
>> +    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
>> +
>> +  attr_str = sorted_attr_string (version_string);
>> +  assembler_name = (char *) xmalloc (strlen (orig_name)
>> +                                    + strlen (attr_str) + 2);
>> +
>> +  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
>> +  if (dump_file)
>> +    fprintf (dump_file, "Assembler name set to %s for function version %s\n",
>> +            assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
>> +  assembler_name_tree = get_identifier (assembler_name);
>> +  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree);
>> +}
>> +
>> +/* Returns true if decl is multi-versioned and DECL is the default function,
>> +   that is it is not tagged with "targetv" attribute.  */
>> +
>> +bool
>> +is_default_function (const tree decl)
>> +{
>> +  return (TREE_CODE (decl) == FUNCTION_DECL
>> +         && DECL_FUNCTION_VERSIONED (decl)
>> +         && (lookup_attribute ("targetv", DECL_ATTRIBUTES (decl))
>> +             == NULL_TREE));
>> +}
>> +
>> +/* For function decl DECL, find the version_function struct in the
>> +   decl_version_htab.  */
>> +
>> +static version_function *
>> +find_function_version (const tree decl)
>> +{
>> +  void *slot;
>> +
>> +  if (!DECL_FUNCTION_VERSIONED (decl))
>> +    return NULL;
>> +
>> +  if (!decl_version_htab)
>> +    return NULL;
>> +
>> +  slot = htab_find_with_hash (decl_version_htab, decl,
>> +                              htab_hash_pointer (decl));
>> +
>> +  if (slot != NULL)
>> +    return (version_function *)slot;
>> +
>> +  return NULL;
>> +}
>> +
>> +/* Record DECL as a function version by creating a version_function struct
>> +   for it and storing it in the hashtable.  */
>> +
>> +static version_function *
>> +add_function_version (const tree decl)
>> +{
>> +  void **slot;
>> +  version_function *v;
>> +
>> +  if (!DECL_FUNCTION_VERSIONED (decl))
>> +    return NULL;
>> +
>> +  create_decl_version_htab ();
>> +
>> +  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl,
>> +                                   htab_hash_pointer ((const void_p)decl),
>> +                                  INSERT);
>> +
>> +  if (*slot != NULL)
>> +    return (version_function *)*slot;
>> +
>> +  v = new_version_function (decl);
>> +  *slot = v;
>> +
>> +  return v;
>> +}
>> +
>> +/* Push V into VEC only if it is not already present.  */
>> +
>> +static void
>> +push_function_version (version_function *v, VEC (void_p, heap) *vec)
>> +{
>> +  int ix;
>> +  void_p ele;
>> +  for (ix = 0; VEC_iterate (void_p, vec, ix, ele); ++ix)
>> +    {
>> +      if (ele == (void_p)v)
>> +        return;
>> +    }
>> +
>> +  VEC_safe_push (void_p, heap, vec, (void*)v);
>> +}
>> +
>> +/* Mark DECL as deleted.  This is called by the front-end when a duplicate
>> +   decl is merged with the original decl and the duplicate decl is deleted.
>> +   This function marks the duplicate_decl as invalid.  Called by
>> +   duplicate_decls in cp/decl.c.  */
>> +
>> +void
>> +mark_delete_decl_version (const tree decl)
>> +{
>> +  version_function *decl_v;
>> +
>> +  decl_v = find_function_version (decl);
>> +
>> +  if (decl_v == NULL)
>> +    return;
>> +
>> +  decl_v->is_deleted = true;
>> +
>> +  if (is_default_function (decl)
>> +      && decl_v->versions != NULL)
>> +    {
>> +      VEC_truncate (void_p, decl_v->versions, 0);
>> +      VEC_free (void_p, heap, decl_v->versions);
>> +    }
>> +}
>> +
>> +/* Mark DECL1 and DECL2 to be function versions in the same group.  One
>> +   of DECL1 and DECL2 must be the default, otherwise this function does
>> +   nothing.  This function aggregates the versions.  */
>> +
>> +int
>> +group_function_versions (const tree decl1, const tree decl2)
>> +{
>> +  tree default_decl, version_decl;
>> +  version_function *default_v, *version_v;
>> +
>> +  gcc_assert (DECL_FUNCTION_VERSIONED (decl1)
>> +             && DECL_FUNCTION_VERSIONED (decl2));
>> +
>> +  /* The version decls are added only to the default decl.  */
>> +  if (!is_default_function (decl1)
>> +      && !is_default_function (decl2))
>> +    return 0;
>> +
>> +  /* This can happen with duplicate declarations.  Just ignore.  */
>> +  if (is_default_function (decl1)
>> +      && is_default_function (decl2))
>> +    return 0;
>> +
>> +  default_decl = (is_default_function (decl1)) ? decl1 : decl2;
>> +  version_decl = (default_decl == decl1) ? decl2 : decl1;
>> +
>> +  gcc_assert (default_decl != version_decl);
>> +  create_decl_version_htab ();
>> +
>> +  /* If the version function is found, it has been added.  */
>> +  if (find_function_version (version_decl))
>> +    return 0;
>> +
>> +  default_v = add_function_version (default_decl);
>> +  version_v = add_function_version (version_decl);
>> +
>> +  if (default_v->versions == NULL)
>> +    default_v->versions = VEC_alloc (void_p, heap, 1);
>> +
>> +  push_function_version (version_v, default_v->versions);
>> +  return 0;
>> +}
>> +
>> +/* Makes a function attribute of the form NAME(ARG_NAME) and chains
>> +   it to CHAIN.  */
>> +
>> +static tree
>> +make_attribute (const char *name, const char *arg_name, tree chain)
>> +{
>> +  tree attr_name;
>> +  tree attr_arg_name;
>> +  tree attr_args;
>> +  tree attr;
>> +
>> +  attr_name = get_identifier (name);
>> +  attr_arg_name = build_string (strlen (arg_name), arg_name);
>> +  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
>> +  attr = tree_cons (attr_name, attr_args, chain);
>> +  return attr;
>> +}
>> +
>> +/* Return a new name by appending SUFFIX to the DECL name.  If
>> +   make_unique is true, append the full path name.  */
>> +
>> +static char *
>> +make_name (tree decl, const char *suffix, bool make_unique)
>> +{
>> +  char *global_var_name;
>> +  int name_len;
>> +  const char *name;
>> +  const char *unique_name = NULL;
>> +
>> +  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
>> +
>> +  /* Get a unique name that can be used globally without any chances
>> +     of collision at link time.  */
>> +  if (make_unique)
>> +    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
>> +
>> +  name_len = strlen (name) + strlen (suffix) + 2;
>> +
>> +  if (make_unique)
>> +    name_len += strlen (unique_name) + 1;
>> +  global_var_name = (char *) xmalloc (name_len);
>> +
>> +  /* Use '.' to concatenate names as it is demangler friendly.  */
>> +  if (make_unique)
>> +      snprintf (global_var_name, name_len, "%s.%s.%s", name,
>> +               unique_name, suffix);
>> +  else
>> +      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
>> +
>> +  return global_var_name;
>> +}
>> +
>> +/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch
>> +   the versions of multi-versioned function DEFAULT_DECL.  Create and
>> +   empty basic block in the resolver and store the pointer in
>> +   EMPTY_BB.  Return the decl of the resolver function.  */
>> +
>> +static tree
>> +make_ifunc_resolver_func (const tree default_decl,
>> +                         const tree ifunc_decl,
>> +                         basic_block *empty_bb)
>> +{
>> +  char *resolver_name;
>> +  tree decl, type, decl_name, t;
>> +  basic_block new_bb;
>> +  tree old_current_function_decl;
>> +  bool make_unique = false;
>> +
>> +  /* IFUNC's have to be globally visible.  So, if the default_decl is
>> +     not, then the name of the IFUNC should be made unique.  */
>> +  if (TREE_PUBLIC (default_decl) == 0)
>> +    make_unique = true;
>> +
>> +  /* Append the filename to the resolver function if the versions are
>> +     not externally visible.  This is because the resolver function has
>> +     to be externally visible for the loader to find it.  So, appending
>> +     the filename will prevent conflicts with a resolver function from
>> +     another module which is based on the same version name.  */
>> +  resolver_name = make_name (default_decl, "resolver", make_unique);
>> +
>> +  /* The resolver function should return a (void *). */
>> +  type = build_function_type_list (ptr_type_node, NULL_TREE);
>> +
>> +  decl = build_fn_decl (resolver_name, type);
>> +  decl_name = get_identifier (resolver_name);
>> +  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
>> +
>> +  DECL_NAME (decl) = decl_name;
>> +  TREE_USED (decl) = TREE_USED (default_decl);
>> +  DECL_ARTIFICIAL (decl) = 1;
>> +  DECL_IGNORED_P (decl) = 0;
>> +  /* IFUNC resolvers have to be externally visible.  */
>> +  TREE_PUBLIC (decl) = 1;
>> +  DECL_UNINLINABLE (decl) = 1;
>> +
>> +  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl);
>> +  DECL_EXTERNAL (ifunc_decl) = 0;
>> +
>> +  DECL_CONTEXT (decl) = NULL_TREE;
>> +  DECL_INITIAL (decl) = make_node (BLOCK);
>> +  DECL_STATIC_CONSTRUCTOR (decl) = 0;
>> +  TREE_READONLY (decl) = 0;
>> +  DECL_PURE_P (decl) = 0;
>> +  DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
>> +  if (DECL_COMDAT_GROUP (default_decl))
>> +    {
>> +      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
>> +    }
>> +  /* Build result decl and add to function_decl. */
>> +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
>> +  DECL_ARTIFICIAL (t) = 1;
>> +  DECL_IGNORED_P (t) = 1;
>> +  DECL_RESULT (decl) = t;
>> +
>> +  gimplify_function_tree (decl);
>> +  old_current_function_decl = current_function_decl;
>> +  push_cfun (DECL_STRUCT_FUNCTION (decl));
>> +  current_function_decl = decl;
>> +  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
>> +  cfun->curr_properties |=
>> +    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
>> +     PROP_ssa);
>> +  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
>> +  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
>> +  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
>> +  *empty_bb = new_bb;
>> +
>> +  cgraph_add_new_function (decl, true);
>> +  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
>> +  cgraph_analyze_function (cgraph_get_create_node (decl));
>> +  cgraph_mark_needed_node (cgraph_get_create_node (decl));
>> +
>> +  if (DECL_COMDAT_GROUP (default_decl))
>> +    {
>> +      gcc_assert (cgraph_get_node (default_decl));
>> +      cgraph_add_to_same_comdat_group (cgraph_get_node (decl),
>> +                                      cgraph_get_node (default_decl));
>> +    }
>> +
>> +  pop_cfun ();
>> +  current_function_decl = old_current_function_decl;
>> +
>> +  gcc_assert (ifunc_decl != NULL);
>> +  DECL_ATTRIBUTES (ifunc_decl)
>> +    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl));
>> +  assemble_alias (ifunc_decl, get_identifier (resolver_name));
>> +  return decl;
>> +}
>> +
>> +/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to
>> +   DECL function will be replaced with calls to the ifunc.   Return the decl
>> +   of the ifunc created.  */
>> +
>> +static tree
>> +make_ifunc_func (const tree decl)
>> +{
>> +  tree ifunc_decl;
>> +  char *ifunc_name, *resolver_name;
>> +  tree fn_type, ifunc_type;
>> +  bool make_unique = false;
>> +
>> +  if (TREE_PUBLIC (decl) == 0)
>> +    make_unique = true;
>> +
>> +  ifunc_name = make_name (decl, "ifunc", make_unique);
>> +  resolver_name = make_name (decl, "resolver", make_unique);
>> +  gcc_assert (resolver_name);
>> +
>> +  fn_type = TREE_TYPE (decl);
>> +  ifunc_type = build_function_type (TREE_TYPE (fn_type),
>> +                                   TYPE_ARG_TYPES (fn_type));
>> +
>> +  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type);
>> +  TREE_USED (ifunc_decl) = 1;
>> +  DECL_CONTEXT (ifunc_decl) = NULL_TREE;
>> +  DECL_INITIAL (ifunc_decl) = error_mark_node;
>> +  DECL_ARTIFICIAL (ifunc_decl) = 1;
>> +  /* Mark this ifunc as external, the resolver will flip it again if
>> +     it gets generated.  */
>> +  DECL_EXTERNAL (ifunc_decl) = 1;
>> +  /* IFUNCs have to be externally visible.  */
>> +  TREE_PUBLIC (ifunc_decl) = 1;
>> +
>> +  return ifunc_decl;
>> +}
>> +
>> +/* For multi-versioned function decl, which should also be the default,
>> +   return the decl of the ifunc resolver, create it if it does not
>> +   exist.  */
>> +
>> +tree
>> +get_ifunc_for_version (const tree decl)
>> +{
>> +  version_function *decl_v;
>> +  int ix;
>> +  void_p ele;
>> +
>> +  /* DECL has to be the default version, otherwise it is missing and
>> +     that is not allowed.  */
>> +  if (!is_default_function (decl))
>> +    {
>> +      error_at (DECL_SOURCE_LOCATION (decl), "Default version not found");
>> +      return decl;
>> +    }
>> +
>> +  decl_v = find_function_version (decl);
>> +  gcc_assert (decl_v != NULL);
>> +  if (decl_v->ifunc_decl == NULL)
>> +    {
>> +      tree ifunc_decl;
>> +      ifunc_decl = make_ifunc_func (decl);
>> +      decl_v->ifunc_decl = ifunc_decl;
>> +    }
>> +
>> +  if (cgraph_get_node (decl))
>> +    cgraph_mark_needed_node (cgraph_get_node (decl));
>> +
>> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
>> +    {
>> +      version_function *v = (version_function *) ele;
>> +      gcc_assert (v->decl != NULL);
>> +      if (cgraph_get_node (v->decl))
>> +       cgraph_mark_needed_node (cgraph_get_node (v->decl));
>> +    }
>> +
>> +  return decl_v->ifunc_decl;
>> +}
>> +
>> +/* Generate the dispatching code to dispatch multi-versioned function
>> +   DECL.  Make a new function decl for dispatching and call the target
>> +   hook to process the "targetv" attributes and provide the code to
>> +   dispatch the right function at run-time.  */
>> +
>> +static tree
>> +make_ifunc_resolver_for_version (const tree decl)
>> +{
>> +  version_function *decl_v;
>> +  tree ifunc_resolver_decl, ifunc_decl;
>> +  basic_block empty_bb;
>> +  int ix;
>> +  void_p ele;
>> +  VEC (tree, heap) *fn_ver_vec = NULL;
>> +
>> +  gcc_assert (is_default_function (decl));
>> +
>> +  decl_v = find_function_version (decl);
>> +  gcc_assert (decl_v != NULL);
>> +
>> +  if (decl_v->ifunc_resolver_decl != NULL)
>> +    return decl_v->ifunc_resolver_decl;
>> +
>> +  ifunc_decl = decl_v->ifunc_decl;
>> +
>> +  if (ifunc_decl == NULL)
>> +    ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl);
>> +
>> +  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl,
>> +                                                 &empty_bb);
>> +
>> +  fn_ver_vec = VEC_alloc (tree, heap, 2);
>> +  VEC_safe_push (tree, heap, fn_ver_vec, decl);
>> +
>> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
>> +    {
>> +      version_function *v = (version_function *) ele;
>> +      gcc_assert (v->decl != NULL);
>> +      /* Check for virtual functions here again, as by this time it should
>> +        have been determined if this function needs a vtable index or
>> +        not.  This happens for methods in derived classes that override
>> +        virtual methods in base classes but are not explicitly marked as
>> +        virtual.  */
>> +      if (DECL_VINDEX (v->decl))
>> +        error_at (DECL_SOURCE_LOCATION (v->decl),
>> +                 "Virtual function versioning not supported\n");
>> +      if (!v->is_deleted)
>> +       VEC_safe_push (tree, heap, fn_ver_vec, v->decl);
>> +    }
>> +
>> +  gcc_assert (targetm.dispatch_version);
>> +  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb);
>> +  decl_v->ifunc_resolver_decl = ifunc_resolver_decl;
>> +
>> +  return ifunc_resolver_decl;
>> +}
>> +
>> +/* Main entry point to pass_dispatch_versions. For multi-versioned functions,
>> +   generate the dispatching code.  */
>> +
>> +static unsigned int
>> +do_dispatch_versions (void)
>> +{
>> +  /* A new pass for generating dispatch code for multi-versioned functions.
>> +     Other forms of dispatch can be added when ifunc support is not available
>> +     like just calling the function directly after checking for target type.
>> +     Currently, dispatching is done through IFUNC.  This pass will become
>> +     more meaningful when other dispatch mechanisms are added.  */
>> +
>> +  /* Cloning a function to produce more versions will happen here when the
>> +     user requests that via the targetv attribute. For example,
>> +     int foo () __attribute__ ((targetv(("arch=core2"), ("arch=corei7"))));
>> +     means that the user wants the same body of foo to be versioned for core2
>> +     and corei7.  In that case, this function will be cloned during this
>> +     pass.  */
>> +
>> +  if (DECL_FUNCTION_VERSIONED (current_function_decl)
>> +      && is_default_function (current_function_decl))
>> +    {
>> +      tree decl = make_ifunc_resolver_for_version (current_function_decl);
>> +      if (dump_file && decl)
>> +       dump_function_to_file (decl, dump_file, TDF_BLOCKS);
>> +    }
>> +  return 0;
>> +}
>> +
>> +static  bool
>> +gate_dispatch_versions (void)
>> +{
>> +  return true;
>> +}
>> +
>> +/* A pass to generate the dispatch code to execute the appropriate version
>> +   of a multi-versioned function at run-time.  */
>> +
>> +struct gimple_opt_pass pass_dispatch_versions =
>> +{
>> + {
>> +  GIMPLE_PASS,
>> +  "dispatch_multiversion_functions",    /* name */
>> +  gate_dispatch_versions,              /* gate */
>> +  do_dispatch_versions,                        /* execute */
>> +  NULL,                                        /* sub */
>> +  NULL,                                        /* next */
>> +  0,                                   /* static_pass_number */
>> +  TV_MULTIVERSION_DISPATCH,            /* tv_id */
>> +  PROP_cfg,                            /* properties_required */
>> +  PROP_cfg,                            /* properties_provided */
>> +  0,                                   /* properties_destroyed */
>> +  0,                                   /* todo_flags_start */
>> +  TODO_dump_func |                     /* todo_flags_finish */
>> +  TODO_cleanup_cfg | TODO_dump_cgraph
>> + }
>> +};
>> Index: cgraphunit.c
>> ===================================================================
>> --- cgraphunit.c        (revision 184971)
>> +++ cgraphunit.c        (working copy)
>> @@ -141,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "ipa-inline.h"
>>  #include "ipa-utils.h"
>>  #include "lto-streamer.h"
>> +#include "multiversion.h"
>>
>>  static void cgraph_expand_all_functions (void);
>>  static void cgraph_mark_functions_to_output (void);
>> @@ -343,6 +344,13 @@ cgraph_finalize_function (tree decl, bool nested)
>>       node->local.redefined_extern_inline = true;
>>     }
>>
>> +  /* If this is a function version and not the default, change the
>> +     assembler name of this function.  The DECL names of function
>> +     versions are the same, only the assembler names are made unique.
>> +     The assembler name is changed by appending the string from
>> +     the "targetv" attribute.  */
>> +  version_assembler_name (decl);
>> +
>>   notice_global_symbol (decl);
>>   node->local.finalized = true;
>>   node->lowered = DECL_STRUCT_FUNCTION (decl)->cfg != NULL;
>> Index: multiversion.h
>> ===================================================================
>> --- multiversion.h      (revision 0)
>> +++ multiversion.h      (revision 0)
>> @@ -0,0 +1,52 @@
>> +/* Function Multiversioning.
>> +   Copyright (C) 2012 Free Software Foundation, Inc.
>> +   Contributed by Sriraman Tallam (tmsriram@google.com)
>> +
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redistribute it and/or modify it under
>> +the terms of the GNU General Public License as published by the Free
>> +Software Foundation; either version 3, or (at your option) any later
>> +version.
>> +
>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>> +for more details.
>> +
>> +You should have received a copy of the GNU General Public License
>> +along with GCC; see the file COPYING3.  If not see
>> +<http://www.gnu.org/licenses/>. */
>> +
>> +/* This is the header file which provides the functions to keep track
>> +   of functions that are multi-versioned and to generate the dispatch
>> +   code to call the right version at run-time.  */
>> +
>> +#ifndef GCC_MULTIVERSION_H
>> +#define GCC_MULTIVERION_H
>> +
>> +#include "tree.h"
>> +
>> +/* Mark DECL1 and DECL2 as function versions.  */
>> +int group_function_versions (const tree decl1, const tree decl2);
>> +
>> +/* Mark DECL as deleted and no longer a version.  */
>> +void mark_delete_decl_version (const tree decl);
>> +
>> +/* Returns true if DECL is the default version to be executed if all
>> +   other versions are inappropriate at run-time.  */
>> +bool is_default_function (const tree decl);
>> +
>> +/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL
>> +   must be the default function in the multi-versioned group.  */
>> +tree get_ifunc_for_version (const tree decl);
>> +
>> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv"
>> +   or if the "targetv" attribute strings of  DECL1 and DECL2 dont match.  */
>> +bool has_different_version_attributes (const tree decl1, const tree decl2);
>> +
>> +/* If DECL is a function version and not the default version, the assembler
>> +   name of DECL is changed to include the attribute string to keep the
>> +   name unambiguous.  */
>> +void version_assembler_name (const tree decl);
>> +#endif
>> Index: cp/class.c
>> ===================================================================
>> --- cp/class.c  (revision 184971)
>> +++ cp/class.c  (working copy)
>> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree-dump.h"
>>  #include "splay-tree.h"
>>  #include "pointer-set.h"
>> +#include "multiversion.h"
>>
>>  /* The number of nested classes being processed.  If we are not in the
>>    scope of any class, this is zero.  */
>> @@ -1092,7 +1093,20 @@ add_method (tree type, tree method, tree using_dec
>>              || same_type_p (TREE_TYPE (fn_type),
>>                              TREE_TYPE (method_type))))
>>        {
>> -         if (using_decl)
>> +         /* For function versions, their parms and types match
>> +            but they are not duplicates.  Record function versions
>> +            as and when they are found.  */
>> +         if (TREE_CODE (fn) == FUNCTION_DECL
>> +             && TREE_CODE (method) == FUNCTION_DECL
>> +             && (DECL_FUNCTION_VERSIONED (fn)
>> +                 || DECL_FUNCTION_VERSIONED (method)))
>> +           {
>> +             DECL_FUNCTION_VERSIONED (fn) = 1;
>> +             DECL_FUNCTION_VERSIONED (method) = 1;
>> +             group_function_versions (fn, method);
>> +             continue;
>> +           }
>> +         else if (using_decl)
>>            {
>>              if (DECL_CONTEXT (fn) == type)
>>                /* Defer to the local function.  */
>> @@ -1150,6 +1164,13 @@ add_method (tree type, tree method, tree using_dec
>>   else
>>     /* Replace the current slot.  */
>>     VEC_replace (tree, method_vec, slot, overload);
>> +
>> +  /* Change the assembler name of method here if it has "targetv"
>> +     attributes.  Since all versions have the same mangled name,
>> +     their assembler name is changed by appending the string from
>> +     the "targetv" attribute. */
>> +  version_assembler_name (method);
>> +
>>   return true;
>>  }
>>
>> @@ -6890,8 +6911,11 @@ resolve_address_of_overloaded_function (tree targe
>>          if (DECL_ANTICIPATED (fn))
>>            continue;
>>
>> -         /* See if there's a match.  */
>> -         if (same_type_p (target_fn_type, static_fn_type (fn)))
>> +         /* See if there's a match.   For functions that are multi-versioned
>> +            match it to the default function.  */
>> +         if (same_type_p (target_fn_type, static_fn_type (fn))
>> +             && (!DECL_FUNCTION_VERSIONED (fn)
>> +                 || is_default_function (fn)))
>>            matches = tree_cons (fn, NULL_TREE, matches);
>>        }
>>     }
>> @@ -7053,6 +7077,21 @@ resolve_address_of_overloaded_function (tree targe
>>       perform_or_defer_access_check (access_path, fn, fn);
>>     }
>>
>> +  /* If a pointer to a function that is multi-versioned is requested, the
>> +     pointer to the dispatcher function is returned instead.  This works
>> +     well because indirectly calling the function will dispatch the right
>> +     function version at run-time. Also, the function address is kept
>> +     unique.  */
>> +  if (DECL_FUNCTION_VERSIONED (fn)
>> +      && is_default_function (fn))
>> +    {
>> +      tree ifunc_decl;
>> +      ifunc_decl = get_ifunc_for_version (fn);
>> +      gcc_assert (ifunc_decl != NULL);
>> +      mark_used (fn);
>> +      return build_fold_addr_expr (ifunc_decl);
>> +    }
>> +
>>   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
>>     return cp_build_addr_expr (fn, flags);
>>   else
>> Index: cp/decl.c
>> ===================================================================
>> --- cp/decl.c   (revision 184971)
>> +++ cp/decl.c   (working copy)
>> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "pointer-set.h"
>>  #include "splay-tree.h"
>>  #include "plugin.h"
>> +#include "multiversion.h"
>>
>>  /* Possible cases of bad specifiers type used by bad_specifiers. */
>>  enum bad_spec_place {
>> @@ -972,6 +973,23 @@ decls_match (tree newdecl, tree olddecl)
>>       if (t1 != t2)
>>        return 0;
>>
>> +      /* The decls dont match if they correspond to two different versions
>> +        of the same function.  */
>> +      if (compparms (p1, p2)
>> +         && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2))
>> +         && (DECL_FUNCTION_VERSIONED (newdecl)
>> +             || DECL_FUNCTION_VERSIONED (olddecl))
>> +         && has_different_version_attributes (newdecl, olddecl))
>> +       {
>> +         /* One of the decls could be the default without the "targetv"
>> +            attribute. Set it to be a versioned function here.  */
>> +         DECL_FUNCTION_VERSIONED (newdecl) = 1;
>> +         DECL_FUNCTION_VERSIONED (olddecl) = 1;
>> +         /* Accumulate all the versions of a function.  */
>> +         group_function_versions (olddecl, newdecl);
>> +         return 0;
>> +       }
>> +
>>       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
>>          && ! (DECL_EXTERN_C_P (newdecl)
>>                && DECL_EXTERN_C_P (olddecl)))
>> @@ -1482,7 +1500,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
>>              error ("previous declaration %q+#D here", olddecl);
>>              return NULL_TREE;
>>            }
>> -         else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
>> +         /* For function versions, params and types match, but they
>> +            are not ambiguous.  */
>> +         else if ((!DECL_FUNCTION_VERSIONED (newdecl)
>> +                   && !DECL_FUNCTION_VERSIONED (olddecl))
>> +                  && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
>>                              TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
>>            {
>>              error ("new declaration %q#D", newdecl);
>> @@ -2250,6 +2272,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool
>>   else if (DECL_PRESERVE_P (newdecl))
>>     DECL_PRESERVE_P (olddecl) = 1;
>>
>> +  /* If the olddecl is a version, so is the newdecl.  */
>> +  if (TREE_CODE (newdecl) == FUNCTION_DECL
>> +      && DECL_FUNCTION_VERSIONED (olddecl))
>> +    {
>> +      DECL_FUNCTION_VERSIONED (newdecl) = 1;
>> +      /* Record that newdecl is not a valid version and has
>> +        been deleted.  */
>> +      mark_delete_decl_version (newdecl);
>> +    }
>> +
>>   if (TREE_CODE (newdecl) == FUNCTION_DECL)
>>     {
>>       int function_size;
>> @@ -4512,6 +4544,10 @@ start_decl (const cp_declarator *declarator,
>>   /* Enter this declaration into the symbol table.  */
>>   decl = maybe_push_decl (decl);
>>
>> +  /* If this decl is a function version and not the default, its assembler
>> +     name has to be changed.  */
>> +  version_assembler_name (decl);
>> +
>>   if (processing_template_decl)
>>     decl = push_template_decl (decl);
>>   if (decl == error_mark_node)
>> @@ -13019,6 +13055,10 @@ start_function (cp_decl_specifier_seq *declspecs,
>>     gcc_assert (same_type_p (TREE_TYPE (TREE_TYPE (decl1)),
>>                             integer_type_node));
>>
>> +  /* If this decl is a function version and not the default, its assembler
>> +     name has to be changed.  */
>> +  version_assembler_name (decl1);
>> +
>>   start_preparsed_function (decl1, attrs, /*flags=*/SF_DEFAULT);
>>
>>   return 1;
>> @@ -13960,6 +14000,11 @@ cxx_comdat_group (tree decl)
>>            break;
>>        }
>>       name = DECL_ASSEMBLER_NAME (decl);
>> +      if (TREE_CODE (decl) == FUNCTION_DECL
>> +         && DECL_FUNCTION_VERSIONED (decl))
>> +       name = DECL_NAME (decl);
>> +      else
>> +        name = DECL_ASSEMBLER_NAME (decl);
>>     }
>>
>>   return name;
>> Index: cp/semantics.c
>> ===================================================================
>> --- cp/semantics.c      (revision 184971)
>> +++ cp/semantics.c      (working copy)
>> @@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn)
>>       /* If the user wants us to keep all inline functions, then mark
>>         this function as needed so that finish_file will make sure to
>>         output it later.  Similarly, all dllexport'd functions must
>> -        be emitted; there may be callers in other DLLs.  */
>> -      if ((flag_keep_inline_functions
>> +        be emitted; there may be callers in other DLLs.
>> +        Also, mark this function as needed if it is marked inline but
>> +        is a multi-versioned function.  */
>> +      if (((flag_keep_inline_functions
>> +           || DECL_FUNCTION_VERSIONED (fn))
>>           && DECL_DECLARED_INLINE_P (fn)
>>           && !DECL_REALLY_EXTERN (fn))
>>          || (flag_keep_inline_dllexport
>> Index: cp/decl2.c
>> ===================================================================
>> --- cp/decl2.c  (revision 184971)
>> +++ cp/decl2.c  (working copy)
>> @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "splay-tree.h"
>>  #include "langhooks.h"
>>  #include "c-family/c-ada-spec.h"
>> +#include "multiversion.h"
>>
>>  extern cpp_reader *parse_in;
>>
>> @@ -674,9 +675,13 @@ check_classfn (tree ctype, tree function, tree tem
>>          if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
>>            continue;
>>
>> +         /* While finding a match, same types and params are not enough
>> +            if the function is versioned.  Also check version ("targetv")
>> +            attributes.  */
>>          if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
>>                           TREE_TYPE (TREE_TYPE (fndecl)))
>>              && compparms (p1, p2)
>> +             && !has_different_version_attributes (function, fndecl)
>>              && (!is_template
>>                  || comp_template_parms (template_parms,
>>                                          DECL_TEMPLATE_PARMS (fndecl)))
>> Index: cp/call.c
>> ===================================================================
>> --- cp/call.c   (revision 184971)
>> +++ cp/call.c   (working copy)
>> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "langhooks.h"
>>  #include "c-family/c-objc.h"
>>  #include "timevar.h"
>> +#include "multiversion.h"
>>
>>  /* The various kinds of conversion.  */
>>
>> @@ -6730,6 +6731,17 @@ build_over_call (struct z_candidate *cand, int fla
>>   if (!already_used)
>>     mark_used (fn);
>>
>> +  /* For a call to a multi-versioned function, the call should actually be to
>> +     the dispatcher.  */
>> +  if (DECL_FUNCTION_VERSIONED (fn))
>> +    {
>> +      tree ifunc_decl;
>> +      ifunc_decl = get_ifunc_for_version (fn);
>> +      gcc_assert (ifunc_decl != NULL);
>> +      return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl,
>> +                                       nargs, argarray);
>> +    }
>> +
>>   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
>>     {
>>       tree t;
>> @@ -7980,6 +7992,30 @@ joust (struct z_candidate *cand1, struct z_candida
>>   size_t i;
>>   size_t len;
>>
>> +  /* For Candidates of a multi-versioned function, the one marked default
>> +     wins.  This is because the default decl is used as key to aggregate
>> +     all the other versions provided for it in multiversion.c.  When
>> +     generating the actual call, the appropriate dispatcher is created
>> +     to call the right function version at run-time.  */
>> +
>> +  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL
>> +       && DECL_FUNCTION_VERSIONED (cand1->fn))
>> +      ||(TREE_CODE (cand2->fn) == FUNCTION_DECL
>> +        && DECL_FUNCTION_VERSIONED (cand2->fn)))
>> +    {
>> +      if (is_default_function (cand1->fn))
>> +       {
>> +          mark_used (cand2->fn);
>> +         return 1;
>> +       }
>> +      if (is_default_function (cand2->fn))
>> +       {
>> +          mark_used (cand1->fn);
>> +         return -1;
>> +       }
>> +      return 0;
>> +    }
>> +
>>   /* Candidates that involve bad conversions are always worse than those
>>      that don't.  */
>>   if (cand1->viable > cand2->viable)
>> Index: timevar.def
>> ===================================================================
>> --- timevar.def (revision 184971)
>> +++ timevar.def (working copy)
>> @@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE        , "tree if-co
>>  DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
>>  DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
>>  DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
>> +DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch")
>>
>>  /* Everything else in rest_of_compilation not included above.  */
>>  DEFTIMEVAR (TV_EARLY_LOCAL          , "early local passes")
>> Index: varasm.c
>> ===================================================================
>> --- varasm.c    (revision 184971)
>> +++ varasm.c    (working copy)
>> @@ -5755,6 +5755,8 @@ finish_aliases_1 (void)
>>        }
>>       else if (! (p->emitted_diags & ALIAS_DIAG_TO_EXTERN)
>>               && DECL_EXTERNAL (target_decl)
>> +              && (!TREE_CODE (target_decl) == FUNCTION_DECL
>> +                  || !DECL_STRUCT_FUNCTION (target_decl))
>>               /* We use local aliases for C++ thunks to force the tailcall
>>                  to bind locally.  This is a hack - to keep it working do
>>                  the following (which is not strictly correct).  */
>> Index: Makefile.in
>> ===================================================================
>> --- Makefile.in (revision 184971)
>> +++ Makefile.in (working copy)
>> @@ -1298,6 +1298,7 @@ OBJS = \
>>        mcf.o \
>>        mode-switching.o \
>>        modulo-sched.o \
>> +       multiversion.o \
>>        omega.o \
>>        omp-low.o \
>>        optabs.o \
>> @@ -3030,6 +3031,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h
>>    $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>    $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
>>    $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
>> +multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
>> +   $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
>> +   $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
>> +   $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
>> +   $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h
>>  cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \
>> Index: passes.c
>> ===================================================================
>> --- passes.c    (revision 184971)
>> +++ passes.c    (working copy)
>> @@ -1190,6 +1190,7 @@ init_optimization_passes (void)
>>   NEXT_PASS (pass_build_cfg);
>>   NEXT_PASS (pass_warn_function_return);
>>   NEXT_PASS (pass_build_cgraph_edges);
>> +  NEXT_PASS (pass_dispatch_versions);
>>   *p = NULL;
>>
>>   /* Interprocedural optimization passes.  */
>> Index: config/i386/i386.c
>> ===================================================================
>> --- config/i386/i386.c  (revision 184971)
>> +++ config/i386/i386.c  (working copy)
>> @@ -27446,6 +27473,593 @@ ix86_init_mmx_sse_builtins (void)
>>     }
>>  }
>>
>> +/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
>> +   to return a pointer to VERSION_DECL if the outcome of the function
>> +   PREDICATE_DECL is true.  This function will be called during version
>> +   dispatch to decide which function version to execute.  It returns the
>> +   basic block at the end to which more conditions can be added.  */
>> +
>> +static basic_block
>> +add_condition_to_bb (tree function_decl, tree version_decl,
>> +                    basic_block new_bb, tree predicate_decl)
>> +{
>> +  gimple return_stmt;
>> +  tree convert_expr, result_var;
>> +  gimple convert_stmt;
>> +  gimple call_cond_stmt;
>> +  gimple if_else_stmt;
>> +
>> +  basic_block bb1, bb2, bb3;
>> +  edge e12, e23;
>> +
>> +  tree cond_var;
>> +  gimple_seq gseq;
>> +
>> +  tree old_current_function_decl;
>> +
>> +  old_current_function_decl = current_function_decl;
>> +  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
>> +  current_function_decl = function_decl;
>> +
>> +  gcc_assert (new_bb != NULL);
>> +  gseq = bb_seq (new_bb);
>> +
>> +
>> +  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
>> +                        build_fold_addr_expr (version_decl));
>> +  result_var = create_tmp_var (ptr_type_node, NULL);
>> +  convert_stmt = gimple_build_assign (result_var, convert_expr);
>> +  return_stmt = gimple_build_return (result_var);
>> +
>> +  if (predicate_decl == NULL_TREE)
>> +    {
>> +      gimple_seq_add_stmt (&gseq, convert_stmt);
>> +      gimple_seq_add_stmt (&gseq, return_stmt);
>> +      set_bb_seq (new_bb, gseq);
>> +      gimple_set_bb (convert_stmt, new_bb);
>> +      gimple_set_bb (return_stmt, new_bb);
>> +      pop_cfun ();
>> +      current_function_decl = old_current_function_decl;
>> +      return new_bb;
>> +    }
>> +
>> +  cond_var = create_tmp_var (integer_type_node, NULL);
>> +  call_cond_stmt = gimple_build_call (predicate_decl, 0);
>> +  gimple_call_set_lhs (call_cond_stmt, cond_var);
>> +
>> +  gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
>> +  gimple_set_bb (call_cond_stmt, new_bb);
>> +  gimple_seq_add_stmt (&gseq, call_cond_stmt);
>> +
>> +  if_else_stmt = gimple_build_cond (GT_EXPR, cond_var,
>> +                                   integer_zero_node,
>> +                                   NULL_TREE, NULL_TREE);
>> +  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
>> +  gimple_set_bb (if_else_stmt, new_bb);
>> +  gimple_seq_add_stmt (&gseq, if_else_stmt);
>> +
>> +  gimple_seq_add_stmt (&gseq, convert_stmt);
>> +  gimple_seq_add_stmt (&gseq, return_stmt);
>> +  set_bb_seq (new_bb, gseq);
>> +
>> +  bb1 = new_bb;
>> +  e12 = split_block (bb1, if_else_stmt);
>> +  bb2 = e12->dest;
>> +  e12->flags &= ~EDGE_FALLTHRU;
>> +  e12->flags |= EDGE_TRUE_VALUE;
>> +
>> +  e23 = split_block (bb2, return_stmt);
>> +
>> +  gimple_set_bb (convert_stmt, bb2);
>> +  gimple_set_bb (return_stmt, bb2);
>> +
>> +  bb3 = e23->dest;
>> +  make_edge (bb1, bb3, EDGE_FALSE_VALUE);
>> +
>> +  remove_edge (e23);
>> +  make_edge (bb2, EXIT_BLOCK_PTR, 0);
>> +
>> +  rebuild_cgraph_edges ();
>> +
>> +  pop_cfun ();
>> +  current_function_decl = old_current_function_decl;
>> +
>> +  return bb3;
>> +}
>> +
>> +/* This parses the attribute arguments to targetv in DECL and determines
>> +   the right builtin to use to match the platform specification.
>> +   For now, only one target argument ("arch=") is allowed.  */
>> +
>> +static enum ix86_builtins
>> +get_builtin_code_for_version (tree decl)
>> +{
>> +  tree attrs;
>> +  struct cl_target_option cur_target;
>> +  tree target_node;
>> +  struct cl_target_option *new_target;
>> +  enum ix86_builtins builtin_code = IX86_BUILTIN_MAX;
>> +
>> +  attrs = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl));
>> +  gcc_assert (attrs != NULL);
>> +
>> +  cl_target_option_save (&cur_target, &global_options);
>> +
>> +  target_node = ix86_valid_target_attribute_tree
>> +                 (TREE_VALUE (TREE_VALUE (attrs)));
>> +
>> +  gcc_assert (target_node);
>> +  new_target = TREE_TARGET_OPTION (target_node);
>> +  gcc_assert (new_target);
>> +
>> +  if (new_target->arch_specified && new_target->arch > 0)
>> +    {
>> +      switch (new_target->arch)
>> +        {
>> +       case 1:
>> +       case 2:
>> +       case 3:
>> +       case 4:
>> +       case 5:
>> +       case 6:
>> +       case 7:
>> +       case 8:
>> +       case 9:
>> +       case 10:
>> +       case 11:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL;
>> +         break;
>> +       case 12:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_CORE2;
>> +         break;
>> +       case 13:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_COREI7;
>> +         break;
>> +       case 14:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_ATOM;
>> +         break;
>> +       case 15:
>> +       case 16:
>> +       case 17:
>> +       case 18:
>> +       case 19:
>> +       case 20:
>> +       case 21:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMD;
>> +         break;
>> +       case 22:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM10H;
>> +         break;
>> +       case 23:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER1;
>> +         break;
>> +       case 24:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER2;
>> +         break;
>> +       case 25: /* What is btver1 ? */
>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMD;
>> +         break;
>> +       }
>> +    }
>> +
>> +  cl_target_option_restore (&global_options, &cur_target);
>> +  if (builtin_code == IX86_BUILTIN_MAX)
>> +      error_at (DECL_SOURCE_LOCATION (decl),
>> +               "No dispatcher found for the versioning attributes");
>> +
>> +  return builtin_code;
>> +}
>> +
>> +/* This is the target hook to generate the dispatch function for
>> +   multi-versioned functions.  DISPATCH_DECL is the function which will
>> +   contain the dispatch logic.  FNDECLS are the function choices for
>> +   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
>> +   in DISPATCH_DECL in which the dispatch code is generated.  */
>> +
>> +static int
>> +ix86_dispatch_version (tree dispatch_decl,
>> +                      void *fndecls_p,
>> +                      basic_block *empty_bb)
>> +{
>> +  tree default_decl;
>> +  gimple ifunc_cpu_init_stmt;
>> +  gimple_seq gseq;
>> +  tree old_current_function_decl;
>> +  int ix;
>> +  tree ele;
>> +  VEC (tree, heap) *fndecls;
>> +
>> +  gcc_assert (dispatch_decl != NULL
>> +             && fndecls_p != NULL
>> +             && empty_bb != NULL);
>> +
>> +  /*fndecls_p is actually a vector.  */
>> +  fndecls = (VEC (tree, heap) *)fndecls_p;
>> +
>> +  /* Atleast one more version other than the default.  */
>> +  gcc_assert (VEC_length (tree, fndecls) >= 2);
>> +
>> +  /* The first version in the vector is the default decl.  */
>> +  default_decl = VEC_index (tree, fndecls, 0);
>> +
>> +  old_current_function_decl = current_function_decl;
>> +  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
>> +  current_function_decl = dispatch_decl;
>> +
>> +  gseq = bb_seq (*empty_bb);
>> +  ifunc_cpu_init_stmt = gimple_build_call_vec (
>> +                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
>> +  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
>> +  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
>> +  set_bb_seq (*empty_bb, gseq);
>> +
>> +  pop_cfun ();
>> +  current_function_decl = old_current_function_decl;
>> +
>> +
>> +  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
>> +    {
>> +      tree version_decl = ele;
>> +      /* Get attribute string, parse it and find the right predicate decl.
>> +         The predicate function could be a lengthy combination of many
>> +        features, like arch-type and various isa-variants.  For now, only
>> +        check the arch-type.  */
>> +      tree predicate_decl = ix86_builtins [
>> +                       get_builtin_code_for_version (version_decl)];
>> +      *empty_bb = add_condition_to_bb (dispatch_decl, version_decl, *empty_bb,
>> +                                      predicate_decl);
>> +
>> +    }
>> +  /* dispatch default version at the end.  */
>> +  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl, *empty_bb,
>> +                                  NULL);
>> +  return 0;
>> +}
>>
>> @@ -38610,6 +39269,12 @@ ix86_autovectorize_vector_sizes (void)
>>  #undef TARGET_BUILD_BUILTIN_VA_LIST
>>  #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list
>>
>> +#undef TARGET_DISPATCH_VERSION
>> +#define TARGET_DISPATCH_VERSION ix86_dispatch_version
>> +
>>  #undef TARGET_ENUM_VA_LIST_P
>>  #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
>>
>> Index: testsuite/g++.dg/mv1.C
>> ===================================================================
>> --- testsuite/g++.dg/mv1.C      (revision 0)
>> +++ testsuite/g++.dg/mv1.C      (revision 0)
>> @@ -0,0 +1,23 @@
>> +/* Simple test case to check if Multiversioning works.  */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +
>> +int foo ();
>> +int foo () __attribute__ ((targetv("arch=corei7")));
>> +
>> +int main ()
>> +{
>> +  int (*p)() = &foo;
>> +  return foo () + (*p)();
>> +}
>> +
>> +int foo ()
>> +{
>> +  return 0;
>> +}
>> +
>> +int __attribute__ ((targetv("arch=corei7")))
>> +foo ()
>> +{
>> +  return 0;
>> +}
>>
>>
>> --
>> This patch is available for review at http://codereview.appspot.com/5752064

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-03-07 19:08   ` Sriraman Tallam
@ 2012-03-08 21:37     ` Xinliang David Li
  0 siblings, 0 replies; 93+ messages in thread
From: Xinliang David Li @ 2012-03-08 21:37 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: Richard Guenther, reply, gcc-patches

On Wed, Mar 7, 2012 at 11:08 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Wed, Mar 7, 2012 at 6:05 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Wed, Mar 7, 2012 at 1:46 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> User directed Function Multiversioning (MV) via Function Overloading
>>> ====================================================================
>>>
>>> This patch adds support for user directed function MV via function overloading.
>>> For more detailed description:
>>> http://gcc.gnu.org/ml/gcc/2012-03/msg00074.html
>>>
>>>
>>> Here is an example program with function versions:
>>>
>>> int foo ();  /* Default version */
>>> int foo () __attribute__ ((targetv("arch=corei7")));/*Specialized for corei7 */
>>> int foo () __attribute__ ((targetv("arch=core2")));/*Specialized for core2 */
>>>
>>> int main ()
>>> {
>>>  int (*p)() = &foo;
>>>  return foo () + (*p)();
>>> }
>>>
>>> int foo ()
>>> {
>>>  return 0;
>>> }
>>>
>>> int __attribute__ ((targetv("arch=corei7")))
>>> foo ()
>>> {
>>>  return 0;
>>> }
>>>
>>> int __attribute__ ((targetv("arch=core2")))
>>> foo ()
>>> {
>>>  return 0;
>>> }
>>>
>>> The above example has foo defined 3 times, but all 3 definitions of foo are
>>> different versions of the same function. The call to foo in main, directly and
>>> via a pointer, are calls to the multi-versioned function foo which is dispatched
>>> to the right foo at run-time.
>>>
>>> Function versions must have the same signature but must differ in the specifier
>>> string provided to a new attribute called "targetv", which is nothing but the
>>> target attribute with an extra specification to indicate a version. Any number
>>> of versions can be created using the targetv attribute but it is mandatory to
>>> have one function without the attribute, which is treated as the default
>>> version.
>>>
>>> The dispatching is done using the IFUNC mechanism to keep the dispatch overhead
>>> low. The compiler creates a dispatcher function which checks the CPU type and
>>> calls the right version of foo. The dispatching code checks for the platform
>>> type and calls the first version that matches. The default function is called if
>>> no specialized version is appropriate for execution.
>>>
>>> The pointer to foo is made to be the address of the dispatcher function, so that
>>> it is unique and calls made via the pointer also work correctly. The assembler
>>> names of the various versions of foo is made different, by tagging
>>> the specifier strings, to keep them unique.  A specific version can be called
>>> directly by creating an alias to its assembler name. For instance, to call the
>>> corei7 version directly, make an alias :
>>> int foo_corei7 () __attribute__((alias ("_Z3foov.arch_corei7")));
>>> and then call foo_corei7.
>>>
>>> Note that using IFUNC  blocks inlining of versioned functions. I had implemented
>>> an optimization earlier to do hot path cloning to allow versioned functions to
>>> be inlined. Please see : http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02285.html
>>> In the next iteration, I plan to merge these two. With that, hot code paths with
>>> versioned functions will be cloned so that versioned functions can be inlined.
>>
>> Note that inlining of functions with the target attribute is limited as well,
>> but your issue is that of the indirect dispatch as ...
>>
>> You don't give an overview of the frontend implementation.  Thus I have
>> extracted the following
>>
>>  - the FE does not really know about the "overloading", nor can it directly
>>   resolve calls from a "sse" function to another "sse" function without going
>>   through the 2nd IFUNC
>
> This is a good point but I can change function joust, where the
> overload candidate is selected, to return the decl of the versioned
> function with matching target attributes as that of the callee. That
> will solve this problem. I have to treat the target attributes as an
> additional criterion for a match in overload resolution. The front end
> *does know* about the overloading, it is a question of doing the
> overload resolution correctly right?  This is easy when there is no
> cloning involved.

Should this be covered by a new IFUNC folding rule? FE just needs to
generate dummy code.

>
> When cloning of a version is required, it gets complicated since the
> FE must clone and produce the bodies. Once, all the bodies are
> available the overload resolution can do the right thing.
>

How can you safely clone a function without knowing if the versioned
body is available in another module?

David

>>
>>  - cgraph also does not know about the "overloading", so it cannot do such
>>   "devirtualization" either
>>
>> you seem to have implemented something inbetween a pure frontend
>> solution and a proper middle-end solution.
>
> The only thing I delayed is the code generation of the dispatcher. I
> thought it is better to have this come later, after cfg and cgraph is
> generated, so that multiple dispatching mechanisms could be
> implemented.
>
> For optimization and eventually
>> automatically selecting functions for cloning (like, callees of a manual "sse"
>> versioned function should be cloned?) it would be nice if the cgraph would
>> know about the different versions and their relationships (and the dispatcher).
>> Especially the cgraph code should know the functions are semantically
>> equivalent (I suppose we should require that).  The IFUNC should be
>> generated by cgraph / target code, similar to how we generate C++ thunks.
>>
>> Honza, any suggestions on how the FE side of such cgraph infrastructure
>> should look like and how we should encode the target bits?
>>
>> Thanks,
>> Richard.
>>
>>>        * doc/tm.texi.in: Add description for TARGET_DISPATCH_VERSION.
>>>        * doc/tm.texi: Regenerate.
>>>        * c-family/c-common.c (handle_targetv_attribute): New function.
>>>        * target.def (dispatch_version): New target hook.
>>>        * tree.h (DECL_FUNCTION_VERSIONED): New macro.
>>>        (tree_function_decl): New bit-field versioned_function.
>>>        * tree-pass.h (pass_dispatch_versions): New pass.
>>>        * multiversion.c: New file.
>>>        * multiversion.h: New file.
>>>        * cgraphunit.c: Include multiversion.h
>>>        (cgraph_finalize_function): Change assembler names of versioned
>>>        functions.
>>>        * cp/class.c: Include multiversion.h
>>>        (add_method): aggregate function versions. Change assembler names of
>>>        versioned functions.
>>>        (resolve_address_of_overloaded_function): Match address of function
>>>        version with default function.  Return address of ifunc dispatcher
>>>        for address of versioned functions.
>>>        * cp/decl.c (decls_match): Make decls unmatched for versioned
>>>        functions.
>>>        (duplicate_decls): Remove ambiguity for versioned functions. Notify
>>>        of deleted function version decls.
>>>        (start_decl): Change assembler name of versioned functions.
>>>        (start_function): Change assembler name of versioned functions.
>>>        (cxx_comdat_group): Make comdat group of versioned functions be the
>>>        same.
>>>        * cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
>>>        functions that are also marked inline.
>>>        * cp/decl2.c: Include multiversion.h
>>>        (check_classfn): Check attributes of versioned functions for match.
>>>        * cp/call.c: Include multiversion.h
>>>        (build_over_call): Make calls to multiversioned functions to call the
>>>        dispatcher.
>>>        (joust): For calls to multi-versioned functions, make the default
>>>        function win.
>>>        * timevar.def (TV_MULTIVERSION_DISPATCH): New time var.
>>>        * varasm.c (finish_aliases_1): Check if the alias points to a function
>>>        with a body before giving an error.
>>>        * Makefile.in: Add multiversion.o
>>>        * passes.c: Add pass_dispatch_versions to the pass list.
>>>        * config/i386/i386.c (add_condition_to_bb): New function.
>>>        (get_builtin_code_for_version): New function.
>>>        (ix86_dispatch_version): New function.
>>>        (TARGET_DISPATCH_VERSION): New macro.
>>>        * testsuite/g++.dg/mv1.C: New test.
>>>
>>> Index: doc/tm.texi
>>> ===================================================================
>>> --- doc/tm.texi (revision 184971)
>>> +++ doc/tm.texi (working copy)
>>> @@ -10995,6 +10995,14 @@ The result is another tree containing a simplified
>>>  call's result.  If @var{ignore} is true the value will be ignored.
>>>  @end deftypefn
>>>
>>> +@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb})
>>> +For multi-versioned function, this hook sets up the dispatcher.
>>> +@var{dispatch_decl} is the function that will be used to dispatch the
>>> +version. @var{fndecls} are the function choices for dispatch.
>>> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the
>>> +code to do the dispatch will be added.
>>> +@end deftypefn
>>> +
>>>  @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
>>>
>>>  Take an instruction in @var{insn} and return NULL if it is valid within a
>>> Index: doc/tm.texi.in
>>> ===================================================================
>>> --- doc/tm.texi.in      (revision 184971)
>>> +++ doc/tm.texi.in      (working copy)
>>> @@ -10873,6 +10873,14 @@ The result is another tree containing a simplified
>>>  call's result.  If @var{ignore} is true the value will be ignored.
>>>  @end deftypefn
>>>
>>> +@hook TARGET_DISPATCH_VERSION
>>> +For multi-versioned function, this hook sets up the dispatcher.
>>> +@var{dispatch_decl} is the function that will be used to dispatch the
>>> +version. @var{fndecls} are the function choices for dispatch.
>>> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the
>>> +code to do the dispatch will be added.
>>> +@end deftypefn
>>> +
>>>  @hook TARGET_INVALID_WITHIN_DOLOOP
>>>
>>>  Take an instruction in @var{insn} and return NULL if it is valid within a
>>> Index: c-family/c-common.c
>>> ===================================================================
>>> --- c-family/c-common.c (revision 184971)
>>> +++ c-family/c-common.c (working copy)
>>> @@ -315,6 +315,7 @@ static tree check_case_value (tree);
>>>  static bool check_case_bounds (tree, tree, tree *, tree *);
>>>
>>>  static tree handle_packed_attribute (tree *, tree, tree, int, bool *);
>>> +static tree handle_targetv_attribute (tree *, tree, tree, int, bool *);
>>>  static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *);
>>>  static tree handle_common_attribute (tree *, tree, tree, int, bool *);
>>>  static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *);
>>> @@ -604,6 +605,8 @@ const struct attribute_spec c_common_attribute_tab
>>>  {
>>>   /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
>>>        affects_type_identity } */
>>> +  { "targetv",               1, -1, true, false, false,
>>> +                             handle_targetv_attribute, false },
>>>   { "packed",                 0, 0, false, false, false,
>>>                              handle_packed_attribute , false},
>>>   { "nocommon",               0, 0, true,  false, false,
>>> @@ -5869,6 +5872,54 @@ handle_packed_attribute (tree *node, tree name, tr
>>>   return NULL_TREE;
>>>  }
>>>
>>> +/* The targetv attribue is used to specify a function version
>>> +   targeted to specific platform types.  The "targetv" attributes
>>> +   have to be valid "target" attributes.  NODE should always point
>>> +   to a FUNCTION_DECL.  ARGS contain the arguments to "targetv"
>>> +   which should be valid arguments to attribute "target" too.
>>> +   Check handle_target_attribute for FLAGS and NO_ADD_ATTRS.  */
>>> +
>>> +static tree
>>> +handle_targetv_attribute (tree *node, tree name,
>>> +                         tree args,
>>> +                         int flags,
>>> +                         bool *no_add_attrs)
>>> +{
>>> +  const char *attr_str = NULL;
>>> +  gcc_assert (TREE_CODE (*node) == FUNCTION_DECL);
>>> +  gcc_assert (args != NULL);
>>> +
>>> +  /* This is a function version.  */
>>> +  DECL_FUNCTION_VERSIONED (*node) = 1;
>>> +
>>> +  attr_str = TREE_STRING_POINTER (TREE_VALUE (args));
>>> +
>>> +  /* Check if multiple sets of target attributes are there.  This
>>> +     is not supported now.   In future, this will be supported by
>>> +     cloning this function for each set.  */
>>> +  if (TREE_CHAIN (args) != NULL)
>>> +    warning (OPT_Wattributes, "%qE attribute has multiple sets which "
>>> +            "is not supported", name);
>>> +
>>> +  if (attr_str == NULL
>>> +      || strstr (attr_str, "arch=") == NULL)
>>> +    error_at (DECL_SOURCE_LOCATION (*node),
>>> +             "Versioning supported only on \"arch=\" for now");
>>> +
>>> +  /* targetv attributes must translate into target attributes.  */
>>> +  handle_target_attribute (node, get_identifier ("target"), args, flags,
>>> +                          no_add_attrs);
>>> +
>>> +  if (*no_add_attrs)
>>> +    warning (OPT_Wattributes, "%qE attribute has no effect", name);
>>> +
>>> +  /* This is necessary to keep the attribute tagged to the decl
>>> +     all the time.  */
>>> +  *no_add_attrs = false;
>>> +
>>> +  return NULL_TREE;
>>> +}
>>> +
>>>  /* Handle a "nocommon" attribute; arguments as in
>>>    struct attribute_spec.handler.  */
>>>
>>> Index: target.def
>>> ===================================================================
>>> --- target.def  (revision 184971)
>>> +++ target.def  (working copy)
>>> @@ -1249,6 +1249,15 @@ DEFHOOK
>>>  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
>>>  hook_tree_tree_int_treep_bool_null)
>>>
>>> +/* Target hook to generate the dispatching code for calls to multi-versioned
>>> +   functions.  DISPATCH_DECL is the function that will have the dispatching
>>> +   logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the
>>> +   basic bloc in DISPATCH_DECL which will contain the code.  */
>>> +DEFHOOK
>>> +(dispatch_version,
>>> + "",
>>> + int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL)
>>> +
>>>  /* Returns a code for a target-specific builtin that implements
>>>    reciprocal of the function, or NULL_TREE if not available.  */
>>>  DEFHOOK
>>> Index: tree.h
>>> ===================================================================
>>> --- tree.h      (revision 184971)
>>> +++ tree.h      (working copy)
>>> @@ -3532,6 +3532,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
>>>  #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
>>>    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
>>>
>>> +/* In FUNCTION_DECL, this is set if this function has other versions generated
>>> +   using "targetv" attributes.  The default version is the one which does not
>>> +   have any "targetv" attribute set. */
>>> +#define DECL_FUNCTION_VERSIONED(NODE)\
>>> +   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
>>> +
>>>  /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
>>>    arguments/result/saved_tree fields by front ends.   It was either inherit
>>>    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
>>> @@ -3576,8 +3582,8 @@ struct GTY(()) tree_function_decl {
>>>   unsigned looping_const_or_pure_flag : 1;
>>>   unsigned has_debug_args_flag : 1;
>>>   unsigned tm_clone_flag : 1;
>>> -
>>> -  /* 1 bit left */
>>> +  unsigned versioned_function : 1;
>>> +  /* No bits left.  */
>>>  };
>>>
>>>  /* The source language of the translation-unit.  */
>>> Index: tree-pass.h
>>> ===================================================================
>>> --- tree-pass.h (revision 184971)
>>> +++ tree-pass.h (working copy)
>>> @@ -455,6 +455,7 @@ extern struct gimple_opt_pass pass_tm_memopt;
>>>  extern struct gimple_opt_pass pass_tm_edges;
>>>  extern struct gimple_opt_pass pass_split_functions;
>>>  extern struct gimple_opt_pass pass_feedback_split_functions;
>>> +extern struct gimple_opt_pass pass_dispatch_versions;
>>>
>>>  /* IPA Passes */
>>>  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
>>> Index: multiversion.c
>>> ===================================================================
>>> --- multiversion.c      (revision 0)
>>> +++ multiversion.c      (revision 0)
>>> @@ -0,0 +1,798 @@
>>> +/* Function Multiversioning.
>>> +   Copyright (C) 2012 Free Software Foundation, Inc.
>>> +   Contributed by Sriraman Tallam (tmsriram@google.com)
>>> +
>>> +This file is part of GCC.
>>> +
>>> +GCC is free software; you can redistribute it and/or modify it under
>>> +the terms of the GNU General Public License as published by the Free
>>> +Software Foundation; either version 3, or (at your option) any later
>>> +version.
>>> +
>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>> +for more details.
>>> +
>>> +You should have received a copy of the GNU General Public License
>>> +along with GCC; see the file COPYING3.  If not see
>>> +<http://www.gnu.org/licenses/>. */
>>> +
>>> +/* Holds the state for multi-versioned functions here. The front-end
>>> +   updates the state as and when function versions are encountered.
>>> +   This is then used to generate the dispatch code.  Also, the
>>> +   optimization passes to clone hot paths involving versioned functions
>>> +   will be done here.
>>> +
>>> +   Function versions are created by using the same function signature but
>>> +   also tagging attribute "targetv" to specify the platform type for which
>>> +   the version must be executed.  Here is an example:
>>> +
>>> +   int foo ()
>>> +   {
>>> +     printf ("Execute as default");
>>> +     return 0;
>>> +   }
>>> +
>>> +   int  __attribute__ ((targetv ("arch=corei7")))
>>> +   foo ()
>>> +   {
>>> +     printf ("Execute for corei7");
>>> +     return 0;
>>> +   }
>>> +
>>> +   int main ()
>>> +   {
>>> +     return foo ();
>>> +   }
>>> +
>>> +   The call to foo in main is replaced with a call to an IFUNC function that
>>> +   contains the dispatch code to call the correct function version at
>>> +   run-time.  */
>>> +
>>> +
>>> +#include "config.h"
>>> +#include "system.h"
>>> +#include "coretypes.h"
>>> +#include "tm.h"
>>> +#include "tree.h"
>>> +#include "tree-inline.h"
>>> +#include "langhooks.h"
>>> +#include "flags.h"
>>> +#include "cgraph.h"
>>> +#include "diagnostic.h"
>>> +#include "toplev.h"
>>> +#include "timevar.h"
>>> +#include "params.h"
>>> +#include "fibheap.h"
>>> +#include "intl.h"
>>> +#include "tree-pass.h"
>>> +#include "hashtab.h"
>>> +#include "coverage.h"
>>> +#include "ggc.h"
>>> +#include "tree-flow.h"
>>> +#include "rtl.h"
>>> +#include "ipa-prop.h"
>>> +#include "basic-block.h"
>>> +#include "toplev.h"
>>> +#include "dbgcnt.h"
>>> +#include "tree-dump.h"
>>> +#include "output.h"
>>> +#include "vecprim.h"
>>> +#include "gimple-pretty-print.h"
>>> +#include "ipa-inline.h"
>>> +#include "target.h"
>>> +#include "multiversion.h"
>>> +
>>> +typedef void * void_p;
>>> +
>>> +DEF_VEC_P (void_p);
>>> +DEF_VEC_ALLOC_P (void_p, heap);
>>> +
>>> +/* Each function decl that is a function version gets an instance of this
>>> +   structure.   Since this is called by the front-end, decl merging can
>>> +   happen, where a decl created for a new declaration is merged with
>>> +   the old. In this case, the new decl is deleted and the IS_DELETED
>>> +   field is set for the struct instance corresponding to the new decl.
>>> +   IFUNC_DECL is the decl of the ifunc function for default decls.
>>> +   IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS
>>> +   is a vector containing the list of function versions  that are
>>> +   the candidates for dispatch.  */
>>> +
>>> +typedef struct version_function_d {
>>> +  tree decl;
>>> +  tree ifunc_decl;
>>> +  tree ifunc_resolver_decl;
>>> +  VEC (void_p, heap) *versions;
>>> +  bool is_deleted;
>>> +} version_function;
>>> +
>>> +/* Hashmap has an entry for every function decl that has other function
>>> +   versions.  For function decls that are the default, it also stores the
>>> +   list of all the other function versions.  Each entry is a structure
>>> +   of type version_function_d.  */
>>> +static htab_t decl_version_htab = NULL;
>>> +
>>> +/* Hashtable helpers for decl_version_htab. */
>>> +
>>> +static hashval_t
>>> +decl_version_htab_hash_descriptor (const void *p)
>>> +{
>>> +  const version_function *t = (const version_function *) p;
>>> +  return htab_hash_pointer (t->decl);
>>> +}
>>> +
>>> +/* Hashtable helper for decl_version_htab. */
>>> +
>>> +static int
>>> +decl_version_htab_eq_descriptor (const void *p1, const void *p2)
>>> +{
>>> +  const version_function *t1 = (const version_function *) p1;
>>> +  return htab_eq_pointer ((const void_p) t1->decl, p2);
>>> +}
>>> +
>>> +/* Create the decl_version_htab.  */
>>> +static void
>>> +create_decl_version_htab (void)
>>> +{
>>> +  if (decl_version_htab == NULL)
>>> +    decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor,
>>> +                                    decl_version_htab_eq_descriptor, NULL);
>>> +}
>>> +
>>> +/* Creates an instance of version_function for decl DECL.  */
>>> +
>>> +static version_function*
>>> +new_version_function (const tree decl)
>>> +{
>>> +  version_function *v;
>>> +  v = (version_function *)xmalloc(sizeof (version_function));
>>> +  v->decl = decl;
>>> +  v->ifunc_decl = NULL;
>>> +  v->ifunc_resolver_decl = NULL;
>>> +  v->versions = NULL;
>>> +  v->is_deleted = false;
>>> +  return v;
>>> +}
>>> +
>>> +/* Comparator function to be used in qsort routine to sort attribute
>>> +   specification strings to "targetv".  */
>>> +
>>> +static int
>>> +attr_strcmp (const void *v1, const void *v2)
>>> +{
>>> +  const char *c1 = *(char *const*)v1;
>>> +  const char *c2 = *(char *const*)v2;
>>> +  return strcmp (c1, c2);
>>> +}
>>> +
>>> +/* STR is the argument to targetv attribute.  This function tokenizes
>>> +   the comma separated arguments, sorts them and returns a string which
>>> +   is a unique identifier for the comma separated arguments.  */
>>> +
>>> +static char *
>>> +sorted_attr_string (const char *str)
>>> +{
>>> +  char **args = NULL;
>>> +  char *attr_str, *ret_str;
>>> +  char *attr = NULL;
>>> +  unsigned int argnum = 1;
>>> +  unsigned int i;
>>> +
>>> +  for (i = 0; i < strlen (str); i++)
>>> +    if (str[i] == ',')
>>> +      argnum++;
>>> +
>>> +  attr_str = (char *)xmalloc (strlen (str) + 1);
>>> +  strcpy (attr_str, str);
>>> +
>>> +  for (i = 0; i < strlen (attr_str); i++)
>>> +    if (attr_str[i] == '=')
>>> +      attr_str[i] = '_';
>>> +
>>> +  if (argnum == 1)
>>> +    return attr_str;
>>> +
>>> +  args = (char **)xmalloc (argnum * sizeof (char *));
>>> +
>>> +  i = 0;
>>> +  attr = strtok (attr_str, ",");
>>> +  while (attr != NULL)
>>> +    {
>>> +      args[i] = attr;
>>> +      i++;
>>> +      attr = strtok (NULL, ",");
>>> +    }
>>> +
>>> +  qsort (args, argnum, sizeof (char*), attr_strcmp);
>>> +
>>> +  ret_str = (char *)xmalloc (strlen (str) + 1);
>>> +  strcpy (ret_str, args[0]);
>>> +  for (i = 1; i < argnum; i++)
>>> +    {
>>> +      strcat (ret_str, "_");
>>> +      strcat (ret_str, args[i]);
>>> +    }
>>> +
>>> +  free (args);
>>> +  free (attr_str);
>>> +  return ret_str;
>>> +}
>>> +
>>> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv"
>>> +   or if the "targetv" attribute strings of DECL1 and DECL2 dont match.  */
>>> +
>>> +bool
>>> +has_different_version_attributes (const tree decl1, const tree decl2)
>>> +{
>>> +  tree attr1, attr2;
>>> +  char *c1, *c2;
>>> +  bool ret = false;
>>> +
>>> +  if (TREE_CODE (decl1) != FUNCTION_DECL
>>> +      || TREE_CODE (decl2) != FUNCTION_DECL)
>>> +    return false;
>>> +
>>> +  attr1 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl1));
>>> +  attr2 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl2));
>>> +
>>> +  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
>>> +    return false;
>>> +
>>> +  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
>>> +      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
>>> +    return true;
>>> +
>>> +  c1 = sorted_attr_string (
>>> +       TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1))));
>>> +  c2 = sorted_attr_string (
>>> +       TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2))));
>>> +
>>> +  if (strcmp (c1, c2) != 0)
>>> +     ret = true;
>>> +
>>> +  free (c1);
>>> +  free (c2);
>>> +
>>> +  return ret;
>>> +}
>>> +
>>> +/* If this decl corresponds to a function and has "targetv" attribute,
>>> +   append the attribute string to its assembler name.  */
>>> +
>>> +void
>>> +version_assembler_name (const tree decl)
>>> +{
>>> +  tree version_attr;
>>> +  const char *orig_name, *version_string, *attr_str;
>>> +  char *assembler_name;
>>> +  tree assembler_name_tree;
>>> +
>>> +  if (TREE_CODE (decl) != FUNCTION_DECL
>>> +      || DECL_ASSEMBLER_NAME_SET_P (decl)
>>> +      || !DECL_FUNCTION_VERSIONED (decl))
>>> +    return;
>>> +
>>> +  if (DECL_DECLARED_INLINE_P (decl)
>>> +      &&lookup_attribute ("gnu_inline",
>>> +                         DECL_ATTRIBUTES (decl)))
>>> +    error_at (DECL_SOURCE_LOCATION (decl),
>>> +             "Function versions cannot be marked as gnu_inline,"
>>> +             " bodies have to be generated\n");
>>> +
>>> +  if (DECL_VIRTUAL_P (decl)
>>> +      || DECL_VINDEX (decl))
>>> +    error_at (DECL_SOURCE_LOCATION (decl),
>>> +             "Virtual function versioning not supported\n");
>>> +
>>> +  version_attr = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl));
>>> +  /* targetv attribute string is NULL for default functions.  */
>>> +  if (version_attr == NULL_TREE)
>>> +    return;
>>> +
>>> +  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
>>> +  version_string
>>> +    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
>>> +
>>> +  attr_str = sorted_attr_string (version_string);
>>> +  assembler_name = (char *) xmalloc (strlen (orig_name)
>>> +                                    + strlen (attr_str) + 2);
>>> +
>>> +  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
>>> +  if (dump_file)
>>> +    fprintf (dump_file, "Assembler name set to %s for function version %s\n",
>>> +            assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
>>> +  assembler_name_tree = get_identifier (assembler_name);
>>> +  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree);
>>> +}
>>> +
>>> +/* Returns true if decl is multi-versioned and DECL is the default function,
>>> +   that is it is not tagged with "targetv" attribute.  */
>>> +
>>> +bool
>>> +is_default_function (const tree decl)
>>> +{
>>> +  return (TREE_CODE (decl) == FUNCTION_DECL
>>> +         && DECL_FUNCTION_VERSIONED (decl)
>>> +         && (lookup_attribute ("targetv", DECL_ATTRIBUTES (decl))
>>> +             == NULL_TREE));
>>> +}
>>> +
>>> +/* For function decl DECL, find the version_function struct in the
>>> +   decl_version_htab.  */
>>> +
>>> +static version_function *
>>> +find_function_version (const tree decl)
>>> +{
>>> +  void *slot;
>>> +
>>> +  if (!DECL_FUNCTION_VERSIONED (decl))
>>> +    return NULL;
>>> +
>>> +  if (!decl_version_htab)
>>> +    return NULL;
>>> +
>>> +  slot = htab_find_with_hash (decl_version_htab, decl,
>>> +                              htab_hash_pointer (decl));
>>> +
>>> +  if (slot != NULL)
>>> +    return (version_function *)slot;
>>> +
>>> +  return NULL;
>>> +}
>>> +
>>> +/* Record DECL as a function version by creating a version_function struct
>>> +   for it and storing it in the hashtable.  */
>>> +
>>> +static version_function *
>>> +add_function_version (const tree decl)
>>> +{
>>> +  void **slot;
>>> +  version_function *v;
>>> +
>>> +  if (!DECL_FUNCTION_VERSIONED (decl))
>>> +    return NULL;
>>> +
>>> +  create_decl_version_htab ();
>>> +
>>> +  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl,
>>> +                                   htab_hash_pointer ((const void_p)decl),
>>> +                                  INSERT);
>>> +
>>> +  if (*slot != NULL)
>>> +    return (version_function *)*slot;
>>> +
>>> +  v = new_version_function (decl);
>>> +  *slot = v;
>>> +
>>> +  return v;
>>> +}
>>> +
>>> +/* Push V into VEC only if it is not already present.  */
>>> +
>>> +static void
>>> +push_function_version (version_function *v, VEC (void_p, heap) *vec)
>>> +{
>>> +  int ix;
>>> +  void_p ele;
>>> +  for (ix = 0; VEC_iterate (void_p, vec, ix, ele); ++ix)
>>> +    {
>>> +      if (ele == (void_p)v)
>>> +        return;
>>> +    }
>>> +
>>> +  VEC_safe_push (void_p, heap, vec, (void*)v);
>>> +}
>>> +
>>> +/* Mark DECL as deleted.  This is called by the front-end when a duplicate
>>> +   decl is merged with the original decl and the duplicate decl is deleted.
>>> +   This function marks the duplicate_decl as invalid.  Called by
>>> +   duplicate_decls in cp/decl.c.  */
>>> +
>>> +void
>>> +mark_delete_decl_version (const tree decl)
>>> +{
>>> +  version_function *decl_v;
>>> +
>>> +  decl_v = find_function_version (decl);
>>> +
>>> +  if (decl_v == NULL)
>>> +    return;
>>> +
>>> +  decl_v->is_deleted = true;
>>> +
>>> +  if (is_default_function (decl)
>>> +      && decl_v->versions != NULL)
>>> +    {
>>> +      VEC_truncate (void_p, decl_v->versions, 0);
>>> +      VEC_free (void_p, heap, decl_v->versions);
>>> +    }
>>> +}
>>> +
>>> +/* Mark DECL1 and DECL2 to be function versions in the same group.  One
>>> +   of DECL1 and DECL2 must be the default, otherwise this function does
>>> +   nothing.  This function aggregates the versions.  */
>>> +
>>> +int
>>> +group_function_versions (const tree decl1, const tree decl2)
>>> +{
>>> +  tree default_decl, version_decl;
>>> +  version_function *default_v, *version_v;
>>> +
>>> +  gcc_assert (DECL_FUNCTION_VERSIONED (decl1)
>>> +             && DECL_FUNCTION_VERSIONED (decl2));
>>> +
>>> +  /* The version decls are added only to the default decl.  */
>>> +  if (!is_default_function (decl1)
>>> +      && !is_default_function (decl2))
>>> +    return 0;
>>> +
>>> +  /* This can happen with duplicate declarations.  Just ignore.  */
>>> +  if (is_default_function (decl1)
>>> +      && is_default_function (decl2))
>>> +    return 0;
>>> +
>>> +  default_decl = (is_default_function (decl1)) ? decl1 : decl2;
>>> +  version_decl = (default_decl == decl1) ? decl2 : decl1;
>>> +
>>> +  gcc_assert (default_decl != version_decl);
>>> +  create_decl_version_htab ();
>>> +
>>> +  /* If the version function is found, it has been added.  */
>>> +  if (find_function_version (version_decl))
>>> +    return 0;
>>> +
>>> +  default_v = add_function_version (default_decl);
>>> +  version_v = add_function_version (version_decl);
>>> +
>>> +  if (default_v->versions == NULL)
>>> +    default_v->versions = VEC_alloc (void_p, heap, 1);
>>> +
>>> +  push_function_version (version_v, default_v->versions);
>>> +  return 0;
>>> +}
>>> +
>>> +/* Makes a function attribute of the form NAME(ARG_NAME) and chains
>>> +   it to CHAIN.  */
>>> +
>>> +static tree
>>> +make_attribute (const char *name, const char *arg_name, tree chain)
>>> +{
>>> +  tree attr_name;
>>> +  tree attr_arg_name;
>>> +  tree attr_args;
>>> +  tree attr;
>>> +
>>> +  attr_name = get_identifier (name);
>>> +  attr_arg_name = build_string (strlen (arg_name), arg_name);
>>> +  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
>>> +  attr = tree_cons (attr_name, attr_args, chain);
>>> +  return attr;
>>> +}
>>> +
>>> +/* Return a new name by appending SUFFIX to the DECL name.  If
>>> +   make_unique is true, append the full path name.  */
>>> +
>>> +static char *
>>> +make_name (tree decl, const char *suffix, bool make_unique)
>>> +{
>>> +  char *global_var_name;
>>> +  int name_len;
>>> +  const char *name;
>>> +  const char *unique_name = NULL;
>>> +
>>> +  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
>>> +
>>> +  /* Get a unique name that can be used globally without any chances
>>> +     of collision at link time.  */
>>> +  if (make_unique)
>>> +    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
>>> +
>>> +  name_len = strlen (name) + strlen (suffix) + 2;
>>> +
>>> +  if (make_unique)
>>> +    name_len += strlen (unique_name) + 1;
>>> +  global_var_name = (char *) xmalloc (name_len);
>>> +
>>> +  /* Use '.' to concatenate names as it is demangler friendly.  */
>>> +  if (make_unique)
>>> +      snprintf (global_var_name, name_len, "%s.%s.%s", name,
>>> +               unique_name, suffix);
>>> +  else
>>> +      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
>>> +
>>> +  return global_var_name;
>>> +}
>>> +
>>> +/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch
>>> +   the versions of multi-versioned function DEFAULT_DECL.  Create and
>>> +   empty basic block in the resolver and store the pointer in
>>> +   EMPTY_BB.  Return the decl of the resolver function.  */
>>> +
>>> +static tree
>>> +make_ifunc_resolver_func (const tree default_decl,
>>> +                         const tree ifunc_decl,
>>> +                         basic_block *empty_bb)
>>> +{
>>> +  char *resolver_name;
>>> +  tree decl, type, decl_name, t;
>>> +  basic_block new_bb;
>>> +  tree old_current_function_decl;
>>> +  bool make_unique = false;
>>> +
>>> +  /* IFUNC's have to be globally visible.  So, if the default_decl is
>>> +     not, then the name of the IFUNC should be made unique.  */
>>> +  if (TREE_PUBLIC (default_decl) == 0)
>>> +    make_unique = true;
>>> +
>>> +  /* Append the filename to the resolver function if the versions are
>>> +     not externally visible.  This is because the resolver function has
>>> +     to be externally visible for the loader to find it.  So, appending
>>> +     the filename will prevent conflicts with a resolver function from
>>> +     another module which is based on the same version name.  */
>>> +  resolver_name = make_name (default_decl, "resolver", make_unique);
>>> +
>>> +  /* The resolver function should return a (void *). */
>>> +  type = build_function_type_list (ptr_type_node, NULL_TREE);
>>> +
>>> +  decl = build_fn_decl (resolver_name, type);
>>> +  decl_name = get_identifier (resolver_name);
>>> +  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
>>> +
>>> +  DECL_NAME (decl) = decl_name;
>>> +  TREE_USED (decl) = TREE_USED (default_decl);
>>> +  DECL_ARTIFICIAL (decl) = 1;
>>> +  DECL_IGNORED_P (decl) = 0;
>>> +  /* IFUNC resolvers have to be externally visible.  */
>>> +  TREE_PUBLIC (decl) = 1;
>>> +  DECL_UNINLINABLE (decl) = 1;
>>> +
>>> +  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl);
>>> +  DECL_EXTERNAL (ifunc_decl) = 0;
>>> +
>>> +  DECL_CONTEXT (decl) = NULL_TREE;
>>> +  DECL_INITIAL (decl) = make_node (BLOCK);
>>> +  DECL_STATIC_CONSTRUCTOR (decl) = 0;
>>> +  TREE_READONLY (decl) = 0;
>>> +  DECL_PURE_P (decl) = 0;
>>> +  DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
>>> +  if (DECL_COMDAT_GROUP (default_decl))
>>> +    {
>>> +      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
>>> +    }
>>> +  /* Build result decl and add to function_decl. */
>>> +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
>>> +  DECL_ARTIFICIAL (t) = 1;
>>> +  DECL_IGNORED_P (t) = 1;
>>> +  DECL_RESULT (decl) = t;
>>> +
>>> +  gimplify_function_tree (decl);
>>> +  old_current_function_decl = current_function_decl;
>>> +  push_cfun (DECL_STRUCT_FUNCTION (decl));
>>> +  current_function_decl = decl;
>>> +  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
>>> +  cfun->curr_properties |=
>>> +    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
>>> +     PROP_ssa);
>>> +  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
>>> +  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
>>> +  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
>>> +  *empty_bb = new_bb;
>>> +
>>> +  cgraph_add_new_function (decl, true);
>>> +  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
>>> +  cgraph_analyze_function (cgraph_get_create_node (decl));
>>> +  cgraph_mark_needed_node (cgraph_get_create_node (decl));
>>> +
>>> +  if (DECL_COMDAT_GROUP (default_decl))
>>> +    {
>>> +      gcc_assert (cgraph_get_node (default_decl));
>>> +      cgraph_add_to_same_comdat_group (cgraph_get_node (decl),
>>> +                                      cgraph_get_node (default_decl));
>>> +    }
>>> +
>>> +  pop_cfun ();
>>> +  current_function_decl = old_current_function_decl;
>>> +
>>> +  gcc_assert (ifunc_decl != NULL);
>>> +  DECL_ATTRIBUTES (ifunc_decl)
>>> +    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl));
>>> +  assemble_alias (ifunc_decl, get_identifier (resolver_name));
>>> +  return decl;
>>> +}
>>> +
>>> +/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to
>>> +   DECL function will be replaced with calls to the ifunc.   Return the decl
>>> +   of the ifunc created.  */
>>> +
>>> +static tree
>>> +make_ifunc_func (const tree decl)
>>> +{
>>> +  tree ifunc_decl;
>>> +  char *ifunc_name, *resolver_name;
>>> +  tree fn_type, ifunc_type;
>>> +  bool make_unique = false;
>>> +
>>> +  if (TREE_PUBLIC (decl) == 0)
>>> +    make_unique = true;
>>> +
>>> +  ifunc_name = make_name (decl, "ifunc", make_unique);
>>> +  resolver_name = make_name (decl, "resolver", make_unique);
>>> +  gcc_assert (resolver_name);
>>> +
>>> +  fn_type = TREE_TYPE (decl);
>>> +  ifunc_type = build_function_type (TREE_TYPE (fn_type),
>>> +                                   TYPE_ARG_TYPES (fn_type));
>>> +
>>> +  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type);
>>> +  TREE_USED (ifunc_decl) = 1;
>>> +  DECL_CONTEXT (ifunc_decl) = NULL_TREE;
>>> +  DECL_INITIAL (ifunc_decl) = error_mark_node;
>>> +  DECL_ARTIFICIAL (ifunc_decl) = 1;
>>> +  /* Mark this ifunc as external, the resolver will flip it again if
>>> +     it gets generated.  */
>>> +  DECL_EXTERNAL (ifunc_decl) = 1;
>>> +  /* IFUNCs have to be externally visible.  */
>>> +  TREE_PUBLIC (ifunc_decl) = 1;
>>> +
>>> +  return ifunc_decl;
>>> +}
>>> +
>>> +/* For multi-versioned function decl, which should also be the default,
>>> +   return the decl of the ifunc resolver, create it if it does not
>>> +   exist.  */
>>> +
>>> +tree
>>> +get_ifunc_for_version (const tree decl)
>>> +{
>>> +  version_function *decl_v;
>>> +  int ix;
>>> +  void_p ele;
>>> +
>>> +  /* DECL has to be the default version, otherwise it is missing and
>>> +     that is not allowed.  */
>>> +  if (!is_default_function (decl))
>>> +    {
>>> +      error_at (DECL_SOURCE_LOCATION (decl), "Default version not found");
>>> +      return decl;
>>> +    }
>>> +
>>> +  decl_v = find_function_version (decl);
>>> +  gcc_assert (decl_v != NULL);
>>> +  if (decl_v->ifunc_decl == NULL)
>>> +    {
>>> +      tree ifunc_decl;
>>> +      ifunc_decl = make_ifunc_func (decl);
>>> +      decl_v->ifunc_decl = ifunc_decl;
>>> +    }
>>> +
>>> +  if (cgraph_get_node (decl))
>>> +    cgraph_mark_needed_node (cgraph_get_node (decl));
>>> +
>>> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
>>> +    {
>>> +      version_function *v = (version_function *) ele;
>>> +      gcc_assert (v->decl != NULL);
>>> +      if (cgraph_get_node (v->decl))
>>> +       cgraph_mark_needed_node (cgraph_get_node (v->decl));
>>> +    }
>>> +
>>> +  return decl_v->ifunc_decl;
>>> +}
>>> +
>>> +/* Generate the dispatching code to dispatch multi-versioned function
>>> +   DECL.  Make a new function decl for dispatching and call the target
>>> +   hook to process the "targetv" attributes and provide the code to
>>> +   dispatch the right function at run-time.  */
>>> +
>>> +static tree
>>> +make_ifunc_resolver_for_version (const tree decl)
>>> +{
>>> +  version_function *decl_v;
>>> +  tree ifunc_resolver_decl, ifunc_decl;
>>> +  basic_block empty_bb;
>>> +  int ix;
>>> +  void_p ele;
>>> +  VEC (tree, heap) *fn_ver_vec = NULL;
>>> +
>>> +  gcc_assert (is_default_function (decl));
>>> +
>>> +  decl_v = find_function_version (decl);
>>> +  gcc_assert (decl_v != NULL);
>>> +
>>> +  if (decl_v->ifunc_resolver_decl != NULL)
>>> +    return decl_v->ifunc_resolver_decl;
>>> +
>>> +  ifunc_decl = decl_v->ifunc_decl;
>>> +
>>> +  if (ifunc_decl == NULL)
>>> +    ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl);
>>> +
>>> +  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl,
>>> +                                                 &empty_bb);
>>> +
>>> +  fn_ver_vec = VEC_alloc (tree, heap, 2);
>>> +  VEC_safe_push (tree, heap, fn_ver_vec, decl);
>>> +
>>> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
>>> +    {
>>> +      version_function *v = (version_function *) ele;
>>> +      gcc_assert (v->decl != NULL);
>>> +      /* Check for virtual functions here again, as by this time it should
>>> +        have been determined if this function needs a vtable index or
>>> +        not.  This happens for methods in derived classes that override
>>> +        virtual methods in base classes but are not explicitly marked as
>>> +        virtual.  */
>>> +      if (DECL_VINDEX (v->decl))
>>> +        error_at (DECL_SOURCE_LOCATION (v->decl),
>>> +                 "Virtual function versioning not supported\n");
>>> +      if (!v->is_deleted)
>>> +       VEC_safe_push (tree, heap, fn_ver_vec, v->decl);
>>> +    }
>>> +
>>> +  gcc_assert (targetm.dispatch_version);
>>> +  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb);
>>> +  decl_v->ifunc_resolver_decl = ifunc_resolver_decl;
>>> +
>>> +  return ifunc_resolver_decl;
>>> +}
>>> +
>>> +/* Main entry point to pass_dispatch_versions. For multi-versioned functions,
>>> +   generate the dispatching code.  */
>>> +
>>> +static unsigned int
>>> +do_dispatch_versions (void)
>>> +{
>>> +  /* A new pass for generating dispatch code for multi-versioned functions.
>>> +     Other forms of dispatch can be added when ifunc support is not available
>>> +     like just calling the function directly after checking for target type.
>>> +     Currently, dispatching is done through IFUNC.  This pass will become
>>> +     more meaningful when other dispatch mechanisms are added.  */
>>> +
>>> +  /* Cloning a function to produce more versions will happen here when the
>>> +     user requests that via the targetv attribute. For example,
>>> +     int foo () __attribute__ ((targetv(("arch=core2"), ("arch=corei7"))));
>>> +     means that the user wants the same body of foo to be versioned for core2
>>> +     and corei7.  In that case, this function will be cloned during this
>>> +     pass.  */
>>> +
>>> +  if (DECL_FUNCTION_VERSIONED (current_function_decl)
>>> +      && is_default_function (current_function_decl))
>>> +    {
>>> +      tree decl = make_ifunc_resolver_for_version (current_function_decl);
>>> +      if (dump_file && decl)
>>> +       dump_function_to_file (decl, dump_file, TDF_BLOCKS);
>>> +    }
>>> +  return 0;
>>> +}
>>> +
>>> +static  bool
>>> +gate_dispatch_versions (void)
>>> +{
>>> +  return true;
>>> +}
>>> +
>>> +/* A pass to generate the dispatch code to execute the appropriate version
>>> +   of a multi-versioned function at run-time.  */
>>> +
>>> +struct gimple_opt_pass pass_dispatch_versions =
>>> +{
>>> + {
>>> +  GIMPLE_PASS,
>>> +  "dispatch_multiversion_functions",    /* name */
>>> +  gate_dispatch_versions,              /* gate */
>>> +  do_dispatch_versions,                        /* execute */
>>> +  NULL,                                        /* sub */
>>> +  NULL,                                        /* next */
>>> +  0,                                   /* static_pass_number */
>>> +  TV_MULTIVERSION_DISPATCH,            /* tv_id */
>>> +  PROP_cfg,                            /* properties_required */
>>> +  PROP_cfg,                            /* properties_provided */
>>> +  0,                                   /* properties_destroyed */
>>> +  0,                                   /* todo_flags_start */
>>> +  TODO_dump_func |                     /* todo_flags_finish */
>>> +  TODO_cleanup_cfg | TODO_dump_cgraph
>>> + }
>>> +};
>>> Index: cgraphunit.c
>>> ===================================================================
>>> --- cgraphunit.c        (revision 184971)
>>> +++ cgraphunit.c        (working copy)
>>> @@ -141,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "ipa-inline.h"
>>>  #include "ipa-utils.h"
>>>  #include "lto-streamer.h"
>>> +#include "multiversion.h"
>>>
>>>  static void cgraph_expand_all_functions (void);
>>>  static void cgraph_mark_functions_to_output (void);
>>> @@ -343,6 +344,13 @@ cgraph_finalize_function (tree decl, bool nested)
>>>       node->local.redefined_extern_inline = true;
>>>     }
>>>
>>> +  /* If this is a function version and not the default, change the
>>> +     assembler name of this function.  The DECL names of function
>>> +     versions are the same, only the assembler names are made unique.
>>> +     The assembler name is changed by appending the string from
>>> +     the "targetv" attribute.  */
>>> +  version_assembler_name (decl);
>>> +
>>>   notice_global_symbol (decl);
>>>   node->local.finalized = true;
>>>   node->lowered = DECL_STRUCT_FUNCTION (decl)->cfg != NULL;
>>> Index: multiversion.h
>>> ===================================================================
>>> --- multiversion.h      (revision 0)
>>> +++ multiversion.h      (revision 0)
>>> @@ -0,0 +1,52 @@
>>> +/* Function Multiversioning.
>>> +   Copyright (C) 2012 Free Software Foundation, Inc.
>>> +   Contributed by Sriraman Tallam (tmsriram@google.com)
>>> +
>>> +This file is part of GCC.
>>> +
>>> +GCC is free software; you can redistribute it and/or modify it under
>>> +the terms of the GNU General Public License as published by the Free
>>> +Software Foundation; either version 3, or (at your option) any later
>>> +version.
>>> +
>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>> +for more details.
>>> +
>>> +You should have received a copy of the GNU General Public License
>>> +along with GCC; see the file COPYING3.  If not see
>>> +<http://www.gnu.org/licenses/>. */
>>> +
>>> +/* This is the header file which provides the functions to keep track
>>> +   of functions that are multi-versioned and to generate the dispatch
>>> +   code to call the right version at run-time.  */
>>> +
>>> +#ifndef GCC_MULTIVERSION_H
>>> +#define GCC_MULTIVERION_H
>>> +
>>> +#include "tree.h"
>>> +
>>> +/* Mark DECL1 and DECL2 as function versions.  */
>>> +int group_function_versions (const tree decl1, const tree decl2);
>>> +
>>> +/* Mark DECL as deleted and no longer a version.  */
>>> +void mark_delete_decl_version (const tree decl);
>>> +
>>> +/* Returns true if DECL is the default version to be executed if all
>>> +   other versions are inappropriate at run-time.  */
>>> +bool is_default_function (const tree decl);
>>> +
>>> +/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL
>>> +   must be the default function in the multi-versioned group.  */
>>> +tree get_ifunc_for_version (const tree decl);
>>> +
>>> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv"
>>> +   or if the "targetv" attribute strings of  DECL1 and DECL2 dont match.  */
>>> +bool has_different_version_attributes (const tree decl1, const tree decl2);
>>> +
>>> +/* If DECL is a function version and not the default version, the assembler
>>> +   name of DECL is changed to include the attribute string to keep the
>>> +   name unambiguous.  */
>>> +void version_assembler_name (const tree decl);
>>> +#endif
>>> Index: cp/class.c
>>> ===================================================================
>>> --- cp/class.c  (revision 184971)
>>> +++ cp/class.c  (working copy)
>>> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "tree-dump.h"
>>>  #include "splay-tree.h"
>>>  #include "pointer-set.h"
>>> +#include "multiversion.h"
>>>
>>>  /* The number of nested classes being processed.  If we are not in the
>>>    scope of any class, this is zero.  */
>>> @@ -1092,7 +1093,20 @@ add_method (tree type, tree method, tree using_dec
>>>              || same_type_p (TREE_TYPE (fn_type),
>>>                              TREE_TYPE (method_type))))
>>>        {
>>> -         if (using_decl)
>>> +         /* For function versions, their parms and types match
>>> +            but they are not duplicates.  Record function versions
>>> +            as and when they are found.  */
>>> +         if (TREE_CODE (fn) == FUNCTION_DECL
>>> +             && TREE_CODE (method) == FUNCTION_DECL
>>> +             && (DECL_FUNCTION_VERSIONED (fn)
>>> +                 || DECL_FUNCTION_VERSIONED (method)))
>>> +           {
>>> +             DECL_FUNCTION_VERSIONED (fn) = 1;
>>> +             DECL_FUNCTION_VERSIONED (method) = 1;
>>> +             group_function_versions (fn, method);
>>> +             continue;
>>> +           }
>>> +         else if (using_decl)
>>>            {
>>>              if (DECL_CONTEXT (fn) == type)
>>>                /* Defer to the local function.  */
>>> @@ -1150,6 +1164,13 @@ add_method (tree type, tree method, tree using_dec
>>>   else
>>>     /* Replace the current slot.  */
>>>     VEC_replace (tree, method_vec, slot, overload);
>>> +
>>> +  /* Change the assembler name of method here if it has "targetv"
>>> +     attributes.  Since all versions have the same mangled name,
>>> +     their assembler name is changed by appending the string from
>>> +     the "targetv" attribute. */
>>> +  version_assembler_name (method);
>>> +
>>>   return true;
>>>  }
>>>
>>> @@ -6890,8 +6911,11 @@ resolve_address_of_overloaded_function (tree targe
>>>          if (DECL_ANTICIPATED (fn))
>>>            continue;
>>>
>>> -         /* See if there's a match.  */
>>> -         if (same_type_p (target_fn_type, static_fn_type (fn)))
>>> +         /* See if there's a match.   For functions that are multi-versioned
>>> +            match it to the default function.  */
>>> +         if (same_type_p (target_fn_type, static_fn_type (fn))
>>> +             && (!DECL_FUNCTION_VERSIONED (fn)
>>> +                 || is_default_function (fn)))
>>>            matches = tree_cons (fn, NULL_TREE, matches);
>>>        }
>>>     }
>>> @@ -7053,6 +7077,21 @@ resolve_address_of_overloaded_function (tree targe
>>>       perform_or_defer_access_check (access_path, fn, fn);
>>>     }
>>>
>>> +  /* If a pointer to a function that is multi-versioned is requested, the
>>> +     pointer to the dispatcher function is returned instead.  This works
>>> +     well because indirectly calling the function will dispatch the right
>>> +     function version at run-time. Also, the function address is kept
>>> +     unique.  */
>>> +  if (DECL_FUNCTION_VERSIONED (fn)
>>> +      && is_default_function (fn))
>>> +    {
>>> +      tree ifunc_decl;
>>> +      ifunc_decl = get_ifunc_for_version (fn);
>>> +      gcc_assert (ifunc_decl != NULL);
>>> +      mark_used (fn);
>>> +      return build_fold_addr_expr (ifunc_decl);
>>> +    }
>>> +
>>>   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
>>>     return cp_build_addr_expr (fn, flags);
>>>   else
>>> Index: cp/decl.c
>>> ===================================================================
>>> --- cp/decl.c   (revision 184971)
>>> +++ cp/decl.c   (working copy)
>>> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "pointer-set.h"
>>>  #include "splay-tree.h"
>>>  #include "plugin.h"
>>> +#include "multiversion.h"
>>>
>>>  /* Possible cases of bad specifiers type used by bad_specifiers. */
>>>  enum bad_spec_place {
>>> @@ -972,6 +973,23 @@ decls_match (tree newdecl, tree olddecl)
>>>       if (t1 != t2)
>>>        return 0;
>>>
>>> +      /* The decls dont match if they correspond to two different versions
>>> +        of the same function.  */
>>> +      if (compparms (p1, p2)
>>> +         && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2))
>>> +         && (DECL_FUNCTION_VERSIONED (newdecl)
>>> +             || DECL_FUNCTION_VERSIONED (olddecl))
>>> +         && has_different_version_attributes (newdecl, olddecl))
>>> +       {
>>> +         /* One of the decls could be the default without the "targetv"
>>> +            attribute. Set it to be a versioned function here.  */
>>> +         DECL_FUNCTION_VERSIONED (newdecl) = 1;
>>> +         DECL_FUNCTION_VERSIONED (olddecl) = 1;
>>> +         /* Accumulate all the versions of a function.  */
>>> +         group_function_versions (olddecl, newdecl);
>>> +         return 0;
>>> +       }
>>> +
>>>       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
>>>          && ! (DECL_EXTERN_C_P (newdecl)
>>>                && DECL_EXTERN_C_P (olddecl)))
>>> @@ -1482,7 +1500,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
>>>              error ("previous declaration %q+#D here", olddecl);
>>>              return NULL_TREE;
>>>            }
>>> -         else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
>>> +         /* For function versions, params and types match, but they
>>> +            are not ambiguous.  */
>>> +         else if ((!DECL_FUNCTION_VERSIONED (newdecl)
>>> +                   && !DECL_FUNCTION_VERSIONED (olddecl))
>>> +                  && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
>>>                              TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
>>>            {
>>>              error ("new declaration %q#D", newdecl);
>>> @@ -2250,6 +2272,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool
>>>   else if (DECL_PRESERVE_P (newdecl))
>>>     DECL_PRESERVE_P (olddecl) = 1;
>>>
>>> +  /* If the olddecl is a version, so is the newdecl.  */
>>> +  if (TREE_CODE (newdecl) == FUNCTION_DECL
>>> +      && DECL_FUNCTION_VERSIONED (olddecl))
>>> +    {
>>> +      DECL_FUNCTION_VERSIONED (newdecl) = 1;
>>> +      /* Record that newdecl is not a valid version and has
>>> +        been deleted.  */
>>> +      mark_delete_decl_version (newdecl);
>>> +    }
>>> +
>>>   if (TREE_CODE (newdecl) == FUNCTION_DECL)
>>>     {
>>>       int function_size;
>>> @@ -4512,6 +4544,10 @@ start_decl (const cp_declarator *declarator,
>>>   /* Enter this declaration into the symbol table.  */
>>>   decl = maybe_push_decl (decl);
>>>
>>> +  /* If this decl is a function version and not the default, its assembler
>>> +     name has to be changed.  */
>>> +  version_assembler_name (decl);
>>> +
>>>   if (processing_template_decl)
>>>     decl = push_template_decl (decl);
>>>   if (decl == error_mark_node)
>>> @@ -13019,6 +13055,10 @@ start_function (cp_decl_specifier_seq *declspecs,
>>>     gcc_assert (same_type_p (TREE_TYPE (TREE_TYPE (decl1)),
>>>                             integer_type_node));
>>>
>>> +  /* If this decl is a function version and not the default, its assembler
>>> +     name has to be changed.  */
>>> +  version_assembler_name (decl1);
>>> +
>>>   start_preparsed_function (decl1, attrs, /*flags=*/SF_DEFAULT);
>>>
>>>   return 1;
>>> @@ -13960,6 +14000,11 @@ cxx_comdat_group (tree decl)
>>>            break;
>>>        }
>>>       name = DECL_ASSEMBLER_NAME (decl);
>>> +      if (TREE_CODE (decl) == FUNCTION_DECL
>>> +         && DECL_FUNCTION_VERSIONED (decl))
>>> +       name = DECL_NAME (decl);
>>> +      else
>>> +        name = DECL_ASSEMBLER_NAME (decl);
>>>     }
>>>
>>>   return name;
>>> Index: cp/semantics.c
>>> ===================================================================
>>> --- cp/semantics.c      (revision 184971)
>>> +++ cp/semantics.c      (working copy)
>>> @@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn)
>>>       /* If the user wants us to keep all inline functions, then mark
>>>         this function as needed so that finish_file will make sure to
>>>         output it later.  Similarly, all dllexport'd functions must
>>> -        be emitted; there may be callers in other DLLs.  */
>>> -      if ((flag_keep_inline_functions
>>> +        be emitted; there may be callers in other DLLs.
>>> +        Also, mark this function as needed if it is marked inline but
>>> +        is a multi-versioned function.  */
>>> +      if (((flag_keep_inline_functions
>>> +           || DECL_FUNCTION_VERSIONED (fn))
>>>           && DECL_DECLARED_INLINE_P (fn)
>>>           && !DECL_REALLY_EXTERN (fn))
>>>          || (flag_keep_inline_dllexport
>>> Index: cp/decl2.c
>>> ===================================================================
>>> --- cp/decl2.c  (revision 184971)
>>> +++ cp/decl2.c  (working copy)
>>> @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "splay-tree.h"
>>>  #include "langhooks.h"
>>>  #include "c-family/c-ada-spec.h"
>>> +#include "multiversion.h"
>>>
>>>  extern cpp_reader *parse_in;
>>>
>>> @@ -674,9 +675,13 @@ check_classfn (tree ctype, tree function, tree tem
>>>          if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
>>>            continue;
>>>
>>> +         /* While finding a match, same types and params are not enough
>>> +            if the function is versioned.  Also check version ("targetv")
>>> +            attributes.  */
>>>          if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
>>>                           TREE_TYPE (TREE_TYPE (fndecl)))
>>>              && compparms (p1, p2)
>>> +             && !has_different_version_attributes (function, fndecl)
>>>              && (!is_template
>>>                  || comp_template_parms (template_parms,
>>>                                          DECL_TEMPLATE_PARMS (fndecl)))
>>> Index: cp/call.c
>>> ===================================================================
>>> --- cp/call.c   (revision 184971)
>>> +++ cp/call.c   (working copy)
>>> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "langhooks.h"
>>>  #include "c-family/c-objc.h"
>>>  #include "timevar.h"
>>> +#include "multiversion.h"
>>>
>>>  /* The various kinds of conversion.  */
>>>
>>> @@ -6730,6 +6731,17 @@ build_over_call (struct z_candidate *cand, int fla
>>>   if (!already_used)
>>>     mark_used (fn);
>>>
>>> +  /* For a call to a multi-versioned function, the call should actually be to
>>> +     the dispatcher.  */
>>> +  if (DECL_FUNCTION_VERSIONED (fn))
>>> +    {
>>> +      tree ifunc_decl;
>>> +      ifunc_decl = get_ifunc_for_version (fn);
>>> +      gcc_assert (ifunc_decl != NULL);
>>> +      return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl,
>>> +                                       nargs, argarray);
>>> +    }
>>> +
>>>   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
>>>     {
>>>       tree t;
>>> @@ -7980,6 +7992,30 @@ joust (struct z_candidate *cand1, struct z_candida
>>>   size_t i;
>>>   size_t len;
>>>
>>> +  /* For Candidates of a multi-versioned function, the one marked default
>>> +     wins.  This is because the default decl is used as key to aggregate
>>> +     all the other versions provided for it in multiversion.c.  When
>>> +     generating the actual call, the appropriate dispatcher is created
>>> +     to call the right function version at run-time.  */
>>> +
>>> +  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL
>>> +       && DECL_FUNCTION_VERSIONED (cand1->fn))
>>> +      ||(TREE_CODE (cand2->fn) == FUNCTION_DECL
>>> +        && DECL_FUNCTION_VERSIONED (cand2->fn)))
>>> +    {
>>> +      if (is_default_function (cand1->fn))
>>> +       {
>>> +          mark_used (cand2->fn);
>>> +         return 1;
>>> +       }
>>> +      if (is_default_function (cand2->fn))
>>> +       {
>>> +          mark_used (cand1->fn);
>>> +         return -1;
>>> +       }
>>> +      return 0;
>>> +    }
>>> +
>>>   /* Candidates that involve bad conversions are always worse than those
>>>      that don't.  */
>>>   if (cand1->viable > cand2->viable)
>>> Index: timevar.def
>>> ===================================================================
>>> --- timevar.def (revision 184971)
>>> +++ timevar.def (working copy)
>>> @@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE        , "tree if-co
>>>  DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
>>>  DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
>>>  DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
>>> +DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch")
>>>
>>>  /* Everything else in rest_of_compilation not included above.  */
>>>  DEFTIMEVAR (TV_EARLY_LOCAL          , "early local passes")
>>> Index: varasm.c
>>> ===================================================================
>>> --- varasm.c    (revision 184971)
>>> +++ varasm.c    (working copy)
>>> @@ -5755,6 +5755,8 @@ finish_aliases_1 (void)
>>>        }
>>>       else if (! (p->emitted_diags & ALIAS_DIAG_TO_EXTERN)
>>>               && DECL_EXTERNAL (target_decl)
>>> +              && (!TREE_CODE (target_decl) == FUNCTION_DECL
>>> +                  || !DECL_STRUCT_FUNCTION (target_decl))
>>>               /* We use local aliases for C++ thunks to force the tailcall
>>>                  to bind locally.  This is a hack - to keep it working do
>>>                  the following (which is not strictly correct).  */
>>> Index: Makefile.in
>>> ===================================================================
>>> --- Makefile.in (revision 184971)
>>> +++ Makefile.in (working copy)
>>> @@ -1298,6 +1298,7 @@ OBJS = \
>>>        mcf.o \
>>>        mode-switching.o \
>>>        modulo-sched.o \
>>> +       multiversion.o \
>>>        omega.o \
>>>        omp-low.o \
>>>        optabs.o \
>>> @@ -3030,6 +3031,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h
>>>    $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>>    $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
>>>    $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
>>> +multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
>>> +   $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
>>> +   $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
>>> +   $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
>>> +   $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h
>>>  cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>>    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \
>>> Index: passes.c
>>> ===================================================================
>>> --- passes.c    (revision 184971)
>>> +++ passes.c    (working copy)
>>> @@ -1190,6 +1190,7 @@ init_optimization_passes (void)
>>>   NEXT_PASS (pass_build_cfg);
>>>   NEXT_PASS (pass_warn_function_return);
>>>   NEXT_PASS (pass_build_cgraph_edges);
>>> +  NEXT_PASS (pass_dispatch_versions);
>>>   *p = NULL;
>>>
>>>   /* Interprocedural optimization passes.  */
>>> Index: config/i386/i386.c
>>> ===================================================================
>>> --- config/i386/i386.c  (revision 184971)
>>> +++ config/i386/i386.c  (working copy)
>>> @@ -27446,6 +27473,593 @@ ix86_init_mmx_sse_builtins (void)
>>>     }
>>>  }
>>>
>>> +/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
>>> +   to return a pointer to VERSION_DECL if the outcome of the function
>>> +   PREDICATE_DECL is true.  This function will be called during version
>>> +   dispatch to decide which function version to execute.  It returns the
>>> +   basic block at the end to which more conditions can be added.  */
>>> +
>>> +static basic_block
>>> +add_condition_to_bb (tree function_decl, tree version_decl,
>>> +                    basic_block new_bb, tree predicate_decl)
>>> +{
>>> +  gimple return_stmt;
>>> +  tree convert_expr, result_var;
>>> +  gimple convert_stmt;
>>> +  gimple call_cond_stmt;
>>> +  gimple if_else_stmt;
>>> +
>>> +  basic_block bb1, bb2, bb3;
>>> +  edge e12, e23;
>>> +
>>> +  tree cond_var;
>>> +  gimple_seq gseq;
>>> +
>>> +  tree old_current_function_decl;
>>> +
>>> +  old_current_function_decl = current_function_decl;
>>> +  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
>>> +  current_function_decl = function_decl;
>>> +
>>> +  gcc_assert (new_bb != NULL);
>>> +  gseq = bb_seq (new_bb);
>>> +
>>> +
>>> +  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
>>> +                        build_fold_addr_expr (version_decl));
>>> +  result_var = create_tmp_var (ptr_type_node, NULL);
>>> +  convert_stmt = gimple_build_assign (result_var, convert_expr);
>>> +  return_stmt = gimple_build_return (result_var);
>>> +
>>> +  if (predicate_decl == NULL_TREE)
>>> +    {
>>> +      gimple_seq_add_stmt (&gseq, convert_stmt);
>>> +      gimple_seq_add_stmt (&gseq, return_stmt);
>>> +      set_bb_seq (new_bb, gseq);
>>> +      gimple_set_bb (convert_stmt, new_bb);
>>> +      gimple_set_bb (return_stmt, new_bb);
>>> +      pop_cfun ();
>>> +      current_function_decl = old_current_function_decl;
>>> +      return new_bb;
>>> +    }
>>> +
>>> +  cond_var = create_tmp_var (integer_type_node, NULL);
>>> +  call_cond_stmt = gimple_build_call (predicate_decl, 0);
>>> +  gimple_call_set_lhs (call_cond_stmt, cond_var);
>>> +
>>> +  gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
>>> +  gimple_set_bb (call_cond_stmt, new_bb);
>>> +  gimple_seq_add_stmt (&gseq, call_cond_stmt);
>>> +
>>> +  if_else_stmt = gimple_build_cond (GT_EXPR, cond_var,
>>> +                                   integer_zero_node,
>>> +                                   NULL_TREE, NULL_TREE);
>>> +  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
>>> +  gimple_set_bb (if_else_stmt, new_bb);
>>> +  gimple_seq_add_stmt (&gseq, if_else_stmt);
>>> +
>>> +  gimple_seq_add_stmt (&gseq, convert_stmt);
>>> +  gimple_seq_add_stmt (&gseq, return_stmt);
>>> +  set_bb_seq (new_bb, gseq);
>>> +
>>> +  bb1 = new_bb;
>>> +  e12 = split_block (bb1, if_else_stmt);
>>> +  bb2 = e12->dest;
>>> +  e12->flags &= ~EDGE_FALLTHRU;
>>> +  e12->flags |= EDGE_TRUE_VALUE;
>>> +
>>> +  e23 = split_block (bb2, return_stmt);
>>> +
>>> +  gimple_set_bb (convert_stmt, bb2);
>>> +  gimple_set_bb (return_stmt, bb2);
>>> +
>>> +  bb3 = e23->dest;
>>> +  make_edge (bb1, bb3, EDGE_FALSE_VALUE);
>>> +
>>> +  remove_edge (e23);
>>> +  make_edge (bb2, EXIT_BLOCK_PTR, 0);
>>> +
>>> +  rebuild_cgraph_edges ();
>>> +
>>> +  pop_cfun ();
>>> +  current_function_decl = old_current_function_decl;
>>> +
>>> +  return bb3;
>>> +}
>>> +
>>> +/* This parses the attribute arguments to targetv in DECL and determines
>>> +   the right builtin to use to match the platform specification.
>>> +   For now, only one target argument ("arch=") is allowed.  */
>>> +
>>> +static enum ix86_builtins
>>> +get_builtin_code_for_version (tree decl)
>>> +{
>>> +  tree attrs;
>>> +  struct cl_target_option cur_target;
>>> +  tree target_node;
>>> +  struct cl_target_option *new_target;
>>> +  enum ix86_builtins builtin_code = IX86_BUILTIN_MAX;
>>> +
>>> +  attrs = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl));
>>> +  gcc_assert (attrs != NULL);
>>> +
>>> +  cl_target_option_save (&cur_target, &global_options);
>>> +
>>> +  target_node = ix86_valid_target_attribute_tree
>>> +                 (TREE_VALUE (TREE_VALUE (attrs)));
>>> +
>>> +  gcc_assert (target_node);
>>> +  new_target = TREE_TARGET_OPTION (target_node);
>>> +  gcc_assert (new_target);
>>> +
>>> +  if (new_target->arch_specified && new_target->arch > 0)
>>> +    {
>>> +      switch (new_target->arch)
>>> +        {
>>> +       case 1:
>>> +       case 2:
>>> +       case 3:
>>> +       case 4:
>>> +       case 5:
>>> +       case 6:
>>> +       case 7:
>>> +       case 8:
>>> +       case 9:
>>> +       case 10:
>>> +       case 11:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL;
>>> +         break;
>>> +       case 12:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_CORE2;
>>> +         break;
>>> +       case 13:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_COREI7;
>>> +         break;
>>> +       case 14:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_ATOM;
>>> +         break;
>>> +       case 15:
>>> +       case 16:
>>> +       case 17:
>>> +       case 18:
>>> +       case 19:
>>> +       case 20:
>>> +       case 21:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMD;
>>> +         break;
>>> +       case 22:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM10H;
>>> +         break;
>>> +       case 23:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER1;
>>> +         break;
>>> +       case 24:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER2;
>>> +         break;
>>> +       case 25: /* What is btver1 ? */
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMD;
>>> +         break;
>>> +       }
>>> +    }
>>> +
>>> +  cl_target_option_restore (&global_options, &cur_target);
>>> +  if (builtin_code == IX86_BUILTIN_MAX)
>>> +      error_at (DECL_SOURCE_LOCATION (decl),
>>> +               "No dispatcher found for the versioning attributes");
>>> +
>>> +  return builtin_code;
>>> +}
>>> +
>>> +/* This is the target hook to generate the dispatch function for
>>> +   multi-versioned functions.  DISPATCH_DECL is the function which will
>>> +   contain the dispatch logic.  FNDECLS are the function choices for
>>> +   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
>>> +   in DISPATCH_DECL in which the dispatch code is generated.  */
>>> +
>>> +static int
>>> +ix86_dispatch_version (tree dispatch_decl,
>>> +                      void *fndecls_p,
>>> +                      basic_block *empty_bb)
>>> +{
>>> +  tree default_decl;
>>> +  gimple ifunc_cpu_init_stmt;
>>> +  gimple_seq gseq;
>>> +  tree old_current_function_decl;
>>> +  int ix;
>>> +  tree ele;
>>> +  VEC (tree, heap) *fndecls;
>>> +
>>> +  gcc_assert (dispatch_decl != NULL
>>> +             && fndecls_p != NULL
>>> +             && empty_bb != NULL);
>>> +
>>> +  /*fndecls_p is actually a vector.  */
>>> +  fndecls = (VEC (tree, heap) *)fndecls_p;
>>> +
>>> +  /* Atleast one more version other than the default.  */
>>> +  gcc_assert (VEC_length (tree, fndecls) >= 2);
>>> +
>>> +  /* The first version in the vector is the default decl.  */
>>> +  default_decl = VEC_index (tree, fndecls, 0);
>>> +
>>> +  old_current_function_decl = current_function_decl;
>>> +  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
>>> +  current_function_decl = dispatch_decl;
>>> +
>>> +  gseq = bb_seq (*empty_bb);
>>> +  ifunc_cpu_init_stmt = gimple_build_call_vec (
>>> +                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
>>> +  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
>>> +  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
>>> +  set_bb_seq (*empty_bb, gseq);
>>> +
>>> +  pop_cfun ();
>>> +  current_function_decl = old_current_function_decl;
>>> +
>>> +
>>> +  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
>>> +    {
>>> +      tree version_decl = ele;
>>> +      /* Get attribute string, parse it and find the right predicate decl.
>>> +         The predicate function could be a lengthy combination of many
>>> +        features, like arch-type and various isa-variants.  For now, only
>>> +        check the arch-type.  */
>>> +      tree predicate_decl = ix86_builtins [
>>> +                       get_builtin_code_for_version (version_decl)];
>>> +      *empty_bb = add_condition_to_bb (dispatch_decl, version_decl, *empty_bb,
>>> +                                      predicate_decl);
>>> +
>>> +    }
>>> +  /* dispatch default version at the end.  */
>>> +  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl, *empty_bb,
>>> +                                  NULL);
>>> +  return 0;
>>> +}
>>>
>>> @@ -38610,6 +39269,12 @@ ix86_autovectorize_vector_sizes (void)
>>>  #undef TARGET_BUILD_BUILTIN_VA_LIST
>>>  #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list
>>>
>>> +#undef TARGET_DISPATCH_VERSION
>>> +#define TARGET_DISPATCH_VERSION ix86_dispatch_version
>>> +
>>>  #undef TARGET_ENUM_VA_LIST_P
>>>  #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
>>>
>>> Index: testsuite/g++.dg/mv1.C
>>> ===================================================================
>>> --- testsuite/g++.dg/mv1.C      (revision 0)
>>> +++ testsuite/g++.dg/mv1.C      (revision 0)
>>> @@ -0,0 +1,23 @@
>>> +/* Simple test case to check if Multiversioning works.  */
>>> +/* { dg-do run } */
>>> +/* { dg-options "-O2" } */
>>> +
>>> +int foo ();
>>> +int foo () __attribute__ ((targetv("arch=corei7")));
>>> +
>>> +int main ()
>>> +{
>>> +  int (*p)() = &foo;
>>> +  return foo () + (*p)();
>>> +}
>>> +
>>> +int foo ()
>>> +{
>>> +  return 0;
>>> +}
>>> +
>>> +int __attribute__ ((targetv("arch=corei7")))
>>> +foo ()
>>> +{
>>> +  return 0;
>>> +}
>>>
>>>
>>> --
>>> This patch is available for review at http://codereview.appspot.com/5752064

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-03-07 14:05 ` Richard Guenther
  2012-03-07 19:08   ` Sriraman Tallam
  2012-03-08 21:00   ` Xinliang David Li
@ 2012-03-09 20:04   ` Sriraman Tallam
  2012-04-27  5:09     ` Sriraman Tallam
  2 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-03-09 20:04 UTC (permalink / raw)
  To: Richard Guenther; +Cc: reply, gcc-patches, David Li

Hi Richard,

  Here is a more detailed overview of the front-end description:

* Tracking decls that correspond to function versions of function
name, say "foo":

Wnen the front-end sees a decl for "foo" with "targetv" attributes, it
tags it as a function version. To prevent duplicate definition errors
with other versions of "foo", I change "decls_match" function in
cp/decl.c to return false when 2 decls have the same signature but
different targetv attributes. This will make all function versions of
"foo" to be added to the overload list of "foo".

To expand further, different targetv attributes is checked for by
sorting the arguments to targetv.

* Change the assembler names of the function versions.

The front-end, changes the assembler names of the function versions by
tagging the sorted list of args to "targetv" to the function name of
"foo". For example, the assembler name of "void foo () __attribute__
((targetv ("sse4")))" will become _Z3foov.sse4.

* Separately group all function versions of "foo" together, in multiversion.c:

File multiversion.c maintains a hashtab, decl_version_htab,  that maps
the  default function decl of "foo" to the list of all other versions
of this function "foo". This is meant to be used when creating the
dispatcher for this function.

* Overload resolution:

 Function "build_over_call" in cp/call.c sees a call to function
"foo", which is multi-versioned. The overload resolution happens in
function "joust" in "cp/call.c". Here, the call to "foo" has all
possible versions of "foo" as candidates. Currently, "joust" returns
the default version of "foo" as the winning candidate. But,
"build_over_call" realizes that this is a versioned function and
replaces the call-site of foo with a "ifunc" call for foo, by querying
a function in "multiversion.c" which builds the ifunc decl. After
this, all call-sites of "foo" contain the call to the ifunc.

Notice that, for  calls from a sse function to a versioned function
with an sse variant, I can modify "joust" to return the "sse" function
version rather than the default and not replace this call with an
ifunc. To do this, I must pass the target attributes of the callee to
"joust" and check if the target attributes also match any version.

* Creating the dispatcher:

The dispatcher is independently created in a new pass, called
"pass_dispatch_version", that runs immediately after cfg and cgraph is
created. The dispatcher looks at all possible versions and queries the
target to give it the CPU detection predicates it must use to dispatch
each version. Then, the dispatcher body is created and the ifunc is
mapped to use this dispatcher.

Notice that only the dispatcher creation is done after the front-end.
Everything else occurs in the front-end itself. I could have created
the dispatcher also in the front-end. I did not do so because I
thought keeping it as a separate pass made sense to easily add more
dispatch mechanisms. Like when IFUNC is not available, replace it with
 control-flow to make direct calls to the function versions. Also,
making the dispatcher after "cfg" is created was easy.

Thanks,
-Sri.


On Wed, Mar 7, 2012 at 6:05 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Mar 7, 2012 at 1:46 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> User directed Function Multiversioning (MV) via Function Overloading
>> ====================================================================
>>
>> This patch adds support for user directed function MV via function overloading.
>> For more detailed description:
>> http://gcc.gnu.org/ml/gcc/2012-03/msg00074.html
>>
>>
>> Here is an example program with function versions:
>>
>> int foo ();  /* Default version */
>> int foo () __attribute__ ((targetv("arch=corei7")));/*Specialized for corei7 */
>> int foo () __attribute__ ((targetv("arch=core2")));/*Specialized for core2 */
>>
>> int main ()
>> {
>>  int (*p)() = &foo;
>>  return foo () + (*p)();
>> }
>>
>> int foo ()
>> {
>>  return 0;
>> }
>>
>> int __attribute__ ((targetv("arch=corei7")))
>> foo ()
>> {
>>  return 0;
>> }
>>
>> int __attribute__ ((targetv("arch=core2")))
>> foo ()
>> {
>>  return 0;
>> }
>>
>> The above example has foo defined 3 times, but all 3 definitions of foo are
>> different versions of the same function. The call to foo in main, directly and
>> via a pointer, are calls to the multi-versioned function foo which is dispatched
>> to the right foo at run-time.
>>
>> Function versions must have the same signature but must differ in the specifier
>> string provided to a new attribute called "targetv", which is nothing but the
>> target attribute with an extra specification to indicate a version. Any number
>> of versions can be created using the targetv attribute but it is mandatory to
>> have one function without the attribute, which is treated as the default
>> version.
>>
>> The dispatching is done using the IFUNC mechanism to keep the dispatch overhead
>> low. The compiler creates a dispatcher function which checks the CPU type and
>> calls the right version of foo. The dispatching code checks for the platform
>> type and calls the first version that matches. The default function is called if
>> no specialized version is appropriate for execution.
>>
>> The pointer to foo is made to be the address of the dispatcher function, so that
>> it is unique and calls made via the pointer also work correctly. The assembler
>> names of the various versions of foo is made different, by tagging
>> the specifier strings, to keep them unique.  A specific version can be called
>> directly by creating an alias to its assembler name. For instance, to call the
>> corei7 version directly, make an alias :
>> int foo_corei7 () __attribute__((alias ("_Z3foov.arch_corei7")));
>> and then call foo_corei7.
>>
>> Note that using IFUNC  blocks inlining of versioned functions. I had implemented
>> an optimization earlier to do hot path cloning to allow versioned functions to
>> be inlined. Please see : http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02285.html
>> In the next iteration, I plan to merge these two. With that, hot code paths with
>> versioned functions will be cloned so that versioned functions can be inlined.
>
> Note that inlining of functions with the target attribute is limited as well,
> but your issue is that of the indirect dispatch as ...
>
> You don't give an overview of the frontend implementation.  Thus I have
> extracted the following
>
>  - the FE does not really know about the "overloading", nor can it directly
>   resolve calls from a "sse" function to another "sse" function without going
>   through the 2nd IFUNC
>
>  - cgraph also does not know about the "overloading", so it cannot do such
>   "devirtualization" either
>
> you seem to have implemented something inbetween a pure frontend
> solution and a proper middle-end solution.  For optimization and eventually
> automatically selecting functions for cloning (like, callees of a manual "sse"
> versioned function should be cloned?) it would be nice if the cgraph would
> know about the different versions and their relationships (and the dispatcher).
> Especially the cgraph code should know the functions are semantically
> equivalent (I suppose we should require that).  The IFUNC should be
> generated by cgraph / target code, similar to how we generate C++ thunks.
>
> Honza, any suggestions on how the FE side of such cgraph infrastructure
> should look like and how we should encode the target bits?
>
> Thanks,
> Richard.
>
>>        * doc/tm.texi.in: Add description for TARGET_DISPATCH_VERSION.
>>        * doc/tm.texi: Regenerate.
>>        * c-family/c-common.c (handle_targetv_attribute): New function.
>>        * target.def (dispatch_version): New target hook.
>>        * tree.h (DECL_FUNCTION_VERSIONED): New macro.
>>        (tree_function_decl): New bit-field versioned_function.
>>        * tree-pass.h (pass_dispatch_versions): New pass.
>>        * multiversion.c: New file.
>>        * multiversion.h: New file.
>>        * cgraphunit.c: Include multiversion.h
>>        (cgraph_finalize_function): Change assembler names of versioned
>>        functions.
>>        * cp/class.c: Include multiversion.h
>>        (add_method): aggregate function versions. Change assembler names of
>>        versioned functions.
>>        (resolve_address_of_overloaded_function): Match address of function
>>        version with default function.  Return address of ifunc dispatcher
>>        for address of versioned functions.
>>        * cp/decl.c (decls_match): Make decls unmatched for versioned
>>        functions.
>>        (duplicate_decls): Remove ambiguity for versioned functions. Notify
>>        of deleted function version decls.
>>        (start_decl): Change assembler name of versioned functions.
>>        (start_function): Change assembler name of versioned functions.
>>        (cxx_comdat_group): Make comdat group of versioned functions be the
>>        same.
>>        * cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
>>        functions that are also marked inline.
>>        * cp/decl2.c: Include multiversion.h
>>        (check_classfn): Check attributes of versioned functions for match.
>>        * cp/call.c: Include multiversion.h
>>        (build_over_call): Make calls to multiversioned functions to call the
>>        dispatcher.
>>        (joust): For calls to multi-versioned functions, make the default
>>        function win.
>>        * timevar.def (TV_MULTIVERSION_DISPATCH): New time var.
>>        * varasm.c (finish_aliases_1): Check if the alias points to a function
>>        with a body before giving an error.
>>        * Makefile.in: Add multiversion.o
>>        * passes.c: Add pass_dispatch_versions to the pass list.
>>        * config/i386/i386.c (add_condition_to_bb): New function.
>>        (get_builtin_code_for_version): New function.
>>        (ix86_dispatch_version): New function.
>>        (TARGET_DISPATCH_VERSION): New macro.
>>        * testsuite/g++.dg/mv1.C: New test.
>>
>> Index: doc/tm.texi
>> ===================================================================
>> --- doc/tm.texi (revision 184971)
>> +++ doc/tm.texi (working copy)
>> @@ -10995,6 +10995,14 @@ The result is another tree containing a simplified
>>  call's result.  If @var{ignore} is true the value will be ignored.
>>  @end deftypefn
>>
>> +@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb})
>> +For multi-versioned function, this hook sets up the dispatcher.
>> +@var{dispatch_decl} is the function that will be used to dispatch the
>> +version. @var{fndecls} are the function choices for dispatch.
>> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the
>> +code to do the dispatch will be added.
>> +@end deftypefn
>> +
>>  @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
>>
>>  Take an instruction in @var{insn} and return NULL if it is valid within a
>> Index: doc/tm.texi.in
>> ===================================================================
>> --- doc/tm.texi.in      (revision 184971)
>> +++ doc/tm.texi.in      (working copy)
>> @@ -10873,6 +10873,14 @@ The result is another tree containing a simplified
>>  call's result.  If @var{ignore} is true the value will be ignored.
>>  @end deftypefn
>>
>> +@hook TARGET_DISPATCH_VERSION
>> +For multi-versioned function, this hook sets up the dispatcher.
>> +@var{dispatch_decl} is the function that will be used to dispatch the
>> +version. @var{fndecls} are the function choices for dispatch.
>> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the
>> +code to do the dispatch will be added.
>> +@end deftypefn
>> +
>>  @hook TARGET_INVALID_WITHIN_DOLOOP
>>
>>  Take an instruction in @var{insn} and return NULL if it is valid within a
>> Index: c-family/c-common.c
>> ===================================================================
>> --- c-family/c-common.c (revision 184971)
>> +++ c-family/c-common.c (working copy)
>> @@ -315,6 +315,7 @@ static tree check_case_value (tree);
>>  static bool check_case_bounds (tree, tree, tree *, tree *);
>>
>>  static tree handle_packed_attribute (tree *, tree, tree, int, bool *);
>> +static tree handle_targetv_attribute (tree *, tree, tree, int, bool *);
>>  static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *);
>>  static tree handle_common_attribute (tree *, tree, tree, int, bool *);
>>  static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *);
>> @@ -604,6 +605,8 @@ const struct attribute_spec c_common_attribute_tab
>>  {
>>   /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
>>        affects_type_identity } */
>> +  { "targetv",               1, -1, true, false, false,
>> +                             handle_targetv_attribute, false },
>>   { "packed",                 0, 0, false, false, false,
>>                              handle_packed_attribute , false},
>>   { "nocommon",               0, 0, true,  false, false,
>> @@ -5869,6 +5872,54 @@ handle_packed_attribute (tree *node, tree name, tr
>>   return NULL_TREE;
>>  }
>>
>> +/* The targetv attribue is used to specify a function version
>> +   targeted to specific platform types.  The "targetv" attributes
>> +   have to be valid "target" attributes.  NODE should always point
>> +   to a FUNCTION_DECL.  ARGS contain the arguments to "targetv"
>> +   which should be valid arguments to attribute "target" too.
>> +   Check handle_target_attribute for FLAGS and NO_ADD_ATTRS.  */
>> +
>> +static tree
>> +handle_targetv_attribute (tree *node, tree name,
>> +                         tree args,
>> +                         int flags,
>> +                         bool *no_add_attrs)
>> +{
>> +  const char *attr_str = NULL;
>> +  gcc_assert (TREE_CODE (*node) == FUNCTION_DECL);
>> +  gcc_assert (args != NULL);
>> +
>> +  /* This is a function version.  */
>> +  DECL_FUNCTION_VERSIONED (*node) = 1;
>> +
>> +  attr_str = TREE_STRING_POINTER (TREE_VALUE (args));
>> +
>> +  /* Check if multiple sets of target attributes are there.  This
>> +     is not supported now.   In future, this will be supported by
>> +     cloning this function for each set.  */
>> +  if (TREE_CHAIN (args) != NULL)
>> +    warning (OPT_Wattributes, "%qE attribute has multiple sets which "
>> +            "is not supported", name);
>> +
>> +  if (attr_str == NULL
>> +      || strstr (attr_str, "arch=") == NULL)
>> +    error_at (DECL_SOURCE_LOCATION (*node),
>> +             "Versioning supported only on \"arch=\" for now");
>> +
>> +  /* targetv attributes must translate into target attributes.  */
>> +  handle_target_attribute (node, get_identifier ("target"), args, flags,
>> +                          no_add_attrs);
>> +
>> +  if (*no_add_attrs)
>> +    warning (OPT_Wattributes, "%qE attribute has no effect", name);
>> +
>> +  /* This is necessary to keep the attribute tagged to the decl
>> +     all the time.  */
>> +  *no_add_attrs = false;
>> +
>> +  return NULL_TREE;
>> +}
>> +
>>  /* Handle a "nocommon" attribute; arguments as in
>>    struct attribute_spec.handler.  */
>>
>> Index: target.def
>> ===================================================================
>> --- target.def  (revision 184971)
>> +++ target.def  (working copy)
>> @@ -1249,6 +1249,15 @@ DEFHOOK
>>  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
>>  hook_tree_tree_int_treep_bool_null)
>>
>> +/* Target hook to generate the dispatching code for calls to multi-versioned
>> +   functions.  DISPATCH_DECL is the function that will have the dispatching
>> +   logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the
>> +   basic bloc in DISPATCH_DECL which will contain the code.  */
>> +DEFHOOK
>> +(dispatch_version,
>> + "",
>> + int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL)
>> +
>>  /* Returns a code for a target-specific builtin that implements
>>    reciprocal of the function, or NULL_TREE if not available.  */
>>  DEFHOOK
>> Index: tree.h
>> ===================================================================
>> --- tree.h      (revision 184971)
>> +++ tree.h      (working copy)
>> @@ -3532,6 +3532,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
>>  #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
>>    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
>>
>> +/* In FUNCTION_DECL, this is set if this function has other versions generated
>> +   using "targetv" attributes.  The default version is the one which does not
>> +   have any "targetv" attribute set. */
>> +#define DECL_FUNCTION_VERSIONED(NODE)\
>> +   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
>> +
>>  /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
>>    arguments/result/saved_tree fields by front ends.   It was either inherit
>>    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
>> @@ -3576,8 +3582,8 @@ struct GTY(()) tree_function_decl {
>>   unsigned looping_const_or_pure_flag : 1;
>>   unsigned has_debug_args_flag : 1;
>>   unsigned tm_clone_flag : 1;
>> -
>> -  /* 1 bit left */
>> +  unsigned versioned_function : 1;
>> +  /* No bits left.  */
>>  };
>>
>>  /* The source language of the translation-unit.  */
>> Index: tree-pass.h
>> ===================================================================
>> --- tree-pass.h (revision 184971)
>> +++ tree-pass.h (working copy)
>> @@ -455,6 +455,7 @@ extern struct gimple_opt_pass pass_tm_memopt;
>>  extern struct gimple_opt_pass pass_tm_edges;
>>  extern struct gimple_opt_pass pass_split_functions;
>>  extern struct gimple_opt_pass pass_feedback_split_functions;
>> +extern struct gimple_opt_pass pass_dispatch_versions;
>>
>>  /* IPA Passes */
>>  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
>> Index: multiversion.c
>> ===================================================================
>> --- multiversion.c      (revision 0)
>> +++ multiversion.c      (revision 0)
>> @@ -0,0 +1,798 @@
>> +/* Function Multiversioning.
>> +   Copyright (C) 2012 Free Software Foundation, Inc.
>> +   Contributed by Sriraman Tallam (tmsriram@google.com)
>> +
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redistribute it and/or modify it under
>> +the terms of the GNU General Public License as published by the Free
>> +Software Foundation; either version 3, or (at your option) any later
>> +version.
>> +
>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>> +for more details.
>> +
>> +You should have received a copy of the GNU General Public License
>> +along with GCC; see the file COPYING3.  If not see
>> +<http://www.gnu.org/licenses/>. */
>> +
>> +/* Holds the state for multi-versioned functions here. The front-end
>> +   updates the state as and when function versions are encountered.
>> +   This is then used to generate the dispatch code.  Also, the
>> +   optimization passes to clone hot paths involving versioned functions
>> +   will be done here.
>> +
>> +   Function versions are created by using the same function signature but
>> +   also tagging attribute "targetv" to specify the platform type for which
>> +   the version must be executed.  Here is an example:
>> +
>> +   int foo ()
>> +   {
>> +     printf ("Execute as default");
>> +     return 0;
>> +   }
>> +
>> +   int  __attribute__ ((targetv ("arch=corei7")))
>> +   foo ()
>> +   {
>> +     printf ("Execute for corei7");
>> +     return 0;
>> +   }
>> +
>> +   int main ()
>> +   {
>> +     return foo ();
>> +   }
>> +
>> +   The call to foo in main is replaced with a call to an IFUNC function that
>> +   contains the dispatch code to call the correct function version at
>> +   run-time.  */
>> +
>> +
>> +#include "config.h"
>> +#include "system.h"
>> +#include "coretypes.h"
>> +#include "tm.h"
>> +#include "tree.h"
>> +#include "tree-inline.h"
>> +#include "langhooks.h"
>> +#include "flags.h"
>> +#include "cgraph.h"
>> +#include "diagnostic.h"
>> +#include "toplev.h"
>> +#include "timevar.h"
>> +#include "params.h"
>> +#include "fibheap.h"
>> +#include "intl.h"
>> +#include "tree-pass.h"
>> +#include "hashtab.h"
>> +#include "coverage.h"
>> +#include "ggc.h"
>> +#include "tree-flow.h"
>> +#include "rtl.h"
>> +#include "ipa-prop.h"
>> +#include "basic-block.h"
>> +#include "toplev.h"
>> +#include "dbgcnt.h"
>> +#include "tree-dump.h"
>> +#include "output.h"
>> +#include "vecprim.h"
>> +#include "gimple-pretty-print.h"
>> +#include "ipa-inline.h"
>> +#include "target.h"
>> +#include "multiversion.h"
>> +
>> +typedef void * void_p;
>> +
>> +DEF_VEC_P (void_p);
>> +DEF_VEC_ALLOC_P (void_p, heap);
>> +
>> +/* Each function decl that is a function version gets an instance of this
>> +   structure.   Since this is called by the front-end, decl merging can
>> +   happen, where a decl created for a new declaration is merged with
>> +   the old. In this case, the new decl is deleted and the IS_DELETED
>> +   field is set for the struct instance corresponding to the new decl.
>> +   IFUNC_DECL is the decl of the ifunc function for default decls.
>> +   IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS
>> +   is a vector containing the list of function versions  that are
>> +   the candidates for dispatch.  */
>> +
>> +typedef struct version_function_d {
>> +  tree decl;
>> +  tree ifunc_decl;
>> +  tree ifunc_resolver_decl;
>> +  VEC (void_p, heap) *versions;
>> +  bool is_deleted;
>> +} version_function;
>> +
>> +/* Hashmap has an entry for every function decl that has other function
>> +   versions.  For function decls that are the default, it also stores the
>> +   list of all the other function versions.  Each entry is a structure
>> +   of type version_function_d.  */
>> +static htab_t decl_version_htab = NULL;
>> +
>> +/* Hashtable helpers for decl_version_htab. */
>> +
>> +static hashval_t
>> +decl_version_htab_hash_descriptor (const void *p)
>> +{
>> +  const version_function *t = (const version_function *) p;
>> +  return htab_hash_pointer (t->decl);
>> +}
>> +
>> +/* Hashtable helper for decl_version_htab. */
>> +
>> +static int
>> +decl_version_htab_eq_descriptor (const void *p1, const void *p2)
>> +{
>> +  const version_function *t1 = (const version_function *) p1;
>> +  return htab_eq_pointer ((const void_p) t1->decl, p2);
>> +}
>> +
>> +/* Create the decl_version_htab.  */
>> +static void
>> +create_decl_version_htab (void)
>> +{
>> +  if (decl_version_htab == NULL)
>> +    decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor,
>> +                                    decl_version_htab_eq_descriptor, NULL);
>> +}
>> +
>> +/* Creates an instance of version_function for decl DECL.  */
>> +
>> +static version_function*
>> +new_version_function (const tree decl)
>> +{
>> +  version_function *v;
>> +  v = (version_function *)xmalloc(sizeof (version_function));
>> +  v->decl = decl;
>> +  v->ifunc_decl = NULL;
>> +  v->ifunc_resolver_decl = NULL;
>> +  v->versions = NULL;
>> +  v->is_deleted = false;
>> +  return v;
>> +}
>> +
>> +/* Comparator function to be used in qsort routine to sort attribute
>> +   specification strings to "targetv".  */
>> +
>> +static int
>> +attr_strcmp (const void *v1, const void *v2)
>> +{
>> +  const char *c1 = *(char *const*)v1;
>> +  const char *c2 = *(char *const*)v2;
>> +  return strcmp (c1, c2);
>> +}
>> +
>> +/* STR is the argument to targetv attribute.  This function tokenizes
>> +   the comma separated arguments, sorts them and returns a string which
>> +   is a unique identifier for the comma separated arguments.  */
>> +
>> +static char *
>> +sorted_attr_string (const char *str)
>> +{
>> +  char **args = NULL;
>> +  char *attr_str, *ret_str;
>> +  char *attr = NULL;
>> +  unsigned int argnum = 1;
>> +  unsigned int i;
>> +
>> +  for (i = 0; i < strlen (str); i++)
>> +    if (str[i] == ',')
>> +      argnum++;
>> +
>> +  attr_str = (char *)xmalloc (strlen (str) + 1);
>> +  strcpy (attr_str, str);
>> +
>> +  for (i = 0; i < strlen (attr_str); i++)
>> +    if (attr_str[i] == '=')
>> +      attr_str[i] = '_';
>> +
>> +  if (argnum == 1)
>> +    return attr_str;
>> +
>> +  args = (char **)xmalloc (argnum * sizeof (char *));
>> +
>> +  i = 0;
>> +  attr = strtok (attr_str, ",");
>> +  while (attr != NULL)
>> +    {
>> +      args[i] = attr;
>> +      i++;
>> +      attr = strtok (NULL, ",");
>> +    }
>> +
>> +  qsort (args, argnum, sizeof (char*), attr_strcmp);
>> +
>> +  ret_str = (char *)xmalloc (strlen (str) + 1);
>> +  strcpy (ret_str, args[0]);
>> +  for (i = 1; i < argnum; i++)
>> +    {
>> +      strcat (ret_str, "_");
>> +      strcat (ret_str, args[i]);
>> +    }
>> +
>> +  free (args);
>> +  free (attr_str);
>> +  return ret_str;
>> +}
>> +
>> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv"
>> +   or if the "targetv" attribute strings of DECL1 and DECL2 dont match.  */
>> +
>> +bool
>> +has_different_version_attributes (const tree decl1, const tree decl2)
>> +{
>> +  tree attr1, attr2;
>> +  char *c1, *c2;
>> +  bool ret = false;
>> +
>> +  if (TREE_CODE (decl1) != FUNCTION_DECL
>> +      || TREE_CODE (decl2) != FUNCTION_DECL)
>> +    return false;
>> +
>> +  attr1 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl1));
>> +  attr2 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl2));
>> +
>> +  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
>> +    return false;
>> +
>> +  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
>> +      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
>> +    return true;
>> +
>> +  c1 = sorted_attr_string (
>> +       TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1))));
>> +  c2 = sorted_attr_string (
>> +       TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2))));
>> +
>> +  if (strcmp (c1, c2) != 0)
>> +     ret = true;
>> +
>> +  free (c1);
>> +  free (c2);
>> +
>> +  return ret;
>> +}
>> +
>> +/* If this decl corresponds to a function and has "targetv" attribute,
>> +   append the attribute string to its assembler name.  */
>> +
>> +void
>> +version_assembler_name (const tree decl)
>> +{
>> +  tree version_attr;
>> +  const char *orig_name, *version_string, *attr_str;
>> +  char *assembler_name;
>> +  tree assembler_name_tree;
>> +
>> +  if (TREE_CODE (decl) != FUNCTION_DECL
>> +      || DECL_ASSEMBLER_NAME_SET_P (decl)
>> +      || !DECL_FUNCTION_VERSIONED (decl))
>> +    return;
>> +
>> +  if (DECL_DECLARED_INLINE_P (decl)
>> +      &&lookup_attribute ("gnu_inline",
>> +                         DECL_ATTRIBUTES (decl)))
>> +    error_at (DECL_SOURCE_LOCATION (decl),
>> +             "Function versions cannot be marked as gnu_inline,"
>> +             " bodies have to be generated\n");
>> +
>> +  if (DECL_VIRTUAL_P (decl)
>> +      || DECL_VINDEX (decl))
>> +    error_at (DECL_SOURCE_LOCATION (decl),
>> +             "Virtual function versioning not supported\n");
>> +
>> +  version_attr = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl));
>> +  /* targetv attribute string is NULL for default functions.  */
>> +  if (version_attr == NULL_TREE)
>> +    return;
>> +
>> +  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
>> +  version_string
>> +    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
>> +
>> +  attr_str = sorted_attr_string (version_string);
>> +  assembler_name = (char *) xmalloc (strlen (orig_name)
>> +                                    + strlen (attr_str) + 2);
>> +
>> +  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
>> +  if (dump_file)
>> +    fprintf (dump_file, "Assembler name set to %s for function version %s\n",
>> +            assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
>> +  assembler_name_tree = get_identifier (assembler_name);
>> +  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree);
>> +}
>> +
>> +/* Returns true if decl is multi-versioned and DECL is the default function,
>> +   that is it is not tagged with "targetv" attribute.  */
>> +
>> +bool
>> +is_default_function (const tree decl)
>> +{
>> +  return (TREE_CODE (decl) == FUNCTION_DECL
>> +         && DECL_FUNCTION_VERSIONED (decl)
>> +         && (lookup_attribute ("targetv", DECL_ATTRIBUTES (decl))
>> +             == NULL_TREE));
>> +}
>> +
>> +/* For function decl DECL, find the version_function struct in the
>> +   decl_version_htab.  */
>> +
>> +static version_function *
>> +find_function_version (const tree decl)
>> +{
>> +  void *slot;
>> +
>> +  if (!DECL_FUNCTION_VERSIONED (decl))
>> +    return NULL;
>> +
>> +  if (!decl_version_htab)
>> +    return NULL;
>> +
>> +  slot = htab_find_with_hash (decl_version_htab, decl,
>> +                              htab_hash_pointer (decl));
>> +
>> +  if (slot != NULL)
>> +    return (version_function *)slot;
>> +
>> +  return NULL;
>> +}
>> +
>> +/* Record DECL as a function version by creating a version_function struct
>> +   for it and storing it in the hashtable.  */
>> +
>> +static version_function *
>> +add_function_version (const tree decl)
>> +{
>> +  void **slot;
>> +  version_function *v;
>> +
>> +  if (!DECL_FUNCTION_VERSIONED (decl))
>> +    return NULL;
>> +
>> +  create_decl_version_htab ();
>> +
>> +  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl,
>> +                                   htab_hash_pointer ((const void_p)decl),
>> +                                  INSERT);
>> +
>> +  if (*slot != NULL)
>> +    return (version_function *)*slot;
>> +
>> +  v = new_version_function (decl);
>> +  *slot = v;
>> +
>> +  return v;
>> +}
>> +
>> +/* Push V into VEC only if it is not already present.  */
>> +
>> +static void
>> +push_function_version (version_function *v, VEC (void_p, heap) *vec)
>> +{
>> +  int ix;
>> +  void_p ele;
>> +  for (ix = 0; VEC_iterate (void_p, vec, ix, ele); ++ix)
>> +    {
>> +      if (ele == (void_p)v)
>> +        return;
>> +    }
>> +
>> +  VEC_safe_push (void_p, heap, vec, (void*)v);
>> +}
>> +
>> +/* Mark DECL as deleted.  This is called by the front-end when a duplicate
>> +   decl is merged with the original decl and the duplicate decl is deleted.
>> +   This function marks the duplicate_decl as invalid.  Called by
>> +   duplicate_decls in cp/decl.c.  */
>> +
>> +void
>> +mark_delete_decl_version (const tree decl)
>> +{
>> +  version_function *decl_v;
>> +
>> +  decl_v = find_function_version (decl);
>> +
>> +  if (decl_v == NULL)
>> +    return;
>> +
>> +  decl_v->is_deleted = true;
>> +
>> +  if (is_default_function (decl)
>> +      && decl_v->versions != NULL)
>> +    {
>> +      VEC_truncate (void_p, decl_v->versions, 0);
>> +      VEC_free (void_p, heap, decl_v->versions);
>> +    }
>> +}
>> +
>> +/* Mark DECL1 and DECL2 to be function versions in the same group.  One
>> +   of DECL1 and DECL2 must be the default, otherwise this function does
>> +   nothing.  This function aggregates the versions.  */
>> +
>> +int
>> +group_function_versions (const tree decl1, const tree decl2)
>> +{
>> +  tree default_decl, version_decl;
>> +  version_function *default_v, *version_v;
>> +
>> +  gcc_assert (DECL_FUNCTION_VERSIONED (decl1)
>> +             && DECL_FUNCTION_VERSIONED (decl2));
>> +
>> +  /* The version decls are added only to the default decl.  */
>> +  if (!is_default_function (decl1)
>> +      && !is_default_function (decl2))
>> +    return 0;
>> +
>> +  /* This can happen with duplicate declarations.  Just ignore.  */
>> +  if (is_default_function (decl1)
>> +      && is_default_function (decl2))
>> +    return 0;
>> +
>> +  default_decl = (is_default_function (decl1)) ? decl1 : decl2;
>> +  version_decl = (default_decl == decl1) ? decl2 : decl1;
>> +
>> +  gcc_assert (default_decl != version_decl);
>> +  create_decl_version_htab ();
>> +
>> +  /* If the version function is found, it has been added.  */
>> +  if (find_function_version (version_decl))
>> +    return 0;
>> +
>> +  default_v = add_function_version (default_decl);
>> +  version_v = add_function_version (version_decl);
>> +
>> +  if (default_v->versions == NULL)
>> +    default_v->versions = VEC_alloc (void_p, heap, 1);
>> +
>> +  push_function_version (version_v, default_v->versions);
>> +  return 0;
>> +}
>> +
>> +/* Makes a function attribute of the form NAME(ARG_NAME) and chains
>> +   it to CHAIN.  */
>> +
>> +static tree
>> +make_attribute (const char *name, const char *arg_name, tree chain)
>> +{
>> +  tree attr_name;
>> +  tree attr_arg_name;
>> +  tree attr_args;
>> +  tree attr;
>> +
>> +  attr_name = get_identifier (name);
>> +  attr_arg_name = build_string (strlen (arg_name), arg_name);
>> +  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
>> +  attr = tree_cons (attr_name, attr_args, chain);
>> +  return attr;
>> +}
>> +
>> +/* Return a new name by appending SUFFIX to the DECL name.  If
>> +   make_unique is true, append the full path name.  */
>> +
>> +static char *
>> +make_name (tree decl, const char *suffix, bool make_unique)
>> +{
>> +  char *global_var_name;
>> +  int name_len;
>> +  const char *name;
>> +  const char *unique_name = NULL;
>> +
>> +  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
>> +
>> +  /* Get a unique name that can be used globally without any chances
>> +     of collision at link time.  */
>> +  if (make_unique)
>> +    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
>> +
>> +  name_len = strlen (name) + strlen (suffix) + 2;
>> +
>> +  if (make_unique)
>> +    name_len += strlen (unique_name) + 1;
>> +  global_var_name = (char *) xmalloc (name_len);
>> +
>> +  /* Use '.' to concatenate names as it is demangler friendly.  */
>> +  if (make_unique)
>> +      snprintf (global_var_name, name_len, "%s.%s.%s", name,
>> +               unique_name, suffix);
>> +  else
>> +      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
>> +
>> +  return global_var_name;
>> +}
>> +
>> +/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch
>> +   the versions of multi-versioned function DEFAULT_DECL.  Create and
>> +   empty basic block in the resolver and store the pointer in
>> +   EMPTY_BB.  Return the decl of the resolver function.  */
>> +
>> +static tree
>> +make_ifunc_resolver_func (const tree default_decl,
>> +                         const tree ifunc_decl,
>> +                         basic_block *empty_bb)
>> +{
>> +  char *resolver_name;
>> +  tree decl, type, decl_name, t;
>> +  basic_block new_bb;
>> +  tree old_current_function_decl;
>> +  bool make_unique = false;
>> +
>> +  /* IFUNC's have to be globally visible.  So, if the default_decl is
>> +     not, then the name of the IFUNC should be made unique.  */
>> +  if (TREE_PUBLIC (default_decl) == 0)
>> +    make_unique = true;
>> +
>> +  /* Append the filename to the resolver function if the versions are
>> +     not externally visible.  This is because the resolver function has
>> +     to be externally visible for the loader to find it.  So, appending
>> +     the filename will prevent conflicts with a resolver function from
>> +     another module which is based on the same version name.  */
>> +  resolver_name = make_name (default_decl, "resolver", make_unique);
>> +
>> +  /* The resolver function should return a (void *). */
>> +  type = build_function_type_list (ptr_type_node, NULL_TREE);
>> +
>> +  decl = build_fn_decl (resolver_name, type);
>> +  decl_name = get_identifier (resolver_name);
>> +  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
>> +
>> +  DECL_NAME (decl) = decl_name;
>> +  TREE_USED (decl) = TREE_USED (default_decl);
>> +  DECL_ARTIFICIAL (decl) = 1;
>> +  DECL_IGNORED_P (decl) = 0;
>> +  /* IFUNC resolvers have to be externally visible.  */
>> +  TREE_PUBLIC (decl) = 1;
>> +  DECL_UNINLINABLE (decl) = 1;
>> +
>> +  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl);
>> +  DECL_EXTERNAL (ifunc_decl) = 0;
>> +
>> +  DECL_CONTEXT (decl) = NULL_TREE;
>> +  DECL_INITIAL (decl) = make_node (BLOCK);
>> +  DECL_STATIC_CONSTRUCTOR (decl) = 0;
>> +  TREE_READONLY (decl) = 0;
>> +  DECL_PURE_P (decl) = 0;
>> +  DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
>> +  if (DECL_COMDAT_GROUP (default_decl))
>> +    {
>> +      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
>> +    }
>> +  /* Build result decl and add to function_decl. */
>> +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
>> +  DECL_ARTIFICIAL (t) = 1;
>> +  DECL_IGNORED_P (t) = 1;
>> +  DECL_RESULT (decl) = t;
>> +
>> +  gimplify_function_tree (decl);
>> +  old_current_function_decl = current_function_decl;
>> +  push_cfun (DECL_STRUCT_FUNCTION (decl));
>> +  current_function_decl = decl;
>> +  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
>> +  cfun->curr_properties |=
>> +    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
>> +     PROP_ssa);
>> +  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
>> +  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
>> +  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
>> +  *empty_bb = new_bb;
>> +
>> +  cgraph_add_new_function (decl, true);
>> +  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
>> +  cgraph_analyze_function (cgraph_get_create_node (decl));
>> +  cgraph_mark_needed_node (cgraph_get_create_node (decl));
>> +
>> +  if (DECL_COMDAT_GROUP (default_decl))
>> +    {
>> +      gcc_assert (cgraph_get_node (default_decl));
>> +      cgraph_add_to_same_comdat_group (cgraph_get_node (decl),
>> +                                      cgraph_get_node (default_decl));
>> +    }
>> +
>> +  pop_cfun ();
>> +  current_function_decl = old_current_function_decl;
>> +
>> +  gcc_assert (ifunc_decl != NULL);
>> +  DECL_ATTRIBUTES (ifunc_decl)
>> +    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl));
>> +  assemble_alias (ifunc_decl, get_identifier (resolver_name));
>> +  return decl;
>> +}
>> +
>> +/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to
>> +   DECL function will be replaced with calls to the ifunc.   Return the decl
>> +   of the ifunc created.  */
>> +
>> +static tree
>> +make_ifunc_func (const tree decl)
>> +{
>> +  tree ifunc_decl;
>> +  char *ifunc_name, *resolver_name;
>> +  tree fn_type, ifunc_type;
>> +  bool make_unique = false;
>> +
>> +  if (TREE_PUBLIC (decl) == 0)
>> +    make_unique = true;
>> +
>> +  ifunc_name = make_name (decl, "ifunc", make_unique);
>> +  resolver_name = make_name (decl, "resolver", make_unique);
>> +  gcc_assert (resolver_name);
>> +
>> +  fn_type = TREE_TYPE (decl);
>> +  ifunc_type = build_function_type (TREE_TYPE (fn_type),
>> +                                   TYPE_ARG_TYPES (fn_type));
>> +
>> +  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type);
>> +  TREE_USED (ifunc_decl) = 1;
>> +  DECL_CONTEXT (ifunc_decl) = NULL_TREE;
>> +  DECL_INITIAL (ifunc_decl) = error_mark_node;
>> +  DECL_ARTIFICIAL (ifunc_decl) = 1;
>> +  /* Mark this ifunc as external, the resolver will flip it again if
>> +     it gets generated.  */
>> +  DECL_EXTERNAL (ifunc_decl) = 1;
>> +  /* IFUNCs have to be externally visible.  */
>> +  TREE_PUBLIC (ifunc_decl) = 1;
>> +
>> +  return ifunc_decl;
>> +}
>> +
>> +/* For multi-versioned function decl, which should also be the default,
>> +   return the decl of the ifunc resolver, create it if it does not
>> +   exist.  */
>> +
>> +tree
>> +get_ifunc_for_version (const tree decl)
>> +{
>> +  version_function *decl_v;
>> +  int ix;
>> +  void_p ele;
>> +
>> +  /* DECL has to be the default version, otherwise it is missing and
>> +     that is not allowed.  */
>> +  if (!is_default_function (decl))
>> +    {
>> +      error_at (DECL_SOURCE_LOCATION (decl), "Default version not found");
>> +      return decl;
>> +    }
>> +
>> +  decl_v = find_function_version (decl);
>> +  gcc_assert (decl_v != NULL);
>> +  if (decl_v->ifunc_decl == NULL)
>> +    {
>> +      tree ifunc_decl;
>> +      ifunc_decl = make_ifunc_func (decl);
>> +      decl_v->ifunc_decl = ifunc_decl;
>> +    }
>> +
>> +  if (cgraph_get_node (decl))
>> +    cgraph_mark_needed_node (cgraph_get_node (decl));
>> +
>> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
>> +    {
>> +      version_function *v = (version_function *) ele;
>> +      gcc_assert (v->decl != NULL);
>> +      if (cgraph_get_node (v->decl))
>> +       cgraph_mark_needed_node (cgraph_get_node (v->decl));
>> +    }
>> +
>> +  return decl_v->ifunc_decl;
>> +}
>> +
>> +/* Generate the dispatching code to dispatch multi-versioned function
>> +   DECL.  Make a new function decl for dispatching and call the target
>> +   hook to process the "targetv" attributes and provide the code to
>> +   dispatch the right function at run-time.  */
>> +
>> +static tree
>> +make_ifunc_resolver_for_version (const tree decl)
>> +{
>> +  version_function *decl_v;
>> +  tree ifunc_resolver_decl, ifunc_decl;
>> +  basic_block empty_bb;
>> +  int ix;
>> +  void_p ele;
>> +  VEC (tree, heap) *fn_ver_vec = NULL;
>> +
>> +  gcc_assert (is_default_function (decl));
>> +
>> +  decl_v = find_function_version (decl);
>> +  gcc_assert (decl_v != NULL);
>> +
>> +  if (decl_v->ifunc_resolver_decl != NULL)
>> +    return decl_v->ifunc_resolver_decl;
>> +
>> +  ifunc_decl = decl_v->ifunc_decl;
>> +
>> +  if (ifunc_decl == NULL)
>> +    ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl);
>> +
>> +  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl,
>> +                                                 &empty_bb);
>> +
>> +  fn_ver_vec = VEC_alloc (tree, heap, 2);
>> +  VEC_safe_push (tree, heap, fn_ver_vec, decl);
>> +
>> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
>> +    {
>> +      version_function *v = (version_function *) ele;
>> +      gcc_assert (v->decl != NULL);
>> +      /* Check for virtual functions here again, as by this time it should
>> +        have been determined if this function needs a vtable index or
>> +        not.  This happens for methods in derived classes that override
>> +        virtual methods in base classes but are not explicitly marked as
>> +        virtual.  */
>> +      if (DECL_VINDEX (v->decl))
>> +        error_at (DECL_SOURCE_LOCATION (v->decl),
>> +                 "Virtual function versioning not supported\n");
>> +      if (!v->is_deleted)
>> +       VEC_safe_push (tree, heap, fn_ver_vec, v->decl);
>> +    }
>> +
>> +  gcc_assert (targetm.dispatch_version);
>> +  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb);
>> +  decl_v->ifunc_resolver_decl = ifunc_resolver_decl;
>> +
>> +  return ifunc_resolver_decl;
>> +}
>> +
>> +/* Main entry point to pass_dispatch_versions. For multi-versioned functions,
>> +   generate the dispatching code.  */
>> +
>> +static unsigned int
>> +do_dispatch_versions (void)
>> +{
>> +  /* A new pass for generating dispatch code for multi-versioned functions.
>> +     Other forms of dispatch can be added when ifunc support is not available
>> +     like just calling the function directly after checking for target type.
>> +     Currently, dispatching is done through IFUNC.  This pass will become
>> +     more meaningful when other dispatch mechanisms are added.  */
>> +
>> +  /* Cloning a function to produce more versions will happen here when the
>> +     user requests that via the targetv attribute. For example,
>> +     int foo () __attribute__ ((targetv(("arch=core2"), ("arch=corei7"))));
>> +     means that the user wants the same body of foo to be versioned for core2
>> +     and corei7.  In that case, this function will be cloned during this
>> +     pass.  */
>> +
>> +  if (DECL_FUNCTION_VERSIONED (current_function_decl)
>> +      && is_default_function (current_function_decl))
>> +    {
>> +      tree decl = make_ifunc_resolver_for_version (current_function_decl);
>> +      if (dump_file && decl)
>> +       dump_function_to_file (decl, dump_file, TDF_BLOCKS);
>> +    }
>> +  return 0;
>> +}
>> +
>> +static  bool
>> +gate_dispatch_versions (void)
>> +{
>> +  return true;
>> +}
>> +
>> +/* A pass to generate the dispatch code to execute the appropriate version
>> +   of a multi-versioned function at run-time.  */
>> +
>> +struct gimple_opt_pass pass_dispatch_versions =
>> +{
>> + {
>> +  GIMPLE_PASS,
>> +  "dispatch_multiversion_functions",    /* name */
>> +  gate_dispatch_versions,              /* gate */
>> +  do_dispatch_versions,                        /* execute */
>> +  NULL,                                        /* sub */
>> +  NULL,                                        /* next */
>> +  0,                                   /* static_pass_number */
>> +  TV_MULTIVERSION_DISPATCH,            /* tv_id */
>> +  PROP_cfg,                            /* properties_required */
>> +  PROP_cfg,                            /* properties_provided */
>> +  0,                                   /* properties_destroyed */
>> +  0,                                   /* todo_flags_start */
>> +  TODO_dump_func |                     /* todo_flags_finish */
>> +  TODO_cleanup_cfg | TODO_dump_cgraph
>> + }
>> +};
>> Index: cgraphunit.c
>> ===================================================================
>> --- cgraphunit.c        (revision 184971)
>> +++ cgraphunit.c        (working copy)
>> @@ -141,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "ipa-inline.h"
>>  #include "ipa-utils.h"
>>  #include "lto-streamer.h"
>> +#include "multiversion.h"
>>
>>  static void cgraph_expand_all_functions (void);
>>  static void cgraph_mark_functions_to_output (void);
>> @@ -343,6 +344,13 @@ cgraph_finalize_function (tree decl, bool nested)
>>       node->local.redefined_extern_inline = true;
>>     }
>>
>> +  /* If this is a function version and not the default, change the
>> +     assembler name of this function.  The DECL names of function
>> +     versions are the same, only the assembler names are made unique.
>> +     The assembler name is changed by appending the string from
>> +     the "targetv" attribute.  */
>> +  version_assembler_name (decl);
>> +
>>   notice_global_symbol (decl);
>>   node->local.finalized = true;
>>   node->lowered = DECL_STRUCT_FUNCTION (decl)->cfg != NULL;
>> Index: multiversion.h
>> ===================================================================
>> --- multiversion.h      (revision 0)
>> +++ multiversion.h      (revision 0)
>> @@ -0,0 +1,52 @@
>> +/* Function Multiversioning.
>> +   Copyright (C) 2012 Free Software Foundation, Inc.
>> +   Contributed by Sriraman Tallam (tmsriram@google.com)
>> +
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redistribute it and/or modify it under
>> +the terms of the GNU General Public License as published by the Free
>> +Software Foundation; either version 3, or (at your option) any later
>> +version.
>> +
>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>> +for more details.
>> +
>> +You should have received a copy of the GNU General Public License
>> +along with GCC; see the file COPYING3.  If not see
>> +<http://www.gnu.org/licenses/>. */
>> +
>> +/* This is the header file which provides the functions to keep track
>> +   of functions that are multi-versioned and to generate the dispatch
>> +   code to call the right version at run-time.  */
>> +
>> +#ifndef GCC_MULTIVERSION_H
>> +#define GCC_MULTIVERION_H
>> +
>> +#include "tree.h"
>> +
>> +/* Mark DECL1 and DECL2 as function versions.  */
>> +int group_function_versions (const tree decl1, const tree decl2);
>> +
>> +/* Mark DECL as deleted and no longer a version.  */
>> +void mark_delete_decl_version (const tree decl);
>> +
>> +/* Returns true if DECL is the default version to be executed if all
>> +   other versions are inappropriate at run-time.  */
>> +bool is_default_function (const tree decl);
>> +
>> +/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL
>> +   must be the default function in the multi-versioned group.  */
>> +tree get_ifunc_for_version (const tree decl);
>> +
>> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv"
>> +   or if the "targetv" attribute strings of  DECL1 and DECL2 dont match.  */
>> +bool has_different_version_attributes (const tree decl1, const tree decl2);
>> +
>> +/* If DECL is a function version and not the default version, the assembler
>> +   name of DECL is changed to include the attribute string to keep the
>> +   name unambiguous.  */
>> +void version_assembler_name (const tree decl);
>> +#endif
>> Index: cp/class.c
>> ===================================================================
>> --- cp/class.c  (revision 184971)
>> +++ cp/class.c  (working copy)
>> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree-dump.h"
>>  #include "splay-tree.h"
>>  #include "pointer-set.h"
>> +#include "multiversion.h"
>>
>>  /* The number of nested classes being processed.  If we are not in the
>>    scope of any class, this is zero.  */
>> @@ -1092,7 +1093,20 @@ add_method (tree type, tree method, tree using_dec
>>              || same_type_p (TREE_TYPE (fn_type),
>>                              TREE_TYPE (method_type))))
>>        {
>> -         if (using_decl)
>> +         /* For function versions, their parms and types match
>> +            but they are not duplicates.  Record function versions
>> +            as and when they are found.  */
>> +         if (TREE_CODE (fn) == FUNCTION_DECL
>> +             && TREE_CODE (method) == FUNCTION_DECL
>> +             && (DECL_FUNCTION_VERSIONED (fn)
>> +                 || DECL_FUNCTION_VERSIONED (method)))
>> +           {
>> +             DECL_FUNCTION_VERSIONED (fn) = 1;
>> +             DECL_FUNCTION_VERSIONED (method) = 1;
>> +             group_function_versions (fn, method);
>> +             continue;
>> +           }
>> +         else if (using_decl)
>>            {
>>              if (DECL_CONTEXT (fn) == type)
>>                /* Defer to the local function.  */
>> @@ -1150,6 +1164,13 @@ add_method (tree type, tree method, tree using_dec
>>   else
>>     /* Replace the current slot.  */
>>     VEC_replace (tree, method_vec, slot, overload);
>> +
>> +  /* Change the assembler name of method here if it has "targetv"
>> +     attributes.  Since all versions have the same mangled name,
>> +     their assembler name is changed by appending the string from
>> +     the "targetv" attribute. */
>> +  version_assembler_name (method);
>> +
>>   return true;
>>  }
>>
>> @@ -6890,8 +6911,11 @@ resolve_address_of_overloaded_function (tree targe
>>          if (DECL_ANTICIPATED (fn))
>>            continue;
>>
>> -         /* See if there's a match.  */
>> -         if (same_type_p (target_fn_type, static_fn_type (fn)))
>> +         /* See if there's a match.   For functions that are multi-versioned
>> +            match it to the default function.  */
>> +         if (same_type_p (target_fn_type, static_fn_type (fn))
>> +             && (!DECL_FUNCTION_VERSIONED (fn)
>> +                 || is_default_function (fn)))
>>            matches = tree_cons (fn, NULL_TREE, matches);
>>        }
>>     }
>> @@ -7053,6 +7077,21 @@ resolve_address_of_overloaded_function (tree targe
>>       perform_or_defer_access_check (access_path, fn, fn);
>>     }
>>
>> +  /* If a pointer to a function that is multi-versioned is requested, the
>> +     pointer to the dispatcher function is returned instead.  This works
>> +     well because indirectly calling the function will dispatch the right
>> +     function version at run-time. Also, the function address is kept
>> +     unique.  */
>> +  if (DECL_FUNCTION_VERSIONED (fn)
>> +      && is_default_function (fn))
>> +    {
>> +      tree ifunc_decl;
>> +      ifunc_decl = get_ifunc_for_version (fn);
>> +      gcc_assert (ifunc_decl != NULL);
>> +      mark_used (fn);
>> +      return build_fold_addr_expr (ifunc_decl);
>> +    }
>> +
>>   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
>>     return cp_build_addr_expr (fn, flags);
>>   else
>> Index: cp/decl.c
>> ===================================================================
>> --- cp/decl.c   (revision 184971)
>> +++ cp/decl.c   (working copy)
>> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "pointer-set.h"
>>  #include "splay-tree.h"
>>  #include "plugin.h"
>> +#include "multiversion.h"
>>
>>  /* Possible cases of bad specifiers type used by bad_specifiers. */
>>  enum bad_spec_place {
>> @@ -972,6 +973,23 @@ decls_match (tree newdecl, tree olddecl)
>>       if (t1 != t2)
>>        return 0;
>>
>> +      /* The decls dont match if they correspond to two different versions
>> +        of the same function.  */
>> +      if (compparms (p1, p2)
>> +         && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2))
>> +         && (DECL_FUNCTION_VERSIONED (newdecl)
>> +             || DECL_FUNCTION_VERSIONED (olddecl))
>> +         && has_different_version_attributes (newdecl, olddecl))
>> +       {
>> +         /* One of the decls could be the default without the "targetv"
>> +            attribute. Set it to be a versioned function here.  */
>> +         DECL_FUNCTION_VERSIONED (newdecl) = 1;
>> +         DECL_FUNCTION_VERSIONED (olddecl) = 1;
>> +         /* Accumulate all the versions of a function.  */
>> +         group_function_versions (olddecl, newdecl);
>> +         return 0;
>> +       }
>> +
>>       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
>>          && ! (DECL_EXTERN_C_P (newdecl)
>>                && DECL_EXTERN_C_P (olddecl)))
>> @@ -1482,7 +1500,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
>>              error ("previous declaration %q+#D here", olddecl);
>>              return NULL_TREE;
>>            }
>> -         else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
>> +         /* For function versions, params and types match, but they
>> +            are not ambiguous.  */
>> +         else if ((!DECL_FUNCTION_VERSIONED (newdecl)
>> +                   && !DECL_FUNCTION_VERSIONED (olddecl))
>> +                  && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
>>                              TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
>>            {
>>              error ("new declaration %q#D", newdecl);
>> @@ -2250,6 +2272,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool
>>   else if (DECL_PRESERVE_P (newdecl))
>>     DECL_PRESERVE_P (olddecl) = 1;
>>
>> +  /* If the olddecl is a version, so is the newdecl.  */
>> +  if (TREE_CODE (newdecl) == FUNCTION_DECL
>> +      && DECL_FUNCTION_VERSIONED (olddecl))
>> +    {
>> +      DECL_FUNCTION_VERSIONED (newdecl) = 1;
>> +      /* Record that newdecl is not a valid version and has
>> +        been deleted.  */
>> +      mark_delete_decl_version (newdecl);
>> +    }
>> +
>>   if (TREE_CODE (newdecl) == FUNCTION_DECL)
>>     {
>>       int function_size;
>> @@ -4512,6 +4544,10 @@ start_decl (const cp_declarator *declarator,
>>   /* Enter this declaration into the symbol table.  */
>>   decl = maybe_push_decl (decl);
>>
>> +  /* If this decl is a function version and not the default, its assembler
>> +     name has to be changed.  */
>> +  version_assembler_name (decl);
>> +
>>   if (processing_template_decl)
>>     decl = push_template_decl (decl);
>>   if (decl == error_mark_node)
>> @@ -13019,6 +13055,10 @@ start_function (cp_decl_specifier_seq *declspecs,
>>     gcc_assert (same_type_p (TREE_TYPE (TREE_TYPE (decl1)),
>>                             integer_type_node));
>>
>> +  /* If this decl is a function version and not the default, its assembler
>> +     name has to be changed.  */
>> +  version_assembler_name (decl1);
>> +
>>   start_preparsed_function (decl1, attrs, /*flags=*/SF_DEFAULT);
>>
>>   return 1;
>> @@ -13960,6 +14000,11 @@ cxx_comdat_group (tree decl)
>>            break;
>>        }
>>       name = DECL_ASSEMBLER_NAME (decl);
>> +      if (TREE_CODE (decl) == FUNCTION_DECL
>> +         && DECL_FUNCTION_VERSIONED (decl))
>> +       name = DECL_NAME (decl);
>> +      else
>> +        name = DECL_ASSEMBLER_NAME (decl);
>>     }
>>
>>   return name;
>> Index: cp/semantics.c
>> ===================================================================
>> --- cp/semantics.c      (revision 184971)
>> +++ cp/semantics.c      (working copy)
>> @@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn)
>>       /* If the user wants us to keep all inline functions, then mark
>>         this function as needed so that finish_file will make sure to
>>         output it later.  Similarly, all dllexport'd functions must
>> -        be emitted; there may be callers in other DLLs.  */
>> -      if ((flag_keep_inline_functions
>> +        be emitted; there may be callers in other DLLs.
>> +        Also, mark this function as needed if it is marked inline but
>> +        is a multi-versioned function.  */
>> +      if (((flag_keep_inline_functions
>> +           || DECL_FUNCTION_VERSIONED (fn))
>>           && DECL_DECLARED_INLINE_P (fn)
>>           && !DECL_REALLY_EXTERN (fn))
>>          || (flag_keep_inline_dllexport
>> Index: cp/decl2.c
>> ===================================================================
>> --- cp/decl2.c  (revision 184971)
>> +++ cp/decl2.c  (working copy)
>> @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "splay-tree.h"
>>  #include "langhooks.h"
>>  #include "c-family/c-ada-spec.h"
>> +#include "multiversion.h"
>>
>>  extern cpp_reader *parse_in;
>>
>> @@ -674,9 +675,13 @@ check_classfn (tree ctype, tree function, tree tem
>>          if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
>>            continue;
>>
>> +         /* While finding a match, same types and params are not enough
>> +            if the function is versioned.  Also check version ("targetv")
>> +            attributes.  */
>>          if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
>>                           TREE_TYPE (TREE_TYPE (fndecl)))
>>              && compparms (p1, p2)
>> +             && !has_different_version_attributes (function, fndecl)
>>              && (!is_template
>>                  || comp_template_parms (template_parms,
>>                                          DECL_TEMPLATE_PARMS (fndecl)))
>> Index: cp/call.c
>> ===================================================================
>> --- cp/call.c   (revision 184971)
>> +++ cp/call.c   (working copy)
>> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "langhooks.h"
>>  #include "c-family/c-objc.h"
>>  #include "timevar.h"
>> +#include "multiversion.h"
>>
>>  /* The various kinds of conversion.  */
>>
>> @@ -6730,6 +6731,17 @@ build_over_call (struct z_candidate *cand, int fla
>>   if (!already_used)
>>     mark_used (fn);
>>
>> +  /* For a call to a multi-versioned function, the call should actually be to
>> +     the dispatcher.  */
>> +  if (DECL_FUNCTION_VERSIONED (fn))
>> +    {
>> +      tree ifunc_decl;
>> +      ifunc_decl = get_ifunc_for_version (fn);
>> +      gcc_assert (ifunc_decl != NULL);
>> +      return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl,
>> +                                       nargs, argarray);
>> +    }
>> +
>>   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
>>     {
>>       tree t;
>> @@ -7980,6 +7992,30 @@ joust (struct z_candidate *cand1, struct z_candida
>>   size_t i;
>>   size_t len;
>>
>> +  /* For Candidates of a multi-versioned function, the one marked default
>> +     wins.  This is because the default decl is used as key to aggregate
>> +     all the other versions provided for it in multiversion.c.  When
>> +     generating the actual call, the appropriate dispatcher is created
>> +     to call the right function version at run-time.  */
>> +
>> +  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL
>> +       && DECL_FUNCTION_VERSIONED (cand1->fn))
>> +      ||(TREE_CODE (cand2->fn) == FUNCTION_DECL
>> +        && DECL_FUNCTION_VERSIONED (cand2->fn)))
>> +    {
>> +      if (is_default_function (cand1->fn))
>> +       {
>> +          mark_used (cand2->fn);
>> +         return 1;
>> +       }
>> +      if (is_default_function (cand2->fn))
>> +       {
>> +          mark_used (cand1->fn);
>> +         return -1;
>> +       }
>> +      return 0;
>> +    }
>> +
>>   /* Candidates that involve bad conversions are always worse than those
>>      that don't.  */
>>   if (cand1->viable > cand2->viable)
>> Index: timevar.def
>> ===================================================================
>> --- timevar.def (revision 184971)
>> +++ timevar.def (working copy)
>> @@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE        , "tree if-co
>>  DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
>>  DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
>>  DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
>> +DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch")
>>
>>  /* Everything else in rest_of_compilation not included above.  */
>>  DEFTIMEVAR (TV_EARLY_LOCAL          , "early local passes")
>> Index: varasm.c
>> ===================================================================
>> --- varasm.c    (revision 184971)
>> +++ varasm.c    (working copy)
>> @@ -5755,6 +5755,8 @@ finish_aliases_1 (void)
>>        }
>>       else if (! (p->emitted_diags & ALIAS_DIAG_TO_EXTERN)
>>               && DECL_EXTERNAL (target_decl)
>> +              && (!TREE_CODE (target_decl) == FUNCTION_DECL
>> +                  || !DECL_STRUCT_FUNCTION (target_decl))
>>               /* We use local aliases for C++ thunks to force the tailcall
>>                  to bind locally.  This is a hack - to keep it working do
>>                  the following (which is not strictly correct).  */
>> Index: Makefile.in
>> ===================================================================
>> --- Makefile.in (revision 184971)
>> +++ Makefile.in (working copy)
>> @@ -1298,6 +1298,7 @@ OBJS = \
>>        mcf.o \
>>        mode-switching.o \
>>        modulo-sched.o \
>> +       multiversion.o \
>>        omega.o \
>>        omp-low.o \
>>        optabs.o \
>> @@ -3030,6 +3031,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h
>>    $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>    $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
>>    $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
>> +multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
>> +   $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
>> +   $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
>> +   $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
>> +   $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h
>>  cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \
>> Index: passes.c
>> ===================================================================
>> --- passes.c    (revision 184971)
>> +++ passes.c    (working copy)
>> @@ -1190,6 +1190,7 @@ init_optimization_passes (void)
>>   NEXT_PASS (pass_build_cfg);
>>   NEXT_PASS (pass_warn_function_return);
>>   NEXT_PASS (pass_build_cgraph_edges);
>> +  NEXT_PASS (pass_dispatch_versions);
>>   *p = NULL;
>>
>>   /* Interprocedural optimization passes.  */
>> Index: config/i386/i386.c
>> ===================================================================
>> --- config/i386/i386.c  (revision 184971)
>> +++ config/i386/i386.c  (working copy)
>> @@ -27446,6 +27473,593 @@ ix86_init_mmx_sse_builtins (void)
>>     }
>>  }
>>
>> +/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
>> +   to return a pointer to VERSION_DECL if the outcome of the function
>> +   PREDICATE_DECL is true.  This function will be called during version
>> +   dispatch to decide which function version to execute.  It returns the
>> +   basic block at the end to which more conditions can be added.  */
>> +
>> +static basic_block
>> +add_condition_to_bb (tree function_decl, tree version_decl,
>> +                    basic_block new_bb, tree predicate_decl)
>> +{
>> +  gimple return_stmt;
>> +  tree convert_expr, result_var;
>> +  gimple convert_stmt;
>> +  gimple call_cond_stmt;
>> +  gimple if_else_stmt;
>> +
>> +  basic_block bb1, bb2, bb3;
>> +  edge e12, e23;
>> +
>> +  tree cond_var;
>> +  gimple_seq gseq;
>> +
>> +  tree old_current_function_decl;
>> +
>> +  old_current_function_decl = current_function_decl;
>> +  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
>> +  current_function_decl = function_decl;
>> +
>> +  gcc_assert (new_bb != NULL);
>> +  gseq = bb_seq (new_bb);
>> +
>> +
>> +  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
>> +                        build_fold_addr_expr (version_decl));
>> +  result_var = create_tmp_var (ptr_type_node, NULL);
>> +  convert_stmt = gimple_build_assign (result_var, convert_expr);
>> +  return_stmt = gimple_build_return (result_var);
>> +
>> +  if (predicate_decl == NULL_TREE)
>> +    {
>> +      gimple_seq_add_stmt (&gseq, convert_stmt);
>> +      gimple_seq_add_stmt (&gseq, return_stmt);
>> +      set_bb_seq (new_bb, gseq);
>> +      gimple_set_bb (convert_stmt, new_bb);
>> +      gimple_set_bb (return_stmt, new_bb);
>> +      pop_cfun ();
>> +      current_function_decl = old_current_function_decl;
>> +      return new_bb;
>> +    }
>> +
>> +  cond_var = create_tmp_var (integer_type_node, NULL);
>> +  call_cond_stmt = gimple_build_call (predicate_decl, 0);
>> +  gimple_call_set_lhs (call_cond_stmt, cond_var);
>> +
>> +  gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
>> +  gimple_set_bb (call_cond_stmt, new_bb);
>> +  gimple_seq_add_stmt (&gseq, call_cond_stmt);
>> +
>> +  if_else_stmt = gimple_build_cond (GT_EXPR, cond_var,
>> +                                   integer_zero_node,
>> +                                   NULL_TREE, NULL_TREE);
>> +  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
>> +  gimple_set_bb (if_else_stmt, new_bb);
>> +  gimple_seq_add_stmt (&gseq, if_else_stmt);
>> +
>> +  gimple_seq_add_stmt (&gseq, convert_stmt);
>> +  gimple_seq_add_stmt (&gseq, return_stmt);
>> +  set_bb_seq (new_bb, gseq);
>> +
>> +  bb1 = new_bb;
>> +  e12 = split_block (bb1, if_else_stmt);
>> +  bb2 = e12->dest;
>> +  e12->flags &= ~EDGE_FALLTHRU;
>> +  e12->flags |= EDGE_TRUE_VALUE;
>> +
>> +  e23 = split_block (bb2, return_stmt);
>> +
>> +  gimple_set_bb (convert_stmt, bb2);
>> +  gimple_set_bb (return_stmt, bb2);
>> +
>> +  bb3 = e23->dest;
>> +  make_edge (bb1, bb3, EDGE_FALSE_VALUE);
>> +
>> +  remove_edge (e23);
>> +  make_edge (bb2, EXIT_BLOCK_PTR, 0);
>> +
>> +  rebuild_cgraph_edges ();
>> +
>> +  pop_cfun ();
>> +  current_function_decl = old_current_function_decl;
>> +
>> +  return bb3;
>> +}
>> +
>> +/* This parses the attribute arguments to targetv in DECL and determines
>> +   the right builtin to use to match the platform specification.
>> +   For now, only one target argument ("arch=") is allowed.  */
>> +
>> +static enum ix86_builtins
>> +get_builtin_code_for_version (tree decl)
>> +{
>> +  tree attrs;
>> +  struct cl_target_option cur_target;
>> +  tree target_node;
>> +  struct cl_target_option *new_target;
>> +  enum ix86_builtins builtin_code = IX86_BUILTIN_MAX;
>> +
>> +  attrs = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl));
>> +  gcc_assert (attrs != NULL);
>> +
>> +  cl_target_option_save (&cur_target, &global_options);
>> +
>> +  target_node = ix86_valid_target_attribute_tree
>> +                 (TREE_VALUE (TREE_VALUE (attrs)));
>> +
>> +  gcc_assert (target_node);
>> +  new_target = TREE_TARGET_OPTION (target_node);
>> +  gcc_assert (new_target);
>> +
>> +  if (new_target->arch_specified && new_target->arch > 0)
>> +    {
>> +      switch (new_target->arch)
>> +        {
>> +       case 1:
>> +       case 2:
>> +       case 3:
>> +       case 4:
>> +       case 5:
>> +       case 6:
>> +       case 7:
>> +       case 8:
>> +       case 9:
>> +       case 10:
>> +       case 11:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL;
>> +         break;
>> +       case 12:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_CORE2;
>> +         break;
>> +       case 13:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_COREI7;
>> +         break;
>> +       case 14:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_ATOM;
>> +         break;
>> +       case 15:
>> +       case 16:
>> +       case 17:
>> +       case 18:
>> +       case 19:
>> +       case 20:
>> +       case 21:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMD;
>> +         break;
>> +       case 22:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM10H;
>> +         break;
>> +       case 23:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER1;
>> +         break;
>> +       case 24:
>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER2;
>> +         break;
>> +       case 25: /* What is btver1 ? */
>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMD;
>> +         break;
>> +       }
>> +    }
>> +
>> +  cl_target_option_restore (&global_options, &cur_target);
>> +  if (builtin_code == IX86_BUILTIN_MAX)
>> +      error_at (DECL_SOURCE_LOCATION (decl),
>> +               "No dispatcher found for the versioning attributes");
>> +
>> +  return builtin_code;
>> +}
>> +
>> +/* This is the target hook to generate the dispatch function for
>> +   multi-versioned functions.  DISPATCH_DECL is the function which will
>> +   contain the dispatch logic.  FNDECLS are the function choices for
>> +   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
>> +   in DISPATCH_DECL in which the dispatch code is generated.  */
>> +
>> +static int
>> +ix86_dispatch_version (tree dispatch_decl,
>> +                      void *fndecls_p,
>> +                      basic_block *empty_bb)
>> +{
>> +  tree default_decl;
>> +  gimple ifunc_cpu_init_stmt;
>> +  gimple_seq gseq;
>> +  tree old_current_function_decl;
>> +  int ix;
>> +  tree ele;
>> +  VEC (tree, heap) *fndecls;
>> +
>> +  gcc_assert (dispatch_decl != NULL
>> +             && fndecls_p != NULL
>> +             && empty_bb != NULL);
>> +
>> +  /*fndecls_p is actually a vector.  */
>> +  fndecls = (VEC (tree, heap) *)fndecls_p;
>> +
>> +  /* Atleast one more version other than the default.  */
>> +  gcc_assert (VEC_length (tree, fndecls) >= 2);
>> +
>> +  /* The first version in the vector is the default decl.  */
>> +  default_decl = VEC_index (tree, fndecls, 0);
>> +
>> +  old_current_function_decl = current_function_decl;
>> +  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
>> +  current_function_decl = dispatch_decl;
>> +
>> +  gseq = bb_seq (*empty_bb);
>> +  ifunc_cpu_init_stmt = gimple_build_call_vec (
>> +                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
>> +  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
>> +  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
>> +  set_bb_seq (*empty_bb, gseq);
>> +
>> +  pop_cfun ();
>> +  current_function_decl = old_current_function_decl;
>> +
>> +
>> +  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
>> +    {
>> +      tree version_decl = ele;
>> +      /* Get attribute string, parse it and find the right predicate decl.
>> +         The predicate function could be a lengthy combination of many
>> +        features, like arch-type and various isa-variants.  For now, only
>> +        check the arch-type.  */
>> +      tree predicate_decl = ix86_builtins [
>> +                       get_builtin_code_for_version (version_decl)];
>> +      *empty_bb = add_condition_to_bb (dispatch_decl, version_decl, *empty_bb,
>> +                                      predicate_decl);
>> +
>> +    }
>> +  /* dispatch default version at the end.  */
>> +  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl, *empty_bb,
>> +                                  NULL);
>> +  return 0;
>> +}
>>
>> @@ -38610,6 +39269,12 @@ ix86_autovectorize_vector_sizes (void)
>>  #undef TARGET_BUILD_BUILTIN_VA_LIST
>>  #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list
>>
>> +#undef TARGET_DISPATCH_VERSION
>> +#define TARGET_DISPATCH_VERSION ix86_dispatch_version
>> +
>>  #undef TARGET_ENUM_VA_LIST_P
>>  #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
>>
>> Index: testsuite/g++.dg/mv1.C
>> ===================================================================
>> --- testsuite/g++.dg/mv1.C      (revision 0)
>> +++ testsuite/g++.dg/mv1.C      (revision 0)
>> @@ -0,0 +1,23 @@
>> +/* Simple test case to check if Multiversioning works.  */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +
>> +int foo ();
>> +int foo () __attribute__ ((targetv("arch=corei7")));
>> +
>> +int main ()
>> +{
>> +  int (*p)() = &foo;
>> +  return foo () + (*p)();
>> +}
>> +
>> +int foo ()
>> +{
>> +  return 0;
>> +}
>> +
>> +int __attribute__ ((targetv("arch=corei7")))
>> +foo ()
>> +{
>> +  return 0;
>> +}
>>
>>
>> --
>> This patch is available for review at http://codereview.appspot.com/5752064

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-03-09 20:04   ` Sriraman Tallam
@ 2012-04-27  5:09     ` Sriraman Tallam
  2012-04-27 13:39       ` H.J. Lu
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-04-27  5:09 UTC (permalink / raw)
  To: Richard Guenther, Jan Hubicka, Uros Bizjak; +Cc: reply, gcc-patches, David Li

[-- Attachment #1: Type: text/plain, Size: 78141 bytes --]

Hi,

   I have made the following changes in this new patch which is attached:

* Use target attribute itself to create function versions.
* Handle any number of ISA names and arch=  args to target attribute,
generating the right dispatchers.
* Integrate with the CPU runtime detection checked in this week.
* Overload resolution: If the caller's target matches any of the
version function's target, then a direct call to the version is
generated, no need to go through the dispatching.

Patch also available for review here:
http://codereview.appspot.com/5752064

Thanks,
-Sri.


On Fri, Mar 9, 2012 at 12:04 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi Richard,
>
>  Here is a more detailed overview of the front-end description:
>
> * Tracking decls that correspond to function versions of function
> name, say "foo":
>
> Wnen the front-end sees a decl for "foo" with "targetv" attributes, it
> tags it as a function version. To prevent duplicate definition errors
> with other versions of "foo", I change "decls_match" function in
> cp/decl.c to return false when 2 decls have the same signature but
> different targetv attributes. This will make all function versions of
> "foo" to be added to the overload list of "foo".
>
> To expand further, different targetv attributes is checked for by
> sorting the arguments to targetv.
>
> * Change the assembler names of the function versions.
>
> The front-end, changes the assembler names of the function versions by
> tagging the sorted list of args to "targetv" to the function name of
> "foo". For example, the assembler name of "void foo () __attribute__
> ((targetv ("sse4")))" will become _Z3foov.sse4.
>
> * Separately group all function versions of "foo" together, in multiversion.c:
>
> File multiversion.c maintains a hashtab, decl_version_htab,  that maps
> the  default function decl of "foo" to the list of all other versions
> of this function "foo". This is meant to be used when creating the
> dispatcher for this function.
>
> * Overload resolution:
>
>  Function "build_over_call" in cp/call.c sees a call to function
> "foo", which is multi-versioned. The overload resolution happens in
> function "joust" in "cp/call.c". Here, the call to "foo" has all
> possible versions of "foo" as candidates. Currently, "joust" returns
> the default version of "foo" as the winning candidate. But,
> "build_over_call" realizes that this is a versioned function and
> replaces the call-site of foo with a "ifunc" call for foo, by querying
> a function in "multiversion.c" which builds the ifunc decl. After
> this, all call-sites of "foo" contain the call to the ifunc.
>
> Notice that, for  calls from a sse function to a versioned function
> with an sse variant, I can modify "joust" to return the "sse" function
> version rather than the default and not replace this call with an
> ifunc. To do this, I must pass the target attributes of the callee to
> "joust" and check if the target attributes also match any version.
>
> * Creating the dispatcher:
>
> The dispatcher is independently created in a new pass, called
> "pass_dispatch_version", that runs immediately after cfg and cgraph is
> created. The dispatcher looks at all possible versions and queries the
> target to give it the CPU detection predicates it must use to dispatch
> each version. Then, the dispatcher body is created and the ifunc is
> mapped to use this dispatcher.
>
> Notice that only the dispatcher creation is done after the front-end.
> Everything else occurs in the front-end itself. I could have created
> the dispatcher also in the front-end. I did not do so because I
> thought keeping it as a separate pass made sense to easily add more
> dispatch mechanisms. Like when IFUNC is not available, replace it with
>  control-flow to make direct calls to the function versions. Also,
> making the dispatcher after "cfg" is created was easy.
>
> Thanks,
> -Sri.
>
>
> On Wed, Mar 7, 2012 at 6:05 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Wed, Mar 7, 2012 at 1:46 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> User directed Function Multiversioning (MV) via Function Overloading
>>> ====================================================================
>>>
>>> This patch adds support for user directed function MV via function overloading.
>>> For more detailed description:
>>> http://gcc.gnu.org/ml/gcc/2012-03/msg00074.html
>>>
>>>
>>> Here is an example program with function versions:
>>>
>>> int foo ();  /* Default version */
>>> int foo () __attribute__ ((targetv("arch=corei7")));/*Specialized for corei7 */
>>> int foo () __attribute__ ((targetv("arch=core2")));/*Specialized for core2 */
>>>
>>> int main ()
>>> {
>>>  int (*p)() = &foo;
>>>  return foo () + (*p)();
>>> }
>>>
>>> int foo ()
>>> {
>>>  return 0;
>>> }
>>>
>>> int __attribute__ ((targetv("arch=corei7")))
>>> foo ()
>>> {
>>>  return 0;
>>> }
>>>
>>> int __attribute__ ((targetv("arch=core2")))
>>> foo ()
>>> {
>>>  return 0;
>>> }
>>>
>>> The above example has foo defined 3 times, but all 3 definitions of foo are
>>> different versions of the same function. The call to foo in main, directly and
>>> via a pointer, are calls to the multi-versioned function foo which is dispatched
>>> to the right foo at run-time.
>>>
>>> Function versions must have the same signature but must differ in the specifier
>>> string provided to a new attribute called "targetv", which is nothing but the
>>> target attribute with an extra specification to indicate a version. Any number
>>> of versions can be created using the targetv attribute but it is mandatory to
>>> have one function without the attribute, which is treated as the default
>>> version.
>>>
>>> The dispatching is done using the IFUNC mechanism to keep the dispatch overhead
>>> low. The compiler creates a dispatcher function which checks the CPU type and
>>> calls the right version of foo. The dispatching code checks for the platform
>>> type and calls the first version that matches. The default function is called if
>>> no specialized version is appropriate for execution.
>>>
>>> The pointer to foo is made to be the address of the dispatcher function, so that
>>> it is unique and calls made via the pointer also work correctly. The assembler
>>> names of the various versions of foo is made different, by tagging
>>> the specifier strings, to keep them unique.  A specific version can be called
>>> directly by creating an alias to its assembler name. For instance, to call the
>>> corei7 version directly, make an alias :
>>> int foo_corei7 () __attribute__((alias ("_Z3foov.arch_corei7")));
>>> and then call foo_corei7.
>>>
>>> Note that using IFUNC  blocks inlining of versioned functions. I had implemented
>>> an optimization earlier to do hot path cloning to allow versioned functions to
>>> be inlined. Please see : http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02285.html
>>> In the next iteration, I plan to merge these two. With that, hot code paths with
>>> versioned functions will be cloned so that versioned functions can be inlined.
>>
>> Note that inlining of functions with the target attribute is limited as well,
>> but your issue is that of the indirect dispatch as ...
>>
>> You don't give an overview of the frontend implementation.  Thus I have
>> extracted the following
>>
>>  - the FE does not really know about the "overloading", nor can it directly
>>   resolve calls from a "sse" function to another "sse" function without going
>>   through the 2nd IFUNC
>>
>>  - cgraph also does not know about the "overloading", so it cannot do such
>>   "devirtualization" either
>>
>> you seem to have implemented something inbetween a pure frontend
>> solution and a proper middle-end solution.  For optimization and eventually
>> automatically selecting functions for cloning (like, callees of a manual "sse"
>> versioned function should be cloned?) it would be nice if the cgraph would
>> know about the different versions and their relationships (and the dispatcher).
>> Especially the cgraph code should know the functions are semantically
>> equivalent (I suppose we should require that).  The IFUNC should be
>> generated by cgraph / target code, similar to how we generate C++ thunks.
>>
>> Honza, any suggestions on how the FE side of such cgraph infrastructure
>> should look like and how we should encode the target bits?
>>
>> Thanks,
>> Richard.
>>
>>>        * doc/tm.texi.in: Add description for TARGET_DISPATCH_VERSION.
>>>        * doc/tm.texi: Regenerate.
>>>        * c-family/c-common.c (handle_targetv_attribute): New function.
>>>        * target.def (dispatch_version): New target hook.
>>>        * tree.h (DECL_FUNCTION_VERSIONED): New macro.
>>>        (tree_function_decl): New bit-field versioned_function.
>>>        * tree-pass.h (pass_dispatch_versions): New pass.
>>>        * multiversion.c: New file.
>>>        * multiversion.h: New file.
>>>        * cgraphunit.c: Include multiversion.h
>>>        (cgraph_finalize_function): Change assembler names of versioned
>>>        functions.
>>>        * cp/class.c: Include multiversion.h
>>>        (add_method): aggregate function versions. Change assembler names of
>>>        versioned functions.
>>>        (resolve_address_of_overloaded_function): Match address of function
>>>        version with default function.  Return address of ifunc dispatcher
>>>        for address of versioned functions.
>>>        * cp/decl.c (decls_match): Make decls unmatched for versioned
>>>        functions.
>>>        (duplicate_decls): Remove ambiguity for versioned functions. Notify
>>>        of deleted function version decls.
>>>        (start_decl): Change assembler name of versioned functions.
>>>        (start_function): Change assembler name of versioned functions.
>>>        (cxx_comdat_group): Make comdat group of versioned functions be the
>>>        same.
>>>        * cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
>>>        functions that are also marked inline.
>>>        * cp/decl2.c: Include multiversion.h
>>>        (check_classfn): Check attributes of versioned functions for match.
>>>        * cp/call.c: Include multiversion.h
>>>        (build_over_call): Make calls to multiversioned functions to call the
>>>        dispatcher.
>>>        (joust): For calls to multi-versioned functions, make the default
>>>        function win.
>>>        * timevar.def (TV_MULTIVERSION_DISPATCH): New time var.
>>>        * varasm.c (finish_aliases_1): Check if the alias points to a function
>>>        with a body before giving an error.
>>>        * Makefile.in: Add multiversion.o
>>>        * passes.c: Add pass_dispatch_versions to the pass list.
>>>        * config/i386/i386.c (add_condition_to_bb): New function.
>>>        (get_builtin_code_for_version): New function.
>>>        (ix86_dispatch_version): New function.
>>>        (TARGET_DISPATCH_VERSION): New macro.
>>>        * testsuite/g++.dg/mv1.C: New test.
>>>
>>> Index: doc/tm.texi
>>> ===================================================================
>>> --- doc/tm.texi (revision 184971)
>>> +++ doc/tm.texi (working copy)
>>> @@ -10995,6 +10995,14 @@ The result is another tree containing a simplified
>>>  call's result.  If @var{ignore} is true the value will be ignored.
>>>  @end deftypefn
>>>
>>> +@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb})
>>> +For multi-versioned function, this hook sets up the dispatcher.
>>> +@var{dispatch_decl} is the function that will be used to dispatch the
>>> +version. @var{fndecls} are the function choices for dispatch.
>>> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the
>>> +code to do the dispatch will be added.
>>> +@end deftypefn
>>> +
>>>  @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
>>>
>>>  Take an instruction in @var{insn} and return NULL if it is valid within a
>>> Index: doc/tm.texi.in
>>> ===================================================================
>>> --- doc/tm.texi.in      (revision 184971)
>>> +++ doc/tm.texi.in      (working copy)
>>> @@ -10873,6 +10873,14 @@ The result is another tree containing a simplified
>>>  call's result.  If @var{ignore} is true the value will be ignored.
>>>  @end deftypefn
>>>
>>> +@hook TARGET_DISPATCH_VERSION
>>> +For multi-versioned function, this hook sets up the dispatcher.
>>> +@var{dispatch_decl} is the function that will be used to dispatch the
>>> +version. @var{fndecls} are the function choices for dispatch.
>>> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the
>>> +code to do the dispatch will be added.
>>> +@end deftypefn
>>> +
>>>  @hook TARGET_INVALID_WITHIN_DOLOOP
>>>
>>>  Take an instruction in @var{insn} and return NULL if it is valid within a
>>> Index: c-family/c-common.c
>>> ===================================================================
>>> --- c-family/c-common.c (revision 184971)
>>> +++ c-family/c-common.c (working copy)
>>> @@ -315,6 +315,7 @@ static tree check_case_value (tree);
>>>  static bool check_case_bounds (tree, tree, tree *, tree *);
>>>
>>>  static tree handle_packed_attribute (tree *, tree, tree, int, bool *);
>>> +static tree handle_targetv_attribute (tree *, tree, tree, int, bool *);
>>>  static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *);
>>>  static tree handle_common_attribute (tree *, tree, tree, int, bool *);
>>>  static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *);
>>> @@ -604,6 +605,8 @@ const struct attribute_spec c_common_attribute_tab
>>>  {
>>>   /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
>>>        affects_type_identity } */
>>> +  { "targetv",               1, -1, true, false, false,
>>> +                             handle_targetv_attribute, false },
>>>   { "packed",                 0, 0, false, false, false,
>>>                              handle_packed_attribute , false},
>>>   { "nocommon",               0, 0, true,  false, false,
>>> @@ -5869,6 +5872,54 @@ handle_packed_attribute (tree *node, tree name, tr
>>>   return NULL_TREE;
>>>  }
>>>
>>> +/* The targetv attribue is used to specify a function version
>>> +   targeted to specific platform types.  The "targetv" attributes
>>> +   have to be valid "target" attributes.  NODE should always point
>>> +   to a FUNCTION_DECL.  ARGS contain the arguments to "targetv"
>>> +   which should be valid arguments to attribute "target" too.
>>> +   Check handle_target_attribute for FLAGS and NO_ADD_ATTRS.  */
>>> +
>>> +static tree
>>> +handle_targetv_attribute (tree *node, tree name,
>>> +                         tree args,
>>> +                         int flags,
>>> +                         bool *no_add_attrs)
>>> +{
>>> +  const char *attr_str = NULL;
>>> +  gcc_assert (TREE_CODE (*node) == FUNCTION_DECL);
>>> +  gcc_assert (args != NULL);
>>> +
>>> +  /* This is a function version.  */
>>> +  DECL_FUNCTION_VERSIONED (*node) = 1;
>>> +
>>> +  attr_str = TREE_STRING_POINTER (TREE_VALUE (args));
>>> +
>>> +  /* Check if multiple sets of target attributes are there.  This
>>> +     is not supported now.   In future, this will be supported by
>>> +     cloning this function for each set.  */
>>> +  if (TREE_CHAIN (args) != NULL)
>>> +    warning (OPT_Wattributes, "%qE attribute has multiple sets which "
>>> +            "is not supported", name);
>>> +
>>> +  if (attr_str == NULL
>>> +      || strstr (attr_str, "arch=") == NULL)
>>> +    error_at (DECL_SOURCE_LOCATION (*node),
>>> +             "Versioning supported only on \"arch=\" for now");
>>> +
>>> +  /* targetv attributes must translate into target attributes.  */
>>> +  handle_target_attribute (node, get_identifier ("target"), args, flags,
>>> +                          no_add_attrs);
>>> +
>>> +  if (*no_add_attrs)
>>> +    warning (OPT_Wattributes, "%qE attribute has no effect", name);
>>> +
>>> +  /* This is necessary to keep the attribute tagged to the decl
>>> +     all the time.  */
>>> +  *no_add_attrs = false;
>>> +
>>> +  return NULL_TREE;
>>> +}
>>> +
>>>  /* Handle a "nocommon" attribute; arguments as in
>>>    struct attribute_spec.handler.  */
>>>
>>> Index: target.def
>>> ===================================================================
>>> --- target.def  (revision 184971)
>>> +++ target.def  (working copy)
>>> @@ -1249,6 +1249,15 @@ DEFHOOK
>>>  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
>>>  hook_tree_tree_int_treep_bool_null)
>>>
>>> +/* Target hook to generate the dispatching code for calls to multi-versioned
>>> +   functions.  DISPATCH_DECL is the function that will have the dispatching
>>> +   logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the
>>> +   basic bloc in DISPATCH_DECL which will contain the code.  */
>>> +DEFHOOK
>>> +(dispatch_version,
>>> + "",
>>> + int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL)
>>> +
>>>  /* Returns a code for a target-specific builtin that implements
>>>    reciprocal of the function, or NULL_TREE if not available.  */
>>>  DEFHOOK
>>> Index: tree.h
>>> ===================================================================
>>> --- tree.h      (revision 184971)
>>> +++ tree.h      (working copy)
>>> @@ -3532,6 +3532,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
>>>  #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
>>>    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
>>>
>>> +/* In FUNCTION_DECL, this is set if this function has other versions generated
>>> +   using "targetv" attributes.  The default version is the one which does not
>>> +   have any "targetv" attribute set. */
>>> +#define DECL_FUNCTION_VERSIONED(NODE)\
>>> +   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
>>> +
>>>  /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
>>>    arguments/result/saved_tree fields by front ends.   It was either inherit
>>>    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
>>> @@ -3576,8 +3582,8 @@ struct GTY(()) tree_function_decl {
>>>   unsigned looping_const_or_pure_flag : 1;
>>>   unsigned has_debug_args_flag : 1;
>>>   unsigned tm_clone_flag : 1;
>>> -
>>> -  /* 1 bit left */
>>> +  unsigned versioned_function : 1;
>>> +  /* No bits left.  */
>>>  };
>>>
>>>  /* The source language of the translation-unit.  */
>>> Index: tree-pass.h
>>> ===================================================================
>>> --- tree-pass.h (revision 184971)
>>> +++ tree-pass.h (working copy)
>>> @@ -455,6 +455,7 @@ extern struct gimple_opt_pass pass_tm_memopt;
>>>  extern struct gimple_opt_pass pass_tm_edges;
>>>  extern struct gimple_opt_pass pass_split_functions;
>>>  extern struct gimple_opt_pass pass_feedback_split_functions;
>>> +extern struct gimple_opt_pass pass_dispatch_versions;
>>>
>>>  /* IPA Passes */
>>>  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
>>> Index: multiversion.c
>>> ===================================================================
>>> --- multiversion.c      (revision 0)
>>> +++ multiversion.c      (revision 0)
>>> @@ -0,0 +1,798 @@
>>> +/* Function Multiversioning.
>>> +   Copyright (C) 2012 Free Software Foundation, Inc.
>>> +   Contributed by Sriraman Tallam (tmsriram@google.com)
>>> +
>>> +This file is part of GCC.
>>> +
>>> +GCC is free software; you can redistribute it and/or modify it under
>>> +the terms of the GNU General Public License as published by the Free
>>> +Software Foundation; either version 3, or (at your option) any later
>>> +version.
>>> +
>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>> +for more details.
>>> +
>>> +You should have received a copy of the GNU General Public License
>>> +along with GCC; see the file COPYING3.  If not see
>>> +<http://www.gnu.org/licenses/>. */
>>> +
>>> +/* Holds the state for multi-versioned functions here. The front-end
>>> +   updates the state as and when function versions are encountered.
>>> +   This is then used to generate the dispatch code.  Also, the
>>> +   optimization passes to clone hot paths involving versioned functions
>>> +   will be done here.
>>> +
>>> +   Function versions are created by using the same function signature but
>>> +   also tagging attribute "targetv" to specify the platform type for which
>>> +   the version must be executed.  Here is an example:
>>> +
>>> +   int foo ()
>>> +   {
>>> +     printf ("Execute as default");
>>> +     return 0;
>>> +   }
>>> +
>>> +   int  __attribute__ ((targetv ("arch=corei7")))
>>> +   foo ()
>>> +   {
>>> +     printf ("Execute for corei7");
>>> +     return 0;
>>> +   }
>>> +
>>> +   int main ()
>>> +   {
>>> +     return foo ();
>>> +   }
>>> +
>>> +   The call to foo in main is replaced with a call to an IFUNC function that
>>> +   contains the dispatch code to call the correct function version at
>>> +   run-time.  */
>>> +
>>> +
>>> +#include "config.h"
>>> +#include "system.h"
>>> +#include "coretypes.h"
>>> +#include "tm.h"
>>> +#include "tree.h"
>>> +#include "tree-inline.h"
>>> +#include "langhooks.h"
>>> +#include "flags.h"
>>> +#include "cgraph.h"
>>> +#include "diagnostic.h"
>>> +#include "toplev.h"
>>> +#include "timevar.h"
>>> +#include "params.h"
>>> +#include "fibheap.h"
>>> +#include "intl.h"
>>> +#include "tree-pass.h"
>>> +#include "hashtab.h"
>>> +#include "coverage.h"
>>> +#include "ggc.h"
>>> +#include "tree-flow.h"
>>> +#include "rtl.h"
>>> +#include "ipa-prop.h"
>>> +#include "basic-block.h"
>>> +#include "toplev.h"
>>> +#include "dbgcnt.h"
>>> +#include "tree-dump.h"
>>> +#include "output.h"
>>> +#include "vecprim.h"
>>> +#include "gimple-pretty-print.h"
>>> +#include "ipa-inline.h"
>>> +#include "target.h"
>>> +#include "multiversion.h"
>>> +
>>> +typedef void * void_p;
>>> +
>>> +DEF_VEC_P (void_p);
>>> +DEF_VEC_ALLOC_P (void_p, heap);
>>> +
>>> +/* Each function decl that is a function version gets an instance of this
>>> +   structure.   Since this is called by the front-end, decl merging can
>>> +   happen, where a decl created for a new declaration is merged with
>>> +   the old. In this case, the new decl is deleted and the IS_DELETED
>>> +   field is set for the struct instance corresponding to the new decl.
>>> +   IFUNC_DECL is the decl of the ifunc function for default decls.
>>> +   IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS
>>> +   is a vector containing the list of function versions  that are
>>> +   the candidates for dispatch.  */
>>> +
>>> +typedef struct version_function_d {
>>> +  tree decl;
>>> +  tree ifunc_decl;
>>> +  tree ifunc_resolver_decl;
>>> +  VEC (void_p, heap) *versions;
>>> +  bool is_deleted;
>>> +} version_function;
>>> +
>>> +/* Hashmap has an entry for every function decl that has other function
>>> +   versions.  For function decls that are the default, it also stores the
>>> +   list of all the other function versions.  Each entry is a structure
>>> +   of type version_function_d.  */
>>> +static htab_t decl_version_htab = NULL;
>>> +
>>> +/* Hashtable helpers for decl_version_htab. */
>>> +
>>> +static hashval_t
>>> +decl_version_htab_hash_descriptor (const void *p)
>>> +{
>>> +  const version_function *t = (const version_function *) p;
>>> +  return htab_hash_pointer (t->decl);
>>> +}
>>> +
>>> +/* Hashtable helper for decl_version_htab. */
>>> +
>>> +static int
>>> +decl_version_htab_eq_descriptor (const void *p1, const void *p2)
>>> +{
>>> +  const version_function *t1 = (const version_function *) p1;
>>> +  return htab_eq_pointer ((const void_p) t1->decl, p2);
>>> +}
>>> +
>>> +/* Create the decl_version_htab.  */
>>> +static void
>>> +create_decl_version_htab (void)
>>> +{
>>> +  if (decl_version_htab == NULL)
>>> +    decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor,
>>> +                                    decl_version_htab_eq_descriptor, NULL);
>>> +}
>>> +
>>> +/* Creates an instance of version_function for decl DECL.  */
>>> +
>>> +static version_function*
>>> +new_version_function (const tree decl)
>>> +{
>>> +  version_function *v;
>>> +  v = (version_function *)xmalloc(sizeof (version_function));
>>> +  v->decl = decl;
>>> +  v->ifunc_decl = NULL;
>>> +  v->ifunc_resolver_decl = NULL;
>>> +  v->versions = NULL;
>>> +  v->is_deleted = false;
>>> +  return v;
>>> +}
>>> +
>>> +/* Comparator function to be used in qsort routine to sort attribute
>>> +   specification strings to "targetv".  */
>>> +
>>> +static int
>>> +attr_strcmp (const void *v1, const void *v2)
>>> +{
>>> +  const char *c1 = *(char *const*)v1;
>>> +  const char *c2 = *(char *const*)v2;
>>> +  return strcmp (c1, c2);
>>> +}
>>> +
>>> +/* STR is the argument to targetv attribute.  This function tokenizes
>>> +   the comma separated arguments, sorts them and returns a string which
>>> +   is a unique identifier for the comma separated arguments.  */
>>> +
>>> +static char *
>>> +sorted_attr_string (const char *str)
>>> +{
>>> +  char **args = NULL;
>>> +  char *attr_str, *ret_str;
>>> +  char *attr = NULL;
>>> +  unsigned int argnum = 1;
>>> +  unsigned int i;
>>> +
>>> +  for (i = 0; i < strlen (str); i++)
>>> +    if (str[i] == ',')
>>> +      argnum++;
>>> +
>>> +  attr_str = (char *)xmalloc (strlen (str) + 1);
>>> +  strcpy (attr_str, str);
>>> +
>>> +  for (i = 0; i < strlen (attr_str); i++)
>>> +    if (attr_str[i] == '=')
>>> +      attr_str[i] = '_';
>>> +
>>> +  if (argnum == 1)
>>> +    return attr_str;
>>> +
>>> +  args = (char **)xmalloc (argnum * sizeof (char *));
>>> +
>>> +  i = 0;
>>> +  attr = strtok (attr_str, ",");
>>> +  while (attr != NULL)
>>> +    {
>>> +      args[i] = attr;
>>> +      i++;
>>> +      attr = strtok (NULL, ",");
>>> +    }
>>> +
>>> +  qsort (args, argnum, sizeof (char*), attr_strcmp);
>>> +
>>> +  ret_str = (char *)xmalloc (strlen (str) + 1);
>>> +  strcpy (ret_str, args[0]);
>>> +  for (i = 1; i < argnum; i++)
>>> +    {
>>> +      strcat (ret_str, "_");
>>> +      strcat (ret_str, args[i]);
>>> +    }
>>> +
>>> +  free (args);
>>> +  free (attr_str);
>>> +  return ret_str;
>>> +}
>>> +
>>> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv"
>>> +   or if the "targetv" attribute strings of DECL1 and DECL2 dont match.  */
>>> +
>>> +bool
>>> +has_different_version_attributes (const tree decl1, const tree decl2)
>>> +{
>>> +  tree attr1, attr2;
>>> +  char *c1, *c2;
>>> +  bool ret = false;
>>> +
>>> +  if (TREE_CODE (decl1) != FUNCTION_DECL
>>> +      || TREE_CODE (decl2) != FUNCTION_DECL)
>>> +    return false;
>>> +
>>> +  attr1 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl1));
>>> +  attr2 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl2));
>>> +
>>> +  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
>>> +    return false;
>>> +
>>> +  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
>>> +      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
>>> +    return true;
>>> +
>>> +  c1 = sorted_attr_string (
>>> +       TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1))));
>>> +  c2 = sorted_attr_string (
>>> +       TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2))));
>>> +
>>> +  if (strcmp (c1, c2) != 0)
>>> +     ret = true;
>>> +
>>> +  free (c1);
>>> +  free (c2);
>>> +
>>> +  return ret;
>>> +}
>>> +
>>> +/* If this decl corresponds to a function and has "targetv" attribute,
>>> +   append the attribute string to its assembler name.  */
>>> +
>>> +void
>>> +version_assembler_name (const tree decl)
>>> +{
>>> +  tree version_attr;
>>> +  const char *orig_name, *version_string, *attr_str;
>>> +  char *assembler_name;
>>> +  tree assembler_name_tree;
>>> +
>>> +  if (TREE_CODE (decl) != FUNCTION_DECL
>>> +      || DECL_ASSEMBLER_NAME_SET_P (decl)
>>> +      || !DECL_FUNCTION_VERSIONED (decl))
>>> +    return;
>>> +
>>> +  if (DECL_DECLARED_INLINE_P (decl)
>>> +      &&lookup_attribute ("gnu_inline",
>>> +                         DECL_ATTRIBUTES (decl)))
>>> +    error_at (DECL_SOURCE_LOCATION (decl),
>>> +             "Function versions cannot be marked as gnu_inline,"
>>> +             " bodies have to be generated\n");
>>> +
>>> +  if (DECL_VIRTUAL_P (decl)
>>> +      || DECL_VINDEX (decl))
>>> +    error_at (DECL_SOURCE_LOCATION (decl),
>>> +             "Virtual function versioning not supported\n");
>>> +
>>> +  version_attr = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl));
>>> +  /* targetv attribute string is NULL for default functions.  */
>>> +  if (version_attr == NULL_TREE)
>>> +    return;
>>> +
>>> +  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
>>> +  version_string
>>> +    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
>>> +
>>> +  attr_str = sorted_attr_string (version_string);
>>> +  assembler_name = (char *) xmalloc (strlen (orig_name)
>>> +                                    + strlen (attr_str) + 2);
>>> +
>>> +  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
>>> +  if (dump_file)
>>> +    fprintf (dump_file, "Assembler name set to %s for function version %s\n",
>>> +            assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
>>> +  assembler_name_tree = get_identifier (assembler_name);
>>> +  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree);
>>> +}
>>> +
>>> +/* Returns true if decl is multi-versioned and DECL is the default function,
>>> +   that is it is not tagged with "targetv" attribute.  */
>>> +
>>> +bool
>>> +is_default_function (const tree decl)
>>> +{
>>> +  return (TREE_CODE (decl) == FUNCTION_DECL
>>> +         && DECL_FUNCTION_VERSIONED (decl)
>>> +         && (lookup_attribute ("targetv", DECL_ATTRIBUTES (decl))
>>> +             == NULL_TREE));
>>> +}
>>> +
>>> +/* For function decl DECL, find the version_function struct in the
>>> +   decl_version_htab.  */
>>> +
>>> +static version_function *
>>> +find_function_version (const tree decl)
>>> +{
>>> +  void *slot;
>>> +
>>> +  if (!DECL_FUNCTION_VERSIONED (decl))
>>> +    return NULL;
>>> +
>>> +  if (!decl_version_htab)
>>> +    return NULL;
>>> +
>>> +  slot = htab_find_with_hash (decl_version_htab, decl,
>>> +                              htab_hash_pointer (decl));
>>> +
>>> +  if (slot != NULL)
>>> +    return (version_function *)slot;
>>> +
>>> +  return NULL;
>>> +}
>>> +
>>> +/* Record DECL as a function version by creating a version_function struct
>>> +   for it and storing it in the hashtable.  */
>>> +
>>> +static version_function *
>>> +add_function_version (const tree decl)
>>> +{
>>> +  void **slot;
>>> +  version_function *v;
>>> +
>>> +  if (!DECL_FUNCTION_VERSIONED (decl))
>>> +    return NULL;
>>> +
>>> +  create_decl_version_htab ();
>>> +
>>> +  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl,
>>> +                                   htab_hash_pointer ((const void_p)decl),
>>> +                                  INSERT);
>>> +
>>> +  if (*slot != NULL)
>>> +    return (version_function *)*slot;
>>> +
>>> +  v = new_version_function (decl);
>>> +  *slot = v;
>>> +
>>> +  return v;
>>> +}
>>> +
>>> +/* Push V into VEC only if it is not already present.  */
>>> +
>>> +static void
>>> +push_function_version (version_function *v, VEC (void_p, heap) *vec)
>>> +{
>>> +  int ix;
>>> +  void_p ele;
>>> +  for (ix = 0; VEC_iterate (void_p, vec, ix, ele); ++ix)
>>> +    {
>>> +      if (ele == (void_p)v)
>>> +        return;
>>> +    }
>>> +
>>> +  VEC_safe_push (void_p, heap, vec, (void*)v);
>>> +}
>>> +
>>> +/* Mark DECL as deleted.  This is called by the front-end when a duplicate
>>> +   decl is merged with the original decl and the duplicate decl is deleted.
>>> +   This function marks the duplicate_decl as invalid.  Called by
>>> +   duplicate_decls in cp/decl.c.  */
>>> +
>>> +void
>>> +mark_delete_decl_version (const tree decl)
>>> +{
>>> +  version_function *decl_v;
>>> +
>>> +  decl_v = find_function_version (decl);
>>> +
>>> +  if (decl_v == NULL)
>>> +    return;
>>> +
>>> +  decl_v->is_deleted = true;
>>> +
>>> +  if (is_default_function (decl)
>>> +      && decl_v->versions != NULL)
>>> +    {
>>> +      VEC_truncate (void_p, decl_v->versions, 0);
>>> +      VEC_free (void_p, heap, decl_v->versions);
>>> +    }
>>> +}
>>> +
>>> +/* Mark DECL1 and DECL2 to be function versions in the same group.  One
>>> +   of DECL1 and DECL2 must be the default, otherwise this function does
>>> +   nothing.  This function aggregates the versions.  */
>>> +
>>> +int
>>> +group_function_versions (const tree decl1, const tree decl2)
>>> +{
>>> +  tree default_decl, version_decl;
>>> +  version_function *default_v, *version_v;
>>> +
>>> +  gcc_assert (DECL_FUNCTION_VERSIONED (decl1)
>>> +             && DECL_FUNCTION_VERSIONED (decl2));
>>> +
>>> +  /* The version decls are added only to the default decl.  */
>>> +  if (!is_default_function (decl1)
>>> +      && !is_default_function (decl2))
>>> +    return 0;
>>> +
>>> +  /* This can happen with duplicate declarations.  Just ignore.  */
>>> +  if (is_default_function (decl1)
>>> +      && is_default_function (decl2))
>>> +    return 0;
>>> +
>>> +  default_decl = (is_default_function (decl1)) ? decl1 : decl2;
>>> +  version_decl = (default_decl == decl1) ? decl2 : decl1;
>>> +
>>> +  gcc_assert (default_decl != version_decl);
>>> +  create_decl_version_htab ();
>>> +
>>> +  /* If the version function is found, it has been added.  */
>>> +  if (find_function_version (version_decl))
>>> +    return 0;
>>> +
>>> +  default_v = add_function_version (default_decl);
>>> +  version_v = add_function_version (version_decl);
>>> +
>>> +  if (default_v->versions == NULL)
>>> +    default_v->versions = VEC_alloc (void_p, heap, 1);
>>> +
>>> +  push_function_version (version_v, default_v->versions);
>>> +  return 0;
>>> +}
>>> +
>>> +/* Makes a function attribute of the form NAME(ARG_NAME) and chains
>>> +   it to CHAIN.  */
>>> +
>>> +static tree
>>> +make_attribute (const char *name, const char *arg_name, tree chain)
>>> +{
>>> +  tree attr_name;
>>> +  tree attr_arg_name;
>>> +  tree attr_args;
>>> +  tree attr;
>>> +
>>> +  attr_name = get_identifier (name);
>>> +  attr_arg_name = build_string (strlen (arg_name), arg_name);
>>> +  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
>>> +  attr = tree_cons (attr_name, attr_args, chain);
>>> +  return attr;
>>> +}
>>> +
>>> +/* Return a new name by appending SUFFIX to the DECL name.  If
>>> +   make_unique is true, append the full path name.  */
>>> +
>>> +static char *
>>> +make_name (tree decl, const char *suffix, bool make_unique)
>>> +{
>>> +  char *global_var_name;
>>> +  int name_len;
>>> +  const char *name;
>>> +  const char *unique_name = NULL;
>>> +
>>> +  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
>>> +
>>> +  /* Get a unique name that can be used globally without any chances
>>> +     of collision at link time.  */
>>> +  if (make_unique)
>>> +    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
>>> +
>>> +  name_len = strlen (name) + strlen (suffix) + 2;
>>> +
>>> +  if (make_unique)
>>> +    name_len += strlen (unique_name) + 1;
>>> +  global_var_name = (char *) xmalloc (name_len);
>>> +
>>> +  /* Use '.' to concatenate names as it is demangler friendly.  */
>>> +  if (make_unique)
>>> +      snprintf (global_var_name, name_len, "%s.%s.%s", name,
>>> +               unique_name, suffix);
>>> +  else
>>> +      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
>>> +
>>> +  return global_var_name;
>>> +}
>>> +
>>> +/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch
>>> +   the versions of multi-versioned function DEFAULT_DECL.  Create and
>>> +   empty basic block in the resolver and store the pointer in
>>> +   EMPTY_BB.  Return the decl of the resolver function.  */
>>> +
>>> +static tree
>>> +make_ifunc_resolver_func (const tree default_decl,
>>> +                         const tree ifunc_decl,
>>> +                         basic_block *empty_bb)
>>> +{
>>> +  char *resolver_name;
>>> +  tree decl, type, decl_name, t;
>>> +  basic_block new_bb;
>>> +  tree old_current_function_decl;
>>> +  bool make_unique = false;
>>> +
>>> +  /* IFUNC's have to be globally visible.  So, if the default_decl is
>>> +     not, then the name of the IFUNC should be made unique.  */
>>> +  if (TREE_PUBLIC (default_decl) == 0)
>>> +    make_unique = true;
>>> +
>>> +  /* Append the filename to the resolver function if the versions are
>>> +     not externally visible.  This is because the resolver function has
>>> +     to be externally visible for the loader to find it.  So, appending
>>> +     the filename will prevent conflicts with a resolver function from
>>> +     another module which is based on the same version name.  */
>>> +  resolver_name = make_name (default_decl, "resolver", make_unique);
>>> +
>>> +  /* The resolver function should return a (void *). */
>>> +  type = build_function_type_list (ptr_type_node, NULL_TREE);
>>> +
>>> +  decl = build_fn_decl (resolver_name, type);
>>> +  decl_name = get_identifier (resolver_name);
>>> +  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
>>> +
>>> +  DECL_NAME (decl) = decl_name;
>>> +  TREE_USED (decl) = TREE_USED (default_decl);
>>> +  DECL_ARTIFICIAL (decl) = 1;
>>> +  DECL_IGNORED_P (decl) = 0;
>>> +  /* IFUNC resolvers have to be externally visible.  */
>>> +  TREE_PUBLIC (decl) = 1;
>>> +  DECL_UNINLINABLE (decl) = 1;
>>> +
>>> +  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl);
>>> +  DECL_EXTERNAL (ifunc_decl) = 0;
>>> +
>>> +  DECL_CONTEXT (decl) = NULL_TREE;
>>> +  DECL_INITIAL (decl) = make_node (BLOCK);
>>> +  DECL_STATIC_CONSTRUCTOR (decl) = 0;
>>> +  TREE_READONLY (decl) = 0;
>>> +  DECL_PURE_P (decl) = 0;
>>> +  DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
>>> +  if (DECL_COMDAT_GROUP (default_decl))
>>> +    {
>>> +      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
>>> +    }
>>> +  /* Build result decl and add to function_decl. */
>>> +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
>>> +  DECL_ARTIFICIAL (t) = 1;
>>> +  DECL_IGNORED_P (t) = 1;
>>> +  DECL_RESULT (decl) = t;
>>> +
>>> +  gimplify_function_tree (decl);
>>> +  old_current_function_decl = current_function_decl;
>>> +  push_cfun (DECL_STRUCT_FUNCTION (decl));
>>> +  current_function_decl = decl;
>>> +  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
>>> +  cfun->curr_properties |=
>>> +    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
>>> +     PROP_ssa);
>>> +  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
>>> +  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
>>> +  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
>>> +  *empty_bb = new_bb;
>>> +
>>> +  cgraph_add_new_function (decl, true);
>>> +  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
>>> +  cgraph_analyze_function (cgraph_get_create_node (decl));
>>> +  cgraph_mark_needed_node (cgraph_get_create_node (decl));
>>> +
>>> +  if (DECL_COMDAT_GROUP (default_decl))
>>> +    {
>>> +      gcc_assert (cgraph_get_node (default_decl));
>>> +      cgraph_add_to_same_comdat_group (cgraph_get_node (decl),
>>> +                                      cgraph_get_node (default_decl));
>>> +    }
>>> +
>>> +  pop_cfun ();
>>> +  current_function_decl = old_current_function_decl;
>>> +
>>> +  gcc_assert (ifunc_decl != NULL);
>>> +  DECL_ATTRIBUTES (ifunc_decl)
>>> +    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl));
>>> +  assemble_alias (ifunc_decl, get_identifier (resolver_name));
>>> +  return decl;
>>> +}
>>> +
>>> +/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to
>>> +   DECL function will be replaced with calls to the ifunc.   Return the decl
>>> +   of the ifunc created.  */
>>> +
>>> +static tree
>>> +make_ifunc_func (const tree decl)
>>> +{
>>> +  tree ifunc_decl;
>>> +  char *ifunc_name, *resolver_name;
>>> +  tree fn_type, ifunc_type;
>>> +  bool make_unique = false;
>>> +
>>> +  if (TREE_PUBLIC (decl) == 0)
>>> +    make_unique = true;
>>> +
>>> +  ifunc_name = make_name (decl, "ifunc", make_unique);
>>> +  resolver_name = make_name (decl, "resolver", make_unique);
>>> +  gcc_assert (resolver_name);
>>> +
>>> +  fn_type = TREE_TYPE (decl);
>>> +  ifunc_type = build_function_type (TREE_TYPE (fn_type),
>>> +                                   TYPE_ARG_TYPES (fn_type));
>>> +
>>> +  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type);
>>> +  TREE_USED (ifunc_decl) = 1;
>>> +  DECL_CONTEXT (ifunc_decl) = NULL_TREE;
>>> +  DECL_INITIAL (ifunc_decl) = error_mark_node;
>>> +  DECL_ARTIFICIAL (ifunc_decl) = 1;
>>> +  /* Mark this ifunc as external, the resolver will flip it again if
>>> +     it gets generated.  */
>>> +  DECL_EXTERNAL (ifunc_decl) = 1;
>>> +  /* IFUNCs have to be externally visible.  */
>>> +  TREE_PUBLIC (ifunc_decl) = 1;
>>> +
>>> +  return ifunc_decl;
>>> +}
>>> +
>>> +/* For multi-versioned function decl, which should also be the default,
>>> +   return the decl of the ifunc resolver, create it if it does not
>>> +   exist.  */
>>> +
>>> +tree
>>> +get_ifunc_for_version (const tree decl)
>>> +{
>>> +  version_function *decl_v;
>>> +  int ix;
>>> +  void_p ele;
>>> +
>>> +  /* DECL has to be the default version, otherwise it is missing and
>>> +     that is not allowed.  */
>>> +  if (!is_default_function (decl))
>>> +    {
>>> +      error_at (DECL_SOURCE_LOCATION (decl), "Default version not found");
>>> +      return decl;
>>> +    }
>>> +
>>> +  decl_v = find_function_version (decl);
>>> +  gcc_assert (decl_v != NULL);
>>> +  if (decl_v->ifunc_decl == NULL)
>>> +    {
>>> +      tree ifunc_decl;
>>> +      ifunc_decl = make_ifunc_func (decl);
>>> +      decl_v->ifunc_decl = ifunc_decl;
>>> +    }
>>> +
>>> +  if (cgraph_get_node (decl))
>>> +    cgraph_mark_needed_node (cgraph_get_node (decl));
>>> +
>>> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
>>> +    {
>>> +      version_function *v = (version_function *) ele;
>>> +      gcc_assert (v->decl != NULL);
>>> +      if (cgraph_get_node (v->decl))
>>> +       cgraph_mark_needed_node (cgraph_get_node (v->decl));
>>> +    }
>>> +
>>> +  return decl_v->ifunc_decl;
>>> +}
>>> +
>>> +/* Generate the dispatching code to dispatch multi-versioned function
>>> +   DECL.  Make a new function decl for dispatching and call the target
>>> +   hook to process the "targetv" attributes and provide the code to
>>> +   dispatch the right function at run-time.  */
>>> +
>>> +static tree
>>> +make_ifunc_resolver_for_version (const tree decl)
>>> +{
>>> +  version_function *decl_v;
>>> +  tree ifunc_resolver_decl, ifunc_decl;
>>> +  basic_block empty_bb;
>>> +  int ix;
>>> +  void_p ele;
>>> +  VEC (tree, heap) *fn_ver_vec = NULL;
>>> +
>>> +  gcc_assert (is_default_function (decl));
>>> +
>>> +  decl_v = find_function_version (decl);
>>> +  gcc_assert (decl_v != NULL);
>>> +
>>> +  if (decl_v->ifunc_resolver_decl != NULL)
>>> +    return decl_v->ifunc_resolver_decl;
>>> +
>>> +  ifunc_decl = decl_v->ifunc_decl;
>>> +
>>> +  if (ifunc_decl == NULL)
>>> +    ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl);
>>> +
>>> +  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl,
>>> +                                                 &empty_bb);
>>> +
>>> +  fn_ver_vec = VEC_alloc (tree, heap, 2);
>>> +  VEC_safe_push (tree, heap, fn_ver_vec, decl);
>>> +
>>> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
>>> +    {
>>> +      version_function *v = (version_function *) ele;
>>> +      gcc_assert (v->decl != NULL);
>>> +      /* Check for virtual functions here again, as by this time it should
>>> +        have been determined if this function needs a vtable index or
>>> +        not.  This happens for methods in derived classes that override
>>> +        virtual methods in base classes but are not explicitly marked as
>>> +        virtual.  */
>>> +      if (DECL_VINDEX (v->decl))
>>> +        error_at (DECL_SOURCE_LOCATION (v->decl),
>>> +                 "Virtual function versioning not supported\n");
>>> +      if (!v->is_deleted)
>>> +       VEC_safe_push (tree, heap, fn_ver_vec, v->decl);
>>> +    }
>>> +
>>> +  gcc_assert (targetm.dispatch_version);
>>> +  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb);
>>> +  decl_v->ifunc_resolver_decl = ifunc_resolver_decl;
>>> +
>>> +  return ifunc_resolver_decl;
>>> +}
>>> +
>>> +/* Main entry point to pass_dispatch_versions. For multi-versioned functions,
>>> +   generate the dispatching code.  */
>>> +
>>> +static unsigned int
>>> +do_dispatch_versions (void)
>>> +{
>>> +  /* A new pass for generating dispatch code for multi-versioned functions.
>>> +     Other forms of dispatch can be added when ifunc support is not available
>>> +     like just calling the function directly after checking for target type.
>>> +     Currently, dispatching is done through IFUNC.  This pass will become
>>> +     more meaningful when other dispatch mechanisms are added.  */
>>> +
>>> +  /* Cloning a function to produce more versions will happen here when the
>>> +     user requests that via the targetv attribute. For example,
>>> +     int foo () __attribute__ ((targetv(("arch=core2"), ("arch=corei7"))));
>>> +     means that the user wants the same body of foo to be versioned for core2
>>> +     and corei7.  In that case, this function will be cloned during this
>>> +     pass.  */
>>> +
>>> +  if (DECL_FUNCTION_VERSIONED (current_function_decl)
>>> +      && is_default_function (current_function_decl))
>>> +    {
>>> +      tree decl = make_ifunc_resolver_for_version (current_function_decl);
>>> +      if (dump_file && decl)
>>> +       dump_function_to_file (decl, dump_file, TDF_BLOCKS);
>>> +    }
>>> +  return 0;
>>> +}
>>> +
>>> +static  bool
>>> +gate_dispatch_versions (void)
>>> +{
>>> +  return true;
>>> +}
>>> +
>>> +/* A pass to generate the dispatch code to execute the appropriate version
>>> +   of a multi-versioned function at run-time.  */
>>> +
>>> +struct gimple_opt_pass pass_dispatch_versions =
>>> +{
>>> + {
>>> +  GIMPLE_PASS,
>>> +  "dispatch_multiversion_functions",    /* name */
>>> +  gate_dispatch_versions,              /* gate */
>>> +  do_dispatch_versions,                        /* execute */
>>> +  NULL,                                        /* sub */
>>> +  NULL,                                        /* next */
>>> +  0,                                   /* static_pass_number */
>>> +  TV_MULTIVERSION_DISPATCH,            /* tv_id */
>>> +  PROP_cfg,                            /* properties_required */
>>> +  PROP_cfg,                            /* properties_provided */
>>> +  0,                                   /* properties_destroyed */
>>> +  0,                                   /* todo_flags_start */
>>> +  TODO_dump_func |                     /* todo_flags_finish */
>>> +  TODO_cleanup_cfg | TODO_dump_cgraph
>>> + }
>>> +};
>>> Index: cgraphunit.c
>>> ===================================================================
>>> --- cgraphunit.c        (revision 184971)
>>> +++ cgraphunit.c        (working copy)
>>> @@ -141,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "ipa-inline.h"
>>>  #include "ipa-utils.h"
>>>  #include "lto-streamer.h"
>>> +#include "multiversion.h"
>>>
>>>  static void cgraph_expand_all_functions (void);
>>>  static void cgraph_mark_functions_to_output (void);
>>> @@ -343,6 +344,13 @@ cgraph_finalize_function (tree decl, bool nested)
>>>       node->local.redefined_extern_inline = true;
>>>     }
>>>
>>> +  /* If this is a function version and not the default, change the
>>> +     assembler name of this function.  The DECL names of function
>>> +     versions are the same, only the assembler names are made unique.
>>> +     The assembler name is changed by appending the string from
>>> +     the "targetv" attribute.  */
>>> +  version_assembler_name (decl);
>>> +
>>>   notice_global_symbol (decl);
>>>   node->local.finalized = true;
>>>   node->lowered = DECL_STRUCT_FUNCTION (decl)->cfg != NULL;
>>> Index: multiversion.h
>>> ===================================================================
>>> --- multiversion.h      (revision 0)
>>> +++ multiversion.h      (revision 0)
>>> @@ -0,0 +1,52 @@
>>> +/* Function Multiversioning.
>>> +   Copyright (C) 2012 Free Software Foundation, Inc.
>>> +   Contributed by Sriraman Tallam (tmsriram@google.com)
>>> +
>>> +This file is part of GCC.
>>> +
>>> +GCC is free software; you can redistribute it and/or modify it under
>>> +the terms of the GNU General Public License as published by the Free
>>> +Software Foundation; either version 3, or (at your option) any later
>>> +version.
>>> +
>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>> +for more details.
>>> +
>>> +You should have received a copy of the GNU General Public License
>>> +along with GCC; see the file COPYING3.  If not see
>>> +<http://www.gnu.org/licenses/>. */
>>> +
>>> +/* This is the header file which provides the functions to keep track
>>> +   of functions that are multi-versioned and to generate the dispatch
>>> +   code to call the right version at run-time.  */
>>> +
>>> +#ifndef GCC_MULTIVERSION_H
>>> +#define GCC_MULTIVERION_H
>>> +
>>> +#include "tree.h"
>>> +
>>> +/* Mark DECL1 and DECL2 as function versions.  */
>>> +int group_function_versions (const tree decl1, const tree decl2);
>>> +
>>> +/* Mark DECL as deleted and no longer a version.  */
>>> +void mark_delete_decl_version (const tree decl);
>>> +
>>> +/* Returns true if DECL is the default version to be executed if all
>>> +   other versions are inappropriate at run-time.  */
>>> +bool is_default_function (const tree decl);
>>> +
>>> +/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL
>>> +   must be the default function in the multi-versioned group.  */
>>> +tree get_ifunc_for_version (const tree decl);
>>> +
>>> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv"
>>> +   or if the "targetv" attribute strings of  DECL1 and DECL2 dont match.  */
>>> +bool has_different_version_attributes (const tree decl1, const tree decl2);
>>> +
>>> +/* If DECL is a function version and not the default version, the assembler
>>> +   name of DECL is changed to include the attribute string to keep the
>>> +   name unambiguous.  */
>>> +void version_assembler_name (const tree decl);
>>> +#endif
>>> Index: cp/class.c
>>> ===================================================================
>>> --- cp/class.c  (revision 184971)
>>> +++ cp/class.c  (working copy)
>>> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "tree-dump.h"
>>>  #include "splay-tree.h"
>>>  #include "pointer-set.h"
>>> +#include "multiversion.h"
>>>
>>>  /* The number of nested classes being processed.  If we are not in the
>>>    scope of any class, this is zero.  */
>>> @@ -1092,7 +1093,20 @@ add_method (tree type, tree method, tree using_dec
>>>              || same_type_p (TREE_TYPE (fn_type),
>>>                              TREE_TYPE (method_type))))
>>>        {
>>> -         if (using_decl)
>>> +         /* For function versions, their parms and types match
>>> +            but they are not duplicates.  Record function versions
>>> +            as and when they are found.  */
>>> +         if (TREE_CODE (fn) == FUNCTION_DECL
>>> +             && TREE_CODE (method) == FUNCTION_DECL
>>> +             && (DECL_FUNCTION_VERSIONED (fn)
>>> +                 || DECL_FUNCTION_VERSIONED (method)))
>>> +           {
>>> +             DECL_FUNCTION_VERSIONED (fn) = 1;
>>> +             DECL_FUNCTION_VERSIONED (method) = 1;
>>> +             group_function_versions (fn, method);
>>> +             continue;
>>> +           }
>>> +         else if (using_decl)
>>>            {
>>>              if (DECL_CONTEXT (fn) == type)
>>>                /* Defer to the local function.  */
>>> @@ -1150,6 +1164,13 @@ add_method (tree type, tree method, tree using_dec
>>>   else
>>>     /* Replace the current slot.  */
>>>     VEC_replace (tree, method_vec, slot, overload);
>>> +
>>> +  /* Change the assembler name of method here if it has "targetv"
>>> +     attributes.  Since all versions have the same mangled name,
>>> +     their assembler name is changed by appending the string from
>>> +     the "targetv" attribute. */
>>> +  version_assembler_name (method);
>>> +
>>>   return true;
>>>  }
>>>
>>> @@ -6890,8 +6911,11 @@ resolve_address_of_overloaded_function (tree targe
>>>          if (DECL_ANTICIPATED (fn))
>>>            continue;
>>>
>>> -         /* See if there's a match.  */
>>> -         if (same_type_p (target_fn_type, static_fn_type (fn)))
>>> +         /* See if there's a match.   For functions that are multi-versioned
>>> +            match it to the default function.  */
>>> +         if (same_type_p (target_fn_type, static_fn_type (fn))
>>> +             && (!DECL_FUNCTION_VERSIONED (fn)
>>> +                 || is_default_function (fn)))
>>>            matches = tree_cons (fn, NULL_TREE, matches);
>>>        }
>>>     }
>>> @@ -7053,6 +7077,21 @@ resolve_address_of_overloaded_function (tree targe
>>>       perform_or_defer_access_check (access_path, fn, fn);
>>>     }
>>>
>>> +  /* If a pointer to a function that is multi-versioned is requested, the
>>> +     pointer to the dispatcher function is returned instead.  This works
>>> +     well because indirectly calling the function will dispatch the right
>>> +     function version at run-time. Also, the function address is kept
>>> +     unique.  */
>>> +  if (DECL_FUNCTION_VERSIONED (fn)
>>> +      && is_default_function (fn))
>>> +    {
>>> +      tree ifunc_decl;
>>> +      ifunc_decl = get_ifunc_for_version (fn);
>>> +      gcc_assert (ifunc_decl != NULL);
>>> +      mark_used (fn);
>>> +      return build_fold_addr_expr (ifunc_decl);
>>> +    }
>>> +
>>>   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
>>>     return cp_build_addr_expr (fn, flags);
>>>   else
>>> Index: cp/decl.c
>>> ===================================================================
>>> --- cp/decl.c   (revision 184971)
>>> +++ cp/decl.c   (working copy)
>>> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "pointer-set.h"
>>>  #include "splay-tree.h"
>>>  #include "plugin.h"
>>> +#include "multiversion.h"
>>>
>>>  /* Possible cases of bad specifiers type used by bad_specifiers. */
>>>  enum bad_spec_place {
>>> @@ -972,6 +973,23 @@ decls_match (tree newdecl, tree olddecl)
>>>       if (t1 != t2)
>>>        return 0;
>>>
>>> +      /* The decls dont match if they correspond to two different versions
>>> +        of the same function.  */
>>> +      if (compparms (p1, p2)
>>> +         && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2))
>>> +         && (DECL_FUNCTION_VERSIONED (newdecl)
>>> +             || DECL_FUNCTION_VERSIONED (olddecl))
>>> +         && has_different_version_attributes (newdecl, olddecl))
>>> +       {
>>> +         /* One of the decls could be the default without the "targetv"
>>> +            attribute. Set it to be a versioned function here.  */
>>> +         DECL_FUNCTION_VERSIONED (newdecl) = 1;
>>> +         DECL_FUNCTION_VERSIONED (olddecl) = 1;
>>> +         /* Accumulate all the versions of a function.  */
>>> +         group_function_versions (olddecl, newdecl);
>>> +         return 0;
>>> +       }
>>> +
>>>       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
>>>          && ! (DECL_EXTERN_C_P (newdecl)
>>>                && DECL_EXTERN_C_P (olddecl)))
>>> @@ -1482,7 +1500,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
>>>              error ("previous declaration %q+#D here", olddecl);
>>>              return NULL_TREE;
>>>            }
>>> -         else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
>>> +         /* For function versions, params and types match, but they
>>> +            are not ambiguous.  */
>>> +         else if ((!DECL_FUNCTION_VERSIONED (newdecl)
>>> +                   && !DECL_FUNCTION_VERSIONED (olddecl))
>>> +                  && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
>>>                              TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
>>>            {
>>>              error ("new declaration %q#D", newdecl);
>>> @@ -2250,6 +2272,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool
>>>   else if (DECL_PRESERVE_P (newdecl))
>>>     DECL_PRESERVE_P (olddecl) = 1;
>>>
>>> +  /* If the olddecl is a version, so is the newdecl.  */
>>> +  if (TREE_CODE (newdecl) == FUNCTION_DECL
>>> +      && DECL_FUNCTION_VERSIONED (olddecl))
>>> +    {
>>> +      DECL_FUNCTION_VERSIONED (newdecl) = 1;
>>> +      /* Record that newdecl is not a valid version and has
>>> +        been deleted.  */
>>> +      mark_delete_decl_version (newdecl);
>>> +    }
>>> +
>>>   if (TREE_CODE (newdecl) == FUNCTION_DECL)
>>>     {
>>>       int function_size;
>>> @@ -4512,6 +4544,10 @@ start_decl (const cp_declarator *declarator,
>>>   /* Enter this declaration into the symbol table.  */
>>>   decl = maybe_push_decl (decl);
>>>
>>> +  /* If this decl is a function version and not the default, its assembler
>>> +     name has to be changed.  */
>>> +  version_assembler_name (decl);
>>> +
>>>   if (processing_template_decl)
>>>     decl = push_template_decl (decl);
>>>   if (decl == error_mark_node)
>>> @@ -13019,6 +13055,10 @@ start_function (cp_decl_specifier_seq *declspecs,
>>>     gcc_assert (same_type_p (TREE_TYPE (TREE_TYPE (decl1)),
>>>                             integer_type_node));
>>>
>>> +  /* If this decl is a function version and not the default, its assembler
>>> +     name has to be changed.  */
>>> +  version_assembler_name (decl1);
>>> +
>>>   start_preparsed_function (decl1, attrs, /*flags=*/SF_DEFAULT);
>>>
>>>   return 1;
>>> @@ -13960,6 +14000,11 @@ cxx_comdat_group (tree decl)
>>>            break;
>>>        }
>>>       name = DECL_ASSEMBLER_NAME (decl);
>>> +      if (TREE_CODE (decl) == FUNCTION_DECL
>>> +         && DECL_FUNCTION_VERSIONED (decl))
>>> +       name = DECL_NAME (decl);
>>> +      else
>>> +        name = DECL_ASSEMBLER_NAME (decl);
>>>     }
>>>
>>>   return name;
>>> Index: cp/semantics.c
>>> ===================================================================
>>> --- cp/semantics.c      (revision 184971)
>>> +++ cp/semantics.c      (working copy)
>>> @@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn)
>>>       /* If the user wants us to keep all inline functions, then mark
>>>         this function as needed so that finish_file will make sure to
>>>         output it later.  Similarly, all dllexport'd functions must
>>> -        be emitted; there may be callers in other DLLs.  */
>>> -      if ((flag_keep_inline_functions
>>> +        be emitted; there may be callers in other DLLs.
>>> +        Also, mark this function as needed if it is marked inline but
>>> +        is a multi-versioned function.  */
>>> +      if (((flag_keep_inline_functions
>>> +           || DECL_FUNCTION_VERSIONED (fn))
>>>           && DECL_DECLARED_INLINE_P (fn)
>>>           && !DECL_REALLY_EXTERN (fn))
>>>          || (flag_keep_inline_dllexport
>>> Index: cp/decl2.c
>>> ===================================================================
>>> --- cp/decl2.c  (revision 184971)
>>> +++ cp/decl2.c  (working copy)
>>> @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "splay-tree.h"
>>>  #include "langhooks.h"
>>>  #include "c-family/c-ada-spec.h"
>>> +#include "multiversion.h"
>>>
>>>  extern cpp_reader *parse_in;
>>>
>>> @@ -674,9 +675,13 @@ check_classfn (tree ctype, tree function, tree tem
>>>          if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
>>>            continue;
>>>
>>> +         /* While finding a match, same types and params are not enough
>>> +            if the function is versioned.  Also check version ("targetv")
>>> +            attributes.  */
>>>          if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
>>>                           TREE_TYPE (TREE_TYPE (fndecl)))
>>>              && compparms (p1, p2)
>>> +             && !has_different_version_attributes (function, fndecl)
>>>              && (!is_template
>>>                  || comp_template_parms (template_parms,
>>>                                          DECL_TEMPLATE_PARMS (fndecl)))
>>> Index: cp/call.c
>>> ===================================================================
>>> --- cp/call.c   (revision 184971)
>>> +++ cp/call.c   (working copy)
>>> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "langhooks.h"
>>>  #include "c-family/c-objc.h"
>>>  #include "timevar.h"
>>> +#include "multiversion.h"
>>>
>>>  /* The various kinds of conversion.  */
>>>
>>> @@ -6730,6 +6731,17 @@ build_over_call (struct z_candidate *cand, int fla
>>>   if (!already_used)
>>>     mark_used (fn);
>>>
>>> +  /* For a call to a multi-versioned function, the call should actually be to
>>> +     the dispatcher.  */
>>> +  if (DECL_FUNCTION_VERSIONED (fn))
>>> +    {
>>> +      tree ifunc_decl;
>>> +      ifunc_decl = get_ifunc_for_version (fn);
>>> +      gcc_assert (ifunc_decl != NULL);
>>> +      return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl,
>>> +                                       nargs, argarray);
>>> +    }
>>> +
>>>   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
>>>     {
>>>       tree t;
>>> @@ -7980,6 +7992,30 @@ joust (struct z_candidate *cand1, struct z_candida
>>>   size_t i;
>>>   size_t len;
>>>
>>> +  /* For Candidates of a multi-versioned function, the one marked default
>>> +     wins.  This is because the default decl is used as key to aggregate
>>> +     all the other versions provided for it in multiversion.c.  When
>>> +     generating the actual call, the appropriate dispatcher is created
>>> +     to call the right function version at run-time.  */
>>> +
>>> +  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL
>>> +       && DECL_FUNCTION_VERSIONED (cand1->fn))
>>> +      ||(TREE_CODE (cand2->fn) == FUNCTION_DECL
>>> +        && DECL_FUNCTION_VERSIONED (cand2->fn)))
>>> +    {
>>> +      if (is_default_function (cand1->fn))
>>> +       {
>>> +          mark_used (cand2->fn);
>>> +         return 1;
>>> +       }
>>> +      if (is_default_function (cand2->fn))
>>> +       {
>>> +          mark_used (cand1->fn);
>>> +         return -1;
>>> +       }
>>> +      return 0;
>>> +    }
>>> +
>>>   /* Candidates that involve bad conversions are always worse than those
>>>      that don't.  */
>>>   if (cand1->viable > cand2->viable)
>>> Index: timevar.def
>>> ===================================================================
>>> --- timevar.def (revision 184971)
>>> +++ timevar.def (working copy)
>>> @@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE        , "tree if-co
>>>  DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
>>>  DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
>>>  DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
>>> +DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch")
>>>
>>>  /* Everything else in rest_of_compilation not included above.  */
>>>  DEFTIMEVAR (TV_EARLY_LOCAL          , "early local passes")
>>> Index: varasm.c
>>> ===================================================================
>>> --- varasm.c    (revision 184971)
>>> +++ varasm.c    (working copy)
>>> @@ -5755,6 +5755,8 @@ finish_aliases_1 (void)
>>>        }
>>>       else if (! (p->emitted_diags & ALIAS_DIAG_TO_EXTERN)
>>>               && DECL_EXTERNAL (target_decl)
>>> +              && (!TREE_CODE (target_decl) == FUNCTION_DECL
>>> +                  || !DECL_STRUCT_FUNCTION (target_decl))
>>>               /* We use local aliases for C++ thunks to force the tailcall
>>>                  to bind locally.  This is a hack - to keep it working do
>>>                  the following (which is not strictly correct).  */
>>> Index: Makefile.in
>>> ===================================================================
>>> --- Makefile.in (revision 184971)
>>> +++ Makefile.in (working copy)
>>> @@ -1298,6 +1298,7 @@ OBJS = \
>>>        mcf.o \
>>>        mode-switching.o \
>>>        modulo-sched.o \
>>> +       multiversion.o \
>>>        omega.o \
>>>        omp-low.o \
>>>        optabs.o \
>>> @@ -3030,6 +3031,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h
>>>    $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>>    $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
>>>    $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
>>> +multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
>>> +   $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
>>> +   $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
>>> +   $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
>>> +   $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h
>>>  cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>>    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \
>>> Index: passes.c
>>> ===================================================================
>>> --- passes.c    (revision 184971)
>>> +++ passes.c    (working copy)
>>> @@ -1190,6 +1190,7 @@ init_optimization_passes (void)
>>>   NEXT_PASS (pass_build_cfg);
>>>   NEXT_PASS (pass_warn_function_return);
>>>   NEXT_PASS (pass_build_cgraph_edges);
>>> +  NEXT_PASS (pass_dispatch_versions);
>>>   *p = NULL;
>>>
>>>   /* Interprocedural optimization passes.  */
>>> Index: config/i386/i386.c
>>> ===================================================================
>>> --- config/i386/i386.c  (revision 184971)
>>> +++ config/i386/i386.c  (working copy)
>>> @@ -27446,6 +27473,593 @@ ix86_init_mmx_sse_builtins (void)
>>>     }
>>>  }
>>>
>>> +/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
>>> +   to return a pointer to VERSION_DECL if the outcome of the function
>>> +   PREDICATE_DECL is true.  This function will be called during version
>>> +   dispatch to decide which function version to execute.  It returns the
>>> +   basic block at the end to which more conditions can be added.  */
>>> +
>>> +static basic_block
>>> +add_condition_to_bb (tree function_decl, tree version_decl,
>>> +                    basic_block new_bb, tree predicate_decl)
>>> +{
>>> +  gimple return_stmt;
>>> +  tree convert_expr, result_var;
>>> +  gimple convert_stmt;
>>> +  gimple call_cond_stmt;
>>> +  gimple if_else_stmt;
>>> +
>>> +  basic_block bb1, bb2, bb3;
>>> +  edge e12, e23;
>>> +
>>> +  tree cond_var;
>>> +  gimple_seq gseq;
>>> +
>>> +  tree old_current_function_decl;
>>> +
>>> +  old_current_function_decl = current_function_decl;
>>> +  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
>>> +  current_function_decl = function_decl;
>>> +
>>> +  gcc_assert (new_bb != NULL);
>>> +  gseq = bb_seq (new_bb);
>>> +
>>> +
>>> +  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
>>> +                        build_fold_addr_expr (version_decl));
>>> +  result_var = create_tmp_var (ptr_type_node, NULL);
>>> +  convert_stmt = gimple_build_assign (result_var, convert_expr);
>>> +  return_stmt = gimple_build_return (result_var);
>>> +
>>> +  if (predicate_decl == NULL_TREE)
>>> +    {
>>> +      gimple_seq_add_stmt (&gseq, convert_stmt);
>>> +      gimple_seq_add_stmt (&gseq, return_stmt);
>>> +      set_bb_seq (new_bb, gseq);
>>> +      gimple_set_bb (convert_stmt, new_bb);
>>> +      gimple_set_bb (return_stmt, new_bb);
>>> +      pop_cfun ();
>>> +      current_function_decl = old_current_function_decl;
>>> +      return new_bb;
>>> +    }
>>> +
>>> +  cond_var = create_tmp_var (integer_type_node, NULL);
>>> +  call_cond_stmt = gimple_build_call (predicate_decl, 0);
>>> +  gimple_call_set_lhs (call_cond_stmt, cond_var);
>>> +
>>> +  gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
>>> +  gimple_set_bb (call_cond_stmt, new_bb);
>>> +  gimple_seq_add_stmt (&gseq, call_cond_stmt);
>>> +
>>> +  if_else_stmt = gimple_build_cond (GT_EXPR, cond_var,
>>> +                                   integer_zero_node,
>>> +                                   NULL_TREE, NULL_TREE);
>>> +  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
>>> +  gimple_set_bb (if_else_stmt, new_bb);
>>> +  gimple_seq_add_stmt (&gseq, if_else_stmt);
>>> +
>>> +  gimple_seq_add_stmt (&gseq, convert_stmt);
>>> +  gimple_seq_add_stmt (&gseq, return_stmt);
>>> +  set_bb_seq (new_bb, gseq);
>>> +
>>> +  bb1 = new_bb;
>>> +  e12 = split_block (bb1, if_else_stmt);
>>> +  bb2 = e12->dest;
>>> +  e12->flags &= ~EDGE_FALLTHRU;
>>> +  e12->flags |= EDGE_TRUE_VALUE;
>>> +
>>> +  e23 = split_block (bb2, return_stmt);
>>> +
>>> +  gimple_set_bb (convert_stmt, bb2);
>>> +  gimple_set_bb (return_stmt, bb2);
>>> +
>>> +  bb3 = e23->dest;
>>> +  make_edge (bb1, bb3, EDGE_FALSE_VALUE);
>>> +
>>> +  remove_edge (e23);
>>> +  make_edge (bb2, EXIT_BLOCK_PTR, 0);
>>> +
>>> +  rebuild_cgraph_edges ();
>>> +
>>> +  pop_cfun ();
>>> +  current_function_decl = old_current_function_decl;
>>> +
>>> +  return bb3;
>>> +}
>>> +
>>> +/* This parses the attribute arguments to targetv in DECL and determines
>>> +   the right builtin to use to match the platform specification.
>>> +   For now, only one target argument ("arch=") is allowed.  */
>>> +
>>> +static enum ix86_builtins
>>> +get_builtin_code_for_version (tree decl)
>>> +{
>>> +  tree attrs;
>>> +  struct cl_target_option cur_target;
>>> +  tree target_node;
>>> +  struct cl_target_option *new_target;
>>> +  enum ix86_builtins builtin_code = IX86_BUILTIN_MAX;
>>> +
>>> +  attrs = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl));
>>> +  gcc_assert (attrs != NULL);
>>> +
>>> +  cl_target_option_save (&cur_target, &global_options);
>>> +
>>> +  target_node = ix86_valid_target_attribute_tree
>>> +                 (TREE_VALUE (TREE_VALUE (attrs)));
>>> +
>>> +  gcc_assert (target_node);
>>> +  new_target = TREE_TARGET_OPTION (target_node);
>>> +  gcc_assert (new_target);
>>> +
>>> +  if (new_target->arch_specified && new_target->arch > 0)
>>> +    {
>>> +      switch (new_target->arch)
>>> +        {
>>> +       case 1:
>>> +       case 2:
>>> +       case 3:
>>> +       case 4:
>>> +       case 5:
>>> +       case 6:
>>> +       case 7:
>>> +       case 8:
>>> +       case 9:
>>> +       case 10:
>>> +       case 11:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL;
>>> +         break;
>>> +       case 12:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_CORE2;
>>> +         break;
>>> +       case 13:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_COREI7;
>>> +         break;
>>> +       case 14:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_INTEL_ATOM;
>>> +         break;
>>> +       case 15:
>>> +       case 16:
>>> +       case 17:
>>> +       case 18:
>>> +       case 19:
>>> +       case 20:
>>> +       case 21:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMD;
>>> +         break;
>>> +       case 22:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM10H;
>>> +         break;
>>> +       case 23:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER1;
>>> +         break;
>>> +       case 24:
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER2;
>>> +         break;
>>> +       case 25: /* What is btver1 ? */
>>> +         builtin_code = IX86_BUILTIN_CPU_IS_AMD;
>>> +         break;
>>> +       }
>>> +    }
>>> +
>>> +  cl_target_option_restore (&global_options, &cur_target);
>>> +  if (builtin_code == IX86_BUILTIN_MAX)
>>> +      error_at (DECL_SOURCE_LOCATION (decl),
>>> +               "No dispatcher found for the versioning attributes");
>>> +
>>> +  return builtin_code;
>>> +}
>>> +
>>> +/* This is the target hook to generate the dispatch function for
>>> +   multi-versioned functions.  DISPATCH_DECL is the function which will
>>> +   contain the dispatch logic.  FNDECLS are the function choices for
>>> +   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
>>> +   in DISPATCH_DECL in which the dispatch code is generated.  */
>>> +
>>> +static int
>>> +ix86_dispatch_version (tree dispatch_decl,
>>> +                      void *fndecls_p,
>>> +                      basic_block *empty_bb)
>>> +{
>>> +  tree default_decl;
>>> +  gimple ifunc_cpu_init_stmt;
>>> +  gimple_seq gseq;
>>> +  tree old_current_function_decl;
>>> +  int ix;
>>> +  tree ele;
>>> +  VEC (tree, heap) *fndecls;
>>> +
>>> +  gcc_assert (dispatch_decl != NULL
>>> +             && fndecls_p != NULL
>>> +             && empty_bb != NULL);
>>> +
>>> +  /*fndecls_p is actually a vector.  */
>>> +  fndecls = (VEC (tree, heap) *)fndecls_p;
>>> +
>>> +  /* Atleast one more version other than the default.  */
>>> +  gcc_assert (VEC_length (tree, fndecls) >= 2);
>>> +
>>> +  /* The first version in the vector is the default decl.  */
>>> +  default_decl = VEC_index (tree, fndecls, 0);
>>> +
>>> +  old_current_function_decl = current_function_decl;
>>> +  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
>>> +  current_function_decl = dispatch_decl;
>>> +
>>> +  gseq = bb_seq (*empty_bb);
>>> +  ifunc_cpu_init_stmt = gimple_build_call_vec (
>>> +                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
>>> +  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
>>> +  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
>>> +  set_bb_seq (*empty_bb, gseq);
>>> +
>>> +  pop_cfun ();
>>> +  current_function_decl = old_current_function_decl;
>>> +
>>> +
>>> +  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
>>> +    {
>>> +      tree version_decl = ele;
>>> +      /* Get attribute string, parse it and find the right predicate decl.
>>> +         The predicate function could be a lengthy combination of many
>>> +        features, like arch-type and various isa-variants.  For now, only
>>> +        check the arch-type.  */
>>> +      tree predicate_decl = ix86_builtins [
>>> +                       get_builtin_code_for_version (version_decl)];
>>> +      *empty_bb = add_condition_to_bb (dispatch_decl, version_decl, *empty_bb,
>>> +                                      predicate_decl);
>>> +
>>> +    }
>>> +  /* dispatch default version at the end.  */
>>> +  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl, *empty_bb,
>>> +                                  NULL);
>>> +  return 0;
>>> +}
>>>
>>> @@ -38610,6 +39269,12 @@ ix86_autovectorize_vector_sizes (void)
>>>  #undef TARGET_BUILD_BUILTIN_VA_LIST
>>>  #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list
>>>
>>> +#undef TARGET_DISPATCH_VERSION
>>> +#define TARGET_DISPATCH_VERSION ix86_dispatch_version
>>> +
>>>  #undef TARGET_ENUM_VA_LIST_P
>>>  #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
>>>
>>> Index: testsuite/g++.dg/mv1.C
>>> ===================================================================
>>> --- testsuite/g++.dg/mv1.C      (revision 0)
>>> +++ testsuite/g++.dg/mv1.C      (revision 0)
>>> @@ -0,0 +1,23 @@
>>> +/* Simple test case to check if Multiversioning works.  */
>>> +/* { dg-do run } */
>>> +/* { dg-options "-O2" } */
>>> +
>>> +int foo ();
>>> +int foo () __attribute__ ((targetv("arch=corei7")));
>>> +
>>> +int main ()
>>> +{
>>> +  int (*p)() = &foo;
>>> +  return foo () + (*p)();
>>> +}
>>> +
>>> +int foo ()
>>> +{
>>> +  return 0;
>>> +}
>>> +
>>> +int __attribute__ ((targetv("arch=corei7")))
>>> +foo ()
>>> +{
>>> +  return 0;
>>> +}
>>>
>>>
>>> --
>>> This patch is available for review at http://codereview.appspot.com/5752064

[-- Attachment #2: mv_fe_patch.txt --]
[-- Type: text/plain, Size: 63424 bytes --]

Overview of the patch which adds front-end support to specify function versions.

Example:

int foo ();  /* Default version */
int foo () __attribute__ ((target("avx,popcnt")));/*Specialized for avx and popcnt */
int foo () __attribute__ ((target("arch=core2,ssse3")));/*Specialized for core2 and ssse3*/

int main ()
{
 int (*p)() = &foo;
 return foo () + (*p)();
}

int foo ()
{
 return 0;
}

int __attribute__ ((target("avx,popcnt")))
foo ()
{
 return 0;
}

int __attribute__ ((target("arch=core2,ssse3")))
foo ()
{
 return 0;
}

The above example has foo defined 3 times, but all 3 definitions of foo are
different versions of the same function. The call to foo in main, directly and
via a pointer, are calls to the multi-versioned function foo which is dispatched
to the right foo at run-time.

What does the patch do?

* Tracking decls that correspond to function versions of function
name, say "foo":

Wnen the front-end sees more than one decl for "foo", with atleast one decl
tagged with "target"  attributes, it marks it as function versions. To
prevent duplicate definition errors with other versions of "foo", I change
"decls_match" function in cp/decl.c to return false when 2 decls have the
same signature but different target attributes. This will make all function
versions of "foo" to be added to the overload list of "foo".

* Change the assembler names of the function versions.

The front-end changes the assembler names of the function versions by suffixing
the sorted list of args to "target" to the function name of "foo". For example,
he assembler name of "void foo () __attribute__ ((target ("sse4")))" will
become _Z3foov.sse4.

* Separately group all function versions of "foo" together, in multiversion.c:

File multiversion.c maintains a hashtab, decl_version_htab,  that maps
the  default function decl of "foo" to the list of all other versions
of this function "foo". This is used when creating the dispatcher for
this function.

* Overload resolution:

 Function "build_over_call" in cp/call.c sees a call to function
"foo", which is multi-versioned. The overload resolution happens in
function "joust" in "cp/call.c". Here, the call to "foo" has all
possible versions of "foo" as candidates. If the caller has target
attributes and if it matches any of the function version's target
attributes, then a direct call is made to that function version.

For example:

int baz __attribute__ ((target ("avx,popcnt")))
{
  foo ();
}

it baz calls foo which is multi-versioned, then the call to foo here
will become a direct call to the version of foo targeted to avx,popcnt.

When a direct call to a version cannot be made then, the default
version of "foo" is the winning candidate. But, "build_over_call" realizes
that this is a versioned function and replaces the call-site of foo with a
"ifunc" call for foo, by querying a function in "multiversion.c" which
builds the ifunc decl. After this, all call-sites of "foo" contain the
call to the ifunc.

* Creating the dispatcher:

The dispatcher is independently created in a new pass, called
"pass_dispatch_version", that runs immediately after cfg and cgraph are
created. The dispatcher looks at all possible versions and queries the
target to give it the CPU detection predicates it must use to dispatch
each version. Then, the dispatcher body is created and the ifunc is
mapped to use this dispatcher.

Notice that only the dispatcher creation is done after the front-end.
Everything else occurs in the front-end itself. I could have created
the dispatcher also in the front-end. I did not do so because I
thought keeping it as a separate pass made sense to easily add more
dispatch mechanisms. Like when IFUNC is not available, replace it with
 control-flow to make direct calls to the function versions. Also,
making the dispatcher after cfg is created was easy.


	* doc/tm.texi.in: Add description for TARGET_DISPATCH_VERSION.
	* doc/tm.texi: Regenerate.
	* target.def (dispatch_version): New target hook.
	* tree.h (DECL_FUNCTION_VERSIONED): New macro.
	(tree_function_decl): New bit-field versioned_function.
	* tree-pass.h (pass_dispatch_versions): New pass.
	* multiversion.c: New file.
	* multiversion.h: New file.
	* cgraphunit.c:
	(cgraph_finalize_function): Force output of versioned inline
	functions.
	* cp/class.c: Include multiversion.h
	(add_method): aggregate function versions. Change assembler names of
	versioned functions.
	(resolve_address_of_overloaded_function): Match address of function
	version with default function.  Return address of ifunc dispatcher
	for address of versioned functions.
	(cxx_comdat_group): Use decl names for comdat groups of versioned
	functions.
	* cp/decl.c (decls_match): Make decls unmatched for versioned
	functions.
	(duplicate_decls): Remove ambiguity for versioned functions. Notify
	of deleted function version decls.
	(start_decl): Change assembler name of versioned functions.
	(start_function): Change assembler name of versioned functions.
	(cxx_comdat_group): Make comdat group of versioned functions be the
	same.
	* cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
	functions that are also marked inline.
	* cp/decl2.c: Include multiversion.h
	(check_classfn): Check attributes of versioned functions for match.
	* cp/call.c: Include multiversion.h
	(build_over_call): Make calls to multiversioned functions to call the
	dispatcher.
	(joust): For calls to multi-versioned functions, make the default
	function win.
	* timevar.def (TV_MULTIVERSION_DISPATCH): New time var.
	* varasm.c (finish_aliases_1): Check if the alias points to a function
	with a body before giving an error.
	* Makefile.in: Add multiversion.o
	* passes.c: Add pass_dispatch_versions to the pass list.
	* config/i386/i386.c (add_condition_to_bb): New function.
	(get_builtin_code_for_version): New function.
	(ix86_dispatch_version): New function.
	(TARGET_DISPATCH_VERSION): New macro.
	* testsuite/g++.dg/mv1.C: New test.

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 186883)
+++ gcc/doc/tm.texi	(working copy)
@@ -10997,6 +10997,14 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb})
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 186883)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -10877,6 +10877,14 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@hook TARGET_DISPATCH_VERSION
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 186883)
+++ gcc/target.def	(working copy)
@@ -1249,6 +1249,15 @@ DEFHOOK
  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
  hook_tree_tree_int_treep_bool_null)
 
+/* Target hook to generate the dispatching code for calls to multi-versioned
+   functions.  DISPATCH_DECL is the function that will have the dispatching
+   logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the
+   basic bloc in DISPATCH_DECL which will contain the code.  */
+DEFHOOK
+(dispatch_version,
+ "",
+ int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL)
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 DEFHOOK
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 186883)
+++ gcc/tree.h	(working copy)
@@ -3539,6 +3539,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
 #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
 
+/* In FUNCTION_DECL, this is set if this function has other versions generated
+   using "target" attributes.  The default version is the one which does not
+   have any "target" attribute set. */
+#define DECL_FUNCTION_VERSIONED(NODE)\
+   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
+
 /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
    arguments/result/saved_tree fields by front ends.   It was either inherit
    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
@@ -3583,8 +3589,8 @@ struct GTY(()) tree_function_decl {
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
-
-  /* 1 bit left */
+  unsigned versioned_function : 1;
+  /* No bits left.  */
 };
 
 /* The source language of the translation-unit.  */
Index: gcc/tree-pass.h
===================================================================
--- gcc/tree-pass.h	(revision 186883)
+++ gcc/tree-pass.h	(working copy)
@@ -453,6 +453,7 @@ extern struct gimple_opt_pass pass_tm_memopt;
 extern struct gimple_opt_pass pass_tm_edges;
 extern struct gimple_opt_pass pass_split_functions;
 extern struct gimple_opt_pass pass_feedback_split_functions;
+extern struct gimple_opt_pass pass_dispatch_versions;
 
 /* IPA Passes */
 extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
Index: gcc/multiversion.c
===================================================================
--- gcc/multiversion.c	(revision 0)
+++ gcc/multiversion.c	(revision 0)
@@ -0,0 +1,832 @@
+/* Function Multiversioning.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+/* Holds the state for multi-versioned functions here. The front-end
+   updates the state as and when function versions are encountered.
+   This is then used to generate the dispatch code.  Also, the
+   optimization passes to clone hot paths involving versioned functions
+   will be done here.
+
+   Function versions are created by using the same function signature but
+   also tagging attribute "target" to specify the platform type for which
+   the version must be executed.  Here is an example:
+
+   int foo ()
+   {
+     printf ("Execute as default");
+     return 0;
+   }
+
+   int  __attribute__ ((target ("arch=corei7")))
+   foo ()
+   {
+     printf ("Execute for corei7");
+     return 0;
+   }
+   
+   int main ()
+   {
+     return foo ();
+   } 
+
+   The call to foo in main is replaced with a call to an IFUNC function that
+   contains the dispatch code to call the correct function version at
+   run-time.  */
+
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "tree-inline.h"
+#include "langhooks.h"
+#include "flags.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "toplev.h"
+#include "timevar.h"
+#include "params.h"
+#include "fibheap.h"
+#include "intl.h"
+#include "tree-pass.h"
+#include "hashtab.h"
+#include "coverage.h"
+#include "ggc.h"
+#include "tree-flow.h"
+#include "rtl.h"
+#include "ipa-prop.h"
+#include "basic-block.h"
+#include "toplev.h"
+#include "dbgcnt.h"
+#include "tree-dump.h"
+#include "output.h"
+#include "vecprim.h"
+#include "gimple-pretty-print.h"
+#include "ipa-inline.h"
+#include "target.h"
+#include "multiversion.h"
+
+typedef void * void_p;
+
+DEF_VEC_P (void_p);
+DEF_VEC_ALLOC_P (void_p, heap);
+
+/* Each function decl that is a function version gets an instance of this
+   structure.   Since this is called by the front-end, decl merging can
+   happen, where a decl created for a new declaration is merged with 
+   the old. In this case, the new decl is deleted and the IS_DELETED
+   field is set for the struct instance corresponding to the new decl.
+   IFUNC_DECL is the decl of the ifunc function for default decls.
+   IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS
+   is a vector containing the list of function versions  that are
+   the candidates for dispatch.  */
+
+typedef struct version_function_d {
+  tree decl;
+  tree ifunc_decl;
+  tree ifunc_resolver_decl;
+  VEC (void_p, heap) *versions;
+  bool is_deleted;
+} version_function;
+
+/* Hashmap has an entry for every function decl that has other function
+   versions.  For function decls that are the default, it also stores the
+   list of all the other function versions.  Each entry is a structure
+   of type version_function_d.  */
+static htab_t decl_version_htab = NULL;
+
+/* Hashtable helpers for decl_version_htab. */
+
+static hashval_t
+decl_version_htab_hash_descriptor (const void *p)
+{
+  const version_function *t = (const version_function *) p;
+  return htab_hash_pointer (t->decl);
+}
+
+/* Hashtable helper for decl_version_htab. */
+
+static int
+decl_version_htab_eq_descriptor (const void *p1, const void *p2)
+{
+  const version_function *t1 = (const version_function *) p1;
+  return htab_eq_pointer ((const void_p) t1->decl, p2);
+}
+
+/* Create the decl_version_htab.  */
+static void
+create_decl_version_htab (void)
+{
+  if (decl_version_htab == NULL)
+    decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor,
+				     decl_version_htab_eq_descriptor, NULL);
+}
+
+/* Creates an instance of version_function for decl DECL.  */
+
+static version_function*
+new_version_function (const tree decl)
+{
+  version_function *v;
+  v = (version_function *)xmalloc(sizeof (version_function));
+  v->decl = decl;
+  v->ifunc_decl = NULL;
+  v->ifunc_resolver_decl = NULL;
+  v->versions = NULL;
+  v->is_deleted = false;
+  return v;
+}
+
+/* Comparator function to be used in qsort routine to sort attribute
+   specification strings to "target".  */
+
+static int
+attr_strcmp (const void *v1, const void *v2)
+{
+  const char *c1 = *(char *const*)v1;
+  const char *c2 = *(char *const*)v2;
+  return strcmp (c1, c2);
+}
+
+/* STR is the argument to target attribute.  This function tokenizes
+   the comma separated arguments, sorts them and returns a string which
+   is a unique identifier for the comma separated arguments.  */
+
+static char *
+sorted_attr_string (const char *str)
+{
+  char **args = NULL;
+  char *attr_str, *ret_str;
+  char *attr = NULL;
+  unsigned int argnum = 1;
+  unsigned int i;
+
+  for (i = 0; i < strlen (str); i++)
+    if (str[i] == ',')
+      argnum++;
+
+  attr_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (attr_str, str);
+
+  for (i = 0; i < strlen (attr_str); i++)
+    if (attr_str[i] == '=')
+      attr_str[i] = '_';
+
+  if (argnum == 1)
+    return attr_str;
+
+  args = (char **)xmalloc (argnum * sizeof (char *));
+
+  i = 0;
+  attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      args[i] = attr;
+      i++;
+      attr = strtok (NULL, ",");
+    }
+
+  qsort (args, argnum, sizeof (char*), attr_strcmp);
+
+  ret_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (ret_str, args[0]);
+  for (i = 1; i < argnum; i++)
+    {
+      strcat (ret_str, "_");
+      strcat (ret_str, args[i]);
+    }
+
+  free (args);
+  free (attr_str);
+  return ret_str;
+}
+
+/* Returns true when only one of DECL1 and DECL2 is marked with "target"
+   or if the "target" attribute strings of DECL1 and DECL2 dont match.  */
+
+bool
+has_different_version_attributes (const tree decl1, const tree decl2)
+{
+  tree attr1, attr2;
+  char *c1, *c2;
+  bool ret = false;
+
+  if (TREE_CODE (decl1) != FUNCTION_DECL
+      || TREE_CODE (decl2) != FUNCTION_DECL)
+    return false;
+
+  attr1 = lookup_attribute ("target", DECL_ATTRIBUTES (decl1));
+  attr2 = lookup_attribute ("target", DECL_ATTRIBUTES (decl2));
+
+  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
+    return false;
+
+  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
+      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
+    return true;
+
+  c1 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1))));
+  c2 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2))));
+
+  if (strcmp (c1, c2) != 0)
+     ret = true;
+
+  free (c1);
+  free (c2);
+
+  return ret;
+}
+
+/* If this decl corresponds to a function and has "target" attribute,
+   append the attribute string to its assembler name.  */
+
+static void
+version_assembler_name (const tree decl)
+{
+  tree version_attr;
+  const char *orig_name, *version_string, *attr_str;
+  char *assembler_name;
+  tree assembler_name_tree;
+  
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+    return;
+
+  if (DECL_DECLARED_INLINE_P (decl)
+      &&lookup_attribute ("gnu_inline",
+			  DECL_ATTRIBUTES (decl)))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Function versions cannot be marked as gnu_inline,"
+	      " bodies have to be generated\n");
+
+  if (DECL_VIRTUAL_P (decl)
+      || DECL_VINDEX (decl))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Virtual function versioning not supported\n");
+
+  version_attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  /* target attribute string is NULL for default functions.  */
+  if (version_attr == NULL_TREE)
+    return;
+
+  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+  version_string
+    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
+
+  attr_str = sorted_attr_string (version_string);
+  assembler_name = (char *) xmalloc (strlen (orig_name)
+				     + strlen (attr_str) + 2);
+
+  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
+  if (dump_file)
+    fprintf (dump_file, "Assembler name set to %s for function version %s\n",
+	     assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
+
+  assembler_name_tree = get_identifier (assembler_name);
+
+  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree);
+  SET_DECL_RTL (decl, NULL);
+}
+
+void
+mark_function_as_version (const tree decl)
+{
+  if (DECL_FUNCTION_VERSIONED (decl))
+    return;
+  DECL_FUNCTION_VERSIONED (decl) = 1;
+  version_assembler_name (decl);
+}
+
+/* Returns true if function DECL has target attribute set.  This could be
+   a version.  */
+
+bool
+is_target_attribute_set (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && (lookup_attribute ("target", DECL_ATTRIBUTES (decl))
+	      != NULL_TREE));
+}
+
+/* Returns true if decl is multi-versioned and DECL is the default function,
+   that is it is not tagged with "target" attribute.  */
+
+bool
+is_default_function (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && (lookup_attribute ("target", DECL_ATTRIBUTES (decl))
+	      == NULL_TREE));	
+}
+
+/* For function decl DECL, find the version_function struct in the
+   decl_version_htab.  */
+
+static version_function *
+find_function_version (const tree decl)
+{
+  void *slot;
+
+  if (!DECL_FUNCTION_VERSIONED (decl))
+    return NULL;
+
+  if (!decl_version_htab)
+    return NULL;
+
+  slot = htab_find_with_hash (decl_version_htab, decl,
+                              htab_hash_pointer (decl));
+
+  if (slot != NULL)
+    return (version_function *)slot;
+
+  return NULL;
+}
+
+/* Record DECL as a function version by creating a version_function struct
+   for it and storing it in the hashtable.  */
+
+static version_function *
+add_function_version (const tree decl)
+{
+  void **slot;
+  version_function *v;
+
+  if (!DECL_FUNCTION_VERSIONED (decl))
+    return NULL;
+
+  create_decl_version_htab ();
+
+  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl,
+                                   htab_hash_pointer ((const void_p)decl),
+				   INSERT);
+
+  if (*slot != NULL)
+    return (version_function *)*slot;
+
+  v = new_version_function (decl);
+  *slot = v;
+
+  return v;
+}
+
+/* Push V into VEC only if it is not already present.  If already present
+   returns false.  */
+
+static bool
+push_function_version (version_function *v, VEC (void_p, heap) **vec)
+{
+  int ix;
+  void_p ele; 
+  for (ix = 0; VEC_iterate (void_p, *vec, ix, ele); ++ix)
+    {
+      if (ele == (void_p)v)
+        return false;
+    }
+
+  VEC_safe_push (void_p, heap, *vec, (void*)v);
+  return true;
+}
+ 
+/* Mark DECL as deleted.  This is called by the front-end when a duplicate
+   decl is merged with the original decl and the duplicate decl is deleted.
+   This function marks the duplicate_decl as invalid.  Called by
+   duplicate_decls in cp/decl.c.  */
+
+void
+mark_delete_decl_version (const tree decl)
+{
+  version_function *decl_v;
+
+  decl_v = find_function_version (decl);
+
+  if (decl_v == NULL)
+    return;
+
+  decl_v->is_deleted = true;
+
+  if (is_default_function (decl)
+      && decl_v->versions != NULL)
+    {
+      VEC_truncate (void_p, decl_v->versions, 0);
+      VEC_free (void_p, heap, decl_v->versions);
+      decl_v->versions = NULL;
+    }
+}
+
+/* Mark DECL1 and DECL2 to be function versions in the same group.  One
+   of DECL1 and DECL2 must be the default, otherwise this function does
+   nothing.  This function aggregates the versions.  */
+
+int
+group_function_versions (const tree decl1, const tree decl2)
+{
+  tree default_decl, version_decl;
+  version_function *default_v, *version_v;
+
+  gcc_assert (DECL_FUNCTION_VERSIONED (decl1)
+	      && DECL_FUNCTION_VERSIONED (decl2));
+
+  /* The version decls are added only to the default decl.  */
+  if (!is_default_function (decl1)
+      && !is_default_function (decl2))
+    return 0;
+
+  /* This can happen with duplicate declarations.  Just ignore.  */
+  if (is_default_function (decl1)
+      && is_default_function (decl2))
+    return 0;
+
+  default_decl = (is_default_function (decl1)) ? decl1 : decl2;
+  version_decl = (default_decl == decl1) ? decl2 : decl1;
+
+  gcc_assert (default_decl != version_decl);
+  create_decl_version_htab ();
+
+  /* If the version function is found, it has been added.  */
+  if (find_function_version (version_decl))
+    return 0;
+
+  default_v = add_function_version (default_decl);
+  version_v = add_function_version (version_decl);
+
+  if (default_v->versions == NULL)
+    default_v->versions = VEC_alloc (void_p, heap, 1);
+
+  push_function_version (version_v, &default_v->versions);
+  return 0;
+}
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* Return a new name by appending SUFFIX to the DECL name.  If
+   make_unique is true, append the full path name.  */
+
+static char *
+make_name (tree decl, const char *suffix, bool make_unique)
+{
+  char *global_var_name;
+  int name_len;
+  const char *name;
+  const char *unique_name = NULL;
+
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* Get a unique name that can be used globally without any chances
+     of collision at link time.  */
+  if (make_unique)
+    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
+
+  name_len = strlen (name) + strlen (suffix) + 2;
+
+  if (make_unique)
+    name_len += strlen (unique_name) + 1;
+  global_var_name = (char *) xmalloc (name_len);
+
+  /* Use '.' to concatenate names as it is demangler friendly.  */
+  if (make_unique)
+      snprintf (global_var_name, name_len, "%s.%s.%s", name,
+		unique_name, suffix);
+  else
+      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
+
+  return global_var_name;
+}
+
+/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch
+   the versions of multi-versioned function DEFAULT_DECL.  Create and
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_ifunc_resolver_func (const tree default_decl,
+			  const tree ifunc_decl,
+			  basic_block *empty_bb)
+{
+  char *resolver_name;
+  tree decl, type, decl_name, t;
+  basic_block new_bb;
+  tree old_current_function_decl;
+  bool make_unique = false;
+
+  /* IFUNC's have to be globally visible.  So, if the default_decl is
+     not, then the name of the IFUNC should be made unique.  */
+  if (TREE_PUBLIC (default_decl) == 0)
+    make_unique = true;
+
+  /* Append the filename to the resolver function if the versions are
+     not externally visible.  This is because the resolver function has
+     to be externally visible for the loader to find it.  So, appending
+     the filename will prevent conflicts with a resolver function from
+     another module which is based on the same version name.  */
+  resolver_name = make_name (default_decl, "resolver", make_unique);
+
+  /* The resolver function should return a (void *). */
+  type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  decl_name = get_identifier (resolver_name);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = TREE_USED (default_decl);
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  /* IFUNC resolvers have to be externally visible.  */
+  TREE_PUBLIC (decl) = 1;
+  DECL_UNINLINABLE (decl) = 1;
+
+  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl);
+  DECL_EXTERNAL (ifunc_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+  TREE_READONLY (decl) = 0;
+  DECL_PURE_P (decl) = 0;
+  DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
+    }
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  current_function_decl = decl;
+  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
+  cfun->curr_properties |=
+    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
+     PROP_ssa);
+  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
+  *empty_bb = new_bb;
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
+  cgraph_mark_force_output_node (cgraph_get_create_node (decl));
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      gcc_assert (cgraph_get_node (default_decl));
+      cgraph_add_to_same_comdat_group (cgraph_get_node (decl),
+				       cgraph_get_node (default_decl));
+    }
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  gcc_assert (ifunc_decl != NULL);
+  /* Mark ifunc_decl as "ifunc" with resolver as resolver_name.  */
+  DECL_ATTRIBUTES (ifunc_decl) 
+    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl));
+
+  /* Create the alias here.  */
+  cgraph_create_function_alias (ifunc_decl, decl);
+  return decl;
+}
+
+/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to
+   DECL function will be replaced with calls to the ifunc.   Return the decl
+   of the ifunc created.  */
+
+static tree
+make_ifunc_func (const tree decl)
+{
+  tree ifunc_decl;
+  char *ifunc_name, *resolver_name;
+  tree fn_type, ifunc_type;
+  bool make_unique = false;
+
+  if (TREE_PUBLIC (decl) == 0)
+    make_unique = true;
+
+  ifunc_name = make_name (decl, "ifunc", make_unique);
+  resolver_name = make_name (decl, "resolver", make_unique);
+  gcc_assert (resolver_name);
+
+  fn_type = TREE_TYPE (decl);
+  ifunc_type = build_function_type (TREE_TYPE (fn_type),
+				    TYPE_ARG_TYPES (fn_type));
+  
+  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type);
+  TREE_USED (ifunc_decl) = 1;
+  DECL_CONTEXT (ifunc_decl) = NULL_TREE;
+  DECL_INITIAL (ifunc_decl) = error_mark_node;
+  DECL_ARTIFICIAL (ifunc_decl) = 1;
+  /* Mark this ifunc as external, the resolver will flip it again if
+     it gets generated.  */
+  DECL_EXTERNAL (ifunc_decl) = 1;
+  /* IFUNCs have to be externally visible.  */
+  TREE_PUBLIC (ifunc_decl) = 1;
+
+  return ifunc_decl;  
+}
+
+/* For multi-versioned function decl, which should also be the default,
+   return the decl of the ifunc resolver, create it if it does not
+   exist.  */
+
+tree
+get_ifunc_for_version (const tree decl)
+{
+  version_function *decl_v;
+  int ix;
+  void_p ele;
+
+  /* DECL has to be the default version, otherwise it is missing and
+     that is not allowed.  */
+  if (!is_default_function (decl))
+    {
+      error_at (DECL_SOURCE_LOCATION (decl), "Default version not found");
+      return decl;
+    }
+
+  decl_v = find_function_version (decl);
+  gcc_assert (decl_v != NULL);
+  if (decl_v->ifunc_decl == NULL)
+    {
+      tree ifunc_decl;
+      ifunc_decl = make_ifunc_func (decl);
+      decl_v->ifunc_decl = ifunc_decl;
+    }
+
+  if (cgraph_get_node (decl))
+    cgraph_mark_force_output_node (cgraph_get_node (decl));
+
+  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
+    {
+      version_function *v = (version_function *) ele;
+      /* This could be a deleted version.  Happens with
+	 duplicate declarations. */
+      if (v->is_deleted)
+	continue;
+      gcc_assert (v->decl != NULL);
+      if (cgraph_get_node (v->decl))
+	cgraph_mark_force_output_node (cgraph_get_node (v->decl));
+    }
+
+  return decl_v->ifunc_decl;
+}
+
+/* Generate the dispatching code to dispatch multi-versioned function
+   DECL.  Make a new function decl for dispatching and call the target
+   hook to process the "target" attributes and provide the code to
+   dispatch the right function at run-time.  */
+
+static tree
+make_ifunc_resolver_for_version (const tree decl)
+{
+  version_function *decl_v;
+  tree ifunc_resolver_decl, ifunc_decl;
+  basic_block empty_bb;
+  int ix;
+  void_p ele;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  tree old_current_function_decl;
+
+  gcc_assert (is_default_function (decl));
+
+  decl_v = find_function_version (decl);
+  gcc_assert (decl_v != NULL);
+
+  if (decl_v->ifunc_resolver_decl != NULL)
+    return decl_v->ifunc_resolver_decl;
+
+  ifunc_decl = decl_v->ifunc_decl;
+
+  if (ifunc_decl == NULL)
+    ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl);
+
+  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl,
+						  &empty_bb);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (ifunc_resolver_decl));
+  current_function_decl = ifunc_resolver_decl;
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+  VEC_safe_push (tree, heap, fn_ver_vec, decl);
+
+  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
+    {
+      version_function *v = (version_function *) ele;
+      gcc_assert (v->decl != NULL);
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (v->decl))
+        error_at (DECL_SOURCE_LOCATION (v->decl),
+		  "Virtual function versioning not supported\n");
+      if (!v->is_deleted)
+	VEC_safe_push (tree, heap, fn_ver_vec, v->decl);
+    }
+
+  gcc_assert (targetm.dispatch_version);
+  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb);
+  decl_v->ifunc_resolver_decl = ifunc_resolver_decl;
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+  return ifunc_resolver_decl;
+}
+
+/* Main entry point to pass_dispatch_versions. For multi-versioned functions,
+   generate the dispatching code.  */
+
+static unsigned int
+do_dispatch_versions (void)
+{
+  /* A new pass for generating dispatch code for multi-versioned functions.
+     Other forms of dispatch can be added when ifunc support is not available
+     like just calling the function directly after checking for target type.
+     Currently, dispatching is done through IFUNC.  This pass will become
+     more meaningful when other dispatch mechanisms are added.  */
+
+  /* Cloning a function to produce more versions will happen here when the
+     user requests that via the target attribute. For example,
+     int foo () __attribute__ ((target(("arch=core2"), ("arch=corei7"))));
+     means that the user wants the same body of foo to be versioned for core2
+     and corei7.  In that case, this function will be cloned during this
+     pass.  */
+  
+  if (DECL_FUNCTION_VERSIONED (current_function_decl)
+      && is_default_function (current_function_decl))
+    {
+      tree decl = make_ifunc_resolver_for_version (current_function_decl);
+      if (dump_file && decl)
+	dump_function_to_file (decl, dump_file, TDF_BLOCKS);
+    }
+  return 0;
+}
+
+static  bool
+gate_dispatch_versions (void)
+{
+  return true;
+}
+
+/* A pass to generate the dispatch code to execute the appropriate version
+   of a multi-versioned function at run-time.  */
+
+struct gimple_opt_pass pass_dispatch_versions =
+{
+ {
+  GIMPLE_PASS,
+  "dispatch_multiversion_functions",    /* name */
+  gate_dispatch_versions,		/* gate */
+  do_dispatch_versions,			/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_MULTIVERSION_DISPATCH,		/* tv_id */
+  PROP_cfg,				/* properties_required */
+  PROP_cfg,				/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  0					/* todo_flags_finish */
+ }
+};
Index: gcc/multiversion.h
===================================================================
--- gcc/multiversion.h	(revision 0)
+++ gcc/multiversion.h	(revision 0)
@@ -0,0 +1,55 @@
+/* Function Multiversioning.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+/* This is the header file which provides the functions to keep track
+   of functions that are multi-versioned and to generate the dispatch
+   code to call the right version at run-time.  */
+
+#ifndef GCC_MULTIVERSION_H
+#define GCC_MULTIVERION_H
+
+#include "tree.h"
+
+/* Mark DECL1 and DECL2 as function versions.  */
+int group_function_versions (const tree decl1, const tree decl2);
+
+/* Mark DECL as deleted and no longer a version.  */
+void mark_delete_decl_version (const tree decl);
+
+/* Returns true if DECL is the default version to be executed if all
+   other versions are inappropriate at run-time.  */
+bool is_default_function (const tree decl);
+
+/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL
+   must be the default function in the multi-versioned group.  */
+tree get_ifunc_for_version (const tree decl);
+
+/* Returns true when only one of DECL1 and DECL2 is marked with "target"
+   or if the "target" attribute strings of  DECL1 and DECL2 dont match.  */
+bool has_different_version_attributes (const tree decl1, const tree decl2);
+
+/* Function DECL is marked to be a multi-versioned function.  If DECL is
+   not the default version, the assembler name of DECL is changed to include
+   the attribute string to keep the name unambiguous.  */
+void mark_function_as_version (const tree decl);
+
+/* Check if decl is FUNCTION_DECL with target attribute set.  */
+bool is_target_attribute_set (const tree decl);
+#endif
Index: gcc/cgraphunit.c
===================================================================
--- gcc/cgraphunit.c	(revision 186883)
+++ gcc/cgraphunit.c	(working copy)
@@ -411,6 +411,13 @@ cgraph_finalize_function (tree decl, bool nested)
       && !DECL_DISREGARD_INLINE_LIMITS (decl))
     node->symbol.force_output = 1;
 
+  /* With function versions, keep inline functions and do not worry about
+     inline limits.  */
+  if (DECL_FUNCTION_VERSIONED (decl)
+      && DECL_DECLARED_INLINE_P (decl)
+      && !DECL_EXTERNAL (decl))
+    node->symbol.force_output = 1;
+
   /* When not optimizing, also output the static functions. (see
      PR24561), but don't do so for always_inline functions, functions
      declared inline and nested functions.  These were optimized out
Index: gcc/testsuite/g++.dg/mv1.C
===================================================================
--- gcc/testsuite/g++.dg/mv1.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv1.C	(revision 0)
@@ -0,0 +1,24 @@
+/* Simple test case to check if Multiversioning works.  */
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-dispatch_multiversion_functions" } */
+
+int foo ();
+int foo () __attribute__ ((target("arch=corei7")));
+
+int main ()
+{
+  int (*p)() = &foo;
+  return foo () + (*p)();
+}
+
+int foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=corei7")))
+foo ()
+{
+  return 0;
+}
+
Index: gcc/cp/class.c
===================================================================
--- gcc/cp/class.c	(revision 186883)
+++ gcc/cp/class.c	(working copy)
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-dump.h"
 #include "splay-tree.h"
 #include "pointer-set.h"
+#include "multiversion.h"
 
 /* The number of nested classes being processed.  If we are not in the
    scope of any class, this is zero.  */
@@ -1092,7 +1093,21 @@ add_method (tree type, tree method, tree using_dec
 	      || same_type_p (TREE_TYPE (fn_type),
 			      TREE_TYPE (method_type))))
 	{
-	  if (using_decl)
+	  /* For function versions, their parms and types match
+	     but they are not duplicates.  Record function versions
+	     as and when they are found.  */
+	  if (TREE_CODE (fn) == FUNCTION_DECL
+	      && TREE_CODE (method) == FUNCTION_DECL
+	      && (is_target_attribute_set (fn)
+		  || is_target_attribute_set (method))
+	      && has_different_version_attributes (fn, method))
+ 	    {
+	      mark_function_as_version (fn);
+	      mark_function_as_version (method);
+	      group_function_versions (fn, method);
+	      continue;
+	    }
+	  else if (using_decl)
 	    {
 	      if (DECL_CONTEXT (fn) == type)
 		/* Defer to the local function.  */
@@ -1150,6 +1165,7 @@ add_method (tree type, tree method, tree using_dec
   else
     /* Replace the current slot.  */
     VEC_replace (tree, method_vec, slot, overload);
+
   return true;
 }
 
@@ -6930,8 +6946,11 @@ resolve_address_of_overloaded_function (tree targe
 	  if (DECL_ANTICIPATED (fn))
 	    continue;
 
-	  /* See if there's a match.  */
-	  if (same_type_p (target_fn_type, static_fn_type (fn)))
+	  /* See if there's a match.   For functions that are multi-versioned
+	     match it to the default function.  */
+	  if (same_type_p (target_fn_type, static_fn_type (fn))
+	      && (!DECL_FUNCTION_VERSIONED (fn)
+		  || is_default_function (fn)))
 	    matches = tree_cons (fn, NULL_TREE, matches);
 	}
     }
@@ -7093,6 +7112,21 @@ resolve_address_of_overloaded_function (tree targe
       perform_or_defer_access_check (access_path, fn, fn);
     }
 
+  /* If a pointer to a function that is multi-versioned is requested, the
+     pointer to the dispatcher function is returned instead.  This works
+     well because indirectly calling the function will dispatch the right
+     function version at run-time. Also, the function address is kept
+     unique.  */
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && is_default_function (fn))
+    {
+      tree ifunc_decl;
+      ifunc_decl = get_ifunc_for_version (fn);
+      gcc_assert (ifunc_decl != NULL);
+      mark_used (fn);
+      return build_fold_addr_expr (ifunc_decl);
+    }
+
   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
     return cp_build_addr_expr (fn, flags);
   else
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	(revision 186883)
+++ gcc/cp/decl.c	(working copy)
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "pointer-set.h"
 #include "splay-tree.h"
 #include "plugin.h"
+#include "multiversion.h"
 
 /* Possible cases of bad specifiers type used by bad_specifiers. */
 enum bad_spec_place {
@@ -973,6 +974,21 @@ decls_match (tree newdecl, tree olddecl)
       if (t1 != t2)
 	return 0;
 
+      /* The decls dont match if they correspond to two different versions
+	 of the same function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) 
+	  && has_different_version_attributes (newdecl, olddecl))
+	{
+	  /* One of the decls could be the default without the "target"
+	     attribute. Set it to be a versioned function here.  */
+	  mark_function_as_version (newdecl);
+	  mark_function_as_version (olddecl);
+	  /* Accumulate all the versions of a function.  */
+	  group_function_versions (olddecl, newdecl);
+	  return 0;
+	}
+
       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
 	  && ! (DECL_EXTERN_C_P (newdecl)
 		&& DECL_EXTERN_C_P (olddecl)))
@@ -1490,7 +1506,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
 	      error ("previous declaration %q+#D here", olddecl);
 	      return NULL_TREE;
 	    }
-	  else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
+	  /* For function versions, params and types match, but they
+	     are not ambiguous.  */
+	  else if ((!DECL_FUNCTION_VERSIONED (newdecl)
+		    && !DECL_FUNCTION_VERSIONED (olddecl))
+		   && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
 			      TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
 	    {
 	      error ("new declaration %q#D", newdecl);
@@ -2262,6 +2282,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
+  /* If the olddecl is a version, so is the newdecl.  */
+  if (TREE_CODE (newdecl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (olddecl))
+    {
+      DECL_FUNCTION_VERSIONED (newdecl) = 1;
+      /* Record that newdecl is not a valid version and has
+	 been deleted.  */
+      mark_delete_decl_version (newdecl);
+    }
+
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
     {
       int function_size;
@@ -14035,7 +14065,11 @@ cxx_comdat_group (tree decl)
 	  else
 	    break;
 	}
-      name = DECL_ASSEMBLER_NAME (decl);
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl))
+	name = DECL_NAME (decl);
+      else
+        name = DECL_ASSEMBLER_NAME (decl);
     }
 
   return name;
Index: gcc/cp/semantics.c
===================================================================
--- gcc/cp/semantics.c	(revision 186883)
+++ gcc/cp/semantics.c	(working copy)
@@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn)
       /* If the user wants us to keep all inline functions, then mark
 	 this function as needed so that finish_file will make sure to
 	 output it later.  Similarly, all dllexport'd functions must
-	 be emitted; there may be callers in other DLLs.  */
-      if ((flag_keep_inline_functions
+	 be emitted; there may be callers in other DLLs.
+	 Also, mark this function as needed if it is marked inline but
+	 is a multi-versioned function.  */
+      if (((flag_keep_inline_functions
+	    || DECL_FUNCTION_VERSIONED (fn))
 	   && DECL_DECLARED_INLINE_P (fn)
 	   && !DECL_REALLY_EXTERN (fn))
 	  || (flag_keep_inline_dllexport
Index: gcc/cp/decl2.c
===================================================================
--- gcc/cp/decl2.c	(revision 186883)
+++ gcc/cp/decl2.c	(working copy)
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "splay-tree.h"
 #include "langhooks.h"
 #include "c-family/c-ada-spec.h"
+#include "multiversion.h"
 
 extern cpp_reader *parse_in;
 
@@ -677,9 +678,13 @@ check_classfn (tree ctype, tree function, tree tem
 	  if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
 	    continue;
 
+	  /* While finding a match, same types and params are not enough
+	     if the function is versioned.  Also check version ("target")
+	     attributes.  */
 	  if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
 			   TREE_TYPE (TREE_TYPE (fndecl)))
 	      && compparms (p1, p2)
+	      && !has_different_version_attributes (function, fndecl)
 	      && (!is_template
 		  || comp_template_parms (template_parms,
 					  DECL_TEMPLATE_PARMS (fndecl)))
Index: gcc/cp/call.c
===================================================================
--- gcc/cp/call.c	(revision 186883)
+++ gcc/cp/call.c	(working copy)
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "timevar.h"
+#include "multiversion.h"
 
 /* The various kinds of conversion.  */
 
@@ -3903,6 +3904,16 @@ build_new_function_call (tree fn, VEC(tree,gc) **a
     {
       if (complain & tf_error)
 	{
+	  /* If the call is to a multiversioned function without
+	     a default version, overload resolution will fail.  */
+	  if (candidates
+	      && TREE_CODE (candidates->fn) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (candidates->fn))
+	    error_at (location_of (DECL_NAME (OVL_CURRENT (fn))),
+		      "Call to multiversioned function %<%D(%A)%> with"
+		      " no default version", DECL_NAME (OVL_CURRENT (fn)),
+		      build_tree_list_vec (*args));
+
 	  if (!any_viable_p && candidates && ! candidates->next
 	      && (TREE_CODE (candidates->fn) == FUNCTION_DECL))
 	    return cp_build_function_call_vec (candidates->fn, args, complain);
@@ -6809,6 +6820,18 @@ build_over_call (struct z_candidate *cand, int fla
   if (!already_used)
     mark_used (fn);
 
+  /* For a call to a multi-versioned function, the call should actually be to
+     the dispatcher.  */
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && is_default_function (fn))
+    {
+      tree ifunc_decl;
+      ifunc_decl = get_ifunc_for_version (fn);
+      gcc_assert (ifunc_decl != NULL);
+      return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl,
+					nargs, argarray);
+    }
+
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
     {
       tree t;
@@ -8067,6 +8090,60 @@ joust (struct z_candidate *cand1, struct z_candida
   size_t i;
   size_t len;
 
+  /* For Candidates of a multi-versioned function, first check if the
+     target flags of the caller match any of the candidates. If so,
+     the caller can directly call this candidate otherwise the one marked
+     default wins.  This is because the default decl is used as key to
+     aggregate all the other versions provided for it in multiversion.c.
+     When generating the actual call, the appropriate dispatcher is created
+     to call the right function version at run-time.  */
+
+  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL
+       && DECL_FUNCTION_VERSIONED (cand1->fn))
+      ||(TREE_CODE (cand2->fn) == FUNCTION_DECL
+	 && DECL_FUNCTION_VERSIONED (cand2->fn)))
+    {
+      /* Both functions must be marked versioned.  */
+      gcc_assert (DECL_FUNCTION_VERSIONED (cand1->fn)
+		  && DECL_FUNCTION_VERSIONED (cand2->fn));
+
+      /* Try to see if a direct call can be made to a version.  This is
+	 possible if the caller and callee have the same target flags.
+	 If cand->fn is marked with target attributes,  check if the
+	 target approves inlining this into the caller.  If so, this is
+	 the version we want.  */
+
+      if (is_target_attribute_set (cand1->fn)
+	  && targetm.target_option.can_inline_p (current_function_decl,
+						 cand1->fn))
+	return 1;
+
+      if (is_target_attribute_set (cand2->fn)
+	  && targetm.target_option.can_inline_p (current_function_decl,
+						 cand2->fn))
+	return -1;
+
+      /* A direct call to a version is not possible, so find the default
+	 function and return it.  This will later be converted to dispatch
+	 the right version at run time.  */
+
+      if (is_default_function (cand1->fn))
+	{
+          mark_used (cand2->fn);
+	  return 1;
+	}
+
+      if (is_default_function (cand2->fn))
+	{
+          mark_used (cand1->fn);
+	  return -1;
+	}
+
+      /* If a default function is absent, this will never get resolved leading
+	 to an ambiguous call error.  */
+      return 0;
+    }
+
   /* Candidates that involve bad conversions are always worse than those
      that don't.  */
   if (cand1->viable > cand2->viable)
Index: gcc/timevar.def
===================================================================
--- gcc/timevar.def	(revision 186883)
+++ gcc/timevar.def	(working copy)
@@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE        , "tree if-co
 DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
 DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
 DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
+DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch")
 
 /* Everything else in rest_of_compilation not included above.  */
 DEFTIMEVAR (TV_EARLY_LOCAL	     , "early local passes")
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 186883)
+++ gcc/Makefile.in	(working copy)
@@ -1294,6 +1294,7 @@ OBJS = \
 	mcf.o \
 	mode-switching.o \
 	modulo-sched.o \
+	multiversion.o \
 	omega.o \
 	omp-low.o \
 	optabs.o \
@@ -3030,6 +3031,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h
    $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
    $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
    $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
+multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+   $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
+   $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
+   $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
+   $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h
 cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 186883)
+++ gcc/passes.c	(working copy)
@@ -1287,6 +1287,7 @@ init_optimization_passes (void)
   NEXT_PASS (pass_build_cfg);
   NEXT_PASS (pass_warn_function_return);
   NEXT_PASS (pass_build_cgraph_edges);
+  NEXT_PASS (pass_dispatch_versions);
   *p = NULL;
 
   /* Interprocedural optimization passes.  */
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 186883)
+++ gcc/config/i386/i386.c	(working copy)
@@ -27678,6 +27678,324 @@ ix86_init_mmx_sse_builtins (void)
     }
 }
 
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
+   to return a pointer to VERSION_DECL if the outcome of the expression
+   formed by PREDICATE_CHAIN is true.  This function will be called during
+   version dispatch to decide which function version to execute.  It returns
+   the basic block at the end to which more conditions can be added.  */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     tree predicate_chain, basic_block new_bb)
+{
+  gimple return_stmt;
+  tree convert_expr, result_var;
+  gimple convert_stmt;
+  gimple call_cond_stmt;
+  gimple if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+
+  tree cond_var, and_expr_var = NULL_TREE;
+  gimple_seq gseq;
+
+  tree old_current_function_decl;
+  tree predicate_decl, predicate_arg;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+  current_function_decl = function_decl;
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node, NULL);
+  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  return_stmt = gimple_build_return (result_var);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      current_function_decl = old_current_function_decl;
+      return new_bb;
+    }
+
+  while (predicate_chain != NULL)
+    {
+      cond_var = create_tmp_var (integer_type_node, NULL);
+      predicate_decl = TREE_PURPOSE (predicate_chain);
+      predicate_arg = TREE_VALUE (predicate_chain);
+      call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+      gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+      gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+      gimple_set_bb (call_cond_stmt, new_bb);
+      gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+      predicate_chain = TREE_CHAIN (predicate_chain);
+      
+      if (and_expr_var == NULL)
+        and_expr_var = cond_var;
+      else
+	{
+	  gimple assign_stmt;
+	  assign_stmt = gimple_build_assign_with_ops (BIT_AND_EXPR,
+						      and_expr_var,
+						      cond_var, and_expr_var);
+
+	  gimple_set_block (assign_stmt, DECL_INITIAL (function_decl));
+	  gimple_set_bb (assign_stmt, new_bb);
+	  gimple_seq_add_stmt (&gseq, assign_stmt);
+	}
+    }
+
+  if_else_stmt = gimple_build_cond (GT_EXPR, and_expr_var,
+	  		            integer_zero_node,
+				    NULL_TREE, NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR, 0);
+
+  rebuild_cgraph_edges ();
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  return bb3;
+}
+
+/* This parses the attribute arguments to target in DECL and determines
+   the right builtin to use to match the platform specification.
+   For now, only one target argument ("arch=" or "<-m>xxx") is allowed.  */
+
+static tree 
+get_builtin_code_for_version (tree decl)
+{
+  tree attrs;
+  struct cl_target_option cur_target;
+  tree target_node;
+  struct cl_target_option *new_target;
+  const char *arg_str = NULL;
+  const char *attrs_str = NULL;
+  char *tok_str = NULL;
+  char *token;
+  /* These are the target attribute strings for which a dispatcher is
+     available, from fold_builtin_cpu.  */
+  const char *feature_list[] = {"mmx", "popcnt", "sse", "sse2", "sse3",
+				"ssse3", "sse4.1", "sse4.2", "avx", "avx2"};
+  unsigned int NUM_FEATURES = sizeof (feature_list) / sizeof (const char *);
+  unsigned int i;
+  tree predicate_chain = NULL_TREE;
+  tree predicate_decl, predicate_arg;
+
+  attrs = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  gcc_assert (attrs != NULL);
+
+  attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+  gcc_assert (TREE_CODE (attrs) == STRING_CST);
+  attrs_str = TREE_STRING_POINTER (attrs);
+
+  /* Handle arch= if specified.  */
+  if (strstr (attrs_str, "arch=") != NULL)
+    {
+      cl_target_option_save (&cur_target, &global_options);
+      target_node = ix86_valid_target_attribute_tree (attrs);
+    
+      gcc_assert (target_node);
+      new_target = TREE_TARGET_OPTION (target_node);
+      gcc_assert (new_target);
+      
+      if (new_target->arch_specified && new_target->arch > 0)
+	{
+	  switch (new_target->arch)
+	    {
+	    case PROCESSOR_CORE2_32:
+	    case PROCESSOR_CORE2_64:
+	      arg_str = "core2";
+	      break;
+	    case PROCESSOR_COREI7_32:
+	    case PROCESSOR_COREI7_64:
+	      arg_str = "corei7";
+	      break;
+	    case PROCESSOR_ATOM:
+	      arg_str = "atom";
+	      break;
+	    case PROCESSOR_AMDFAM10:
+	      arg_str = "amdfam10h";
+	      break;
+	    case PROCESSOR_BDVER1:
+	      arg_str = "bdver1";
+	      break;
+	    case PROCESSOR_BDVER2:
+	      arg_str = "bdver2";
+	      break;
+	    }  
+	}    
+    
+      cl_target_option_restore (&global_options, &cur_target);
+      if (arg_str == NULL)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+	    	"No dispatcher found for the versioning attributes");
+	  return NULL;
+	}
+    
+      predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
+      /* For a C string literal the length includes the trailing NULL.  */
+      predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+      predicate_chain = tree_cons (predicate_decl, predicate_arg,
+				   predicate_chain);
+    }
+
+  /* Process feature name.  */
+  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
+  strcpy (tok_str, attrs_str);
+  token = strtok (tok_str, ",");
+  predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
+
+  while (token != NULL)
+    {
+      /* Do not process "arch="  */
+      if (strncmp (token, "arch=", 5) == 0)
+	{
+	  token = strtok (NULL, ",");
+	  continue;
+	}
+      for (i = 0; i < NUM_FEATURES; ++i)
+	{
+	  if (strcmp (token, feature_list[i]) == 0)
+	    {
+	      predicate_arg = build_string_literal (
+				strlen (feature_list[i]) + 1,
+				feature_list[i]);
+	      predicate_chain = tree_cons (predicate_decl, predicate_arg,
+					   predicate_chain);
+	      break;
+	    }
+	}
+      if (i == NUM_FEATURES)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+		    "No dispatcher found for %s", token);
+	  return NULL;
+	}
+      token = strtok (NULL, ",");
+    }
+  free (tok_str);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+	        "No dispatcher found for the versioning attributes : %s",
+	        attrs_str);
+      return NULL;
+    }
+
+  predicate_chain = nreverse (predicate_chain);
+  return predicate_chain; 
+} 
+
+/* This is the target hook to generate the dispatch function for
+   multi-versioned functions.  DISPATCH_DECL is the function which will
+   contain the dispatch logic.  FNDECLS are the function choices for
+   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
+   in DISPATCH_DECL in which the dispatch code is generated.  */
+
+static int
+ix86_dispatch_version (tree dispatch_decl,
+		       void *fndecls_p,
+		       basic_block *empty_bb)
+{
+  tree default_decl;
+  gimple ifunc_cpu_init_stmt;
+  gimple_seq gseq;
+  tree old_current_function_decl;
+  int ix;
+  tree ele;
+  VEC (tree, heap) *fndecls;
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /*fndecls_p is actually a vector.  */
+  fndecls = (VEC (tree, heap) *)fndecls_p;
+
+  /* Atleast one more version other than the default.  */
+  gcc_assert (VEC_length (tree, fndecls) >= 2);
+
+  /* The first version in the vector is the default decl.  */
+  default_decl = VEC_index (tree, fndecls, 0);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
+  current_function_decl = dispatch_decl;
+
+  gseq = bb_seq (*empty_bb);
+  ifunc_cpu_init_stmt = gimple_build_call_vec (
+                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
+  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
+  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
+  set_bb_seq (*empty_bb, gseq);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+
+  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
+    {
+      tree version_decl = ele;
+      tree predicate_chain = NULL_TREE;
+      /* Get attribute string, parse it and find the right predicate decl.
+         The predicate function could be a lengthy combination of many
+	 features, like arch-type and various isa-variants.  */
+      predicate_chain = get_builtin_code_for_version (version_decl);
+
+      if (predicate_chain == NULL_TREE)
+	continue;
+
+      *empty_bb = add_condition_to_bb (dispatch_decl, version_decl,
+				       predicate_chain, *empty_bb);
+
+    }
+  /* dispatch default version at the end.  */
+  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl,
+				   NULL, *empty_bb);
+  return 0;
+}
+
 /* This builds the processor_model struct type defined in
    libgcc/config/i386/i386-cpuinfo.c  */
 
@@ -39463,6 +39781,9 @@ ix86_autovectorize_vector_sizes (void)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN ix86_fold_builtin
 
+#undef TARGET_DISPATCH_VERSION
+#define TARGET_DISPATCH_VERSION ix86_dispatch_version
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-04-27  5:09     ` Sriraman Tallam
@ 2012-04-27 13:39       ` H.J. Lu
  2012-04-27 14:35         ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: H.J. Lu @ 2012-04-27 13:39 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi,
>
>   I have made the following changes in this new patch which is attached:
>
> * Use target attribute itself to create function versions.
> * Handle any number of ISA names and arch=  args to target attribute,
> generating the right dispatchers.
> * Integrate with the CPU runtime detection checked in this week.
> * Overload resolution: If the caller's target matches any of the
> version function's target, then a direct call to the version is
> generated, no need to go through the dispatching.
>
> Patch also available for review here:
> http://codereview.appspot.com/5752064
>

Does it work with

int foo ();
int foo () __attribute__ ((targetv("arch=corei7")));

int (*foo_p) () = foo?

Does it support C++?

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-04-27 13:39       ` H.J. Lu
@ 2012-04-27 14:35         ` Sriraman Tallam
  2012-04-27 14:39           ` H.J. Lu
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-04-27 14:35 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi,
>>
>>   I have made the following changes in this new patch which is attached:
>>
>> * Use target attribute itself to create function versions.
>> * Handle any number of ISA names and arch=  args to target attribute,
>> generating the right dispatchers.
>> * Integrate with the CPU runtime detection checked in this week.
>> * Overload resolution: If the caller's target matches any of the
>> version function's target, then a direct call to the version is
>> generated, no need to go through the dispatching.
>>
>> Patch also available for review here:
>> http://codereview.appspot.com/5752064
>>
>
> Does it work with
>
> int foo ();
> int foo () __attribute__ ((targetv("arch=corei7")));
>
> int (*foo_p) () = foo?

Yes, this will work. foo_p will be the address of the dispatcher
function and hence doing (*foo_p)() will call the right version.

>
> Does it support C++?

Partially, no support for virtual function versioning yet. I will add
it in the next iteration.

Thanks,
-Sri.

>
> Thanks.
>
> --
> H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-04-27 14:35         ` Sriraman Tallam
@ 2012-04-27 14:39           ` H.J. Lu
  2012-04-27 14:53             ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: H.J. Lu @ 2012-04-27 14:39 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Hi,
>>>
>>>   I have made the following changes in this new patch which is attached:
>>>
>>> * Use target attribute itself to create function versions.
>>> * Handle any number of ISA names and arch=  args to target attribute,
>>> generating the right dispatchers.
>>> * Integrate with the CPU runtime detection checked in this week.
>>> * Overload resolution: If the caller's target matches any of the
>>> version function's target, then a direct call to the version is
>>> generated, no need to go through the dispatching.
>>>
>>> Patch also available for review here:
>>> http://codereview.appspot.com/5752064
>>>
>>
>> Does it work with
>>
>> int foo ();
>> int foo () __attribute__ ((targetv("arch=corei7")));
>>
>> int (*foo_p) () = foo?
>
> Yes, this will work. foo_p will be the address of the dispatcher
> function and hence doing (*foo_p)() will call the right version.

Even when foo_p is a global variable and compiled with -fPIC?

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-04-27 14:39           ` H.J. Lu
@ 2012-04-27 14:53             ` Sriraman Tallam
  2012-04-27 15:36               ` H.J. Lu
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-04-27 14:53 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

On Fri, Apr 27, 2012 at 7:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> Hi,
>>>>
>>>>   I have made the following changes in this new patch which is attached:
>>>>
>>>> * Use target attribute itself to create function versions.
>>>> * Handle any number of ISA names and arch=  args to target attribute,
>>>> generating the right dispatchers.
>>>> * Integrate with the CPU runtime detection checked in this week.
>>>> * Overload resolution: If the caller's target matches any of the
>>>> version function's target, then a direct call to the version is
>>>> generated, no need to go through the dispatching.
>>>>
>>>> Patch also available for review here:
>>>> http://codereview.appspot.com/5752064
>>>>
>>>
>>> Does it work with
>>>
>>> int foo ();
>>> int foo () __attribute__ ((targetv("arch=corei7")));
>>>
>>> int (*foo_p) () = foo?
>>
>> Yes, this will work. foo_p will be the address of the dispatcher
>> function and hence doing (*foo_p)() will call the right version.
>
> Even when foo_p is a global variable and compiled with -fPIC?

I am not sure I understand what the complication is here, but FWIW, I
tried this example and it works

int foo ()
{
 return 0;
}

int  __attribute__ ((target ("arch=corei7)))
foo ()
{
 return 1;
}

int (*foo_p)() = foo;
int main ()
{
 return (*foo_p)();
}

g++ -fPIC -O2 example.cc


Did you have something else in mind? Could you please elaborate if you
a have a particular case in mind.

The way I handle function pointers is straightforward. When the
front-end sees a pointer to a function that is versioned, it returns
the pointer to the dispatcher instead.

Thanks,
-Sri.

>
> Thanks.
>
> --
> H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-04-27 14:53             ` Sriraman Tallam
@ 2012-04-27 15:36               ` H.J. Lu
  2012-04-27 15:45                 ` Sriraman Tallam
  2012-05-01 23:51                 ` Sriraman Tallam
  0 siblings, 2 replies; 93+ messages in thread
From: H.J. Lu @ 2012-04-27 15:36 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

On Fri, Apr 27, 2012 at 7:53 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Fri, Apr 27, 2012 at 7:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>> Hi,
>>>>>
>>>>>   I have made the following changes in this new patch which is attached:
>>>>>
>>>>> * Use target attribute itself to create function versions.
>>>>> * Handle any number of ISA names and arch=  args to target attribute,
>>>>> generating the right dispatchers.
>>>>> * Integrate with the CPU runtime detection checked in this week.
>>>>> * Overload resolution: If the caller's target matches any of the
>>>>> version function's target, then a direct call to the version is
>>>>> generated, no need to go through the dispatching.
>>>>>
>>>>> Patch also available for review here:
>>>>> http://codereview.appspot.com/5752064
>>>>>
>>>>
>>>> Does it work with
>>>>
>>>> int foo ();
>>>> int foo () __attribute__ ((targetv("arch=corei7")));
>>>>
>>>> int (*foo_p) () = foo?
>>>
>>> Yes, this will work. foo_p will be the address of the dispatcher
>>> function and hence doing (*foo_p)() will call the right version.
>>
>> Even when foo_p is a global variable and compiled with -fPIC?
>
> I am not sure I understand what the complication is here, but FWIW, I
> tried this example and it works
>
> int foo ()
> {
>  return 0;
> }
>
> int  __attribute__ ((target ("arch=corei7)))
> foo ()
> {
>  return 1;
> }
>
> int (*foo_p)() = foo;
> int main ()
> {
>  return (*foo_p)();
> }
>
> g++ -fPIC -O2 example.cc
>
>
> Did you have something else in mind? Could you please elaborate if you
> a have a particular case in mind.
>

That is what I meant.  But I didn't see it in your testcase.
Can you add it to your testcase?

Also you should verify the correct function is called in
your testcase at run-time.


Thanks.


-- 
H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-04-27 15:36               ` H.J. Lu
@ 2012-04-27 15:45                 ` Sriraman Tallam
  2012-05-01 23:51                 ` Sriraman Tallam
  1 sibling, 0 replies; 93+ messages in thread
From: Sriraman Tallam @ 2012-04-27 15:45 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

On Fri, Apr 27, 2012 at 8:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Apr 27, 2012 at 7:53 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Fri, Apr 27, 2012 at 7:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>>   I have made the following changes in this new patch which is attached:
>>>>>>
>>>>>> * Use target attribute itself to create function versions.
>>>>>> * Handle any number of ISA names and arch=  args to target attribute,
>>>>>> generating the right dispatchers.
>>>>>> * Integrate with the CPU runtime detection checked in this week.
>>>>>> * Overload resolution: If the caller's target matches any of the
>>>>>> version function's target, then a direct call to the version is
>>>>>> generated, no need to go through the dispatching.
>>>>>>
>>>>>> Patch also available for review here:
>>>>>> http://codereview.appspot.com/5752064
>>>>>>
>>>>>
>>>>> Does it work with
>>>>>
>>>>> int foo ();
>>>>> int foo () __attribute__ ((targetv("arch=corei7")));
>>>>>
>>>>> int (*foo_p) () = foo?
>>>>
>>>> Yes, this will work. foo_p will be the address of the dispatcher
>>>> function and hence doing (*foo_p)() will call the right version.
>>>
>>> Even when foo_p is a global variable and compiled with -fPIC?
>>
>> I am not sure I understand what the complication is here, but FWIW, I
>> tried this example and it works
>>
>> int foo ()
>> {
>>  return 0;
>> }
>>
>> int  __attribute__ ((target ("arch=corei7)))
>> foo ()
>> {
>>  return 1;
>> }
>>
>> int (*foo_p)() = foo;
>> int main ()
>> {
>>  return (*foo_p)();
>> }
>>
>> g++ -fPIC -O2 example.cc
>>
>>
>> Did you have something else in mind? Could you please elaborate if you
>> a have a particular case in mind.
>>
>
> That is what I meant.  But I didn't see it in your testcase.
> Can you add it to your testcase?
>
> Also you should verify the correct function is called in
> your testcase at run-time.

Ok, i will update the patch.

Thanks,
-Sri.

>
>
> Thanks.
>
>
> --
> H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-04-27 15:36               ` H.J. Lu
  2012-04-27 15:45                 ` Sriraman Tallam
@ 2012-05-01 23:51                 ` Sriraman Tallam
  2012-05-02  0:09                   ` H.J. Lu
  1 sibling, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-05-01 23:51 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

[-- Attachment #1: Type: text/plain, Size: 2443 bytes --]

Hi,

New patch attached, updated test case and fixed bugs related to
__PRETTY_FUNCTION_.

Patch also available for review here:  http://codereview.appspot.com/5752064

Thanks,
-Sri.

On Fri, Apr 27, 2012 at 8:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Apr 27, 2012 at 7:53 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Fri, Apr 27, 2012 at 7:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>>   I have made the following changes in this new patch which is attached:
>>>>>>
>>>>>> * Use target attribute itself to create function versions.
>>>>>> * Handle any number of ISA names and arch=  args to target attribute,
>>>>>> generating the right dispatchers.
>>>>>> * Integrate with the CPU runtime detection checked in this week.
>>>>>> * Overload resolution: If the caller's target matches any of the
>>>>>> version function's target, then a direct call to the version is
>>>>>> generated, no need to go through the dispatching.
>>>>>>
>>>>>> Patch also available for review here:
>>>>>> http://codereview.appspot.com/5752064
>>>>>>
>>>>>
>>>>> Does it work with
>>>>>
>>>>> int foo ();
>>>>> int foo () __attribute__ ((targetv("arch=corei7")));
>>>>>
>>>>> int (*foo_p) () = foo?
>>>>
>>>> Yes, this will work. foo_p will be the address of the dispatcher
>>>> function and hence doing (*foo_p)() will call the right version.
>>>
>>> Even when foo_p is a global variable and compiled with -fPIC?
>>
>> I am not sure I understand what the complication is here, but FWIW, I
>> tried this example and it works
>>
>> int foo ()
>> {
>>  return 0;
>> }
>>
>> int  __attribute__ ((target ("arch=corei7)))
>> foo ()
>> {
>>  return 1;
>> }
>>
>> int (*foo_p)() = foo;
>> int main ()
>> {
>>  return (*foo_p)();
>> }
>>
>> g++ -fPIC -O2 example.cc
>>
>>
>> Did you have something else in mind? Could you please elaborate if you
>> a have a particular case in mind.
>>
>
> That is what I meant.  But I didn't see it in your testcase.
> Can you add it to your testcase?
>
> Also you should verify the correct function is called in
> your testcase at run-time.
>
>
> Thanks.
>
>
> --
> H.J.

[-- Attachment #2: mv_fe_patch.txt --]
[-- Type: text/plain, Size: 65575 bytes --]

Overview of the patch which adds front-end support to specify function versions.

Example:

int foo ();  /* Default version */
int foo () __attribute__ ((target("avx,popcnt")));/*Specialized for avx and popcnt */
int foo () __attribute__ ((target("arch=core2,ssse3")));/*Specialized for core2 and ssse3*/

int main ()
{
 int (*p)() = &foo;
 return foo () + (*p)();
}

int foo ()
{
 return 0;
}

int __attribute__ ((target("avx,popcnt")))
foo ()
{
 return 0;
}

int __attribute__ ((target("arch=core2,ssse3")))
foo ()
{
 return 0;
}

The above example has foo defined 3 times, but all 3 definitions of foo are
different versions of the same function. The call to foo in main, directly and
via a pointer, are calls to the multi-versioned function foo which is dispatched
to the right foo at run-time.

What does the patch do?

* Tracking decls that correspond to function versions of function
name, say "foo":

Wnen the front-end sees more than one decl for "foo", with atleast one decl
tagged with "target"  attributes, it marks it as function versions. To
prevent duplicate definition errors with other versions of "foo", I change
"decls_match" function in cp/decl.c to return false when 2 decls have the
same signature but different target attributes. This will make all function
versions of "foo" to be added to the overload list of "foo".

* Change the assembler names of the function versions.

The front-end changes the assembler names of the function versions by suffixing
the sorted list of args to "target" to the function name of "foo". For example,
he assembler name of "void foo () __attribute__ ((target ("sse4")))" will
become _Z3foov.sse4.

* Separately group all function versions of "foo" together, in multiversion.c:

File multiversion.c maintains a hashtab, decl_version_htab,  that maps
the  default function decl of "foo" to the list of all other versions
of this function "foo". This is used when creating the dispatcher for
this function.

* Overload resolution:

 Function "build_over_call" in cp/call.c sees a call to function
"foo", which is multi-versioned. The overload resolution happens in
function "joust" in "cp/call.c". Here, the call to "foo" has all
possible versions of "foo" as candidates. If the caller has target
attributes and if it matches any of the function version's target
attributes, then a direct call is made to that function version.

For example:

int baz __attribute__ ((target ("avx,popcnt")))
{
  foo ();
}

it baz calls foo which is multi-versioned, then the call to foo here
will become a direct call to the version of foo targeted to avx,popcnt.

When a direct call to a version cannot be made then, the default
version of "foo" is the winning candidate. But, "build_over_call" realizes
that this is a versioned function and replaces the call-site of foo with a
"ifunc" call for foo, by querying a function in "multiversion.c" which
builds the ifunc decl. After this, all call-sites of "foo" contain the
call to the ifunc.

* Creating the dispatcher:

The dispatcher is independently created in a new pass, called
"pass_dispatch_version", that runs immediately after cfg and cgraph are
created. The dispatcher looks at all possible versions and queries the
target to give it the CPU detection predicates it must use to dispatch
each version. Then, the dispatcher body is created and the ifunc is
mapped to use this dispatcher.

Notice that only the dispatcher creation is done after the front-end.
Everything else occurs in the front-end itself. I could have created
the dispatcher also in the front-end. I did not do so because I
thought keeping it as a separate pass made sense to easily add more
dispatch mechanisms. Like when IFUNC is not available, replace it with
 control-flow to make direct calls to the function versions. Also,
making the dispatcher after cfg is created was easy.


	* doc/tm.texi.in: Add description for TARGET_DISPATCH_VERSION.
	* doc/tm.texi: Regenerate.
	* target.def (dispatch_version): New target hook.
	* tree.h (DECL_FUNCTION_VERSIONED): New macro.
	(tree_function_decl): New bit-field versioned_function.
	* tree-pass.h (pass_dispatch_versions): New pass.
	* multiversion.c: New file.
	* multiversion.h: New file.
	* cgraphunit.c:
	(cgraph_finalize_function): Force output of versioned inline
	functions.
	* cp/class.c: Include multiversion.h
	(add_method): aggregate function versions. Change assembler names of
	versioned functions.
	(resolve_address_of_overloaded_function): Match address of function
	version with default function.  Return address of ifunc dispatcher
	for address of versioned functions.
	(cxx_comdat_group): Use decl names for comdat groups of versioned
	functions.
	* cp/decl.c (decls_match): Make decls unmatched for versioned
	functions.
	(duplicate_decls): Remove ambiguity for versioned functions. Notify
	of deleted function version decls.
	(start_decl): Change assembler name of versioned functions.
	(start_function): Change assembler name of versioned functions.
	(cxx_comdat_group): Make comdat group of versioned functions be the
	same.
	* cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
	functions that are also marked inline.
	* cp/decl2.c: Include multiversion.h
	(check_classfn): Check attributes of versioned functions for match.
	* cp/call.c: Include multiversion.h
	(build_over_call): Make calls to multiversioned functions to call the
	dispatcher.
	(joust): For calls to multi-versioned functions, make the default
	function win.
	* cp/mangle.c (write_unqualified_name): Use assembler name for
	versioned functions.
	* timevar.def (TV_MULTIVERSION_DISPATCH): New time var.
	* varasm.c (finish_aliases_1): Check if the alias points to a function
	with a body before giving an error.
	* Makefile.in: Add multiversion.o
	* passes.c: Add pass_dispatch_versions to the pass list.
	* config/i386/i386.c (add_condition_to_bb): New function.
	(get_builtin_code_for_version): New function.
	(ix86_dispatch_version): New function.
	(TARGET_DISPATCH_VERSION): New macro.
	* testsuite/g++.dg/mv1.C: New test.

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 186883)
+++ gcc/doc/tm.texi	(working copy)
@@ -10997,6 +10997,14 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb})
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 186883)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -10877,6 +10877,14 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@hook TARGET_DISPATCH_VERSION
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 186883)
+++ gcc/target.def	(working copy)
@@ -1249,6 +1249,15 @@ DEFHOOK
  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
  hook_tree_tree_int_treep_bool_null)
 
+/* Target hook to generate the dispatching code for calls to multi-versioned
+   functions.  DISPATCH_DECL is the function that will have the dispatching
+   logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the
+   basic bloc in DISPATCH_DECL which will contain the code.  */
+DEFHOOK
+(dispatch_version,
+ "",
+ int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL)
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 DEFHOOK
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 186883)
+++ gcc/tree.h	(working copy)
@@ -3539,6 +3539,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
 #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
 
+/* In FUNCTION_DECL, this is set if this function has other versions generated
+   using "target" attributes.  The default version is the one which does not
+   have any "target" attribute set. */
+#define DECL_FUNCTION_VERSIONED(NODE)\
+   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
+
 /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
    arguments/result/saved_tree fields by front ends.   It was either inherit
    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
@@ -3583,8 +3589,8 @@ struct GTY(()) tree_function_decl {
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
-
-  /* 1 bit left */
+  unsigned versioned_function : 1;
+  /* No bits left.  */
 };
 
 /* The source language of the translation-unit.  */
Index: gcc/tree-pass.h
===================================================================
--- gcc/tree-pass.h	(revision 186883)
+++ gcc/tree-pass.h	(working copy)
@@ -453,6 +453,7 @@ extern struct gimple_opt_pass pass_tm_memopt;
 extern struct gimple_opt_pass pass_tm_edges;
 extern struct gimple_opt_pass pass_split_functions;
 extern struct gimple_opt_pass pass_feedback_split_functions;
+extern struct gimple_opt_pass pass_dispatch_versions;
 
 /* IPA Passes */
 extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
Index: gcc/multiversion.c
===================================================================
--- gcc/multiversion.c	(revision 0)
+++ gcc/multiversion.c	(revision 0)
@@ -0,0 +1,832 @@
+/* Function Multiversioning.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+/* Holds the state for multi-versioned functions here. The front-end
+   updates the state as and when function versions are encountered.
+   This is then used to generate the dispatch code.  Also, the
+   optimization passes to clone hot paths involving versioned functions
+   will be done here.
+
+   Function versions are created by using the same function signature but
+   also tagging attribute "target" to specify the platform type for which
+   the version must be executed.  Here is an example:
+
+   int foo ()
+   {
+     printf ("Execute as default");
+     return 0;
+   }
+
+   int  __attribute__ ((target ("arch=corei7")))
+   foo ()
+   {
+     printf ("Execute for corei7");
+     return 0;
+   }
+   
+   int main ()
+   {
+     return foo ();
+   } 
+
+   The call to foo in main is replaced with a call to an IFUNC function that
+   contains the dispatch code to call the correct function version at
+   run-time.  */
+
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "tree-inline.h"
+#include "langhooks.h"
+#include "flags.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "toplev.h"
+#include "timevar.h"
+#include "params.h"
+#include "fibheap.h"
+#include "intl.h"
+#include "tree-pass.h"
+#include "hashtab.h"
+#include "coverage.h"
+#include "ggc.h"
+#include "tree-flow.h"
+#include "rtl.h"
+#include "ipa-prop.h"
+#include "basic-block.h"
+#include "toplev.h"
+#include "dbgcnt.h"
+#include "tree-dump.h"
+#include "output.h"
+#include "vecprim.h"
+#include "gimple-pretty-print.h"
+#include "ipa-inline.h"
+#include "target.h"
+#include "multiversion.h"
+
+typedef void * void_p;
+
+DEF_VEC_P (void_p);
+DEF_VEC_ALLOC_P (void_p, heap);
+
+/* Each function decl that is a function version gets an instance of this
+   structure.   Since this is called by the front-end, decl merging can
+   happen, where a decl created for a new declaration is merged with 
+   the old. In this case, the new decl is deleted and the IS_DELETED
+   field is set for the struct instance corresponding to the new decl.
+   IFUNC_DECL is the decl of the ifunc function for default decls.
+   IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS
+   is a vector containing the list of function versions  that are
+   the candidates for dispatch.  */
+
+typedef struct version_function_d {
+  tree decl;
+  tree ifunc_decl;
+  tree ifunc_resolver_decl;
+  VEC (void_p, heap) *versions;
+  bool is_deleted;
+} version_function;
+
+/* Hashmap has an entry for every function decl that has other function
+   versions.  For function decls that are the default, it also stores the
+   list of all the other function versions.  Each entry is a structure
+   of type version_function_d.  */
+static htab_t decl_version_htab = NULL;
+
+/* Hashtable helpers for decl_version_htab. */
+
+static hashval_t
+decl_version_htab_hash_descriptor (const void *p)
+{
+  const version_function *t = (const version_function *) p;
+  return htab_hash_pointer (t->decl);
+}
+
+/* Hashtable helper for decl_version_htab. */
+
+static int
+decl_version_htab_eq_descriptor (const void *p1, const void *p2)
+{
+  const version_function *t1 = (const version_function *) p1;
+  return htab_eq_pointer ((const void_p) t1->decl, p2);
+}
+
+/* Create the decl_version_htab.  */
+static void
+create_decl_version_htab (void)
+{
+  if (decl_version_htab == NULL)
+    decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor,
+				     decl_version_htab_eq_descriptor, NULL);
+}
+
+/* Creates an instance of version_function for decl DECL.  */
+
+static version_function*
+new_version_function (const tree decl)
+{
+  version_function *v;
+  v = (version_function *)xmalloc(sizeof (version_function));
+  v->decl = decl;
+  v->ifunc_decl = NULL;
+  v->ifunc_resolver_decl = NULL;
+  v->versions = NULL;
+  v->is_deleted = false;
+  return v;
+}
+
+/* Comparator function to be used in qsort routine to sort attribute
+   specification strings to "target".  */
+
+static int
+attr_strcmp (const void *v1, const void *v2)
+{
+  const char *c1 = *(char *const*)v1;
+  const char *c2 = *(char *const*)v2;
+  return strcmp (c1, c2);
+}
+
+/* STR is the argument to target attribute.  This function tokenizes
+   the comma separated arguments, sorts them and returns a string which
+   is a unique identifier for the comma separated arguments.  */
+
+static char *
+sorted_attr_string (const char *str)
+{
+  char **args = NULL;
+  char *attr_str, *ret_str;
+  char *attr = NULL;
+  unsigned int argnum = 1;
+  unsigned int i;
+
+  for (i = 0; i < strlen (str); i++)
+    if (str[i] == ',')
+      argnum++;
+
+  attr_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (attr_str, str);
+
+  for (i = 0; i < strlen (attr_str); i++)
+    if (attr_str[i] == '=')
+      attr_str[i] = '_';
+
+  if (argnum == 1)
+    return attr_str;
+
+  args = (char **)xmalloc (argnum * sizeof (char *));
+
+  i = 0;
+  attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      args[i] = attr;
+      i++;
+      attr = strtok (NULL, ",");
+    }
+
+  qsort (args, argnum, sizeof (char*), attr_strcmp);
+
+  ret_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (ret_str, args[0]);
+  for (i = 1; i < argnum; i++)
+    {
+      strcat (ret_str, "_");
+      strcat (ret_str, args[i]);
+    }
+
+  free (args);
+  free (attr_str);
+  return ret_str;
+}
+
+/* Returns true when only one of DECL1 and DECL2 is marked with "target"
+   or if the "target" attribute strings of DECL1 and DECL2 dont match.  */
+
+bool
+has_different_version_attributes (const tree decl1, const tree decl2)
+{
+  tree attr1, attr2;
+  char *c1, *c2;
+  bool ret = false;
+
+  if (TREE_CODE (decl1) != FUNCTION_DECL
+      || TREE_CODE (decl2) != FUNCTION_DECL)
+    return false;
+
+  attr1 = lookup_attribute ("target", DECL_ATTRIBUTES (decl1));
+  attr2 = lookup_attribute ("target", DECL_ATTRIBUTES (decl2));
+
+  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
+    return false;
+
+  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
+      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
+    return true;
+
+  c1 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1))));
+  c2 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2))));
+
+  if (strcmp (c1, c2) != 0)
+     ret = true;
+
+  free (c1);
+  free (c2);
+
+  return ret;
+}
+
+/* If this decl corresponds to a function and has "target" attribute,
+   append the attribute string to its assembler name.  */
+
+static void
+version_assembler_name (const tree decl)
+{
+  tree version_attr;
+  const char *orig_name, *version_string, *attr_str;
+  char *assembler_name;
+  tree assembler_name_tree;
+  
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+    return;
+
+  if (DECL_DECLARED_INLINE_P (decl)
+      &&lookup_attribute ("gnu_inline",
+			  DECL_ATTRIBUTES (decl)))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Function versions cannot be marked as gnu_inline,"
+	      " bodies have to be generated\n");
+
+  if (DECL_VIRTUAL_P (decl)
+      || DECL_VINDEX (decl))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Virtual function versioning not supported\n");
+
+  version_attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  /* target attribute string is NULL for default functions.  */
+  if (version_attr == NULL_TREE)
+    return;
+
+  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+  version_string
+    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
+
+  attr_str = sorted_attr_string (version_string);
+  assembler_name = (char *) xmalloc (strlen (orig_name)
+				     + strlen (attr_str) + 2);
+
+  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
+  if (dump_file)
+    fprintf (dump_file, "Assembler name set to %s for function version %s\n",
+	     assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
+
+  assembler_name_tree = get_identifier (assembler_name);
+
+  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree);
+  SET_DECL_RTL (decl, NULL);
+}
+
+void
+mark_function_as_version (const tree decl)
+{
+  if (DECL_FUNCTION_VERSIONED (decl))
+    return;
+  DECL_FUNCTION_VERSIONED (decl) = 1;
+  version_assembler_name (decl);
+}
+
+/* Returns true if function DECL has target attribute set.  This could be
+   a version.  */
+
+bool
+is_target_attribute_set (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && (lookup_attribute ("target", DECL_ATTRIBUTES (decl))
+	      != NULL_TREE));
+}
+
+/* Returns true if decl is multi-versioned and DECL is the default function,
+   that is it is not tagged with "target" attribute.  */
+
+bool
+is_default_function (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && (lookup_attribute ("target", DECL_ATTRIBUTES (decl))
+	      == NULL_TREE));	
+}
+
+/* For function decl DECL, find the version_function struct in the
+   decl_version_htab.  */
+
+static version_function *
+find_function_version (const tree decl)
+{
+  void *slot;
+
+  if (!DECL_FUNCTION_VERSIONED (decl))
+    return NULL;
+
+  if (!decl_version_htab)
+    return NULL;
+
+  slot = htab_find_with_hash (decl_version_htab, decl,
+                              htab_hash_pointer (decl));
+
+  if (slot != NULL)
+    return (version_function *)slot;
+
+  return NULL;
+}
+
+/* Record DECL as a function version by creating a version_function struct
+   for it and storing it in the hashtable.  */
+
+static version_function *
+add_function_version (const tree decl)
+{
+  void **slot;
+  version_function *v;
+
+  if (!DECL_FUNCTION_VERSIONED (decl))
+    return NULL;
+
+  create_decl_version_htab ();
+
+  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl,
+                                   htab_hash_pointer ((const void_p)decl),
+				   INSERT);
+
+  if (*slot != NULL)
+    return (version_function *)*slot;
+
+  v = new_version_function (decl);
+  *slot = v;
+
+  return v;
+}
+
+/* Push V into VEC only if it is not already present.  If already present
+   returns false.  */
+
+static bool
+push_function_version (version_function *v, VEC (void_p, heap) **vec)
+{
+  int ix;
+  void_p ele; 
+  for (ix = 0; VEC_iterate (void_p, *vec, ix, ele); ++ix)
+    {
+      if (ele == (void_p)v)
+        return false;
+    }
+
+  VEC_safe_push (void_p, heap, *vec, (void*)v);
+  return true;
+}
+ 
+/* Mark DECL as deleted.  This is called by the front-end when a duplicate
+   decl is merged with the original decl and the duplicate decl is deleted.
+   This function marks the duplicate_decl as invalid.  Called by
+   duplicate_decls in cp/decl.c.  */
+
+void
+mark_delete_decl_version (const tree decl)
+{
+  version_function *decl_v;
+
+  decl_v = find_function_version (decl);
+
+  if (decl_v == NULL)
+    return;
+
+  decl_v->is_deleted = true;
+
+  if (is_default_function (decl)
+      && decl_v->versions != NULL)
+    {
+      VEC_truncate (void_p, decl_v->versions, 0);
+      VEC_free (void_p, heap, decl_v->versions);
+      decl_v->versions = NULL;
+    }
+}
+
+/* Mark DECL1 and DECL2 to be function versions in the same group.  One
+   of DECL1 and DECL2 must be the default, otherwise this function does
+   nothing.  This function aggregates the versions.  */
+
+int
+group_function_versions (const tree decl1, const tree decl2)
+{
+  tree default_decl, version_decl;
+  version_function *default_v, *version_v;
+
+  gcc_assert (DECL_FUNCTION_VERSIONED (decl1)
+	      && DECL_FUNCTION_VERSIONED (decl2));
+
+  /* The version decls are added only to the default decl.  */
+  if (!is_default_function (decl1)
+      && !is_default_function (decl2))
+    return 0;
+
+  /* This can happen with duplicate declarations.  Just ignore.  */
+  if (is_default_function (decl1)
+      && is_default_function (decl2))
+    return 0;
+
+  default_decl = (is_default_function (decl1)) ? decl1 : decl2;
+  version_decl = (default_decl == decl1) ? decl2 : decl1;
+
+  gcc_assert (default_decl != version_decl);
+  create_decl_version_htab ();
+
+  /* If the version function is found, it has been added.  */
+  if (find_function_version (version_decl))
+    return 0;
+
+  default_v = add_function_version (default_decl);
+  version_v = add_function_version (version_decl);
+
+  if (default_v->versions == NULL)
+    default_v->versions = VEC_alloc (void_p, heap, 1);
+
+  push_function_version (version_v, &default_v->versions);
+  return 0;
+}
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* Return a new name by appending SUFFIX to the DECL name.  If
+   make_unique is true, append the full path name.  */
+
+static char *
+make_name (tree decl, const char *suffix, bool make_unique)
+{
+  char *global_var_name;
+  int name_len;
+  const char *name;
+  const char *unique_name = NULL;
+
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* Get a unique name that can be used globally without any chances
+     of collision at link time.  */
+  if (make_unique)
+    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
+
+  name_len = strlen (name) + strlen (suffix) + 2;
+
+  if (make_unique)
+    name_len += strlen (unique_name) + 1;
+  global_var_name = (char *) xmalloc (name_len);
+
+  /* Use '.' to concatenate names as it is demangler friendly.  */
+  if (make_unique)
+      snprintf (global_var_name, name_len, "%s.%s.%s", name,
+		unique_name, suffix);
+  else
+      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
+
+  return global_var_name;
+}
+
+/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch
+   the versions of multi-versioned function DEFAULT_DECL.  Create and
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_ifunc_resolver_func (const tree default_decl,
+			  const tree ifunc_decl,
+			  basic_block *empty_bb)
+{
+  char *resolver_name;
+  tree decl, type, decl_name, t;
+  basic_block new_bb;
+  tree old_current_function_decl;
+  bool make_unique = false;
+
+  /* IFUNC's have to be globally visible.  So, if the default_decl is
+     not, then the name of the IFUNC should be made unique.  */
+  if (TREE_PUBLIC (default_decl) == 0)
+    make_unique = true;
+
+  /* Append the filename to the resolver function if the versions are
+     not externally visible.  This is because the resolver function has
+     to be externally visible for the loader to find it.  So, appending
+     the filename will prevent conflicts with a resolver function from
+     another module which is based on the same version name.  */
+  resolver_name = make_name (default_decl, "resolver", make_unique);
+
+  /* The resolver function should return a (void *). */
+  type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  decl_name = get_identifier (resolver_name);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = TREE_USED (default_decl);
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  /* IFUNC resolvers have to be externally visible.  */
+  TREE_PUBLIC (decl) = 1;
+  DECL_UNINLINABLE (decl) = 1;
+
+  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl);
+  DECL_EXTERNAL (ifunc_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+  TREE_READONLY (decl) = 0;
+  DECL_PURE_P (decl) = 0;
+  DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
+    }
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  current_function_decl = decl;
+  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
+  cfun->curr_properties |=
+    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
+     PROP_ssa);
+  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
+  *empty_bb = new_bb;
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
+  cgraph_mark_force_output_node (cgraph_get_create_node (decl));
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      gcc_assert (cgraph_get_node (default_decl));
+      cgraph_add_to_same_comdat_group (cgraph_get_node (decl),
+				       cgraph_get_node (default_decl));
+    }
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  gcc_assert (ifunc_decl != NULL);
+  /* Mark ifunc_decl as "ifunc" with resolver as resolver_name.  */
+  DECL_ATTRIBUTES (ifunc_decl) 
+    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl));
+
+  /* Create the alias here.  */
+  cgraph_create_function_alias (ifunc_decl, decl);
+  return decl;
+}
+
+/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to
+   DECL function will be replaced with calls to the ifunc.   Return the decl
+   of the ifunc created.  */
+
+static tree
+make_ifunc_func (const tree decl)
+{
+  tree ifunc_decl;
+  char *ifunc_name, *resolver_name;
+  tree fn_type, ifunc_type;
+  bool make_unique = false;
+
+  if (TREE_PUBLIC (decl) == 0)
+    make_unique = true;
+
+  ifunc_name = make_name (decl, "ifunc", make_unique);
+  resolver_name = make_name (decl, "resolver", make_unique);
+  gcc_assert (resolver_name);
+
+  fn_type = TREE_TYPE (decl);
+  ifunc_type = build_function_type (TREE_TYPE (fn_type),
+				    TYPE_ARG_TYPES (fn_type));
+  
+  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type);
+  TREE_USED (ifunc_decl) = 1;
+  DECL_CONTEXT (ifunc_decl) = NULL_TREE;
+  DECL_INITIAL (ifunc_decl) = error_mark_node;
+  DECL_ARTIFICIAL (ifunc_decl) = 1;
+  /* Mark this ifunc as external, the resolver will flip it again if
+     it gets generated.  */
+  DECL_EXTERNAL (ifunc_decl) = 1;
+  /* IFUNCs have to be externally visible.  */
+  TREE_PUBLIC (ifunc_decl) = 1;
+
+  return ifunc_decl;  
+}
+
+/* For multi-versioned function decl, which should also be the default,
+   return the decl of the ifunc resolver, create it if it does not
+   exist.  */
+
+tree
+get_ifunc_for_version (const tree decl)
+{
+  version_function *decl_v;
+  int ix;
+  void_p ele;
+
+  /* DECL has to be the default version, otherwise it is missing and
+     that is not allowed.  */
+  if (!is_default_function (decl))
+    {
+      error_at (DECL_SOURCE_LOCATION (decl), "Default version not found");
+      return decl;
+    }
+
+  decl_v = find_function_version (decl);
+  gcc_assert (decl_v != NULL);
+  if (decl_v->ifunc_decl == NULL)
+    {
+      tree ifunc_decl;
+      ifunc_decl = make_ifunc_func (decl);
+      decl_v->ifunc_decl = ifunc_decl;
+    }
+
+  if (cgraph_get_node (decl))
+    cgraph_mark_force_output_node (cgraph_get_node (decl));
+
+  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
+    {
+      version_function *v = (version_function *) ele;
+      /* This could be a deleted version.  Happens with
+	 duplicate declarations. */
+      if (v->is_deleted)
+	continue;
+      gcc_assert (v->decl != NULL);
+      if (cgraph_get_node (v->decl))
+	cgraph_mark_force_output_node (cgraph_get_node (v->decl));
+    }
+
+  return decl_v->ifunc_decl;
+}
+
+/* Generate the dispatching code to dispatch multi-versioned function
+   DECL.  Make a new function decl for dispatching and call the target
+   hook to process the "target" attributes and provide the code to
+   dispatch the right function at run-time.  */
+
+static tree
+make_ifunc_resolver_for_version (const tree decl)
+{
+  version_function *decl_v;
+  tree ifunc_resolver_decl, ifunc_decl;
+  basic_block empty_bb;
+  int ix;
+  void_p ele;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  tree old_current_function_decl;
+
+  gcc_assert (is_default_function (decl));
+
+  decl_v = find_function_version (decl);
+  gcc_assert (decl_v != NULL);
+
+  if (decl_v->ifunc_resolver_decl != NULL)
+    return decl_v->ifunc_resolver_decl;
+
+  ifunc_decl = decl_v->ifunc_decl;
+
+  if (ifunc_decl == NULL)
+    ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl);
+
+  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl,
+						  &empty_bb);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (ifunc_resolver_decl));
+  current_function_decl = ifunc_resolver_decl;
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+  VEC_safe_push (tree, heap, fn_ver_vec, decl);
+
+  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
+    {
+      version_function *v = (version_function *) ele;
+      gcc_assert (v->decl != NULL);
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (v->decl))
+        error_at (DECL_SOURCE_LOCATION (v->decl),
+		  "Virtual function versioning not supported\n");
+      if (!v->is_deleted)
+	VEC_safe_push (tree, heap, fn_ver_vec, v->decl);
+    }
+
+  gcc_assert (targetm.dispatch_version);
+  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb);
+  decl_v->ifunc_resolver_decl = ifunc_resolver_decl;
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+  return ifunc_resolver_decl;
+}
+
+/* Main entry point to pass_dispatch_versions. For multi-versioned functions,
+   generate the dispatching code.  */
+
+static unsigned int
+do_dispatch_versions (void)
+{
+  /* A new pass for generating dispatch code for multi-versioned functions.
+     Other forms of dispatch can be added when ifunc support is not available
+     like just calling the function directly after checking for target type.
+     Currently, dispatching is done through IFUNC.  This pass will become
+     more meaningful when other dispatch mechanisms are added.  */
+
+  /* Cloning a function to produce more versions will happen here when the
+     user requests that via the target attribute. For example,
+     int foo () __attribute__ ((target(("arch=core2"), ("arch=corei7"))));
+     means that the user wants the same body of foo to be versioned for core2
+     and corei7.  In that case, this function will be cloned during this
+     pass.  */
+  
+  if (DECL_FUNCTION_VERSIONED (current_function_decl)
+      && is_default_function (current_function_decl))
+    {
+      tree decl = make_ifunc_resolver_for_version (current_function_decl);
+      if (dump_file && decl)
+	dump_function_to_file (decl, dump_file, TDF_BLOCKS);
+    }
+  return 0;
+}
+
+static  bool
+gate_dispatch_versions (void)
+{
+  return true;
+}
+
+/* A pass to generate the dispatch code to execute the appropriate version
+   of a multi-versioned function at run-time.  */
+
+struct gimple_opt_pass pass_dispatch_versions =
+{
+ {
+  GIMPLE_PASS,
+  "dispatch_multiversion_functions",    /* name */
+  gate_dispatch_versions,		/* gate */
+  do_dispatch_versions,			/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_MULTIVERSION_DISPATCH,		/* tv_id */
+  PROP_cfg,				/* properties_required */
+  PROP_cfg,				/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  0					/* todo_flags_finish */
+ }
+};
Index: gcc/cgraphunit.c
===================================================================
--- gcc/cgraphunit.c	(revision 186883)
+++ gcc/cgraphunit.c	(working copy)
@@ -411,6 +411,13 @@ cgraph_finalize_function (tree decl, bool nested)
       && !DECL_DISREGARD_INLINE_LIMITS (decl))
     node->symbol.force_output = 1;
 
+  /* With function versions, keep inline functions and do not worry about
+     inline limits.  */
+  if (DECL_FUNCTION_VERSIONED (decl)
+      && DECL_DECLARED_INLINE_P (decl)
+      && !DECL_EXTERNAL (decl))
+    node->symbol.force_output = 1;
+
   /* When not optimizing, also output the static functions. (see
      PR24561), but don't do so for always_inline functions, functions
      declared inline and nested functions.  These were optimized out
Index: gcc/multiversion.h
===================================================================
--- gcc/multiversion.h	(revision 0)
+++ gcc/multiversion.h	(revision 0)
@@ -0,0 +1,55 @@
+/* Function Multiversioning.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+/* This is the header file which provides the functions to keep track
+   of functions that are multi-versioned and to generate the dispatch
+   code to call the right version at run-time.  */
+
+#ifndef GCC_MULTIVERSION_H
+#define GCC_MULTIVERION_H
+
+#include "tree.h"
+
+/* Mark DECL1 and DECL2 as function versions.  */
+int group_function_versions (const tree decl1, const tree decl2);
+
+/* Mark DECL as deleted and no longer a version.  */
+void mark_delete_decl_version (const tree decl);
+
+/* Returns true if DECL is the default version to be executed if all
+   other versions are inappropriate at run-time.  */
+bool is_default_function (const tree decl);
+
+/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL
+   must be the default function in the multi-versioned group.  */
+tree get_ifunc_for_version (const tree decl);
+
+/* Returns true when only one of DECL1 and DECL2 is marked with "target"
+   or if the "target" attribute strings of  DECL1 and DECL2 dont match.  */
+bool has_different_version_attributes (const tree decl1, const tree decl2);
+
+/* Function DECL is marked to be a multi-versioned function.  If DECL is
+   not the default version, the assembler name of DECL is changed to include
+   the attribute string to keep the name unambiguous.  */
+void mark_function_as_version (const tree decl);
+
+/* Check if decl is FUNCTION_DECL with target attribute set.  */
+bool is_target_attribute_set (const tree decl);
+#endif
Index: gcc/testsuite/g++.dg/mv1.C
===================================================================
--- gcc/testsuite/g++.dg/mv1.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv1.C	(revision 0)
@@ -0,0 +1,39 @@
+/* Simple test case to check if Multiversioning works.  */
+/* { dg-do run } */
+/* { dg-options "-O2 -fPIC" } */
+
+#include <assert.h>
+
+int foo ();
+int foo () __attribute__ ((target("arch=corei7,sse4.2,popcnt")));
+/* The target operands in this declaration and the definition are re-ordered.
+   This should still work.  */
+int foo () __attribute__ ((target("ssse3,avx2")));
+
+int (*p)() = &foo;
+int main ()
+{
+  return foo () + (*p)();
+}
+
+int foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=corei7,sse4.2,popcnt")))
+foo ()
+{
+  assert (__builtin_cpu_is ("corei7")
+	  && __builtin_cpu_supports ("sse4.2")
+	  && __builtin_cpu_supports ("popcnt"));
+  return 0;
+}
+
+int __attribute__ ((target("avx2,ssse3")))
+foo ()
+{
+  assert (__builtin_cpu_supports ("avx2")
+	  && __builtin_cpu_supports ("ssse3"));
+  return 0;
+}
Index: gcc/cp/class.c
===================================================================
--- gcc/cp/class.c	(revision 186883)
+++ gcc/cp/class.c	(working copy)
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-dump.h"
 #include "splay-tree.h"
 #include "pointer-set.h"
+#include "multiversion.h"
 
 /* The number of nested classes being processed.  If we are not in the
    scope of any class, this is zero.  */
@@ -1092,7 +1093,21 @@ add_method (tree type, tree method, tree using_dec
 	      || same_type_p (TREE_TYPE (fn_type),
 			      TREE_TYPE (method_type))))
 	{
-	  if (using_decl)
+	  /* For function versions, their parms and types match
+	     but they are not duplicates.  Record function versions
+	     as and when they are found.  */
+	  if (TREE_CODE (fn) == FUNCTION_DECL
+	      && TREE_CODE (method) == FUNCTION_DECL
+	      && (is_target_attribute_set (fn)
+		  || is_target_attribute_set (method))
+	      && has_different_version_attributes (fn, method))
+ 	    {
+	      mark_function_as_version (fn);
+	      mark_function_as_version (method);
+	      group_function_versions (fn, method);
+	      continue;
+	    }
+	  else if (using_decl)
 	    {
 	      if (DECL_CONTEXT (fn) == type)
 		/* Defer to the local function.  */
@@ -1150,6 +1165,7 @@ add_method (tree type, tree method, tree using_dec
   else
     /* Replace the current slot.  */
     VEC_replace (tree, method_vec, slot, overload);
+
   return true;
 }
 
@@ -6930,8 +6946,11 @@ resolve_address_of_overloaded_function (tree targe
 	  if (DECL_ANTICIPATED (fn))
 	    continue;
 
-	  /* See if there's a match.  */
-	  if (same_type_p (target_fn_type, static_fn_type (fn)))
+	  /* See if there's a match.   For functions that are multi-versioned
+	     match it to the default function.  */
+	  if (same_type_p (target_fn_type, static_fn_type (fn))
+	      && (!DECL_FUNCTION_VERSIONED (fn)
+		  || is_default_function (fn)))
 	    matches = tree_cons (fn, NULL_TREE, matches);
 	}
     }
@@ -7093,6 +7112,22 @@ resolve_address_of_overloaded_function (tree targe
       perform_or_defer_access_check (access_path, fn, fn);
     }
 
+  /* If a pointer to a function that is multi-versioned is requested, the
+     pointer to the dispatcher function is returned instead.  This works
+     well because indirectly calling the function will dispatch the right
+     function version at run-time. Also, the function address is kept
+     unique.  */
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && is_default_function (fn))
+    {
+      tree ifunc_decl;
+      ifunc_decl = get_ifunc_for_version (fn);
+      retrofit_lang_decl (ifunc_decl);
+      gcc_assert (ifunc_decl != NULL);
+      mark_used (fn);
+      return build_fold_addr_expr (ifunc_decl);
+    }
+
   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
     return cp_build_addr_expr (fn, flags);
   else
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	(revision 186883)
+++ gcc/cp/decl.c	(working copy)
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "pointer-set.h"
 #include "splay-tree.h"
 #include "plugin.h"
+#include "multiversion.h"
 
 /* Possible cases of bad specifiers type used by bad_specifiers. */
 enum bad_spec_place {
@@ -973,6 +974,21 @@ decls_match (tree newdecl, tree olddecl)
       if (t1 != t2)
 	return 0;
 
+      /* The decls dont match if they correspond to two different versions
+	 of the same function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) 
+	  && has_different_version_attributes (newdecl, olddecl))
+	{
+	  /* One of the decls could be the default without the "target"
+	     attribute. Set it to be a versioned function here.  */
+	  mark_function_as_version (newdecl);
+	  mark_function_as_version (olddecl);
+	  /* Accumulate all the versions of a function.  */
+	  group_function_versions (olddecl, newdecl);
+	  return 0;
+	}
+
       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
 	  && ! (DECL_EXTERN_C_P (newdecl)
 		&& DECL_EXTERN_C_P (olddecl)))
@@ -1490,7 +1506,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
 	      error ("previous declaration %q+#D here", olddecl);
 	      return NULL_TREE;
 	    }
-	  else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
+	  /* For function versions, params and types match, but they
+	     are not ambiguous.  */
+	  else if ((!DECL_FUNCTION_VERSIONED (newdecl)
+		    && !DECL_FUNCTION_VERSIONED (olddecl))
+		   && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
 			      TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
 	    {
 	      error ("new declaration %q#D", newdecl);
@@ -2262,6 +2282,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
+  /* If the olddecl is a version, so is the newdecl.  */
+  if (TREE_CODE (newdecl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (olddecl))
+    {
+      DECL_FUNCTION_VERSIONED (newdecl) = 1;
+      /* Record that newdecl is not a valid version and has
+	 been deleted.  */
+      mark_delete_decl_version (newdecl);
+    }
+
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
     {
       int function_size;
@@ -3810,6 +3840,7 @@ cp_make_fname_decl (location_t loc, tree id, int t
 			    ? NULL : fname_as_string (type_dep));
   tree type;
   tree init = cp_fname_init (name, &type);
+
   tree decl = build_decl (loc, VAR_DECL, id, type);
 
   if (name)
@@ -14035,7 +14066,11 @@ cxx_comdat_group (tree decl)
 	  else
 	    break;
 	}
-      name = DECL_ASSEMBLER_NAME (decl);
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl))
+	name = DECL_NAME (decl);
+      else
+        name = DECL_ASSEMBLER_NAME (decl);
     }
 
   return name;
Index: gcc/cp/semantics.c
===================================================================
--- gcc/cp/semantics.c	(revision 186883)
+++ gcc/cp/semantics.c	(working copy)
@@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn)
       /* If the user wants us to keep all inline functions, then mark
 	 this function as needed so that finish_file will make sure to
 	 output it later.  Similarly, all dllexport'd functions must
-	 be emitted; there may be callers in other DLLs.  */
-      if ((flag_keep_inline_functions
+	 be emitted; there may be callers in other DLLs.
+	 Also, mark this function as needed if it is marked inline but
+	 is a multi-versioned function.  */
+      if (((flag_keep_inline_functions
+	    || DECL_FUNCTION_VERSIONED (fn))
 	   && DECL_DECLARED_INLINE_P (fn)
 	   && !DECL_REALLY_EXTERN (fn))
 	  || (flag_keep_inline_dllexport
Index: gcc/cp/decl2.c
===================================================================
--- gcc/cp/decl2.c	(revision 186883)
+++ gcc/cp/decl2.c	(working copy)
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "splay-tree.h"
 #include "langhooks.h"
 #include "c-family/c-ada-spec.h"
+#include "multiversion.h"
 
 extern cpp_reader *parse_in;
 
@@ -677,9 +678,13 @@ check_classfn (tree ctype, tree function, tree tem
 	  if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
 	    continue;
 
+	  /* While finding a match, same types and params are not enough
+	     if the function is versioned.  Also check version ("target")
+	     attributes.  */
 	  if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
 			   TREE_TYPE (TREE_TYPE (fndecl)))
 	      && compparms (p1, p2)
+	      && !has_different_version_attributes (function, fndecl)
 	      && (!is_template
 		  || comp_template_parms (template_parms,
 					  DECL_TEMPLATE_PARMS (fndecl)))
Index: gcc/cp/call.c
===================================================================
--- gcc/cp/call.c	(revision 186883)
+++ gcc/cp/call.c	(working copy)
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "timevar.h"
+#include "multiversion.h"
 
 /* The various kinds of conversion.  */
 
@@ -3903,6 +3904,16 @@ build_new_function_call (tree fn, VEC(tree,gc) **a
     {
       if (complain & tf_error)
 	{
+	  /* If the call is to a multiversioned function without
+	     a default version, overload resolution will fail.  */
+	  if (candidates
+	      && TREE_CODE (candidates->fn) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (candidates->fn))
+	    error_at (location_of (DECL_NAME (OVL_CURRENT (fn))),
+		      "Call to multiversioned function %<%D(%A)%> with"
+		      " no default version", DECL_NAME (OVL_CURRENT (fn)),
+		      build_tree_list_vec (*args));
+
 	  if (!any_viable_p && candidates && ! candidates->next
 	      && (TREE_CODE (candidates->fn) == FUNCTION_DECL))
 	    return cp_build_function_call_vec (candidates->fn, args, complain);
@@ -6809,6 +6820,19 @@ build_over_call (struct z_candidate *cand, int fla
   if (!already_used)
     mark_used (fn);
 
+  /* For a call to a multi-versioned function, the call should actually be to
+     the dispatcher.  */
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && is_default_function (fn))
+    {
+      tree ifunc_decl;
+      ifunc_decl = get_ifunc_for_version (fn);
+      retrofit_lang_decl (ifunc_decl);
+      gcc_assert (ifunc_decl != NULL);
+      return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl,
+					nargs, argarray);
+    }
+
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
     {
       tree t;
@@ -8067,6 +8091,60 @@ joust (struct z_candidate *cand1, struct z_candida
   size_t i;
   size_t len;
 
+  /* For Candidates of a multi-versioned function, first check if the
+     target flags of the caller match any of the candidates. If so,
+     the caller can directly call this candidate otherwise the one marked
+     default wins.  This is because the default decl is used as key to
+     aggregate all the other versions provided for it in multiversion.c.
+     When generating the actual call, the appropriate dispatcher is created
+     to call the right function version at run-time.  */
+
+  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL
+       && DECL_FUNCTION_VERSIONED (cand1->fn))
+      ||(TREE_CODE (cand2->fn) == FUNCTION_DECL
+	 && DECL_FUNCTION_VERSIONED (cand2->fn)))
+    {
+      /* Both functions must be marked versioned.  */
+      gcc_assert (DECL_FUNCTION_VERSIONED (cand1->fn)
+		  && DECL_FUNCTION_VERSIONED (cand2->fn));
+
+      /* Try to see if a direct call can be made to a version.  This is
+	 possible if the caller and callee have the same target flags.
+	 If cand->fn is marked with target attributes,  check if the
+	 target approves inlining this into the caller.  If so, this is
+	 the version we want.  */
+
+      if (is_target_attribute_set (cand1->fn)
+	  && targetm.target_option.can_inline_p (current_function_decl,
+						 cand1->fn))
+	return 1;
+
+      if (is_target_attribute_set (cand2->fn)
+	  && targetm.target_option.can_inline_p (current_function_decl,
+						 cand2->fn))
+	return -1;
+
+      /* A direct call to a version is not possible, so find the default
+	 function and return it.  This will later be converted to dispatch
+	 the right version at run time.  */
+
+      if (is_default_function (cand1->fn))
+	{
+          mark_used (cand2->fn);
+	  return 1;
+	}
+
+      if (is_default_function (cand2->fn))
+	{
+          mark_used (cand1->fn);
+	  return -1;
+	}
+
+      /* If a default function is absent, this will never get resolved leading
+	 to an ambiguous call error.  */
+      return 0;
+    }
+
   /* Candidates that involve bad conversions are always worse than those
      that don't.  */
   if (cand1->viable > cand2->viable)
Index: gcc/timevar.def
===================================================================
--- gcc/timevar.def	(revision 186883)
+++ gcc/timevar.def	(working copy)
@@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE        , "tree if-co
 DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
 DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
 DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
+DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch")
 
 /* Everything else in rest_of_compilation not included above.  */
 DEFTIMEVAR (TV_EARLY_LOCAL	     , "early local passes")
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 186883)
+++ gcc/Makefile.in	(working copy)
@@ -1294,6 +1294,7 @@ OBJS = \
 	mcf.o \
 	mode-switching.o \
 	modulo-sched.o \
+	multiversion.o \
 	omega.o \
 	omp-low.o \
 	optabs.o \
@@ -3030,6 +3031,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h
    $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
    $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
    $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
+multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+   $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
+   $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
+   $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
+   $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h
 cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 186883)
+++ gcc/passes.c	(working copy)
@@ -1287,6 +1287,7 @@ init_optimization_passes (void)
   NEXT_PASS (pass_build_cfg);
   NEXT_PASS (pass_warn_function_return);
   NEXT_PASS (pass_build_cgraph_edges);
+  NEXT_PASS (pass_dispatch_versions);
   *p = NULL;
 
   /* Interprocedural optimization passes.  */
Index: gcc/cp/mangle.c
===================================================================
--- gcc/cp/mangle.c	(revision 186883)
+++ gcc/cp/mangle.c	(working copy)
@@ -1245,7 +1245,12 @@ write_unqualified_name (const tree decl)
     {
       MANGLE_TRACE_TREE ("local-source-name", decl);
       write_char ('L');
-      write_source_name (DECL_NAME (decl));
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && DECL_ASSEMBLER_NAME_SET_P (decl))
+	write_source_name (DECL_ASSEMBLER_NAME (decl));
+      else
+	write_source_name (DECL_NAME (decl));
       /* The default discriminator is 1, and that's all we ever use,
 	 so there's no code to output one here.  */
     }
@@ -1260,7 +1265,14 @@ write_unqualified_name (const tree decl)
                && LAMBDA_TYPE_P (type))
         write_closure_type_name (type);
       else
-        write_source_name (DECL_NAME (decl));
+	{
+	  if (TREE_CODE (decl) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (decl)
+	      && DECL_ASSEMBLER_NAME_SET_P (decl))
+	    write_source_name (DECL_ASSEMBLER_NAME (decl));
+	  else
+	    write_source_name (DECL_NAME (decl));
+	}
     }
 }
 
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 186883)
+++ gcc/config/i386/i386.c	(working copy)
@@ -27678,6 +27678,326 @@ ix86_init_mmx_sse_builtins (void)
     }
 }
 
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
+   to return a pointer to VERSION_DECL if the outcome of the expression
+   formed by PREDICATE_CHAIN is true.  This function will be called during
+   version dispatch to decide which function version to execute.  It returns
+   the basic block at the end to which more conditions can be added.  */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     tree predicate_chain, basic_block new_bb)
+{
+  gimple return_stmt;
+  tree convert_expr, result_var;
+  gimple convert_stmt;
+  gimple call_cond_stmt;
+  gimple if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+
+  tree cond_var, and_expr_var = NULL_TREE;
+  gimple_seq gseq;
+
+  tree old_current_function_decl;
+  tree predicate_decl, predicate_arg;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+  current_function_decl = function_decl;
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node, NULL);
+  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  return_stmt = gimple_build_return (result_var);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      current_function_decl = old_current_function_decl;
+      return new_bb;
+    }
+
+  while (predicate_chain != NULL)
+    {
+      cond_var = create_tmp_var (integer_type_node, NULL);
+      predicate_decl = TREE_PURPOSE (predicate_chain);
+      predicate_arg = TREE_VALUE (predicate_chain);
+      call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+      gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+      gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+      gimple_set_bb (call_cond_stmt, new_bb);
+      gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+      predicate_chain = TREE_CHAIN (predicate_chain);
+      
+      if (and_expr_var == NULL)
+        and_expr_var = cond_var;
+      else
+	{
+	  gimple assign_stmt;
+	  /* Use MIN_EXPR to check if any integer is zero?.
+	     and_expr_var = min_expr <cond_var, and_expr_var>  */
+	  assign_stmt = gimple_build_assign (and_expr_var,
+			  build2 (MIN_EXPR, integer_type_node,
+				  cond_var, and_expr_var));
+
+	  gimple_set_block (assign_stmt, DECL_INITIAL (function_decl));
+	  gimple_set_bb (assign_stmt, new_bb);
+	  gimple_seq_add_stmt (&gseq, assign_stmt);
+	}
+    }
+
+  if_else_stmt = gimple_build_cond (GT_EXPR, and_expr_var,
+	  		            integer_zero_node,
+				    NULL_TREE, NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR, 0);
+
+  rebuild_cgraph_edges ();
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  return bb3;
+}
+
+/* This parses the attribute arguments to target in DECL and determines
+   the right builtin to use to match the platform specification.
+   For now, only one target argument ("arch=" or "<-m>xxx") is allowed.  */
+
+static tree 
+get_builtin_code_for_version (tree decl)
+{
+  tree attrs;
+  struct cl_target_option cur_target;
+  tree target_node;
+  struct cl_target_option *new_target;
+  const char *arg_str = NULL;
+  const char *attrs_str = NULL;
+  char *tok_str = NULL;
+  char *token;
+  /* These are the target attribute strings for which a dispatcher is
+     available, from fold_builtin_cpu.  */
+  const char *feature_list[] = {"mmx", "popcnt", "sse", "sse2", "sse3",
+				"ssse3", "sse4.1", "sse4.2", "avx", "avx2"};
+  unsigned int NUM_FEATURES = sizeof (feature_list) / sizeof (const char *);
+  unsigned int i;
+  tree predicate_chain = NULL_TREE;
+  tree predicate_decl, predicate_arg;
+
+  attrs = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  gcc_assert (attrs != NULL);
+
+  attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+  gcc_assert (TREE_CODE (attrs) == STRING_CST);
+  attrs_str = TREE_STRING_POINTER (attrs);
+
+  /* Handle arch= if specified.  */
+  if (strstr (attrs_str, "arch=") != NULL)
+    {
+      cl_target_option_save (&cur_target, &global_options);
+      target_node = ix86_valid_target_attribute_tree (attrs);
+    
+      gcc_assert (target_node);
+      new_target = TREE_TARGET_OPTION (target_node);
+      gcc_assert (new_target);
+      
+      if (new_target->arch_specified && new_target->arch > 0)
+	{
+	  switch (new_target->arch)
+	    {
+	    case PROCESSOR_CORE2_32:
+	    case PROCESSOR_CORE2_64:
+	      arg_str = "core2";
+	      break;
+	    case PROCESSOR_COREI7_32:
+	    case PROCESSOR_COREI7_64:
+	      arg_str = "corei7";
+	      break;
+	    case PROCESSOR_ATOM:
+	      arg_str = "atom";
+	      break;
+	    case PROCESSOR_AMDFAM10:
+	      arg_str = "amdfam10h";
+	      break;
+	    case PROCESSOR_BDVER1:
+	      arg_str = "bdver1";
+	      break;
+	    case PROCESSOR_BDVER2:
+	      arg_str = "bdver2";
+	      break;
+	    }  
+	}    
+    
+      cl_target_option_restore (&global_options, &cur_target);
+      if (arg_str == NULL)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+	    	"No dispatcher found for the versioning attributes");
+	  return NULL;
+	}
+    
+      predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
+      /* For a C string literal the length includes the trailing NULL.  */
+      predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+      predicate_chain = tree_cons (predicate_decl, predicate_arg,
+				   predicate_chain);
+    }
+
+  /* Process feature name.  */
+  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
+  strcpy (tok_str, attrs_str);
+  token = strtok (tok_str, ",");
+  predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
+
+  while (token != NULL)
+    {
+      /* Do not process "arch="  */
+      if (strncmp (token, "arch=", 5) == 0)
+	{
+	  token = strtok (NULL, ",");
+	  continue;
+	}
+      for (i = 0; i < NUM_FEATURES; ++i)
+	{
+	  if (strcmp (token, feature_list[i]) == 0)
+	    {
+	      predicate_arg = build_string_literal (
+				strlen (feature_list[i]) + 1,
+				feature_list[i]);
+	      predicate_chain = tree_cons (predicate_decl, predicate_arg,
+					   predicate_chain);
+	      break;
+	    }
+	}
+      if (i == NUM_FEATURES)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+		    "No dispatcher found for %s", token);
+	  return NULL;
+	}
+      token = strtok (NULL, ",");
+    }
+  free (tok_str);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+	        "No dispatcher found for the versioning attributes : %s",
+	        attrs_str);
+      return NULL;
+    }
+
+  predicate_chain = nreverse (predicate_chain);
+  return predicate_chain; 
+} 
+
+/* This is the target hook to generate the dispatch function for
+   multi-versioned functions.  DISPATCH_DECL is the function which will
+   contain the dispatch logic.  FNDECLS are the function choices for
+   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
+   in DISPATCH_DECL in which the dispatch code is generated.  */
+
+static int
+ix86_dispatch_version (tree dispatch_decl,
+		       void *fndecls_p,
+		       basic_block *empty_bb)
+{
+  tree default_decl;
+  gimple ifunc_cpu_init_stmt;
+  gimple_seq gseq;
+  tree old_current_function_decl;
+  int ix;
+  tree ele;
+  VEC (tree, heap) *fndecls;
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /*fndecls_p is actually a vector.  */
+  fndecls = (VEC (tree, heap) *)fndecls_p;
+
+  /* Atleast one more version other than the default.  */
+  gcc_assert (VEC_length (tree, fndecls) >= 2);
+
+  /* The first version in the vector is the default decl.  */
+  default_decl = VEC_index (tree, fndecls, 0);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
+  current_function_decl = dispatch_decl;
+
+  gseq = bb_seq (*empty_bb);
+  ifunc_cpu_init_stmt = gimple_build_call_vec (
+                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
+  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
+  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
+  set_bb_seq (*empty_bb, gseq);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+
+  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
+    {
+      tree version_decl = ele;
+      tree predicate_chain = NULL_TREE;
+      /* Get attribute string, parse it and find the right predicate decl.
+         The predicate function could be a lengthy combination of many
+	 features, like arch-type and various isa-variants.  */
+      predicate_chain = get_builtin_code_for_version (version_decl);
+
+      if (predicate_chain == NULL_TREE)
+	continue;
+
+      *empty_bb = add_condition_to_bb (dispatch_decl, version_decl,
+				       predicate_chain, *empty_bb);
+
+    }
+  /* dispatch default version at the end.  */
+  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl,
+				   NULL, *empty_bb);
+  return 0;
+}
+
 /* This builds the processor_model struct type defined in
    libgcc/config/i386/i386-cpuinfo.c  */
 
@@ -39463,6 +39783,9 @@ ix86_autovectorize_vector_sizes (void)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN ix86_fold_builtin
 
+#undef TARGET_DISPATCH_VERSION
+#define TARGET_DISPATCH_VERSION ix86_dispatch_version
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-01 23:51                 ` Sriraman Tallam
@ 2012-05-02  0:09                   ` H.J. Lu
  2012-05-02  2:45                     ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: H.J. Lu @ 2012-05-02  0:09 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

On Tue, May 1, 2012 at 4:51 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi,
>
> New patch attached, updated test case and fixed bugs related to
> __PRETTY_FUNCTION_.
>
> Patch also available for review here:  http://codereview.appspot.com/5752064

@@ -0,0 +1,39 @@
+/* Simple test case to check if Multiversioning works.  */
+/* { dg-do run } */
+/* { dg-options "-O2 -fPIC" } */
+
+#include <assert.h>
+
+int foo ();
+int foo () __attribute__ ((target("arch=corei7,sse4.2,popcnt")));
+/* The target operands in this declaration and the definition are re-ordered.
+   This should still work.  */
+int foo () __attribute__ ((target("ssse3,avx2")));
+
+int (*p)() = &foo;
+int main ()
+{
+  return foo () + (*p)();
+}
+
+int foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=corei7,sse4.2,popcnt")))
+foo ()
+{
+  assert (__builtin_cpu_is ("corei7")
+	  && __builtin_cpu_supports ("sse4.2")
+	  && __builtin_cpu_supports ("popcnt"));
+  return 0;
+}
+
+int __attribute__ ((target("avx2,ssse3")))
+foo ()
+{
+  assert (__builtin_cpu_supports ("avx2")
+	  && __builtin_cpu_supports ("ssse3"));
+  return 0;
+}

This test will pass if

int foo ()
{
 return 0;
}

is selected on processors with AVX.  The run-time test should
check that the right function is selected on the target processor,
not the selected function matches the target attribute. You can
do it by returning different values for each foo and call cpuid
to check if the right foo is selected.

You should add a testcase for __builtin_cpu_supports to check
all valid arguments.

-- 
H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-02  0:09                   ` H.J. Lu
@ 2012-05-02  2:45                     ` Sriraman Tallam
  2012-05-02 13:42                       ` H.J. Lu
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-05-02  2:45 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

[-- Attachment #1: Type: text/plain, Size: 1904 bytes --]

Hi H.J,

   Done now. Patch attached.

Thanks,
-Sri.

On Tue, May 1, 2012 at 5:08 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, May 1, 2012 at 4:51 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi,
>>
>> New patch attached, updated test case and fixed bugs related to
>> __PRETTY_FUNCTION_.
>>
>> Patch also available for review here:  http://codereview.appspot.com/5752064
>
> @@ -0,0 +1,39 @@
> +/* Simple test case to check if Multiversioning works.  */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fPIC" } */
> +
> +#include <assert.h>
> +
> +int foo ();
> +int foo () __attribute__ ((target("arch=corei7,sse4.2,popcnt")));
> +/* The target operands in this declaration and the definition are re-ordered.
> +   This should still work.  */
> +int foo () __attribute__ ((target("ssse3,avx2")));
> +
> +int (*p)() = &foo;
> +int main ()
> +{
> +  return foo () + (*p)();
> +}
> +
> +int foo ()
> +{
> +  return 0;
> +}
> +
> +int __attribute__ ((target("arch=corei7,sse4.2,popcnt")))
> +foo ()
> +{
> +  assert (__builtin_cpu_is ("corei7")
> +         && __builtin_cpu_supports ("sse4.2")
> +         && __builtin_cpu_supports ("popcnt"));
> +  return 0;
> +}
> +
> +int __attribute__ ((target("avx2,ssse3")))
> +foo ()
> +{
> +  assert (__builtin_cpu_supports ("avx2")
> +         && __builtin_cpu_supports ("ssse3"));
> +  return 0;
> +}
>
> This test will pass if
>
> int foo ()
> {
>  return 0;
> }
>
> is selected on processors with AVX.  The run-time test should
> check that the right function is selected on the target processor,
> not the selected function matches the target attribute. You can
> do it by returning different values for each foo and call cpuid
> to check if the right foo is selected.
>
> You should add a testcase for __builtin_cpu_supports to check
> all valid arguments.
>
> --
> H.J.

[-- Attachment #2: mv_fe_patch.txt --]
[-- Type: text/plain, Size: 69163 bytes --]

Overview of the patch which adds front-end support to specify function versions.

Example:

int foo ();  /* Default version */
int foo () __attribute__ ((target("avx,popcnt")));/*Specialized for avx and popcnt */
int foo () __attribute__ ((target("arch=core2,ssse3")));/*Specialized for core2 and ssse3*/

int main ()
{
 int (*p)() = &foo;
 return foo () + (*p)();
}

int foo ()
{
 return 0;
}

int __attribute__ ((target("avx,popcnt")))
foo ()
{
 return 0;
}

int __attribute__ ((target("arch=core2,ssse3")))
foo ()
{
 return 0;
}

The above example has foo defined 3 times, but all 3 definitions of foo are
different versions of the same function. The call to foo in main, directly and
via a pointer, are calls to the multi-versioned function foo which is dispatched
to the right foo at run-time.

What does the patch do?

* Tracking decls that correspond to function versions of function
name, say "foo":

Wnen the front-end sees more than one decl for "foo", with atleast one decl
tagged with "target"  attributes, it marks it as function versions. To
prevent duplicate definition errors with other versions of "foo", I change
"decls_match" function in cp/decl.c to return false when 2 decls have the
same signature but different target attributes. This will make all function
versions of "foo" to be added to the overload list of "foo".

* Change the assembler names of the function versions.

The front-end changes the assembler names of the function versions by suffixing
the sorted list of args to "target" to the function name of "foo". For example,
he assembler name of "void foo () __attribute__ ((target ("sse4")))" will
become _Z3foov.sse4.

* Separately group all function versions of "foo" together, in multiversion.c:

File multiversion.c maintains a hashtab, decl_version_htab,  that maps
the  default function decl of "foo" to the list of all other versions
of this function "foo". This is used when creating the dispatcher for
this function.

* Overload resolution:

 Function "build_over_call" in cp/call.c sees a call to function
"foo", which is multi-versioned. The overload resolution happens in
function "joust" in "cp/call.c". Here, the call to "foo" has all
possible versions of "foo" as candidates. If the caller has target
attributes and if it matches any of the function version's target
attributes, then a direct call is made to that function version.

For example:

int baz __attribute__ ((target ("avx,popcnt")))
{
  foo ();
}

it baz calls foo which is multi-versioned, then the call to foo here
will become a direct call to the version of foo targeted to avx,popcnt.

When a direct call to a version cannot be made then, the default
version of "foo" is the winning candidate. But, "build_over_call" realizes
that this is a versioned function and replaces the call-site of foo with a
"ifunc" call for foo, by querying a function in "multiversion.c" which
builds the ifunc decl. After this, all call-sites of "foo" contain the
call to the ifunc.

* Creating the dispatcher:

The dispatcher is independently created in a new pass, called
"pass_dispatch_version", that runs immediately after cfg and cgraph are
created. The dispatcher looks at all possible versions and queries the
target to give it the CPU detection predicates it must use to dispatch
each version. Then, the dispatcher body is created and the ifunc is
mapped to use this dispatcher.

Notice that only the dispatcher creation is done after the front-end.
Everything else occurs in the front-end itself. I could have created
the dispatcher also in the front-end. I did not do so because I
thought keeping it as a separate pass made sense to easily add more
dispatch mechanisms. Like when IFUNC is not available, replace it with
 control-flow to make direct calls to the function versions. Also,
making the dispatcher after cfg is created was easy.


	* doc/tm.texi.in: Add description for TARGET_DISPATCH_VERSION.
	* doc/tm.texi: Regenerate.
	* target.def (dispatch_version): New target hook.
	* tree.h (DECL_FUNCTION_VERSIONED): New macro.
	(tree_function_decl): New bit-field versioned_function.
	* tree-pass.h (pass_dispatch_versions): New pass.
	* multiversion.c: New file.
	* multiversion.h: New file.
	* cgraphunit.c:
	(cgraph_finalize_function): Force output of versioned inline
	functions.
	* cp/class.c: Include multiversion.h
	(add_method): aggregate function versions. Change assembler names of
	versioned functions.
	(resolve_address_of_overloaded_function): Match address of function
	version with default function.  Return address of ifunc dispatcher
	for address of versioned functions.
	(cxx_comdat_group): Use decl names for comdat groups of versioned
	functions.
	* cp/decl.c (decls_match): Make decls unmatched for versioned
	functions.
	(duplicate_decls): Remove ambiguity for versioned functions. Notify
	of deleted function version decls.
	(start_decl): Change assembler name of versioned functions.
	(start_function): Change assembler name of versioned functions.
	(cxx_comdat_group): Make comdat group of versioned functions be the
	same.
	* cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
	functions that are also marked inline.
	* cp/decl2.c: Include multiversion.h
	(check_classfn): Check attributes of versioned functions for match.
	* cp/call.c: Include multiversion.h
	(build_over_call): Make calls to multiversioned functions to call the
	dispatcher.
	(joust): For calls to multi-versioned functions, make the default
	function win.
	* cp/mangle.c (write_unqualified_name): Use assembler name for
	versioned functions.
	* timevar.def (TV_MULTIVERSION_DISPATCH): New time var.
	* varasm.c (finish_aliases_1): Check if the alias points to a function
	with a body before giving an error.
	* Makefile.in: Add multiversion.o
	* passes.c: Add pass_dispatch_versions to the pass list.
	* config/i386/i386.c (add_condition_to_bb): New function.
	(get_builtin_code_for_version): New function.
	(ix86_dispatch_version): New function.
	(TARGET_DISPATCH_VERSION): New macro.
	* testsuite/g++.dg/mv1.C: New test.

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 186883)
+++ gcc/doc/tm.texi	(working copy)
@@ -10997,6 +10997,14 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb})
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 186883)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -10877,6 +10877,14 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@hook TARGET_DISPATCH_VERSION
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 186883)
+++ gcc/target.def	(working copy)
@@ -1249,6 +1249,15 @@ DEFHOOK
  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
  hook_tree_tree_int_treep_bool_null)
 
+/* Target hook to generate the dispatching code for calls to multi-versioned
+   functions.  DISPATCH_DECL is the function that will have the dispatching
+   logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the
+   basic bloc in DISPATCH_DECL which will contain the code.  */
+DEFHOOK
+(dispatch_version,
+ "",
+ int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL)
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 DEFHOOK
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 186883)
+++ gcc/tree.h	(working copy)
@@ -3539,6 +3539,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
 #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
 
+/* In FUNCTION_DECL, this is set if this function has other versions generated
+   using "target" attributes.  The default version is the one which does not
+   have any "target" attribute set. */
+#define DECL_FUNCTION_VERSIONED(NODE)\
+   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
+
 /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
    arguments/result/saved_tree fields by front ends.   It was either inherit
    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
@@ -3583,8 +3589,8 @@ struct GTY(()) tree_function_decl {
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
-
-  /* 1 bit left */
+  unsigned versioned_function : 1;
+  /* No bits left.  */
 };
 
 /* The source language of the translation-unit.  */
Index: gcc/tree-pass.h
===================================================================
--- gcc/tree-pass.h	(revision 186883)
+++ gcc/tree-pass.h	(working copy)
@@ -453,6 +453,7 @@ extern struct gimple_opt_pass pass_tm_memopt;
 extern struct gimple_opt_pass pass_tm_edges;
 extern struct gimple_opt_pass pass_split_functions;
 extern struct gimple_opt_pass pass_feedback_split_functions;
+extern struct gimple_opt_pass pass_dispatch_versions;
 
 /* IPA Passes */
 extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
Index: gcc/multiversion.c
===================================================================
--- gcc/multiversion.c	(revision 0)
+++ gcc/multiversion.c	(revision 0)
@@ -0,0 +1,832 @@
+/* Function Multiversioning.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+/* Holds the state for multi-versioned functions here. The front-end
+   updates the state as and when function versions are encountered.
+   This is then used to generate the dispatch code.  Also, the
+   optimization passes to clone hot paths involving versioned functions
+   will be done here.
+
+   Function versions are created by using the same function signature but
+   also tagging attribute "target" to specify the platform type for which
+   the version must be executed.  Here is an example:
+
+   int foo ()
+   {
+     printf ("Execute as default");
+     return 0;
+   }
+
+   int  __attribute__ ((target ("arch=corei7")))
+   foo ()
+   {
+     printf ("Execute for corei7");
+     return 0;
+   }
+   
+   int main ()
+   {
+     return foo ();
+   } 
+
+   The call to foo in main is replaced with a call to an IFUNC function that
+   contains the dispatch code to call the correct function version at
+   run-time.  */
+
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "tree-inline.h"
+#include "langhooks.h"
+#include "flags.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "toplev.h"
+#include "timevar.h"
+#include "params.h"
+#include "fibheap.h"
+#include "intl.h"
+#include "tree-pass.h"
+#include "hashtab.h"
+#include "coverage.h"
+#include "ggc.h"
+#include "tree-flow.h"
+#include "rtl.h"
+#include "ipa-prop.h"
+#include "basic-block.h"
+#include "toplev.h"
+#include "dbgcnt.h"
+#include "tree-dump.h"
+#include "output.h"
+#include "vecprim.h"
+#include "gimple-pretty-print.h"
+#include "ipa-inline.h"
+#include "target.h"
+#include "multiversion.h"
+
+typedef void * void_p;
+
+DEF_VEC_P (void_p);
+DEF_VEC_ALLOC_P (void_p, heap);
+
+/* Each function decl that is a function version gets an instance of this
+   structure.   Since this is called by the front-end, decl merging can
+   happen, where a decl created for a new declaration is merged with 
+   the old. In this case, the new decl is deleted and the IS_DELETED
+   field is set for the struct instance corresponding to the new decl.
+   IFUNC_DECL is the decl of the ifunc function for default decls.
+   IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS
+   is a vector containing the list of function versions  that are
+   the candidates for dispatch.  */
+
+typedef struct version_function_d {
+  tree decl;
+  tree ifunc_decl;
+  tree ifunc_resolver_decl;
+  VEC (void_p, heap) *versions;
+  bool is_deleted;
+} version_function;
+
+/* Hashmap has an entry for every function decl that has other function
+   versions.  For function decls that are the default, it also stores the
+   list of all the other function versions.  Each entry is a structure
+   of type version_function_d.  */
+static htab_t decl_version_htab = NULL;
+
+/* Hashtable helpers for decl_version_htab. */
+
+static hashval_t
+decl_version_htab_hash_descriptor (const void *p)
+{
+  const version_function *t = (const version_function *) p;
+  return htab_hash_pointer (t->decl);
+}
+
+/* Hashtable helper for decl_version_htab. */
+
+static int
+decl_version_htab_eq_descriptor (const void *p1, const void *p2)
+{
+  const version_function *t1 = (const version_function *) p1;
+  return htab_eq_pointer ((const void_p) t1->decl, p2);
+}
+
+/* Create the decl_version_htab.  */
+static void
+create_decl_version_htab (void)
+{
+  if (decl_version_htab == NULL)
+    decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor,
+				     decl_version_htab_eq_descriptor, NULL);
+}
+
+/* Creates an instance of version_function for decl DECL.  */
+
+static version_function*
+new_version_function (const tree decl)
+{
+  version_function *v;
+  v = (version_function *)xmalloc(sizeof (version_function));
+  v->decl = decl;
+  v->ifunc_decl = NULL;
+  v->ifunc_resolver_decl = NULL;
+  v->versions = NULL;
+  v->is_deleted = false;
+  return v;
+}
+
+/* Comparator function to be used in qsort routine to sort attribute
+   specification strings to "target".  */
+
+static int
+attr_strcmp (const void *v1, const void *v2)
+{
+  const char *c1 = *(char *const*)v1;
+  const char *c2 = *(char *const*)v2;
+  return strcmp (c1, c2);
+}
+
+/* STR is the argument to target attribute.  This function tokenizes
+   the comma separated arguments, sorts them and returns a string which
+   is a unique identifier for the comma separated arguments.  */
+
+static char *
+sorted_attr_string (const char *str)
+{
+  char **args = NULL;
+  char *attr_str, *ret_str;
+  char *attr = NULL;
+  unsigned int argnum = 1;
+  unsigned int i;
+
+  for (i = 0; i < strlen (str); i++)
+    if (str[i] == ',')
+      argnum++;
+
+  attr_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (attr_str, str);
+
+  for (i = 0; i < strlen (attr_str); i++)
+    if (attr_str[i] == '=')
+      attr_str[i] = '_';
+
+  if (argnum == 1)
+    return attr_str;
+
+  args = (char **)xmalloc (argnum * sizeof (char *));
+
+  i = 0;
+  attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      args[i] = attr;
+      i++;
+      attr = strtok (NULL, ",");
+    }
+
+  qsort (args, argnum, sizeof (char*), attr_strcmp);
+
+  ret_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (ret_str, args[0]);
+  for (i = 1; i < argnum; i++)
+    {
+      strcat (ret_str, "_");
+      strcat (ret_str, args[i]);
+    }
+
+  free (args);
+  free (attr_str);
+  return ret_str;
+}
+
+/* Returns true when only one of DECL1 and DECL2 is marked with "target"
+   or if the "target" attribute strings of DECL1 and DECL2 dont match.  */
+
+bool
+has_different_version_attributes (const tree decl1, const tree decl2)
+{
+  tree attr1, attr2;
+  char *c1, *c2;
+  bool ret = false;
+
+  if (TREE_CODE (decl1) != FUNCTION_DECL
+      || TREE_CODE (decl2) != FUNCTION_DECL)
+    return false;
+
+  attr1 = lookup_attribute ("target", DECL_ATTRIBUTES (decl1));
+  attr2 = lookup_attribute ("target", DECL_ATTRIBUTES (decl2));
+
+  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
+    return false;
+
+  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
+      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
+    return true;
+
+  c1 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1))));
+  c2 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2))));
+
+  if (strcmp (c1, c2) != 0)
+     ret = true;
+
+  free (c1);
+  free (c2);
+
+  return ret;
+}
+
+/* If this decl corresponds to a function and has "target" attribute,
+   append the attribute string to its assembler name.  */
+
+static void
+version_assembler_name (const tree decl)
+{
+  tree version_attr;
+  const char *orig_name, *version_string, *attr_str;
+  char *assembler_name;
+  tree assembler_name_tree;
+  
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+    return;
+
+  if (DECL_DECLARED_INLINE_P (decl)
+      &&lookup_attribute ("gnu_inline",
+			  DECL_ATTRIBUTES (decl)))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Function versions cannot be marked as gnu_inline,"
+	      " bodies have to be generated\n");
+
+  if (DECL_VIRTUAL_P (decl)
+      || DECL_VINDEX (decl))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Virtual function versioning not supported\n");
+
+  version_attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  /* target attribute string is NULL for default functions.  */
+  if (version_attr == NULL_TREE)
+    return;
+
+  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+  version_string
+    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
+
+  attr_str = sorted_attr_string (version_string);
+  assembler_name = (char *) xmalloc (strlen (orig_name)
+				     + strlen (attr_str) + 2);
+
+  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
+  if (dump_file)
+    fprintf (dump_file, "Assembler name set to %s for function version %s\n",
+	     assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
+
+  assembler_name_tree = get_identifier (assembler_name);
+
+  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree);
+  SET_DECL_RTL (decl, NULL);
+}
+
+void
+mark_function_as_version (const tree decl)
+{
+  if (DECL_FUNCTION_VERSIONED (decl))
+    return;
+  DECL_FUNCTION_VERSIONED (decl) = 1;
+  version_assembler_name (decl);
+}
+
+/* Returns true if function DECL has target attribute set.  This could be
+   a version.  */
+
+bool
+is_target_attribute_set (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && (lookup_attribute ("target", DECL_ATTRIBUTES (decl))
+	      != NULL_TREE));
+}
+
+/* Returns true if decl is multi-versioned and DECL is the default function,
+   that is it is not tagged with "target" attribute.  */
+
+bool
+is_default_function (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && (lookup_attribute ("target", DECL_ATTRIBUTES (decl))
+	      == NULL_TREE));	
+}
+
+/* For function decl DECL, find the version_function struct in the
+   decl_version_htab.  */
+
+static version_function *
+find_function_version (const tree decl)
+{
+  void *slot;
+
+  if (!DECL_FUNCTION_VERSIONED (decl))
+    return NULL;
+
+  if (!decl_version_htab)
+    return NULL;
+
+  slot = htab_find_with_hash (decl_version_htab, decl,
+                              htab_hash_pointer (decl));
+
+  if (slot != NULL)
+    return (version_function *)slot;
+
+  return NULL;
+}
+
+/* Record DECL as a function version by creating a version_function struct
+   for it and storing it in the hashtable.  */
+
+static version_function *
+add_function_version (const tree decl)
+{
+  void **slot;
+  version_function *v;
+
+  if (!DECL_FUNCTION_VERSIONED (decl))
+    return NULL;
+
+  create_decl_version_htab ();
+
+  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl,
+                                   htab_hash_pointer ((const void_p)decl),
+				   INSERT);
+
+  if (*slot != NULL)
+    return (version_function *)*slot;
+
+  v = new_version_function (decl);
+  *slot = v;
+
+  return v;
+}
+
+/* Push V into VEC only if it is not already present.  If already present
+   returns false.  */
+
+static bool
+push_function_version (version_function *v, VEC (void_p, heap) **vec)
+{
+  int ix;
+  void_p ele; 
+  for (ix = 0; VEC_iterate (void_p, *vec, ix, ele); ++ix)
+    {
+      if (ele == (void_p)v)
+        return false;
+    }
+
+  VEC_safe_push (void_p, heap, *vec, (void*)v);
+  return true;
+}
+ 
+/* Mark DECL as deleted.  This is called by the front-end when a duplicate
+   decl is merged with the original decl and the duplicate decl is deleted.
+   This function marks the duplicate_decl as invalid.  Called by
+   duplicate_decls in cp/decl.c.  */
+
+void
+mark_delete_decl_version (const tree decl)
+{
+  version_function *decl_v;
+
+  decl_v = find_function_version (decl);
+
+  if (decl_v == NULL)
+    return;
+
+  decl_v->is_deleted = true;
+
+  if (is_default_function (decl)
+      && decl_v->versions != NULL)
+    {
+      VEC_truncate (void_p, decl_v->versions, 0);
+      VEC_free (void_p, heap, decl_v->versions);
+      decl_v->versions = NULL;
+    }
+}
+
+/* Mark DECL1 and DECL2 to be function versions in the same group.  One
+   of DECL1 and DECL2 must be the default, otherwise this function does
+   nothing.  This function aggregates the versions.  */
+
+int
+group_function_versions (const tree decl1, const tree decl2)
+{
+  tree default_decl, version_decl;
+  version_function *default_v, *version_v;
+
+  gcc_assert (DECL_FUNCTION_VERSIONED (decl1)
+	      && DECL_FUNCTION_VERSIONED (decl2));
+
+  /* The version decls are added only to the default decl.  */
+  if (!is_default_function (decl1)
+      && !is_default_function (decl2))
+    return 0;
+
+  /* This can happen with duplicate declarations.  Just ignore.  */
+  if (is_default_function (decl1)
+      && is_default_function (decl2))
+    return 0;
+
+  default_decl = (is_default_function (decl1)) ? decl1 : decl2;
+  version_decl = (default_decl == decl1) ? decl2 : decl1;
+
+  gcc_assert (default_decl != version_decl);
+  create_decl_version_htab ();
+
+  /* If the version function is found, it has been added.  */
+  if (find_function_version (version_decl))
+    return 0;
+
+  default_v = add_function_version (default_decl);
+  version_v = add_function_version (version_decl);
+
+  if (default_v->versions == NULL)
+    default_v->versions = VEC_alloc (void_p, heap, 1);
+
+  push_function_version (version_v, &default_v->versions);
+  return 0;
+}
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* Return a new name by appending SUFFIX to the DECL name.  If
+   make_unique is true, append the full path name.  */
+
+static char *
+make_name (tree decl, const char *suffix, bool make_unique)
+{
+  char *global_var_name;
+  int name_len;
+  const char *name;
+  const char *unique_name = NULL;
+
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* Get a unique name that can be used globally without any chances
+     of collision at link time.  */
+  if (make_unique)
+    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
+
+  name_len = strlen (name) + strlen (suffix) + 2;
+
+  if (make_unique)
+    name_len += strlen (unique_name) + 1;
+  global_var_name = (char *) xmalloc (name_len);
+
+  /* Use '.' to concatenate names as it is demangler friendly.  */
+  if (make_unique)
+      snprintf (global_var_name, name_len, "%s.%s.%s", name,
+		unique_name, suffix);
+  else
+      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
+
+  return global_var_name;
+}
+
+/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch
+   the versions of multi-versioned function DEFAULT_DECL.  Create and
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_ifunc_resolver_func (const tree default_decl,
+			  const tree ifunc_decl,
+			  basic_block *empty_bb)
+{
+  char *resolver_name;
+  tree decl, type, decl_name, t;
+  basic_block new_bb;
+  tree old_current_function_decl;
+  bool make_unique = false;
+
+  /* IFUNC's have to be globally visible.  So, if the default_decl is
+     not, then the name of the IFUNC should be made unique.  */
+  if (TREE_PUBLIC (default_decl) == 0)
+    make_unique = true;
+
+  /* Append the filename to the resolver function if the versions are
+     not externally visible.  This is because the resolver function has
+     to be externally visible for the loader to find it.  So, appending
+     the filename will prevent conflicts with a resolver function from
+     another module which is based on the same version name.  */
+  resolver_name = make_name (default_decl, "resolver", make_unique);
+
+  /* The resolver function should return a (void *). */
+  type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  decl_name = get_identifier (resolver_name);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = TREE_USED (default_decl);
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  /* IFUNC resolvers have to be externally visible.  */
+  TREE_PUBLIC (decl) = 1;
+  DECL_UNINLINABLE (decl) = 1;
+
+  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl);
+  DECL_EXTERNAL (ifunc_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+  TREE_READONLY (decl) = 0;
+  DECL_PURE_P (decl) = 0;
+  DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
+    }
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  current_function_decl = decl;
+  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
+  cfun->curr_properties |=
+    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
+     PROP_ssa);
+  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
+  *empty_bb = new_bb;
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
+  cgraph_mark_force_output_node (cgraph_get_create_node (decl));
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      gcc_assert (cgraph_get_node (default_decl));
+      cgraph_add_to_same_comdat_group (cgraph_get_node (decl),
+				       cgraph_get_node (default_decl));
+    }
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  gcc_assert (ifunc_decl != NULL);
+  /* Mark ifunc_decl as "ifunc" with resolver as resolver_name.  */
+  DECL_ATTRIBUTES (ifunc_decl) 
+    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl));
+
+  /* Create the alias here.  */
+  cgraph_create_function_alias (ifunc_decl, decl);
+  return decl;
+}
+
+/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to
+   DECL function will be replaced with calls to the ifunc.   Return the decl
+   of the ifunc created.  */
+
+static tree
+make_ifunc_func (const tree decl)
+{
+  tree ifunc_decl;
+  char *ifunc_name, *resolver_name;
+  tree fn_type, ifunc_type;
+  bool make_unique = false;
+
+  if (TREE_PUBLIC (decl) == 0)
+    make_unique = true;
+
+  ifunc_name = make_name (decl, "ifunc", make_unique);
+  resolver_name = make_name (decl, "resolver", make_unique);
+  gcc_assert (resolver_name);
+
+  fn_type = TREE_TYPE (decl);
+  ifunc_type = build_function_type (TREE_TYPE (fn_type),
+				    TYPE_ARG_TYPES (fn_type));
+  
+  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type);
+  TREE_USED (ifunc_decl) = 1;
+  DECL_CONTEXT (ifunc_decl) = NULL_TREE;
+  DECL_INITIAL (ifunc_decl) = error_mark_node;
+  DECL_ARTIFICIAL (ifunc_decl) = 1;
+  /* Mark this ifunc as external, the resolver will flip it again if
+     it gets generated.  */
+  DECL_EXTERNAL (ifunc_decl) = 1;
+  /* IFUNCs have to be externally visible.  */
+  TREE_PUBLIC (ifunc_decl) = 1;
+
+  return ifunc_decl;  
+}
+
+/* For multi-versioned function decl, which should also be the default,
+   return the decl of the ifunc resolver, create it if it does not
+   exist.  */
+
+tree
+get_ifunc_for_version (const tree decl)
+{
+  version_function *decl_v;
+  int ix;
+  void_p ele;
+
+  /* DECL has to be the default version, otherwise it is missing and
+     that is not allowed.  */
+  if (!is_default_function (decl))
+    {
+      error_at (DECL_SOURCE_LOCATION (decl), "Default version not found");
+      return decl;
+    }
+
+  decl_v = find_function_version (decl);
+  gcc_assert (decl_v != NULL);
+  if (decl_v->ifunc_decl == NULL)
+    {
+      tree ifunc_decl;
+      ifunc_decl = make_ifunc_func (decl);
+      decl_v->ifunc_decl = ifunc_decl;
+    }
+
+  if (cgraph_get_node (decl))
+    cgraph_mark_force_output_node (cgraph_get_node (decl));
+
+  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
+    {
+      version_function *v = (version_function *) ele;
+      /* This could be a deleted version.  Happens with
+	 duplicate declarations. */
+      if (v->is_deleted)
+	continue;
+      gcc_assert (v->decl != NULL);
+      if (cgraph_get_node (v->decl))
+	cgraph_mark_force_output_node (cgraph_get_node (v->decl));
+    }
+
+  return decl_v->ifunc_decl;
+}
+
+/* Generate the dispatching code to dispatch multi-versioned function
+   DECL.  Make a new function decl for dispatching and call the target
+   hook to process the "target" attributes and provide the code to
+   dispatch the right function at run-time.  */
+
+static tree
+make_ifunc_resolver_for_version (const tree decl)
+{
+  version_function *decl_v;
+  tree ifunc_resolver_decl, ifunc_decl;
+  basic_block empty_bb;
+  int ix;
+  void_p ele;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  tree old_current_function_decl;
+
+  gcc_assert (is_default_function (decl));
+
+  decl_v = find_function_version (decl);
+  gcc_assert (decl_v != NULL);
+
+  if (decl_v->ifunc_resolver_decl != NULL)
+    return decl_v->ifunc_resolver_decl;
+
+  ifunc_decl = decl_v->ifunc_decl;
+
+  if (ifunc_decl == NULL)
+    ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl);
+
+  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl,
+						  &empty_bb);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (ifunc_resolver_decl));
+  current_function_decl = ifunc_resolver_decl;
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+  VEC_safe_push (tree, heap, fn_ver_vec, decl);
+
+  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
+    {
+      version_function *v = (version_function *) ele;
+      gcc_assert (v->decl != NULL);
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (v->decl))
+        error_at (DECL_SOURCE_LOCATION (v->decl),
+		  "Virtual function versioning not supported\n");
+      if (!v->is_deleted)
+	VEC_safe_push (tree, heap, fn_ver_vec, v->decl);
+    }
+
+  gcc_assert (targetm.dispatch_version);
+  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb);
+  decl_v->ifunc_resolver_decl = ifunc_resolver_decl;
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+  return ifunc_resolver_decl;
+}
+
+/* Main entry point to pass_dispatch_versions. For multi-versioned functions,
+   generate the dispatching code.  */
+
+static unsigned int
+do_dispatch_versions (void)
+{
+  /* A new pass for generating dispatch code for multi-versioned functions.
+     Other forms of dispatch can be added when ifunc support is not available
+     like just calling the function directly after checking for target type.
+     Currently, dispatching is done through IFUNC.  This pass will become
+     more meaningful when other dispatch mechanisms are added.  */
+
+  /* Cloning a function to produce more versions will happen here when the
+     user requests that via the target attribute. For example,
+     int foo () __attribute__ ((target(("arch=core2"), ("arch=corei7"))));
+     means that the user wants the same body of foo to be versioned for core2
+     and corei7.  In that case, this function will be cloned during this
+     pass.  */
+  
+  if (DECL_FUNCTION_VERSIONED (current_function_decl)
+      && is_default_function (current_function_decl))
+    {
+      tree decl = make_ifunc_resolver_for_version (current_function_decl);
+      if (dump_file && decl)
+	dump_function_to_file (decl, dump_file, TDF_BLOCKS);
+    }
+  return 0;
+}
+
+static  bool
+gate_dispatch_versions (void)
+{
+  return true;
+}
+
+/* A pass to generate the dispatch code to execute the appropriate version
+   of a multi-versioned function at run-time.  */
+
+struct gimple_opt_pass pass_dispatch_versions =
+{
+ {
+  GIMPLE_PASS,
+  "dispatch_multiversion_functions",    /* name */
+  gate_dispatch_versions,		/* gate */
+  do_dispatch_versions,			/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_MULTIVERSION_DISPATCH,		/* tv_id */
+  PROP_cfg,				/* properties_required */
+  PROP_cfg,				/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  0					/* todo_flags_finish */
+ }
+};
Index: gcc/cgraphunit.c
===================================================================
--- gcc/cgraphunit.c	(revision 186883)
+++ gcc/cgraphunit.c	(working copy)
@@ -411,6 +411,13 @@ cgraph_finalize_function (tree decl, bool nested)
       && !DECL_DISREGARD_INLINE_LIMITS (decl))
     node->symbol.force_output = 1;
 
+  /* With function versions, keep inline functions and do not worry about
+     inline limits.  */
+  if (DECL_FUNCTION_VERSIONED (decl)
+      && DECL_DECLARED_INLINE_P (decl)
+      && !DECL_EXTERNAL (decl))
+    node->symbol.force_output = 1;
+
   /* When not optimizing, also output the static functions. (see
      PR24561), but don't do so for always_inline functions, functions
      declared inline and nested functions.  These were optimized out
Index: gcc/multiversion.h
===================================================================
--- gcc/multiversion.h	(revision 0)
+++ gcc/multiversion.h	(revision 0)
@@ -0,0 +1,55 @@
+/* Function Multiversioning.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+/* This is the header file which provides the functions to keep track
+   of functions that are multi-versioned and to generate the dispatch
+   code to call the right version at run-time.  */
+
+#ifndef GCC_MULTIVERSION_H
+#define GCC_MULTIVERION_H
+
+#include "tree.h"
+
+/* Mark DECL1 and DECL2 as function versions.  */
+int group_function_versions (const tree decl1, const tree decl2);
+
+/* Mark DECL as deleted and no longer a version.  */
+void mark_delete_decl_version (const tree decl);
+
+/* Returns true if DECL is the default version to be executed if all
+   other versions are inappropriate at run-time.  */
+bool is_default_function (const tree decl);
+
+/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL
+   must be the default function in the multi-versioned group.  */
+tree get_ifunc_for_version (const tree decl);
+
+/* Returns true when only one of DECL1 and DECL2 is marked with "target"
+   or if the "target" attribute strings of  DECL1 and DECL2 dont match.  */
+bool has_different_version_attributes (const tree decl1, const tree decl2);
+
+/* Function DECL is marked to be a multi-versioned function.  If DECL is
+   not the default version, the assembler name of DECL is changed to include
+   the attribute string to keep the name unambiguous.  */
+void mark_function_as_version (const tree decl);
+
+/* Check if decl is FUNCTION_DECL with target attribute set.  */
+bool is_target_attribute_set (const tree decl);
+#endif
Index: gcc/cp/class.c
===================================================================
--- gcc/cp/class.c	(revision 186883)
+++ gcc/cp/class.c	(working copy)
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-dump.h"
 #include "splay-tree.h"
 #include "pointer-set.h"
+#include "multiversion.h"
 
 /* The number of nested classes being processed.  If we are not in the
    scope of any class, this is zero.  */
@@ -1092,7 +1093,21 @@ add_method (tree type, tree method, tree using_dec
 	      || same_type_p (TREE_TYPE (fn_type),
 			      TREE_TYPE (method_type))))
 	{
-	  if (using_decl)
+	  /* For function versions, their parms and types match
+	     but they are not duplicates.  Record function versions
+	     as and when they are found.  */
+	  if (TREE_CODE (fn) == FUNCTION_DECL
+	      && TREE_CODE (method) == FUNCTION_DECL
+	      && (is_target_attribute_set (fn)
+		  || is_target_attribute_set (method))
+	      && has_different_version_attributes (fn, method))
+ 	    {
+	      mark_function_as_version (fn);
+	      mark_function_as_version (method);
+	      group_function_versions (fn, method);
+	      continue;
+	    }
+	  else if (using_decl)
 	    {
 	      if (DECL_CONTEXT (fn) == type)
 		/* Defer to the local function.  */
@@ -1150,6 +1165,7 @@ add_method (tree type, tree method, tree using_dec
   else
     /* Replace the current slot.  */
     VEC_replace (tree, method_vec, slot, overload);
+
   return true;
 }
 
@@ -6930,8 +6946,11 @@ resolve_address_of_overloaded_function (tree targe
 	  if (DECL_ANTICIPATED (fn))
 	    continue;
 
-	  /* See if there's a match.  */
-	  if (same_type_p (target_fn_type, static_fn_type (fn)))
+	  /* See if there's a match.   For functions that are multi-versioned
+	     match it to the default function.  */
+	  if (same_type_p (target_fn_type, static_fn_type (fn))
+	      && (!DECL_FUNCTION_VERSIONED (fn)
+		  || is_default_function (fn)))
 	    matches = tree_cons (fn, NULL_TREE, matches);
 	}
     }
@@ -7093,6 +7112,22 @@ resolve_address_of_overloaded_function (tree targe
       perform_or_defer_access_check (access_path, fn, fn);
     }
 
+  /* If a pointer to a function that is multi-versioned is requested, the
+     pointer to the dispatcher function is returned instead.  This works
+     well because indirectly calling the function will dispatch the right
+     function version at run-time. Also, the function address is kept
+     unique.  */
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && is_default_function (fn))
+    {
+      tree ifunc_decl;
+      ifunc_decl = get_ifunc_for_version (fn);
+      retrofit_lang_decl (ifunc_decl);
+      gcc_assert (ifunc_decl != NULL);
+      mark_used (fn);
+      return build_fold_addr_expr (ifunc_decl);
+    }
+
   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
     return cp_build_addr_expr (fn, flags);
   else
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	(revision 186883)
+++ gcc/cp/decl.c	(working copy)
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "pointer-set.h"
 #include "splay-tree.h"
 #include "plugin.h"
+#include "multiversion.h"
 
 /* Possible cases of bad specifiers type used by bad_specifiers. */
 enum bad_spec_place {
@@ -973,6 +974,21 @@ decls_match (tree newdecl, tree olddecl)
       if (t1 != t2)
 	return 0;
 
+      /* The decls dont match if they correspond to two different versions
+	 of the same function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) 
+	  && has_different_version_attributes (newdecl, olddecl))
+	{
+	  /* One of the decls could be the default without the "target"
+	     attribute. Set it to be a versioned function here.  */
+	  mark_function_as_version (newdecl);
+	  mark_function_as_version (olddecl);
+	  /* Accumulate all the versions of a function.  */
+	  group_function_versions (olddecl, newdecl);
+	  return 0;
+	}
+
       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
 	  && ! (DECL_EXTERN_C_P (newdecl)
 		&& DECL_EXTERN_C_P (olddecl)))
@@ -1490,7 +1506,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
 	      error ("previous declaration %q+#D here", olddecl);
 	      return NULL_TREE;
 	    }
-	  else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
+	  /* For function versions, params and types match, but they
+	     are not ambiguous.  */
+	  else if ((!DECL_FUNCTION_VERSIONED (newdecl)
+		    && !DECL_FUNCTION_VERSIONED (olddecl))
+		   && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
 			      TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
 	    {
 	      error ("new declaration %q#D", newdecl);
@@ -2262,6 +2282,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
+  /* If the olddecl is a version, so is the newdecl.  */
+  if (TREE_CODE (newdecl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (olddecl))
+    {
+      DECL_FUNCTION_VERSIONED (newdecl) = 1;
+      /* Record that newdecl is not a valid version and has
+	 been deleted.  */
+      mark_delete_decl_version (newdecl);
+    }
+
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
     {
       int function_size;
@@ -3810,6 +3840,7 @@ cp_make_fname_decl (location_t loc, tree id, int t
 			    ? NULL : fname_as_string (type_dep));
   tree type;
   tree init = cp_fname_init (name, &type);
+
   tree decl = build_decl (loc, VAR_DECL, id, type);
 
   if (name)
@@ -14035,7 +14066,11 @@ cxx_comdat_group (tree decl)
 	  else
 	    break;
 	}
-      name = DECL_ASSEMBLER_NAME (decl);
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl))
+	name = DECL_NAME (decl);
+      else
+        name = DECL_ASSEMBLER_NAME (decl);
     }
 
   return name;
Index: gcc/cp/semantics.c
===================================================================
--- gcc/cp/semantics.c	(revision 186883)
+++ gcc/cp/semantics.c	(working copy)
@@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn)
       /* If the user wants us to keep all inline functions, then mark
 	 this function as needed so that finish_file will make sure to
 	 output it later.  Similarly, all dllexport'd functions must
-	 be emitted; there may be callers in other DLLs.  */
-      if ((flag_keep_inline_functions
+	 be emitted; there may be callers in other DLLs.
+	 Also, mark this function as needed if it is marked inline but
+	 is a multi-versioned function.  */
+      if (((flag_keep_inline_functions
+	    || DECL_FUNCTION_VERSIONED (fn))
 	   && DECL_DECLARED_INLINE_P (fn)
 	   && !DECL_REALLY_EXTERN (fn))
 	  || (flag_keep_inline_dllexport
Index: gcc/cp/decl2.c
===================================================================
--- gcc/cp/decl2.c	(revision 186883)
+++ gcc/cp/decl2.c	(working copy)
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "splay-tree.h"
 #include "langhooks.h"
 #include "c-family/c-ada-spec.h"
+#include "multiversion.h"
 
 extern cpp_reader *parse_in;
 
@@ -677,9 +678,13 @@ check_classfn (tree ctype, tree function, tree tem
 	  if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
 	    continue;
 
+	  /* While finding a match, same types and params are not enough
+	     if the function is versioned.  Also check version ("target")
+	     attributes.  */
 	  if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
 			   TREE_TYPE (TREE_TYPE (fndecl)))
 	      && compparms (p1, p2)
+	      && !has_different_version_attributes (function, fndecl)
 	      && (!is_template
 		  || comp_template_parms (template_parms,
 					  DECL_TEMPLATE_PARMS (fndecl)))
Index: gcc/cp/call.c
===================================================================
--- gcc/cp/call.c	(revision 186883)
+++ gcc/cp/call.c	(working copy)
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "timevar.h"
+#include "multiversion.h"
 
 /* The various kinds of conversion.  */
 
@@ -3903,6 +3904,16 @@ build_new_function_call (tree fn, VEC(tree,gc) **a
     {
       if (complain & tf_error)
 	{
+	  /* If the call is to a multiversioned function without
+	     a default version, overload resolution will fail.  */
+	  if (candidates
+	      && TREE_CODE (candidates->fn) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (candidates->fn))
+	    error_at (location_of (DECL_NAME (OVL_CURRENT (fn))),
+		      "Call to multiversioned function %<%D(%A)%> with"
+		      " no default version", DECL_NAME (OVL_CURRENT (fn)),
+		      build_tree_list_vec (*args));
+
 	  if (!any_viable_p && candidates && ! candidates->next
 	      && (TREE_CODE (candidates->fn) == FUNCTION_DECL))
 	    return cp_build_function_call_vec (candidates->fn, args, complain);
@@ -6809,6 +6820,19 @@ build_over_call (struct z_candidate *cand, int fla
   if (!already_used)
     mark_used (fn);
 
+  /* For a call to a multi-versioned function, the call should actually be to
+     the dispatcher.  */
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && is_default_function (fn))
+    {
+      tree ifunc_decl;
+      ifunc_decl = get_ifunc_for_version (fn);
+      retrofit_lang_decl (ifunc_decl);
+      gcc_assert (ifunc_decl != NULL);
+      return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl,
+					nargs, argarray);
+    }
+
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
     {
       tree t;
@@ -8067,6 +8091,60 @@ joust (struct z_candidate *cand1, struct z_candida
   size_t i;
   size_t len;
 
+  /* For Candidates of a multi-versioned function, first check if the
+     target flags of the caller match any of the candidates. If so,
+     the caller can directly call this candidate otherwise the one marked
+     default wins.  This is because the default decl is used as key to
+     aggregate all the other versions provided for it in multiversion.c.
+     When generating the actual call, the appropriate dispatcher is created
+     to call the right function version at run-time.  */
+
+  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL
+       && DECL_FUNCTION_VERSIONED (cand1->fn))
+      ||(TREE_CODE (cand2->fn) == FUNCTION_DECL
+	 && DECL_FUNCTION_VERSIONED (cand2->fn)))
+    {
+      /* Both functions must be marked versioned.  */
+      gcc_assert (DECL_FUNCTION_VERSIONED (cand1->fn)
+		  && DECL_FUNCTION_VERSIONED (cand2->fn));
+
+      /* Try to see if a direct call can be made to a version.  This is
+	 possible if the caller and callee have the same target flags.
+	 If cand->fn is marked with target attributes,  check if the
+	 target approves inlining this into the caller.  If so, this is
+	 the version we want.  */
+
+      if (is_target_attribute_set (cand1->fn)
+	  && targetm.target_option.can_inline_p (current_function_decl,
+						 cand1->fn))
+	return 1;
+
+      if (is_target_attribute_set (cand2->fn)
+	  && targetm.target_option.can_inline_p (current_function_decl,
+						 cand2->fn))
+	return -1;
+
+      /* A direct call to a version is not possible, so find the default
+	 function and return it.  This will later be converted to dispatch
+	 the right version at run time.  */
+
+      if (is_default_function (cand1->fn))
+	{
+          mark_used (cand2->fn);
+	  return 1;
+	}
+
+      if (is_default_function (cand2->fn))
+	{
+          mark_used (cand1->fn);
+	  return -1;
+	}
+
+      /* If a default function is absent, this will never get resolved leading
+	 to an ambiguous call error.  */
+      return 0;
+    }
+
   /* Candidates that involve bad conversions are always worse than those
      that don't.  */
   if (cand1->viable > cand2->viable)
Index: gcc/timevar.def
===================================================================
--- gcc/timevar.def	(revision 186883)
+++ gcc/timevar.def	(working copy)
@@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE        , "tree if-co
 DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
 DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
 DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
+DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch")
 
 /* Everything else in rest_of_compilation not included above.  */
 DEFTIMEVAR (TV_EARLY_LOCAL	     , "early local passes")
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 186883)
+++ gcc/Makefile.in	(working copy)
@@ -1294,6 +1294,7 @@ OBJS = \
 	mcf.o \
 	mode-switching.o \
 	modulo-sched.o \
+	multiversion.o \
 	omega.o \
 	omp-low.o \
 	optabs.o \
@@ -3030,6 +3031,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h
    $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
    $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
    $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
+multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+   $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
+   $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
+   $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
+   $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h
 cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 186883)
+++ gcc/passes.c	(working copy)
@@ -1287,6 +1287,7 @@ init_optimization_passes (void)
   NEXT_PASS (pass_build_cfg);
   NEXT_PASS (pass_warn_function_return);
   NEXT_PASS (pass_build_cgraph_edges);
+  NEXT_PASS (pass_dispatch_versions);
   *p = NULL;
 
   /* Interprocedural optimization passes.  */
Index: gcc/cp/mangle.c
===================================================================
--- gcc/cp/mangle.c	(revision 186883)
+++ gcc/cp/mangle.c	(working copy)
@@ -1245,7 +1245,12 @@ write_unqualified_name (const tree decl)
     {
       MANGLE_TRACE_TREE ("local-source-name", decl);
       write_char ('L');
-      write_source_name (DECL_NAME (decl));
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && DECL_ASSEMBLER_NAME_SET_P (decl))
+	write_source_name (DECL_ASSEMBLER_NAME (decl));
+      else
+	write_source_name (DECL_NAME (decl));
       /* The default discriminator is 1, and that's all we ever use,
 	 so there's no code to output one here.  */
     }
@@ -1260,7 +1265,14 @@ write_unqualified_name (const tree decl)
                && LAMBDA_TYPE_P (type))
         write_closure_type_name (type);
       else
-        write_source_name (DECL_NAME (decl));
+	{
+	  if (TREE_CODE (decl) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (decl)
+	      && DECL_ASSEMBLER_NAME_SET_P (decl))
+	    write_source_name (DECL_ASSEMBLER_NAME (decl));
+	  else
+	    write_source_name (DECL_NAME (decl));
+	}
     }
 }
 
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 186883)
+++ gcc/config/i386/i386.c	(working copy)
@@ -27678,6 +27678,326 @@ ix86_init_mmx_sse_builtins (void)
     }
 }
 
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
+   to return a pointer to VERSION_DECL if the outcome of the expression
+   formed by PREDICATE_CHAIN is true.  This function will be called during
+   version dispatch to decide which function version to execute.  It returns
+   the basic block at the end to which more conditions can be added.  */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     tree predicate_chain, basic_block new_bb)
+{
+  gimple return_stmt;
+  tree convert_expr, result_var;
+  gimple convert_stmt;
+  gimple call_cond_stmt;
+  gimple if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+
+  tree cond_var, and_expr_var = NULL_TREE;
+  gimple_seq gseq;
+
+  tree old_current_function_decl;
+  tree predicate_decl, predicate_arg;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+  current_function_decl = function_decl;
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node, NULL);
+  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  return_stmt = gimple_build_return (result_var);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      current_function_decl = old_current_function_decl;
+      return new_bb;
+    }
+
+  while (predicate_chain != NULL)
+    {
+      cond_var = create_tmp_var (integer_type_node, NULL);
+      predicate_decl = TREE_PURPOSE (predicate_chain);
+      predicate_arg = TREE_VALUE (predicate_chain);
+      call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+      gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+      gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+      gimple_set_bb (call_cond_stmt, new_bb);
+      gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+      predicate_chain = TREE_CHAIN (predicate_chain);
+      
+      if (and_expr_var == NULL)
+        and_expr_var = cond_var;
+      else
+	{
+	  gimple assign_stmt;
+	  /* Use MIN_EXPR to check if any integer is zero?.
+	     and_expr_var = min_expr <cond_var, and_expr_var>  */
+	  assign_stmt = gimple_build_assign (and_expr_var,
+			  build2 (MIN_EXPR, integer_type_node,
+				  cond_var, and_expr_var));
+
+	  gimple_set_block (assign_stmt, DECL_INITIAL (function_decl));
+	  gimple_set_bb (assign_stmt, new_bb);
+	  gimple_seq_add_stmt (&gseq, assign_stmt);
+	}
+    }
+
+  if_else_stmt = gimple_build_cond (GT_EXPR, and_expr_var,
+	  		            integer_zero_node,
+				    NULL_TREE, NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR, 0);
+
+  rebuild_cgraph_edges ();
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  return bb3;
+}
+
+/* This parses the attribute arguments to target in DECL and determines
+   the right builtin to use to match the platform specification.
+   For now, only one target argument ("arch=" or "<-m>xxx") is allowed.  */
+
+static tree 
+get_builtin_code_for_version (tree decl)
+{
+  tree attrs;
+  struct cl_target_option cur_target;
+  tree target_node;
+  struct cl_target_option *new_target;
+  const char *arg_str = NULL;
+  const char *attrs_str = NULL;
+  char *tok_str = NULL;
+  char *token;
+  /* These are the target attribute strings for which a dispatcher is
+     available, from fold_builtin_cpu.  */
+  const char *feature_list[] = {"mmx", "popcnt", "sse", "sse2", "sse3",
+				"ssse3", "sse4.1", "sse4.2", "avx", "avx2"};
+  unsigned int NUM_FEATURES = sizeof (feature_list) / sizeof (const char *);
+  unsigned int i;
+  tree predicate_chain = NULL_TREE;
+  tree predicate_decl, predicate_arg;
+
+  attrs = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  gcc_assert (attrs != NULL);
+
+  attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+  gcc_assert (TREE_CODE (attrs) == STRING_CST);
+  attrs_str = TREE_STRING_POINTER (attrs);
+
+  /* Handle arch= if specified.  */
+  if (strstr (attrs_str, "arch=") != NULL)
+    {
+      cl_target_option_save (&cur_target, &global_options);
+      target_node = ix86_valid_target_attribute_tree (attrs);
+    
+      gcc_assert (target_node);
+      new_target = TREE_TARGET_OPTION (target_node);
+      gcc_assert (new_target);
+      
+      if (new_target->arch_specified && new_target->arch > 0)
+	{
+	  switch (new_target->arch)
+	    {
+	    case PROCESSOR_CORE2_32:
+	    case PROCESSOR_CORE2_64:
+	      arg_str = "core2";
+	      break;
+	    case PROCESSOR_COREI7_32:
+	    case PROCESSOR_COREI7_64:
+	      arg_str = "corei7";
+	      break;
+	    case PROCESSOR_ATOM:
+	      arg_str = "atom";
+	      break;
+	    case PROCESSOR_AMDFAM10:
+	      arg_str = "amdfam10h";
+	      break;
+	    case PROCESSOR_BDVER1:
+	      arg_str = "bdver1";
+	      break;
+	    case PROCESSOR_BDVER2:
+	      arg_str = "bdver2";
+	      break;
+	    }  
+	}    
+    
+      cl_target_option_restore (&global_options, &cur_target);
+      if (arg_str == NULL)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+	    	"No dispatcher found for the versioning attributes");
+	  return NULL;
+	}
+    
+      predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
+      /* For a C string literal the length includes the trailing NULL.  */
+      predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+      predicate_chain = tree_cons (predicate_decl, predicate_arg,
+				   predicate_chain);
+    }
+
+  /* Process feature name.  */
+  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
+  strcpy (tok_str, attrs_str);
+  token = strtok (tok_str, ",");
+  predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
+
+  while (token != NULL)
+    {
+      /* Do not process "arch="  */
+      if (strncmp (token, "arch=", 5) == 0)
+	{
+	  token = strtok (NULL, ",");
+	  continue;
+	}
+      for (i = 0; i < NUM_FEATURES; ++i)
+	{
+	  if (strcmp (token, feature_list[i]) == 0)
+	    {
+	      predicate_arg = build_string_literal (
+				strlen (feature_list[i]) + 1,
+				feature_list[i]);
+	      predicate_chain = tree_cons (predicate_decl, predicate_arg,
+					   predicate_chain);
+	      break;
+	    }
+	}
+      if (i == NUM_FEATURES)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+		    "No dispatcher found for %s", token);
+	  return NULL;
+	}
+      token = strtok (NULL, ",");
+    }
+  free (tok_str);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+	        "No dispatcher found for the versioning attributes : %s",
+	        attrs_str);
+      return NULL;
+    }
+
+  predicate_chain = nreverse (predicate_chain);
+  return predicate_chain; 
+} 
+
+/* This is the target hook to generate the dispatch function for
+   multi-versioned functions.  DISPATCH_DECL is the function which will
+   contain the dispatch logic.  FNDECLS are the function choices for
+   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
+   in DISPATCH_DECL in which the dispatch code is generated.  */
+
+static int
+ix86_dispatch_version (tree dispatch_decl,
+		       void *fndecls_p,
+		       basic_block *empty_bb)
+{
+  tree default_decl;
+  gimple ifunc_cpu_init_stmt;
+  gimple_seq gseq;
+  tree old_current_function_decl;
+  int ix;
+  tree ele;
+  VEC (tree, heap) *fndecls;
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /*fndecls_p is actually a vector.  */
+  fndecls = (VEC (tree, heap) *)fndecls_p;
+
+  /* Atleast one more version other than the default.  */
+  gcc_assert (VEC_length (tree, fndecls) >= 2);
+
+  /* The first version in the vector is the default decl.  */
+  default_decl = VEC_index (tree, fndecls, 0);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
+  current_function_decl = dispatch_decl;
+
+  gseq = bb_seq (*empty_bb);
+  ifunc_cpu_init_stmt = gimple_build_call_vec (
+                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
+  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
+  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
+  set_bb_seq (*empty_bb, gseq);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+
+  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
+    {
+      tree version_decl = ele;
+      tree predicate_chain = NULL_TREE;
+      /* Get attribute string, parse it and find the right predicate decl.
+         The predicate function could be a lengthy combination of many
+	 features, like arch-type and various isa-variants.  */
+      predicate_chain = get_builtin_code_for_version (version_decl);
+
+      if (predicate_chain == NULL_TREE)
+	continue;
+
+      *empty_bb = add_condition_to_bb (dispatch_decl, version_decl,
+				       predicate_chain, *empty_bb);
+
+    }
+  /* dispatch default version at the end.  */
+  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl,
+				   NULL, *empty_bb);
+  return 0;
+}
+
 /* This builds the processor_model struct type defined in
    libgcc/config/i386/i386-cpuinfo.c  */
 
@@ -39463,6 +39783,9 @@ ix86_autovectorize_vector_sizes (void)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN ix86_fold_builtin
 
+#undef TARGET_DISPATCH_VERSION
+#define TARGET_DISPATCH_VERSION ix86_dispatch_version
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
 
Index: gcc/testsuite/g++.dg/mv1.C
===================================================================
--- gcc/testsuite/g++.dg/mv1.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv1.C	(revision 0)
@@ -0,0 +1,204 @@
+/* Test case to check if Multiversioning works.  */
+/* { dg-do run } */
+/* { dg-options "-O2 -fPIC" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* Check combination of target attributes.  */
+int foo () __attribute__ ((target("arch=corei7,sse4.2,popcnt")));
+/* The target operands in this declaration and the definition are re-ordered.
+   This should still work.  */
+int foo () __attribute__ ((target("ssse3,avx2")));
+
+/* Check for all target attributes for which dispatchers are available.  */
+/* Check arch= */
+int foo () __attribute__((target("arch=core2")));
+int foo () __attribute__((target("arch=corei7")));
+int foo () __attribute__((target("arch=atom")));
+int foo () __attribute__((target("arch=amdfam10")));
+int foo () __attribute__((target("arch=bdver1")));
+int foo () __attribute__((target("arch=bdver2")));
+/* Check ISA, latest features should be ahead.  */
+int foo () __attribute__((target("avx2")));
+int foo () __attribute__((target("avx")));
+int foo () __attribute__((target("popcnt")));
+int foo () __attribute__((target("sse4.2")));
+int foo () __attribute__((target("sse4.1")));
+int foo () __attribute__((target("ssse3")));
+int foo () __attribute__((target("mmx")));
+int foo () __attribute__((target("sse3")));
+int foo () __attribute__((target("sse2")));
+int foo () __attribute__((target("sse")));
+
+int (*p)() = &foo;
+
+int main ()
+{
+  int val = foo ();
+
+  /* Check if calling foo via ptr is the same.  */
+  assert (val ==  (*p)());
+
+  if (__builtin_cpu_is ("corei7")
+      && __builtin_cpu_supports ("sse4.2")
+      && __builtin_cpu_supports ("popcnt"))
+    assert (val == 1);
+  else if (__builtin_cpu_supports ("avx2")
+	   && __builtin_cpu_supports ("ssse3"))
+    assert (val == 2);
+  else if (__builtin_cpu_is ("core2"))
+    assert (val == 3);
+  else if (__builtin_cpu_is ("corei7"))
+    assert (val == 4);
+  else if (__builtin_cpu_is ("atom"))
+    assert (val == 5);
+  else if (__builtin_cpu_is ("amdfam10h"))
+    assert (val == 6);
+  else if (__builtin_cpu_is ("bdver1"))
+    assert (val == 7);
+  else if (__builtin_cpu_is ("bdver2"))
+    assert (val == 8);
+  else if (__builtin_cpu_supports ("avx2"))
+    assert (val == -1);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == -2);
+  else if (__builtin_cpu_supports ("popcnt"))
+    assert (val == -3);
+  else if (__builtin_cpu_supports ("sse4.2"))
+    assert (val == -4);
+  else if (__builtin_cpu_supports ("sse4.1"))
+    assert (val == -5);
+  else if (__builtin_cpu_supports ("ssse3"))
+    assert (val == -6);
+  else if (__builtin_cpu_supports ("mmx"))
+    assert (val == -7);
+  else if (__builtin_cpu_supports ("sse3"))
+    assert (val == -8);
+  else if (__builtin_cpu_supports ("sse2"))
+    assert (val == -9);
+  else if (__builtin_cpu_supports ("sse"))
+    assert (val == -10);
+  else
+    assert (val == 0);
+
+  return 0;
+}
+
+int foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=corei7,sse4.2,popcnt")))
+foo ()
+{
+  assert (__builtin_cpu_is ("corei7")
+	  && __builtin_cpu_supports ("sse4.2")
+	  && __builtin_cpu_supports ("popcnt"));
+  return 1;
+}
+
+int __attribute__ ((target("avx2,ssse3")))
+foo ()
+{
+  assert (__builtin_cpu_supports ("avx2")
+	  && __builtin_cpu_supports ("ssse3"));
+  return 2;
+}
+int __attribute__ ((target("arch=core2")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("arch=corei7")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("arch=atom")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("arch=amdfam10")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("arch=bdver1")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("arch=bdver2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return -1;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return -2;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return -3;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return -4;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return -5;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return -6;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return -7;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return -8;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return -9;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return -10;
+}

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-02  2:45                     ` Sriraman Tallam
@ 2012-05-02 13:42                       ` H.J. Lu
  2012-05-02 15:08                         ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: H.J. Lu @ 2012-05-02 13:42 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

On Tue, May 1, 2012 at 7:45 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi H.J,
>
>   Done now. Patch attached.
>
> Thanks,
> -Sri.
>
> On Tue, May 1, 2012 at 5:08 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, May 1, 2012 at 4:51 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Hi,
>>>
>>> New patch attached, updated test case and fixed bugs related to
>>> __PRETTY_FUNCTION_.
>>>
>>> Patch also available for review here:  http://codereview.appspot.com/5752064
>>
>> @@ -0,0 +1,39 @@
>> +/* Simple test case to check if Multiversioning works.  */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -fPIC" } */
>> +
>> +#include <assert.h>
>> +
>> +int foo ();
>> +int foo () __attribute__ ((target("arch=corei7,sse4.2,popcnt")));
>> +/* The target operands in this declaration and the definition are re-ordered.
>> +   This should still work.  */
>> +int foo () __attribute__ ((target("ssse3,avx2")));
>> +
>> +int (*p)() = &foo;
>> +int main ()
>> +{
>> +  return foo () + (*p)();
>> +}
>> +
>> +int foo ()
>> +{
>> +  return 0;
>> +}
>> +
>> +int __attribute__ ((target("arch=corei7,sse4.2,popcnt")))
>> +foo ()
>> +{
>> +  assert (__builtin_cpu_is ("corei7")
>> +         && __builtin_cpu_supports ("sse4.2")
>> +         && __builtin_cpu_supports ("popcnt"));
>> +  return 0;
>> +}
>> +
>> +int __attribute__ ((target("avx2,ssse3")))
>> +foo ()
>> +{
>> +  assert (__builtin_cpu_supports ("avx2")
>> +         && __builtin_cpu_supports ("ssse3"));
>> +  return 0;
>> +}
>>
>> This test will pass if
>>
>> int foo ()
>> {
>>  return 0;
>> }
>>
>> is selected on processors with AVX.  The run-time test should
>> check that the right function is selected on the target processor,
>> not the selected function matches the target attribute. You can
>> do it by returning different values for each foo and call cpuid
>> to check if the right foo is selected.
>>
>> You should add a testcase for __builtin_cpu_supports to check
>> all valid arguments.
>>
>> --
>> H.J.

2 questions:

1.  Since AVX > SSE4 > SSSE3 > SSE3 > SSE2 > SSE, with
foo for AVX and SSE3, on AVX processors, which foo will be
selected?
2.  I don't see any tests for __builtin_cpu_supports ("XXX")
nor __builtin_cpu_is ("XXX").  I think you need tests for
them.

-- 
H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-02 13:42                       ` H.J. Lu
@ 2012-05-02 15:08                         ` Sriraman Tallam
  2012-05-02 16:06                           ` H.J. Lu
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-05-02 15:08 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

On Wed, May 2, 2012 at 6:42 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, May 1, 2012 at 7:45 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi H.J,
>>
>>   Done now. Patch attached.
>>
>> Thanks,
>> -Sri.
>>
>> On Tue, May 1, 2012 at 5:08 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Tue, May 1, 2012 at 4:51 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> Hi,
>>>>
>>>> New patch attached, updated test case and fixed bugs related to
>>>> __PRETTY_FUNCTION_.
>>>>
>>>> Patch also available for review here:  http://codereview.appspot.com/5752064
>>>
>>> @@ -0,0 +1,39 @@
>>> +/* Simple test case to check if Multiversioning works.  */
>>> +/* { dg-do run } */
>>> +/* { dg-options "-O2 -fPIC" } */
>>> +
>>> +#include <assert.h>
>>> +
>>> +int foo ();
>>> +int foo () __attribute__ ((target("arch=corei7,sse4.2,popcnt")));
>>> +/* The target operands in this declaration and the definition are re-ordered.
>>> +   This should still work.  */
>>> +int foo () __attribute__ ((target("ssse3,avx2")));
>>> +
>>> +int (*p)() = &foo;
>>> +int main ()
>>> +{
>>> +  return foo () + (*p)();
>>> +}
>>> +
>>> +int foo ()
>>> +{
>>> +  return 0;
>>> +}
>>> +
>>> +int __attribute__ ((target("arch=corei7,sse4.2,popcnt")))
>>> +foo ()
>>> +{
>>> +  assert (__builtin_cpu_is ("corei7")
>>> +         && __builtin_cpu_supports ("sse4.2")
>>> +         && __builtin_cpu_supports ("popcnt"));
>>> +  return 0;
>>> +}
>>> +
>>> +int __attribute__ ((target("avx2,ssse3")))
>>> +foo ()
>>> +{
>>> +  assert (__builtin_cpu_supports ("avx2")
>>> +         && __builtin_cpu_supports ("ssse3"));
>>> +  return 0;
>>> +}
>>>
>>> This test will pass if
>>>
>>> int foo ()
>>> {
>>>  return 0;
>>> }
>>>
>>> is selected on processors with AVX.  The run-time test should
>>> check that the right function is selected on the target processor,
>>> not the selected function matches the target attribute. You can
>>> do it by returning different values for each foo and call cpuid
>>> to check if the right foo is selected.
>>>
>>> You should add a testcase for __builtin_cpu_supports to check
>>> all valid arguments.
>>>
>>> --
>>> H.J.
>
> 2 questions:
>
> 1.  Since AVX > SSE4 > SSSE3 > SSE3 > SSE2 > SSE, with
> foo for AVX and SSE3, on AVX processors, which foo will be
> selected?

foo for AVX will get called since that appears ahead.

The dispatching is done in the same order in which the functions are
specified. If, potentially, two foo versions can be dispatched for an
architecture, the first foo will get called.  There is no way right
now to specify the order in which the dispatching should be done.


> 2.  I don't see any tests for __builtin_cpu_supports ("XXX")
> nor __builtin_cpu_is ("XXX").  I think you need tests for
> them.

This is already there as part of the previous CPU detection patch that
was submitted. Please see gcc.target/i386/builtin_target.c. Did you
want something else?

>
> --
> H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-02 15:08                         ` Sriraman Tallam
@ 2012-05-02 16:06                           ` H.J. Lu
  2012-05-02 17:44                             ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: H.J. Lu @ 2012-05-02 16:06 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

On Wed, May 2, 2012 at 8:08 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Wed, May 2, 2012 at 6:42 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, May 1, 2012 at 7:45 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Hi H.J,
>>>
>>>   Done now. Patch attached.
>>>
>>> Thanks,
>>> -Sri.
>>>
>>> On Tue, May 1, 2012 at 5:08 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Tue, May 1, 2012 at 4:51 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>> Hi,
>>>>>
>>>>> New patch attached, updated test case and fixed bugs related to
>>>>> __PRETTY_FUNCTION_.
>>>>>
>>>>> Patch also available for review here:  http://codereview.appspot.com/5752064
>>>>
>>>> @@ -0,0 +1,39 @@
>>>> +/* Simple test case to check if Multiversioning works.  */
>>>> +/* { dg-do run } */
>>>> +/* { dg-options "-O2 -fPIC" } */
>>>> +
>>>> +#include <assert.h>
>>>> +
>>>> +int foo ();
>>>> +int foo () __attribute__ ((target("arch=corei7,sse4.2,popcnt")));
>>>> +/* The target operands in this declaration and the definition are re-ordered.
>>>> +   This should still work.  */
>>>> +int foo () __attribute__ ((target("ssse3,avx2")));
>>>> +
>>>> +int (*p)() = &foo;
>>>> +int main ()
>>>> +{
>>>> +  return foo () + (*p)();
>>>> +}
>>>> +
>>>> +int foo ()
>>>> +{
>>>> +  return 0;
>>>> +}
>>>> +
>>>> +int __attribute__ ((target("arch=corei7,sse4.2,popcnt")))
>>>> +foo ()
>>>> +{
>>>> +  assert (__builtin_cpu_is ("corei7")
>>>> +         && __builtin_cpu_supports ("sse4.2")
>>>> +         && __builtin_cpu_supports ("popcnt"));
>>>> +  return 0;
>>>> +}
>>>> +
>>>> +int __attribute__ ((target("avx2,ssse3")))
>>>> +foo ()
>>>> +{
>>>> +  assert (__builtin_cpu_supports ("avx2")
>>>> +         && __builtin_cpu_supports ("ssse3"));
>>>> +  return 0;
>>>> +}
>>>>
>>>> This test will pass if
>>>>
>>>> int foo ()
>>>> {
>>>>  return 0;
>>>> }
>>>>
>>>> is selected on processors with AVX.  The run-time test should
>>>> check that the right function is selected on the target processor,
>>>> not the selected function matches the target attribute. You can
>>>> do it by returning different values for each foo and call cpuid
>>>> to check if the right foo is selected.
>>>>
>>>> You should add a testcase for __builtin_cpu_supports to check
>>>> all valid arguments.
>>>>
>>>> --
>>>> H.J.
>>
>> 2 questions:
>>
>> 1.  Since AVX > SSE4 > SSSE3 > SSE3 > SSE2 > SSE, with
>> foo for AVX and SSE3, on AVX processors, which foo will be
>> selected?
>
> foo for AVX will get called since that appears ahead.
>
> The dispatching is done in the same order in which the functions are
> specified. If, potentially, two foo versions can be dispatched for an
> architecture, the first foo will get called.  There is no way right
> now to specify the order in which the dispatching should be done.

This is very fragile.  We know ISAs and processors.  The source
order should be irrelevant.

>
>> 2.  I don't see any tests for __builtin_cpu_supports ("XXX")
>> nor __builtin_cpu_is ("XXX").  I think you need tests for
>> them.
>
> This is already there as part of the previous CPU detection patch that
> was submitted. Please see gcc.target/i386/builtin_target.c. Did you
> want something else?

gcc.target/i386/builtin_target.c doesn't test if __builtin_cpu_supports ("XXX")
and __builtin_cpu_is ("XXX") are implemented correctly.


-- 
H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-02 16:06                           ` H.J. Lu
@ 2012-05-02 17:44                             ` Sriraman Tallam
  2012-05-02 18:04                               ` H.J. Lu
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-05-02 17:44 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

On Wed, May 2, 2012 at 9:05 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, May 2, 2012 at 8:08 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Wed, May 2, 2012 at 6:42 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Tue, May 1, 2012 at 7:45 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> Hi H.J,
>>>>
>>>>   Done now. Patch attached.
>>>>
>>>> Thanks,
>>>> -Sri.
>>>>
>>>> On Tue, May 1, 2012 at 5:08 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Tue, May 1, 2012 at 4:51 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> New patch attached, updated test case and fixed bugs related to
>>>>>> __PRETTY_FUNCTION_.
>>>>>>
>>>>>> Patch also available for review here:  http://codereview.appspot.com/5752064
>>>>>
>>>>> @@ -0,0 +1,39 @@
>>>>> +/* Simple test case to check if Multiversioning works.  */
>>>>> +/* { dg-do run } */
>>>>> +/* { dg-options "-O2 -fPIC" } */
>>>>> +
>>>>> +#include <assert.h>
>>>>> +
>>>>> +int foo ();
>>>>> +int foo () __attribute__ ((target("arch=corei7,sse4.2,popcnt")));
>>>>> +/* The target operands in this declaration and the definition are re-ordered.
>>>>> +   This should still work.  */
>>>>> +int foo () __attribute__ ((target("ssse3,avx2")));
>>>>> +
>>>>> +int (*p)() = &foo;
>>>>> +int main ()
>>>>> +{
>>>>> +  return foo () + (*p)();
>>>>> +}
>>>>> +
>>>>> +int foo ()
>>>>> +{
>>>>> +  return 0;
>>>>> +}
>>>>> +
>>>>> +int __attribute__ ((target("arch=corei7,sse4.2,popcnt")))
>>>>> +foo ()
>>>>> +{
>>>>> +  assert (__builtin_cpu_is ("corei7")
>>>>> +         && __builtin_cpu_supports ("sse4.2")
>>>>> +         && __builtin_cpu_supports ("popcnt"));
>>>>> +  return 0;
>>>>> +}
>>>>> +
>>>>> +int __attribute__ ((target("avx2,ssse3")))
>>>>> +foo ()
>>>>> +{
>>>>> +  assert (__builtin_cpu_supports ("avx2")
>>>>> +         && __builtin_cpu_supports ("ssse3"));
>>>>> +  return 0;
>>>>> +}
>>>>>
>>>>> This test will pass if
>>>>>
>>>>> int foo ()
>>>>> {
>>>>>  return 0;
>>>>> }
>>>>>
>>>>> is selected on processors with AVX.  The run-time test should
>>>>> check that the right function is selected on the target processor,
>>>>> not the selected function matches the target attribute. You can
>>>>> do it by returning different values for each foo and call cpuid
>>>>> to check if the right foo is selected.
>>>>>
>>>>> You should add a testcase for __builtin_cpu_supports to check
>>>>> all valid arguments.
>>>>>
>>>>> --
>>>>> H.J.
>>>
>>> 2 questions:
>>>
>>> 1.  Since AVX > SSE4 > SSSE3 > SSE3 > SSE2 > SSE, with
>>> foo for AVX and SSE3, on AVX processors, which foo will be
>>> selected?
>>
>> foo for AVX will get called since that appears ahead.
>>
>> The dispatching is done in the same order in which the functions are
>> specified. If, potentially, two foo versions can be dispatched for an
>> architecture, the first foo will get called.  There is no way right
>> now to specify the order in which the dispatching should be done.
>
> This is very fragile.  We know ISAs and processors.  The source
> order should be irrelevant.

I am not sure it is always possible keep this dispatching unambiguous
to the user. It might be better to let the user specify a priority for
each version to control the order of dispatching.

 Still, one way to implement what you said is to assign a significance
number to each ISA, where the number of sse4 > sse, for instance.
Then, the dispatching can be done in the descending order of
significance. What do you think?

I thought about this earlier and I was thinking along the lines of
letting the user specify a priority for each version, when there is
ambiguity.

>
>>
>>> 2.  I don't see any tests for __builtin_cpu_supports ("XXX")
>>> nor __builtin_cpu_is ("XXX").  I think you need tests for
>>> them.
>>
>> This is already there as part of the previous CPU detection patch that
>> was submitted. Please see gcc.target/i386/builtin_target.c. Did you
>> want something else?
>
> gcc.target/i386/builtin_target.c doesn't test if __builtin_cpu_supports ("XXX")
> and __builtin_cpu_is ("XXX") are implemented correctly.

Oh, you mean like doing a CPUID again in the test case itself and checking, ok.

Thanks,
-Sri.

>
>
> --
> H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-02 17:44                             ` Sriraman Tallam
@ 2012-05-02 18:04                               ` H.J. Lu
  2012-05-07 16:58                                 ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: H.J. Lu @ 2012-05-02 18:04 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

On Wed, May 2, 2012 at 10:44 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>
>>>> 1.  Since AVX > SSE4 > SSSE3 > SSE3 > SSE2 > SSE, with
>>>> foo for AVX and SSE3, on AVX processors, which foo will be
>>>> selected?
>>>
>>> foo for AVX will get called since that appears ahead.
>>>
>>> The dispatching is done in the same order in which the functions are
>>> specified. If, potentially, two foo versions can be dispatched for an
>>> architecture, the first foo will get called.  There is no way right
>>> now to specify the order in which the dispatching should be done.
>>
>> This is very fragile.  We know ISAs and processors.  The source
>> order should be irrelevant.
>
> I am not sure it is always possible keep this dispatching unambiguous
> to the user. It might be better to let the user specify a priority for
> each version to control the order of dispatching.
>
>  Still, one way to implement what you said is to assign a significance
> number to each ISA, where the number of sse4 > sse, for instance.
> Then, the dispatching can be done in the descending order of
> significance. What do you think?

This sounds reasonable.  You should also take processor into
account when doing this.

> I thought about this earlier and I was thinking along the lines of
> letting the user specify a priority for each version, when there is
> ambiguity.
>
>>
>>>
>>>> 2.  I don't see any tests for __builtin_cpu_supports ("XXX")
>>>> nor __builtin_cpu_is ("XXX").  I think you need tests for
>>>> them.
>>>
>>> This is already there as part of the previous CPU detection patch that
>>> was submitted. Please see gcc.target/i386/builtin_target.c. Did you
>>> want something else?
>>
>> gcc.target/i386/builtin_target.c doesn't test if __builtin_cpu_supports ("XXX")
>> and __builtin_cpu_is ("XXX") are implemented correctly.
>
> Oh, you mean like doing a CPUID again in the test case itself and checking, ok.
>

Yes. BTW,  I think you should also add FMA support to
config/i386/i386-cpuinfo.c.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-02 18:04                               ` H.J. Lu
@ 2012-05-07 16:58                                 ` Sriraman Tallam
  2012-05-09 19:01                                   ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-05-07 16:58 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

On Wed, May 2, 2012 at 11:04 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, May 2, 2012 at 10:44 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>
>>>>> 1.  Since AVX > SSE4 > SSSE3 > SSE3 > SSE2 > SSE, with
>>>>> foo for AVX and SSE3, on AVX processors, which foo will be
>>>>> selected?
>>>>
>>>> foo for AVX will get called since that appears ahead.
>>>>
>>>> The dispatching is done in the same order in which the functions are
>>>> specified. If, potentially, two foo versions can be dispatched for an
>>>> architecture, the first foo will get called.  There is no way right
>>>> now to specify the order in which the dispatching should be done.
>>>
>>> This is very fragile.  We know ISAs and processors.  The source
>>> order should be irrelevant.
>>
>> I am not sure it is always possible keep this dispatching unambiguous
>> to the user. It might be better to let the user specify a priority for
>> each version to control the order of dispatching.
>>
>>  Still, one way to implement what you said is to assign a significance
>> number to each ISA, where the number of sse4 > sse, for instance.
>> Then, the dispatching can be done in the descending order of
>> significance. What do you think?
>
> This sounds reasonable.  You should also take processor into
> account when doing this.
>
>> I thought about this earlier and I was thinking along the lines of
>> letting the user specify a priority for each version, when there is
>> ambiguity.
>>
>>>
>>>>
>>>>> 2.  I don't see any tests for __builtin_cpu_supports ("XXX")
>>>>> nor __builtin_cpu_is ("XXX").  I think you need tests for
>>>>> them.
>>>>
>>>> This is already there as part of the previous CPU detection patch that
>>>> was submitted. Please see gcc.target/i386/builtin_target.c. Did you
>>>> want something else?
>>>
>>> gcc.target/i386/builtin_target.c doesn't test if __builtin_cpu_supports ("XXX")
>>> and __builtin_cpu_is ("XXX") are implemented correctly.
>>
>> Oh, you mean like doing a CPUID again in the test case itself and checking, ok.
>>
>
> Yes. BTW,  I think you should also add FMA support to
> config/i386/i386-cpuinfo.c.

I am preparing a patch for this. I will send it your way soon enough.

Thanks,
-Sri.

>
> Thanks.
>
> --
> H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-07 16:58                                 ` Sriraman Tallam
@ 2012-05-09 19:01                                   ` Sriraman Tallam
  2012-05-10 17:55                                     ` H.J. Lu
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-05-09 19:01 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

[-- Attachment #1: Type: text/plain, Size: 2677 bytes --]

Hi,

Attached new patch with more bug fixes. I will fix the dispatching
method to use prioirty of attributes in the next iteration.

Patch also available for review here:  http://codereview.appspot.com/5752064

Thanks,
-Sri.

On Mon, May 7, 2012 at 9:58 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Wed, May 2, 2012 at 11:04 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, May 2, 2012 at 10:44 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>>
>>>>>> 1.  Since AVX > SSE4 > SSSE3 > SSE3 > SSE2 > SSE, with
>>>>>> foo for AVX and SSE3, on AVX processors, which foo will be
>>>>>> selected?
>>>>>
>>>>> foo for AVX will get called since that appears ahead.
>>>>>
>>>>> The dispatching is done in the same order in which the functions are
>>>>> specified. If, potentially, two foo versions can be dispatched for an
>>>>> architecture, the first foo will get called.  There is no way right
>>>>> now to specify the order in which the dispatching should be done.
>>>>
>>>> This is very fragile.  We know ISAs and processors.  The source
>>>> order should be irrelevant.
>>>
>>> I am not sure it is always possible keep this dispatching unambiguous
>>> to the user. It might be better to let the user specify a priority for
>>> each version to control the order of dispatching.
>>>
>>>  Still, one way to implement what you said is to assign a significance
>>> number to each ISA, where the number of sse4 > sse, for instance.
>>> Then, the dispatching can be done in the descending order of
>>> significance. What do you think?
>>
>> This sounds reasonable.  You should also take processor into
>> account when doing this.
>>
>>> I thought about this earlier and I was thinking along the lines of
>>> letting the user specify a priority for each version, when there is
>>> ambiguity.
>>>
>>>>
>>>>>
>>>>>> 2.  I don't see any tests for __builtin_cpu_supports ("XXX")
>>>>>> nor __builtin_cpu_is ("XXX").  I think you need tests for
>>>>>> them.
>>>>>
>>>>> This is already there as part of the previous CPU detection patch that
>>>>> was submitted. Please see gcc.target/i386/builtin_target.c. Did you
>>>>> want something else?
>>>>
>>>> gcc.target/i386/builtin_target.c doesn't test if __builtin_cpu_supports ("XXX")
>>>> and __builtin_cpu_is ("XXX") are implemented correctly.
>>>
>>> Oh, you mean like doing a CPUID again in the test case itself and checking, ok.
>>>
>>
>> Yes. BTW,  I think you should also add FMA support to
>> config/i386/i386-cpuinfo.c.
>
> I am preparing a patch for this. I will send it your way soon enough.
>
> Thanks,
> -Sri.
>
>>
>> Thanks.
>>
>> --
>> H.J.

[-- Attachment #2: mv_fe_patch.txt --]
[-- Type: text/plain, Size: 63692 bytes --]

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 187346)
+++ gcc/doc/tm.texi	(working copy)
@@ -10997,6 +10997,14 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb})
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 187346)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -10877,6 +10877,14 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@hook TARGET_DISPATCH_VERSION
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 187346)
+++ gcc/target.def	(working copy)
@@ -1249,6 +1249,15 @@ DEFHOOK
  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
  hook_tree_tree_int_treep_bool_null)
 
+/* Target hook to generate the dispatching code for calls to multi-versioned
+   functions.  DISPATCH_DECL is the function that will have the dispatching
+   logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the
+   basic bloc in DISPATCH_DECL which will contain the code.  */
+DEFHOOK
+(dispatch_version,
+ "",
+ int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL)
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 DEFHOOK
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 187346)
+++ gcc/tree.h	(working copy)
@@ -3539,6 +3539,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
 #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
 
+/* In FUNCTION_DECL, this is set if this function has other versions generated
+   using "target" attributes.  The default version is the one which does not
+   have any "target" attribute set. */
+#define DECL_FUNCTION_VERSIONED(NODE)\
+   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
+
 /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
    arguments/result/saved_tree fields by front ends.   It was either inherit
    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
@@ -3583,8 +3589,8 @@ struct GTY(()) tree_function_decl {
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
-
-  /* 1 bit left */
+  unsigned versioned_function : 1;
+  /* No bits left.  */
 };
 
 /* The source language of the translation-unit.  */
Index: gcc/tree-pass.h
===================================================================
--- gcc/tree-pass.h	(revision 187346)
+++ gcc/tree-pass.h	(working copy)
@@ -453,6 +453,7 @@ extern struct gimple_opt_pass pass_tm_memopt;
 extern struct gimple_opt_pass pass_tm_edges;
 extern struct gimple_opt_pass pass_split_functions;
 extern struct gimple_opt_pass pass_feedback_split_functions;
+extern struct gimple_opt_pass pass_dispatch_versions;
 
 /* IPA Passes */
 extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
Index: gcc/multiversion.c
===================================================================
--- gcc/multiversion.c	(revision 0)
+++ gcc/multiversion.c	(revision 0)
@@ -0,0 +1,833 @@
+/* Function Multiversioning.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+/* Holds the state for multi-versioned functions here. The front-end
+   updates the state as and when function versions are encountered.
+   This is then used to generate the dispatch code.  Also, the
+   optimization passes to clone hot paths involving versioned functions
+   will be done here.
+
+   Function versions are created by using the same function signature but
+   also tagging attribute "target" to specify the platform type for which
+   the version must be executed.  Here is an example:
+
+   int foo ()
+   {
+     printf ("Execute as default");
+     return 0;
+   }
+
+   int  __attribute__ ((target ("arch=corei7")))
+   foo ()
+   {
+     printf ("Execute for corei7");
+     return 0;
+   }
+   
+   int main ()
+   {
+     return foo ();
+   } 
+
+   The call to foo in main is replaced with a call to an IFUNC function that
+   contains the dispatch code to call the correct function version at
+   run-time.  */
+
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "tree-inline.h"
+#include "langhooks.h"
+#include "flags.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "toplev.h"
+#include "timevar.h"
+#include "params.h"
+#include "fibheap.h"
+#include "intl.h"
+#include "tree-pass.h"
+#include "hashtab.h"
+#include "coverage.h"
+#include "ggc.h"
+#include "tree-flow.h"
+#include "rtl.h"
+#include "ipa-prop.h"
+#include "basic-block.h"
+#include "toplev.h"
+#include "dbgcnt.h"
+#include "tree-dump.h"
+#include "output.h"
+#include "vecprim.h"
+#include "gimple-pretty-print.h"
+#include "ipa-inline.h"
+#include "target.h"
+#include "multiversion.h"
+
+typedef void * void_p;
+
+DEF_VEC_P (void_p);
+DEF_VEC_ALLOC_P (void_p, heap);
+
+/* Each function decl that is a function version gets an instance of this
+   structure.   Since this is called by the front-end, decl merging can
+   happen, where a decl created for a new declaration is merged with 
+   the old. In this case, the new decl is deleted and the IS_DELETED
+   field is set for the struct instance corresponding to the new decl.
+   IFUNC_DECL is the decl of the ifunc function for default decls.
+   IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS
+   is a vector containing the list of function versions  that are
+   the candidates for dispatch.  */
+
+typedef struct version_function_d {
+  tree decl;
+  tree ifunc_decl;
+  tree ifunc_resolver_decl;
+  VEC (void_p, heap) *versions;
+  bool is_deleted;
+} version_function;
+
+/* Hashmap has an entry for every function decl that has other function
+   versions.  For function decls that are the default, it also stores the
+   list of all the other function versions.  Each entry is a structure
+   of type version_function_d.  */
+static htab_t decl_version_htab = NULL;
+
+/* Hashtable helpers for decl_version_htab. */
+
+static hashval_t
+decl_version_htab_hash_descriptor (const void *p)
+{
+  const version_function *t = (const version_function *) p;
+  return htab_hash_pointer (t->decl);
+}
+
+/* Hashtable helper for decl_version_htab. */
+
+static int
+decl_version_htab_eq_descriptor (const void *p1, const void *p2)
+{
+  const version_function *t1 = (const version_function *) p1;
+  return htab_eq_pointer ((const void_p) t1->decl, p2);
+}
+
+/* Create the decl_version_htab.  */
+static void
+create_decl_version_htab (void)
+{
+  if (decl_version_htab == NULL)
+    decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor,
+				     decl_version_htab_eq_descriptor, NULL);
+}
+
+/* Creates an instance of version_function for decl DECL.  */
+
+static version_function*
+new_version_function (const tree decl)
+{
+  version_function *v;
+  v = (version_function *)xmalloc(sizeof (version_function));
+  v->decl = decl;
+  v->ifunc_decl = NULL;
+  v->ifunc_resolver_decl = NULL;
+  v->versions = NULL;
+  v->is_deleted = false;
+  return v;
+}
+
+/* Comparator function to be used in qsort routine to sort attribute
+   specification strings to "target".  */
+
+static int
+attr_strcmp (const void *v1, const void *v2)
+{
+  const char *c1 = *(char *const*)v1;
+  const char *c2 = *(char *const*)v2;
+  return strcmp (c1, c2);
+}
+
+/* STR is the argument to target attribute.  This function tokenizes
+   the comma separated arguments, sorts them and returns a string which
+   is a unique identifier for the comma separated arguments.  */
+
+static char *
+sorted_attr_string (const char *str)
+{
+  char **args = NULL;
+  char *attr_str, *ret_str;
+  char *attr = NULL;
+  unsigned int argnum = 1;
+  unsigned int i;
+
+  for (i = 0; i < strlen (str); i++)
+    if (str[i] == ',')
+      argnum++;
+
+  attr_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (attr_str, str);
+
+  for (i = 0; i < strlen (attr_str); i++)
+    if (attr_str[i] == '=')
+      attr_str[i] = '_';
+
+  if (argnum == 1)
+    return attr_str;
+
+  args = (char **)xmalloc (argnum * sizeof (char *));
+
+  i = 0;
+  attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      args[i] = attr;
+      i++;
+      attr = strtok (NULL, ",");
+    }
+
+  qsort (args, argnum, sizeof (char*), attr_strcmp);
+
+  ret_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (ret_str, args[0]);
+  for (i = 1; i < argnum; i++)
+    {
+      strcat (ret_str, "_");
+      strcat (ret_str, args[i]);
+    }
+
+  free (args);
+  free (attr_str);
+  return ret_str;
+}
+
+/* Returns true when only one of DECL1 and DECL2 is marked with "target"
+   or if the "target" attribute strings of DECL1 and DECL2 dont match.  */
+
+bool
+has_different_version_attributes (const tree decl1, const tree decl2)
+{
+  tree attr1, attr2;
+  char *c1, *c2;
+  bool ret = false;
+
+  if (TREE_CODE (decl1) != FUNCTION_DECL
+      || TREE_CODE (decl2) != FUNCTION_DECL)
+    return false;
+
+  attr1 = lookup_attribute ("target", DECL_ATTRIBUTES (decl1));
+  attr2 = lookup_attribute ("target", DECL_ATTRIBUTES (decl2));
+
+  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
+    return false;
+
+  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
+      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
+    return true;
+
+  c1 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1))));
+  c2 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2))));
+
+  if (strcmp (c1, c2) != 0)
+     ret = true;
+
+  free (c1);
+  free (c2);
+
+  return ret;
+}
+
+/* If this decl corresponds to a function and has "target" attribute,
+   append the attribute string to its assembler name.  */
+
+static void
+version_assembler_name (const tree decl)
+{
+  tree version_attr;
+  const char *orig_name, *version_string, *attr_str;
+  char *assembler_name;
+  tree assembler_name_tree;
+  
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+    return;
+
+  if (DECL_DECLARED_INLINE_P (decl)
+      &&lookup_attribute ("gnu_inline",
+			  DECL_ATTRIBUTES (decl)))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Function versions cannot be marked as gnu_inline,"
+	      " bodies have to be generated\n");
+
+  if (DECL_VIRTUAL_P (decl)
+      || DECL_VINDEX (decl))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Virtual function versioning not supported\n");
+
+  version_attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  /* target attribute string is NULL for default functions.  */
+  if (version_attr == NULL_TREE)
+    return;
+
+  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+  version_string
+    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
+
+  attr_str = sorted_attr_string (version_string);
+  assembler_name = (char *) xmalloc (strlen (orig_name)
+				     + strlen (attr_str) + 2);
+
+  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
+  if (dump_file)
+    fprintf (dump_file, "Assembler name set to %s for function version %s\n",
+	     assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
+
+  assembler_name_tree = get_identifier (assembler_name);
+
+  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree);
+  SET_DECL_RTL (decl, NULL);
+}
+
+void
+mark_function_as_version (const tree decl)
+{
+  if (DECL_FUNCTION_VERSIONED (decl))
+    return;
+  DECL_FUNCTION_VERSIONED (decl) = 1;
+  version_assembler_name (decl);
+}
+
+/* Returns true if function DECL has target attribute set.  This could be
+   a version.  */
+
+bool
+is_target_attribute_set (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && (lookup_attribute ("target", DECL_ATTRIBUTES (decl))
+	      != NULL_TREE));
+}
+
+/* Returns true if decl is multi-versioned and DECL is the default function,
+   that is it is not tagged with "target" attribute.  */
+
+bool
+is_default_function (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && (lookup_attribute ("target", DECL_ATTRIBUTES (decl))
+	      == NULL_TREE));	
+}
+
+/* For function decl DECL, find the version_function struct in the
+   decl_version_htab.  */
+
+static version_function *
+find_function_version (const tree decl)
+{
+  void *slot;
+
+  if (!DECL_FUNCTION_VERSIONED (decl))
+    return NULL;
+
+  if (!decl_version_htab)
+    return NULL;
+
+  slot = htab_find_with_hash (decl_version_htab, decl,
+                              htab_hash_pointer (decl));
+
+  if (slot != NULL)
+    return (version_function *)slot;
+
+  return NULL;
+}
+
+/* Record DECL as a function version by creating a version_function struct
+   for it and storing it in the hashtable.  */
+
+static version_function *
+add_function_version (const tree decl)
+{
+  void **slot;
+  version_function *v;
+
+  if (!DECL_FUNCTION_VERSIONED (decl))
+    return NULL;
+
+  create_decl_version_htab ();
+
+  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl,
+                                   htab_hash_pointer ((const void_p)decl),
+				   INSERT);
+
+  if (*slot != NULL)
+    return (version_function *)*slot;
+
+  v = new_version_function (decl);
+  *slot = v;
+
+  return v;
+}
+
+/* Push V into VEC only if it is not already present.  If already present
+   returns false.  */
+
+static bool
+push_function_version (version_function *v, VEC (void_p, heap) **vec)
+{
+  int ix;
+  void_p ele; 
+  for (ix = 0; VEC_iterate (void_p, *vec, ix, ele); ++ix)
+    {
+      if (ele == (void_p)v)
+        return false;
+    }
+
+  VEC_safe_push (void_p, heap, *vec, (void*)v);
+  return true;
+}
+ 
+/* Mark DECL as deleted.  This is called by the front-end when a duplicate
+   decl is merged with the original decl and the duplicate decl is deleted.
+   This function marks the duplicate_decl as invalid.  Called by
+   duplicate_decls in cp/decl.c.  */
+
+void
+mark_delete_decl_version (const tree decl)
+{
+  version_function *decl_v;
+
+  decl_v = find_function_version (decl);
+
+  if (decl_v == NULL)
+    return;
+
+  decl_v->is_deleted = true;
+
+  if (is_default_function (decl)
+      && decl_v->versions != NULL)
+    {
+      VEC_truncate (void_p, decl_v->versions, 0);
+      VEC_free (void_p, heap, decl_v->versions);
+      decl_v->versions = NULL;
+    }
+}
+
+/* Mark DECL1 and DECL2 to be function versions in the same group.  One
+   of DECL1 and DECL2 must be the default, otherwise this function does
+   nothing.  This function aggregates the versions.  */
+
+int
+group_function_versions (const tree decl1, const tree decl2)
+{
+  tree default_decl, version_decl;
+  version_function *default_v, *version_v;
+
+  gcc_assert (DECL_FUNCTION_VERSIONED (decl1)
+	      && DECL_FUNCTION_VERSIONED (decl2));
+
+  /* The version decls are added only to the default decl.  */
+  if (!is_default_function (decl1)
+      && !is_default_function (decl2))
+    return 0;
+
+  /* This can happen with duplicate declarations.  Just ignore.  */
+  if (is_default_function (decl1)
+      && is_default_function (decl2))
+    return 0;
+
+  default_decl = (is_default_function (decl1)) ? decl1 : decl2;
+  version_decl = (default_decl == decl1) ? decl2 : decl1;
+
+  gcc_assert (default_decl != version_decl);
+  create_decl_version_htab ();
+
+  /* If the version function is found, it has been added.  */
+  if (find_function_version (version_decl))
+    return 0;
+
+  default_v = add_function_version (default_decl);
+  version_v = add_function_version (version_decl);
+
+  if (default_v->versions == NULL)
+    default_v->versions = VEC_alloc (void_p, heap, 1);
+
+  push_function_version (version_v, &default_v->versions);
+  return 0;
+}
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* Return a new name by appending SUFFIX to the DECL name.  If
+   make_unique is true, append the full path name.  */
+
+static char *
+make_name (tree decl, const char *suffix, bool make_unique)
+{
+  char *global_var_name;
+  int name_len;
+  const char *name;
+  const char *unique_name = NULL;
+
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* Get a unique name that can be used globally without any chances
+     of collision at link time.  */
+  if (make_unique)
+    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
+
+  name_len = strlen (name) + strlen (suffix) + 2;
+
+  if (make_unique)
+    name_len += strlen (unique_name) + 1;
+  global_var_name = (char *) xmalloc (name_len);
+
+  /* Use '.' to concatenate names as it is demangler friendly.  */
+  if (make_unique)
+      snprintf (global_var_name, name_len, "%s.%s.%s", name,
+		unique_name, suffix);
+  else
+      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
+
+  return global_var_name;
+}
+
+/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch
+   the versions of multi-versioned function DEFAULT_DECL.  Create and
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_ifunc_resolver_func (const tree default_decl,
+			  const tree ifunc_decl,
+			  basic_block *empty_bb)
+{
+  char *resolver_name;
+  tree decl, type, decl_name, t;
+  basic_block new_bb;
+  tree old_current_function_decl;
+  bool make_unique = false;
+
+  /* IFUNC's have to be globally visible.  So, if the default_decl is
+     not, then the name of the IFUNC should be made unique.  */
+  if (TREE_PUBLIC (default_decl) == 0)
+    make_unique = true;
+
+  /* Append the filename to the resolver function if the versions are
+     not externally visible.  This is because the resolver function has
+     to be externally visible for the loader to find it.  So, appending
+     the filename will prevent conflicts with a resolver function from
+     another module which is based on the same version name.  */
+  resolver_name = make_name (default_decl, "resolver", make_unique);
+
+  /* The resolver function should return a (void *). */
+  type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  decl_name = get_identifier (resolver_name);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = TREE_USED (default_decl);
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  /* IFUNC resolvers have to be externally visible.  */
+  TREE_PUBLIC (decl) = 1;
+  DECL_UNINLINABLE (decl) = 1;
+
+  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl);
+  DECL_EXTERNAL (ifunc_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+  TREE_READONLY (decl) = 0;
+  DECL_PURE_P (decl) = 0;
+  DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
+    }
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  current_function_decl = decl;
+  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
+  cfun->curr_properties |=
+    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
+     PROP_ssa);
+  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
+  *empty_bb = new_bb;
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
+  cgraph_mark_force_output_node (cgraph_get_create_node (decl));
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      gcc_assert (cgraph_get_node (default_decl));
+      symtab_add_to_same_comdat_group (
+	(symtab_node) cgraph_get_node (decl),
+	(symtab_node) cgraph_get_node (default_decl));
+    }
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  gcc_assert (ifunc_decl != NULL);
+  /* Mark ifunc_decl as "ifunc" with resolver as resolver_name.  */
+  DECL_ATTRIBUTES (ifunc_decl) 
+    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl));
+
+  /* Create the alias here.  */
+  cgraph_create_function_alias (ifunc_decl, decl);
+  return decl;
+}
+
+/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to
+   DECL function will be replaced with calls to the ifunc.   Return the decl
+   of the ifunc created.  */
+
+static tree
+make_ifunc_func (const tree decl)
+{
+  tree ifunc_decl;
+  char *ifunc_name, *resolver_name;
+  tree fn_type, ifunc_type;
+  bool make_unique = false;
+
+  if (TREE_PUBLIC (decl) == 0)
+    make_unique = true;
+
+  ifunc_name = make_name (decl, "ifunc", make_unique);
+  resolver_name = make_name (decl, "resolver", make_unique);
+  gcc_assert (resolver_name);
+
+  fn_type = TREE_TYPE (decl);
+  ifunc_type = build_function_type (TREE_TYPE (fn_type),
+				    TYPE_ARG_TYPES (fn_type));
+  
+  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type);
+  TREE_USED (ifunc_decl) = 1;
+  DECL_CONTEXT (ifunc_decl) = NULL_TREE;
+  DECL_INITIAL (ifunc_decl) = error_mark_node;
+  DECL_ARTIFICIAL (ifunc_decl) = 1;
+  /* Mark this ifunc as external, the resolver will flip it again if
+     it gets generated.  */
+  DECL_EXTERNAL (ifunc_decl) = 1;
+  /* IFUNCs have to be externally visible.  */
+  TREE_PUBLIC (ifunc_decl) = 1;
+
+  return ifunc_decl;  
+}
+
+/* For multi-versioned function decl, which should also be the default,
+   return the decl of the ifunc resolver, create it if it does not
+   exist.  */
+
+tree
+get_ifunc_for_version (const tree decl)
+{
+  version_function *decl_v;
+  int ix;
+  void_p ele;
+
+  /* DECL has to be the default version, otherwise it is missing and
+     that is not allowed.  */
+  if (!is_default_function (decl))
+    {
+      error_at (DECL_SOURCE_LOCATION (decl), "Default version not found");
+      return decl;
+    }
+
+  decl_v = find_function_version (decl);
+  gcc_assert (decl_v != NULL);
+  if (decl_v->ifunc_decl == NULL)
+    {
+      tree ifunc_decl;
+      ifunc_decl = make_ifunc_func (decl);
+      decl_v->ifunc_decl = ifunc_decl;
+    }
+
+  if (cgraph_get_node (decl))
+    cgraph_mark_force_output_node (cgraph_get_node (decl));
+
+  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
+    {
+      version_function *v = (version_function *) ele;
+      /* This could be a deleted version.  Happens with
+	 duplicate declarations. */
+      if (v->is_deleted)
+	continue;
+      gcc_assert (v->decl != NULL);
+      if (cgraph_get_node (v->decl))
+	cgraph_mark_force_output_node (cgraph_get_node (v->decl));
+    }
+
+  return decl_v->ifunc_decl;
+}
+
+/* Generate the dispatching code to dispatch multi-versioned function
+   DECL.  Make a new function decl for dispatching and call the target
+   hook to process the "target" attributes and provide the code to
+   dispatch the right function at run-time.  */
+
+static tree
+make_ifunc_resolver_for_version (const tree decl)
+{
+  version_function *decl_v;
+  tree ifunc_resolver_decl, ifunc_decl;
+  basic_block empty_bb;
+  int ix;
+  void_p ele;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  tree old_current_function_decl;
+
+  gcc_assert (is_default_function (decl));
+
+  decl_v = find_function_version (decl);
+  gcc_assert (decl_v != NULL);
+
+  if (decl_v->ifunc_resolver_decl != NULL)
+    return decl_v->ifunc_resolver_decl;
+
+  ifunc_decl = decl_v->ifunc_decl;
+
+  if (ifunc_decl == NULL)
+    ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl);
+
+  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl,
+						  &empty_bb);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (ifunc_resolver_decl));
+  current_function_decl = ifunc_resolver_decl;
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+  VEC_safe_push (tree, heap, fn_ver_vec, decl);
+
+  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
+    {
+      version_function *v = (version_function *) ele;
+      gcc_assert (v->decl != NULL);
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (v->decl))
+        error_at (DECL_SOURCE_LOCATION (v->decl),
+		  "Virtual function versioning not supported\n");
+      if (!v->is_deleted)
+	VEC_safe_push (tree, heap, fn_ver_vec, v->decl);
+    }
+
+  gcc_assert (targetm.dispatch_version);
+  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb);
+  decl_v->ifunc_resolver_decl = ifunc_resolver_decl;
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+  return ifunc_resolver_decl;
+}
+
+/* Main entry point to pass_dispatch_versions. For multi-versioned functions,
+   generate the dispatching code.  */
+
+static unsigned int
+do_dispatch_versions (void)
+{
+  /* A new pass for generating dispatch code for multi-versioned functions.
+     Other forms of dispatch can be added when ifunc support is not available
+     like just calling the function directly after checking for target type.
+     Currently, dispatching is done through IFUNC.  This pass will become
+     more meaningful when other dispatch mechanisms are added.  */
+
+  /* Cloning a function to produce more versions will happen here when the
+     user requests that via the target attribute. For example,
+     int foo () __attribute__ ((target(("arch=core2"), ("arch=corei7"))));
+     means that the user wants the same body of foo to be versioned for core2
+     and corei7.  In that case, this function will be cloned during this
+     pass.  */
+  
+  if (DECL_FUNCTION_VERSIONED (current_function_decl)
+      && is_default_function (current_function_decl))
+    {
+      tree decl = make_ifunc_resolver_for_version (current_function_decl);
+      if (dump_file && decl)
+	dump_function_to_file (decl, dump_file, TDF_BLOCKS);
+    }
+  return 0;
+}
+
+static  bool
+gate_dispatch_versions (void)
+{
+  return true;
+}
+
+/* A pass to generate the dispatch code to execute the appropriate version
+   of a multi-versioned function at run-time.  */
+
+struct gimple_opt_pass pass_dispatch_versions =
+{
+ {
+  GIMPLE_PASS,
+  "dispatch_multiversion_functions",    /* name */
+  gate_dispatch_versions,		/* gate */
+  do_dispatch_versions,			/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_MULTIVERSION_DISPATCH,		/* tv_id */
+  PROP_cfg,				/* properties_required */
+  PROP_cfg,				/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  0					/* todo_flags_finish */
+ }
+};
Index: gcc/cgraphunit.c
===================================================================
--- gcc/cgraphunit.c	(revision 187346)
+++ gcc/cgraphunit.c	(working copy)
@@ -420,6 +420,13 @@ cgraph_finalize_function (tree decl, bool nested)
       && !DECL_DISREGARD_INLINE_LIMITS (decl))
     node->symbol.force_output = 1;
 
+  /* With function versions, keep inline functions and do not worry about
+     inline limits.  */
+  if (DECL_FUNCTION_VERSIONED (decl)
+      && DECL_DECLARED_INLINE_P (decl)
+      && !DECL_EXTERNAL (decl))
+    node->symbol.force_output = 1;
+
   /* When not optimizing, also output the static functions. (see
      PR24561), but don't do so for always_inline functions, functions
      declared inline and nested functions.  These were optimized out
Index: gcc/multiversion.h
===================================================================
--- gcc/multiversion.h	(revision 0)
+++ gcc/multiversion.h	(revision 0)
@@ -0,0 +1,55 @@
+/* Function Multiversioning.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+/* This is the header file which provides the functions to keep track
+   of functions that are multi-versioned and to generate the dispatch
+   code to call the right version at run-time.  */
+
+#ifndef GCC_MULTIVERSION_H
+#define GCC_MULTIVERION_H
+
+#include "tree.h"
+
+/* Mark DECL1 and DECL2 as function versions.  */
+int group_function_versions (const tree decl1, const tree decl2);
+
+/* Mark DECL as deleted and no longer a version.  */
+void mark_delete_decl_version (const tree decl);
+
+/* Returns true if DECL is the default version to be executed if all
+   other versions are inappropriate at run-time.  */
+bool is_default_function (const tree decl);
+
+/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL
+   must be the default function in the multi-versioned group.  */
+tree get_ifunc_for_version (const tree decl);
+
+/* Returns true when only one of DECL1 and DECL2 is marked with "target"
+   or if the "target" attribute strings of  DECL1 and DECL2 dont match.  */
+bool has_different_version_attributes (const tree decl1, const tree decl2);
+
+/* Function DECL is marked to be a multi-versioned function.  If DECL is
+   not the default version, the assembler name of DECL is changed to include
+   the attribute string to keep the name unambiguous.  */
+void mark_function_as_version (const tree decl);
+
+/* Check if decl is FUNCTION_DECL with target attribute set.  */
+bool is_target_attribute_set (const tree decl);
+#endif
Index: gcc/testsuite/g++.dg/mv1.C
===================================================================
--- gcc/testsuite/g++.dg/mv1.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv1.C	(revision 0)
@@ -0,0 +1,199 @@
+/* Test case to check if Multiversioning works.  */
+/* { dg-do run } */
+/* { dg-options "-O2 -fPIC" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* Check combination of target attributes.  */
+int foo () __attribute__ ((target("arch=corei7,sse4.2,popcnt")));
+/* The target operands in this declaration and the definition are re-ordered.
+   This should still work.  */
+int foo () __attribute__ ((target("ssse3,avx2")));
+
+/* Check for all target attributes for which dispatchers are available.  */
+/* Check arch= */
+int foo () __attribute__((target("arch=core2")));
+int foo () __attribute__((target("arch=corei7")));
+int foo () __attribute__((target("arch=atom")));
+int foo () __attribute__((target("arch=amdfam10")));
+int foo () __attribute__((target("arch=bdver1")));
+int foo () __attribute__((target("arch=bdver2")));
+/* Check ISA, latest features should be ahead.  */
+int foo () __attribute__((target("avx2")));
+int foo () __attribute__((target("avx")));
+int foo () __attribute__((target("popcnt")));
+int foo () __attribute__((target("sse4.2")));
+int foo () __attribute__((target("sse4.1")));
+int foo () __attribute__((target("ssse3")));
+int foo () __attribute__((target("mmx")));
+int foo () __attribute__((target("sse3")));
+int foo () __attribute__((target("sse2")));
+int foo () __attribute__((target("sse")));
+
+int (*p)() = &foo;
+
+int main ()
+{
+  int val = foo ();
+
+  /* Check if calling foo via ptr is the same.  */
+  assert (val ==  (*p)());
+
+  if (__builtin_cpu_is ("corei7")
+      && __builtin_cpu_supports ("sse4.2")
+      && __builtin_cpu_supports ("popcnt"))
+    assert (val == 1);
+  else if (__builtin_cpu_supports ("avx2")
+	   && __builtin_cpu_supports ("ssse3"))
+    assert (val == 2);
+  else if (__builtin_cpu_is ("core2"))
+    assert (val == 3);
+  else if (__builtin_cpu_is ("corei7"))
+    assert (val == 4);
+  else if (__builtin_cpu_is ("atom"))
+    assert (val == 5);
+  else if (__builtin_cpu_is ("amdfam10h"))
+    assert (val == 6);
+  else if (__builtin_cpu_is ("bdver1"))
+    assert (val == 7);
+  else if (__builtin_cpu_is ("bdver2"))
+    assert (val == 8);
+  else if (__builtin_cpu_supports ("avx2"))
+    assert (val == -1);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == -2);
+  else if (__builtin_cpu_supports ("popcnt"))
+    assert (val == -3);
+  else if (__builtin_cpu_supports ("sse4.2"))
+    assert (val == -4);
+  else if (__builtin_cpu_supports ("sse4.1"))
+    assert (val == -5);
+  else if (__builtin_cpu_supports ("ssse3"))
+    assert (val == -6);
+  else if (__builtin_cpu_supports ("mmx"))
+    assert (val == -7);
+  else if (__builtin_cpu_supports ("sse3"))
+    assert (val == -8);
+  else if (__builtin_cpu_supports ("sse2"))
+    assert (val == -9);
+  else if (__builtin_cpu_supports ("sse"))
+    assert (val == -10);
+  else
+    assert (val == 0);
+
+  return 0;
+}
+
+int foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=corei7,sse4.2,popcnt")))
+foo ()
+{
+  return 1;
+}
+
+int __attribute__ ((target("avx2,ssse3")))
+foo ()
+{
+  return 2;
+}
+int __attribute__ ((target("arch=core2")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("arch=corei7")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("arch=atom")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("arch=amdfam10")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("arch=bdver1")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("arch=bdver2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return -1;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return -2;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return -3;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return -4;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return -5;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return -6;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return -7;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return -8;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return -9;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return -10;
+}
Index: gcc/cp/class.c
===================================================================
--- gcc/cp/class.c	(revision 187346)
+++ gcc/cp/class.c	(working copy)
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-dump.h"
 #include "splay-tree.h"
 #include "pointer-set.h"
+#include "multiversion.h"
 
 /* The number of nested classes being processed.  If we are not in the
    scope of any class, this is zero.  */
@@ -1093,7 +1094,21 @@ add_method (tree type, tree method, tree using_dec
 	      || same_type_p (TREE_TYPE (fn_type),
 			      TREE_TYPE (method_type))))
 	{
-	  if (using_decl)
+	  /* For function versions, their parms and types match
+	     but they are not duplicates.  Record function versions
+	     as and when they are found.  */
+	  if (TREE_CODE (fn) == FUNCTION_DECL
+	      && TREE_CODE (method) == FUNCTION_DECL
+	      && (is_target_attribute_set (fn)
+		  || is_target_attribute_set (method))
+	      && has_different_version_attributes (fn, method))
+ 	    {
+	      mark_function_as_version (fn);
+	      mark_function_as_version (method);
+	      group_function_versions (fn, method);
+	      continue;
+	    }
+	  else if (using_decl)
 	    {
 	      if (DECL_CONTEXT (fn) == type)
 		/* Defer to the local function.  */
@@ -1151,6 +1166,7 @@ add_method (tree type, tree method, tree using_dec
   else
     /* Replace the current slot.  */
     VEC_replace (tree, method_vec, slot, overload);
+
   return true;
 }
 
@@ -6928,8 +6944,11 @@ resolve_address_of_overloaded_function (tree targe
 	  if (DECL_ANTICIPATED (fn))
 	    continue;
 
-	  /* See if there's a match.  */
-	  if (same_type_p (target_fn_type, static_fn_type (fn)))
+	  /* See if there's a match.   For functions that are multi-versioned
+	     match it to the default function.  */
+	  if (same_type_p (target_fn_type, static_fn_type (fn))
+	      && (!DECL_FUNCTION_VERSIONED (fn)
+		  || is_default_function (fn)))
 	    matches = tree_cons (fn, NULL_TREE, matches);
 	}
     }
@@ -7091,6 +7110,22 @@ resolve_address_of_overloaded_function (tree targe
       perform_or_defer_access_check (access_path, fn, fn);
     }
 
+  /* If a pointer to a function that is multi-versioned is requested, the
+     pointer to the dispatcher function is returned instead.  This works
+     well because indirectly calling the function will dispatch the right
+     function version at run-time. Also, the function address is kept
+     unique.  */
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && is_default_function (fn))
+    {
+      tree ifunc_decl;
+      ifunc_decl = get_ifunc_for_version (fn);
+      retrofit_lang_decl (ifunc_decl);
+      gcc_assert (ifunc_decl != NULL);
+      mark_used (fn);
+      return build_fold_addr_expr (ifunc_decl);
+    }
+
   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
     return cp_build_addr_expr (fn, flags);
   else
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	(revision 187346)
+++ gcc/cp/decl.c	(working copy)
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "pointer-set.h"
 #include "splay-tree.h"
 #include "plugin.h"
+#include "multiversion.h"
 
 /* Possible cases of bad specifiers type used by bad_specifiers. */
 enum bad_spec_place {
@@ -973,6 +974,21 @@ decls_match (tree newdecl, tree olddecl)
       if (t1 != t2)
 	return 0;
 
+      /* The decls dont match if they correspond to two different versions
+	 of the same function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) 
+	  && has_different_version_attributes (newdecl, olddecl))
+	{
+	  /* One of the decls could be the default without the "target"
+	     attribute. Set it to be a versioned function here.  */
+	  mark_function_as_version (newdecl);
+	  mark_function_as_version (olddecl);
+	  /* Accumulate all the versions of a function.  */
+	  group_function_versions (olddecl, newdecl);
+	  return 0;
+	}
+
       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
 	  && ! (DECL_EXTERN_C_P (newdecl)
 		&& DECL_EXTERN_C_P (olddecl)))
@@ -1490,7 +1506,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
 	      error ("previous declaration %q+#D here", olddecl);
 	      return NULL_TREE;
 	    }
-	  else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
+	  /* For function versions, params and types match, but they
+	     are not ambiguous.  */
+	  else if ((!DECL_FUNCTION_VERSIONED (newdecl)
+		    && !DECL_FUNCTION_VERSIONED (olddecl))
+		   && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
 			      TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
 	    {
 	      error ("new declaration %q#D", newdecl);
@@ -2262,6 +2282,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
+  /* If the olddecl is a version, so is the newdecl.  */
+  if (TREE_CODE (newdecl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (olddecl))
+    {
+      DECL_FUNCTION_VERSIONED (newdecl) = 1;
+      /* Record that newdecl is not a valid version and has
+	 been deleted.  */
+      mark_delete_decl_version (newdecl);
+    }
+
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
     {
       int function_size;
@@ -3810,6 +3840,7 @@ cp_make_fname_decl (location_t loc, tree id, int t
 			    ? NULL : fname_as_string (type_dep));
   tree type;
   tree init = cp_fname_init (name, &type);
+
   tree decl = build_decl (loc, VAR_DECL, id, type);
 
   if (name)
@@ -14036,7 +14067,11 @@ cxx_comdat_group (tree decl)
 	  else
 	    break;
 	}
-      name = DECL_ASSEMBLER_NAME (decl);
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl))
+	name = DECL_NAME (decl);
+      else
+        name = DECL_ASSEMBLER_NAME (decl);
     }
 
   return name;
Index: gcc/cp/semantics.c
===================================================================
--- gcc/cp/semantics.c	(revision 187346)
+++ gcc/cp/semantics.c	(working copy)
@@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn)
       /* If the user wants us to keep all inline functions, then mark
 	 this function as needed so that finish_file will make sure to
 	 output it later.  Similarly, all dllexport'd functions must
-	 be emitted; there may be callers in other DLLs.  */
-      if ((flag_keep_inline_functions
+	 be emitted; there may be callers in other DLLs.
+	 Also, mark this function as needed if it is marked inline but
+	 is a multi-versioned function.  */
+      if (((flag_keep_inline_functions
+	    || DECL_FUNCTION_VERSIONED (fn))
 	   && DECL_DECLARED_INLINE_P (fn)
 	   && !DECL_REALLY_EXTERN (fn))
 	  || (flag_keep_inline_dllexport
Index: gcc/cp/decl2.c
===================================================================
--- gcc/cp/decl2.c	(revision 187346)
+++ gcc/cp/decl2.c	(working copy)
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "splay-tree.h"
 #include "langhooks.h"
 #include "c-family/c-ada-spec.h"
+#include "multiversion.h"
 
 extern cpp_reader *parse_in;
 
@@ -677,9 +678,13 @@ check_classfn (tree ctype, tree function, tree tem
 	  if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
 	    continue;
 
+	  /* While finding a match, same types and params are not enough
+	     if the function is versioned.  Also check version ("target")
+	     attributes.  */
 	  if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
 			   TREE_TYPE (TREE_TYPE (fndecl)))
 	      && compparms (p1, p2)
+	      && !has_different_version_attributes (function, fndecl)
 	      && (!is_template
 		  || comp_template_parms (template_parms,
 					  DECL_TEMPLATE_PARMS (fndecl)))
Index: gcc/cp/call.c
===================================================================
--- gcc/cp/call.c	(revision 187346)
+++ gcc/cp/call.c	(working copy)
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "timevar.h"
+#include "multiversion.h"
 
 /* The various kinds of conversion.  */
 
@@ -3903,6 +3904,16 @@ build_new_function_call (tree fn, VEC(tree,gc) **a
     {
       if (complain & tf_error)
 	{
+	  /* If the call is to a multiversioned function without
+	     a default version, overload resolution will fail.  */
+	  if (candidates
+	      && TREE_CODE (candidates->fn) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (candidates->fn))
+	    error_at (location_of (DECL_NAME (OVL_CURRENT (fn))),
+		      "Call to multiversioned function %<%D(%A)%> with"
+		      " no default version", DECL_NAME (OVL_CURRENT (fn)),
+		      build_tree_list_vec (*args));
+
 	  if (!any_viable_p && candidates && ! candidates->next
 	      && (TREE_CODE (candidates->fn) == FUNCTION_DECL))
 	    return cp_build_function_call_vec (candidates->fn, args, complain);
@@ -6824,6 +6835,19 @@ build_over_call (struct z_candidate *cand, int fla
   if (!already_used)
     mark_used (fn);
 
+  /* For a call to a multi-versioned function, the call should actually be to
+     the dispatcher.  */
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && is_default_function (fn))
+    {
+      tree ifunc_decl;
+      ifunc_decl = get_ifunc_for_version (fn);
+      retrofit_lang_decl (ifunc_decl);
+      gcc_assert (ifunc_decl != NULL);
+      return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl,
+					nargs, argarray);
+    }
+
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
     {
       tree t;
@@ -8081,6 +8105,60 @@ joust (struct z_candidate *cand1, struct z_candida
   size_t i;
   size_t len;
 
+  /* For Candidates of a multi-versioned function, first check if the
+     target flags of the caller match any of the candidates. If so,
+     the caller can directly call this candidate otherwise the one marked
+     default wins.  This is because the default decl is used as key to
+     aggregate all the other versions provided for it in multiversion.c.
+     When generating the actual call, the appropriate dispatcher is created
+     to call the right function version at run-time.  */
+
+  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL
+       && DECL_FUNCTION_VERSIONED (cand1->fn))
+      ||(TREE_CODE (cand2->fn) == FUNCTION_DECL
+	 && DECL_FUNCTION_VERSIONED (cand2->fn)))
+    {
+      /* Both functions must be marked versioned.  */
+      gcc_assert (DECL_FUNCTION_VERSIONED (cand1->fn)
+		  && DECL_FUNCTION_VERSIONED (cand2->fn));
+
+      /* Try to see if a direct call can be made to a version.  This is
+	 possible if the caller and callee have the same target flags.
+	 If cand->fn is marked with target attributes,  check if the
+	 target approves inlining this into the caller.  If so, this is
+	 the version we want.  */
+
+      if (is_target_attribute_set (cand1->fn)
+	  && targetm.target_option.can_inline_p (current_function_decl,
+						 cand1->fn))
+	return 1;
+
+      if (is_target_attribute_set (cand2->fn)
+	  && targetm.target_option.can_inline_p (current_function_decl,
+						 cand2->fn))
+	return -1;
+
+      /* A direct call to a version is not possible, so find the default
+	 function and return it.  This will later be converted to dispatch
+	 the right version at run time.  */
+
+      if (is_default_function (cand1->fn))
+	{
+          mark_used (cand2->fn);
+	  return 1;
+	}
+
+      if (is_default_function (cand2->fn))
+	{
+          mark_used (cand1->fn);
+	  return -1;
+	}
+
+      /* If a default function is absent, this will never get resolved leading
+	 to an ambiguous call error.  */
+      return 0;
+    }
+
   /* Candidates that involve bad conversions are always worse than those
      that don't.  */
   if (cand1->viable > cand2->viable)
Index: gcc/timevar.def
===================================================================
--- gcc/timevar.def	(revision 187346)
+++ gcc/timevar.def	(working copy)
@@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE        , "tree if-co
 DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
 DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
 DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
+DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch")
 
 /* Everything else in rest_of_compilation not included above.  */
 DEFTIMEVAR (TV_EARLY_LOCAL	     , "early local passes")
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 187346)
+++ gcc/Makefile.in	(working copy)
@@ -1297,6 +1297,7 @@ OBJS = \
 	mcf.o \
 	mode-switching.o \
 	modulo-sched.o \
+	multiversion.o \
 	omega.o \
 	omp-low.o \
 	optabs.o \
@@ -3042,6 +3043,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h
    $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
    $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
    $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
+multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+   $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
+   $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
+   $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
+   $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h
 cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 187346)
+++ gcc/passes.c	(working copy)
@@ -1293,6 +1293,7 @@ init_optimization_passes (void)
   NEXT_PASS (pass_build_cfg);
   NEXT_PASS (pass_warn_function_return);
   NEXT_PASS (pass_build_cgraph_edges);
+  NEXT_PASS (pass_dispatch_versions);
   *p = NULL;
 
   /* Interprocedural optimization passes.  */
Index: gcc/cp/mangle.c
===================================================================
--- gcc/cp/mangle.c	(revision 187346)
+++ gcc/cp/mangle.c	(working copy)
@@ -1245,7 +1245,12 @@ write_unqualified_name (const tree decl)
     {
       MANGLE_TRACE_TREE ("local-source-name", decl);
       write_char ('L');
-      write_source_name (DECL_NAME (decl));
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && DECL_ASSEMBLER_NAME_SET_P (decl))
+	write_source_name (DECL_ASSEMBLER_NAME (decl));
+      else
+	write_source_name (DECL_NAME (decl));
       /* The default discriminator is 1, and that's all we ever use,
 	 so there's no code to output one here.  */
     }
@@ -1260,7 +1265,14 @@ write_unqualified_name (const tree decl)
                && LAMBDA_TYPE_P (type))
         write_closure_type_name (type);
       else
-        write_source_name (DECL_NAME (decl));
+	{
+	  if (TREE_CODE (decl) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (decl)
+	      && DECL_ASSEMBLER_NAME_SET_P (decl))
+	    write_source_name (DECL_ASSEMBLER_NAME (decl));
+	  else
+	    write_source_name (DECL_NAME (decl));
+	}
     }
 }
 
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 187346)
+++ gcc/config/i386/i386.c	(working copy)
@@ -27715,6 +27715,326 @@ ix86_init_mmx_sse_builtins (void)
     }
 }
 
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
+   to return a pointer to VERSION_DECL if the outcome of the expression
+   formed by PREDICATE_CHAIN is true.  This function will be called during
+   version dispatch to decide which function version to execute.  It returns
+   the basic block at the end to which more conditions can be added.  */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     tree predicate_chain, basic_block new_bb)
+{
+  gimple return_stmt;
+  tree convert_expr, result_var;
+  gimple convert_stmt;
+  gimple call_cond_stmt;
+  gimple if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+
+  tree cond_var, and_expr_var = NULL_TREE;
+  gimple_seq gseq;
+
+  tree old_current_function_decl;
+  tree predicate_decl, predicate_arg;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+  current_function_decl = function_decl;
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node, NULL);
+  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  return_stmt = gimple_build_return (result_var);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      current_function_decl = old_current_function_decl;
+      return new_bb;
+    }
+
+  while (predicate_chain != NULL)
+    {
+      cond_var = create_tmp_var (integer_type_node, NULL);
+      predicate_decl = TREE_PURPOSE (predicate_chain);
+      predicate_arg = TREE_VALUE (predicate_chain);
+      call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+      gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+      gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+      gimple_set_bb (call_cond_stmt, new_bb);
+      gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+      predicate_chain = TREE_CHAIN (predicate_chain);
+      
+      if (and_expr_var == NULL)
+        and_expr_var = cond_var;
+      else
+	{
+	  gimple assign_stmt;
+	  /* Use MIN_EXPR to check if any integer is zero?.
+	     and_expr_var = min_expr <cond_var, and_expr_var>  */
+	  assign_stmt = gimple_build_assign (and_expr_var,
+			  build2 (MIN_EXPR, integer_type_node,
+				  cond_var, and_expr_var));
+
+	  gimple_set_block (assign_stmt, DECL_INITIAL (function_decl));
+	  gimple_set_bb (assign_stmt, new_bb);
+	  gimple_seq_add_stmt (&gseq, assign_stmt);
+	}
+    }
+
+  if_else_stmt = gimple_build_cond (GT_EXPR, and_expr_var,
+	  		            integer_zero_node,
+				    NULL_TREE, NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR, 0);
+
+  rebuild_cgraph_edges ();
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  return bb3;
+}
+
+/* This parses the attribute arguments to target in DECL and determines
+   the right builtin to use to match the platform specification.
+   For now, only one target argument ("arch=" or "<-m>xxx") is allowed.  */
+
+static tree 
+get_builtin_code_for_version (tree decl)
+{
+  tree attrs;
+  struct cl_target_option cur_target;
+  tree target_node;
+  struct cl_target_option *new_target;
+  const char *arg_str = NULL;
+  const char *attrs_str = NULL;
+  char *tok_str = NULL;
+  char *token;
+  /* These are the target attribute strings for which a dispatcher is
+     available, from fold_builtin_cpu.  */
+  const char *feature_list[] = {"mmx", "popcnt", "sse", "sse2", "sse3",
+				"ssse3", "sse4.1", "sse4.2", "avx", "avx2"};
+  unsigned int NUM_FEATURES = sizeof (feature_list) / sizeof (const char *);
+  unsigned int i;
+  tree predicate_chain = NULL_TREE;
+  tree predicate_decl, predicate_arg;
+
+  attrs = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  gcc_assert (attrs != NULL);
+
+  attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+  gcc_assert (TREE_CODE (attrs) == STRING_CST);
+  attrs_str = TREE_STRING_POINTER (attrs);
+
+  /* Handle arch= if specified.  */
+  if (strstr (attrs_str, "arch=") != NULL)
+    {
+      cl_target_option_save (&cur_target, &global_options);
+      target_node = ix86_valid_target_attribute_tree (attrs);
+    
+      gcc_assert (target_node);
+      new_target = TREE_TARGET_OPTION (target_node);
+      gcc_assert (new_target);
+      
+      if (new_target->arch_specified && new_target->arch > 0)
+	{
+	  switch (new_target->arch)
+	    {
+	    case PROCESSOR_CORE2_32:
+	    case PROCESSOR_CORE2_64:
+	      arg_str = "core2";
+	      break;
+	    case PROCESSOR_COREI7_32:
+	    case PROCESSOR_COREI7_64:
+	      arg_str = "corei7";
+	      break;
+	    case PROCESSOR_ATOM:
+	      arg_str = "atom";
+	      break;
+	    case PROCESSOR_AMDFAM10:
+	      arg_str = "amdfam10h";
+	      break;
+	    case PROCESSOR_BDVER1:
+	      arg_str = "bdver1";
+	      break;
+	    case PROCESSOR_BDVER2:
+	      arg_str = "bdver2";
+	      break;
+	    }  
+	}    
+    
+      cl_target_option_restore (&global_options, &cur_target);
+      if (arg_str == NULL)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+	    	"No dispatcher found for the versioning attributes");
+	  return NULL;
+	}
+    
+      predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
+      /* For a C string literal the length includes the trailing NULL.  */
+      predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+      predicate_chain = tree_cons (predicate_decl, predicate_arg,
+				   predicate_chain);
+    }
+
+  /* Process feature name.  */
+  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
+  strcpy (tok_str, attrs_str);
+  token = strtok (tok_str, ",");
+  predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
+
+  while (token != NULL)
+    {
+      /* Do not process "arch="  */
+      if (strncmp (token, "arch=", 5) == 0)
+	{
+	  token = strtok (NULL, ",");
+	  continue;
+	}
+      for (i = 0; i < NUM_FEATURES; ++i)
+	{
+	  if (strcmp (token, feature_list[i]) == 0)
+	    {
+	      predicate_arg = build_string_literal (
+				strlen (feature_list[i]) + 1,
+				feature_list[i]);
+	      predicate_chain = tree_cons (predicate_decl, predicate_arg,
+					   predicate_chain);
+	      break;
+	    }
+	}
+      if (i == NUM_FEATURES)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+		    "No dispatcher found for %s", token);
+	  return NULL;
+	}
+      token = strtok (NULL, ",");
+    }
+  free (tok_str);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+	        "No dispatcher found for the versioning attributes : %s",
+	        attrs_str);
+      return NULL;
+    }
+
+  predicate_chain = nreverse (predicate_chain);
+  return predicate_chain; 
+} 
+
+/* This is the target hook to generate the dispatch function for
+   multi-versioned functions.  DISPATCH_DECL is the function which will
+   contain the dispatch logic.  FNDECLS are the function choices for
+   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
+   in DISPATCH_DECL in which the dispatch code is generated.  */
+
+static int
+ix86_dispatch_version (tree dispatch_decl,
+		       void *fndecls_p,
+		       basic_block *empty_bb)
+{
+  tree default_decl;
+  gimple ifunc_cpu_init_stmt;
+  gimple_seq gseq;
+  tree old_current_function_decl;
+  int ix;
+  tree ele;
+  VEC (tree, heap) *fndecls;
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /*fndecls_p is actually a vector.  */
+  fndecls = (VEC (tree, heap) *)fndecls_p;
+
+  /* Atleast one more version other than the default.  */
+  gcc_assert (VEC_length (tree, fndecls) >= 2);
+
+  /* The first version in the vector is the default decl.  */
+  default_decl = VEC_index (tree, fndecls, 0);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
+  current_function_decl = dispatch_decl;
+
+  gseq = bb_seq (*empty_bb);
+  ifunc_cpu_init_stmt = gimple_build_call_vec (
+                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
+  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
+  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
+  set_bb_seq (*empty_bb, gseq);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+
+  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
+    {
+      tree version_decl = ele;
+      tree predicate_chain = NULL_TREE;
+      /* Get attribute string, parse it and find the right predicate decl.
+         The predicate function could be a lengthy combination of many
+	 features, like arch-type and various isa-variants.  */
+      predicate_chain = get_builtin_code_for_version (version_decl);
+
+      if (predicate_chain == NULL_TREE)
+	continue;
+
+      *empty_bb = add_condition_to_bb (dispatch_decl, version_decl,
+				       predicate_chain, *empty_bb);
+
+    }
+  /* dispatch default version at the end.  */
+  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl,
+				   NULL, *empty_bb);
+  return 0;
+}
+
 /* This builds the processor_model struct type defined in
    libgcc/config/i386/i386-cpuinfo.c  */
 
@@ -39591,6 +39911,9 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN ix86_fold_builtin
 
+#undef TARGET_DISPATCH_VERSION
+#define TARGET_DISPATCH_VERSION ix86_dispatch_version
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
 
Index: gcc/cp/error.c
===================================================================
--- gcc/cp/error.c	(revision 187346)
+++ gcc/cp/error.c	(working copy)
@@ -1534,8 +1534,15 @@ dump_exception_spec (tree t, int flags)
 static void
 dump_function_name (tree t, int flags)
 {
-  tree name = DECL_NAME (t);
+  tree name;
 
+  /* For function versions, use the assembler name as the decl name is
+     the same for all versions.  */
+  if (DECL_FUNCTION_VERSIONED (t))
+    name = DECL_ASSEMBLER_NAME (t);
+  else
+    name = DECL_NAME (t);
+
   /* We can get here with a decl that was synthesized by language-
      independent machinery (e.g. coverage.c) in which case it won't
      have a lang_specific structure attached and DECL_CONSTRUCTOR_P

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-09 19:01                                   ` Sriraman Tallam
@ 2012-05-10 17:55                                     ` H.J. Lu
  2012-05-12  2:04                                       ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: H.J. Lu @ 2012-05-10 17:55 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

On Wed, May 9, 2012 at 12:01 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi,
>
> Attached new patch with more bug fixes. I will fix the dispatching
> method to use prioirty of attributes in the next iteration.
>
> Patch also available for review here:  http://codereview.appspot.com/5752064
>

The patch looks OK to me.  Since testcase depends on the dispatching
method,  I'd like to see the whole patch with the updated dispatching
method.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-10 17:55                                     ` H.J. Lu
@ 2012-05-12  2:04                                       ` Sriraman Tallam
  2012-05-12 13:38                                         ` H.J. Lu
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-05-12  2:04 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

[-- Attachment #1: Type: text/plain, Size: 891 bytes --]

Hi H.J.,

   I have updated the patch to improve the dispatching method like we
discussed. Each feature gets a priority now, and the dispatching is
done in priority order. Please see i386.c for the changes.

Patch also available for review here:  http://codereview.appspot.com/5752064

Thanks,
-Sri.

On Thu, May 10, 2012 at 10:55 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, May 9, 2012 at 12:01 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi,
>>
>> Attached new patch with more bug fixes. I will fix the dispatching
>> method to use prioirty of attributes in the next iteration.
>>
>> Patch also available for review here:  http://codereview.appspot.com/5752064
>>
>
> The patch looks OK to me.  Since testcase depends on the dispatching
> method,  I'd like to see the whole patch with the updated dispatching
> method.
>
> Thanks.
>
> --
> H.J.

[-- Attachment #2: mv_fe_patch.txt --]
[-- Type: text/plain, Size: 67215 bytes --]

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 187371)
+++ gcc/doc/tm.texi	(working copy)
@@ -10997,6 +10997,14 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb})
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 187371)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -10877,6 +10877,14 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@hook TARGET_DISPATCH_VERSION
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 187371)
+++ gcc/target.def	(working copy)
@@ -1249,6 +1249,15 @@ DEFHOOK
  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
  hook_tree_tree_int_treep_bool_null)
 
+/* Target hook to generate the dispatching code for calls to multi-versioned
+   functions.  DISPATCH_DECL is the function that will have the dispatching
+   logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the
+   basic bloc in DISPATCH_DECL which will contain the code.  */
+DEFHOOK
+(dispatch_version,
+ "",
+ int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL)
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 DEFHOOK
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 187371)
+++ gcc/tree.h	(working copy)
@@ -3528,6 +3528,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
 #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
 
+/* In FUNCTION_DECL, this is set if this function has other versions generated
+   using "target" attributes.  The default version is the one which does not
+   have any "target" attribute set. */
+#define DECL_FUNCTION_VERSIONED(NODE)\
+   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
+
 /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
    arguments/result/saved_tree fields by front ends.   It was either inherit
    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
@@ -3572,8 +3578,8 @@ struct GTY(()) tree_function_decl {
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
-
-  /* 1 bit left */
+  unsigned versioned_function : 1;
+  /* No bits left.  */
 };
 
 /* The source language of the translation-unit.  */
Index: gcc/tree-pass.h
===================================================================
--- gcc/tree-pass.h	(revision 187371)
+++ gcc/tree-pass.h	(working copy)
@@ -453,6 +453,7 @@ extern struct gimple_opt_pass pass_tm_memopt;
 extern struct gimple_opt_pass pass_tm_edges;
 extern struct gimple_opt_pass pass_split_functions;
 extern struct gimple_opt_pass pass_feedback_split_functions;
+extern struct gimple_opt_pass pass_dispatch_versions;
 
 /* IPA Passes */
 extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
Index: gcc/multiversion.c
===================================================================
--- gcc/multiversion.c	(revision 0)
+++ gcc/multiversion.c	(revision 0)
@@ -0,0 +1,833 @@
+/* Function Multiversioning.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+/* Holds the state for multi-versioned functions here. The front-end
+   updates the state as and when function versions are encountered.
+   This is then used to generate the dispatch code.  Also, the
+   optimization passes to clone hot paths involving versioned functions
+   will be done here.
+
+   Function versions are created by using the same function signature but
+   also tagging attribute "target" to specify the platform type for which
+   the version must be executed.  Here is an example:
+
+   int foo ()
+   {
+     printf ("Execute as default");
+     return 0;
+   }
+
+   int  __attribute__ ((target ("arch=corei7")))
+   foo ()
+   {
+     printf ("Execute for corei7");
+     return 0;
+   }
+   
+   int main ()
+   {
+     return foo ();
+   } 
+
+   The call to foo in main is replaced with a call to an IFUNC function that
+   contains the dispatch code to call the correct function version at
+   run-time.  */
+
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "tree-inline.h"
+#include "langhooks.h"
+#include "flags.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "toplev.h"
+#include "timevar.h"
+#include "params.h"
+#include "fibheap.h"
+#include "intl.h"
+#include "tree-pass.h"
+#include "hashtab.h"
+#include "coverage.h"
+#include "ggc.h"
+#include "tree-flow.h"
+#include "rtl.h"
+#include "ipa-prop.h"
+#include "basic-block.h"
+#include "toplev.h"
+#include "dbgcnt.h"
+#include "tree-dump.h"
+#include "output.h"
+#include "vecprim.h"
+#include "gimple-pretty-print.h"
+#include "ipa-inline.h"
+#include "target.h"
+#include "multiversion.h"
+
+typedef void * void_p;
+
+DEF_VEC_P (void_p);
+DEF_VEC_ALLOC_P (void_p, heap);
+
+/* Each function decl that is a function version gets an instance of this
+   structure.   Since this is called by the front-end, decl merging can
+   happen, where a decl created for a new declaration is merged with 
+   the old. In this case, the new decl is deleted and the IS_DELETED
+   field is set for the struct instance corresponding to the new decl.
+   IFUNC_DECL is the decl of the ifunc function for default decls.
+   IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS
+   is a vector containing the list of function versions  that are
+   the candidates for dispatch.  */
+
+typedef struct version_function_d {
+  tree decl;
+  tree ifunc_decl;
+  tree ifunc_resolver_decl;
+  VEC (void_p, heap) *versions;
+  bool is_deleted;
+} version_function;
+
+/* Hashmap has an entry for every function decl that has other function
+   versions.  For function decls that are the default, it also stores the
+   list of all the other function versions.  Each entry is a structure
+   of type version_function_d.  */
+static htab_t decl_version_htab = NULL;
+
+/* Hashtable helpers for decl_version_htab. */
+
+static hashval_t
+decl_version_htab_hash_descriptor (const void *p)
+{
+  const version_function *t = (const version_function *) p;
+  return htab_hash_pointer (t->decl);
+}
+
+/* Hashtable helper for decl_version_htab. */
+
+static int
+decl_version_htab_eq_descriptor (const void *p1, const void *p2)
+{
+  const version_function *t1 = (const version_function *) p1;
+  return htab_eq_pointer ((const void_p) t1->decl, p2);
+}
+
+/* Create the decl_version_htab.  */
+static void
+create_decl_version_htab (void)
+{
+  if (decl_version_htab == NULL)
+    decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor,
+				     decl_version_htab_eq_descriptor, NULL);
+}
+
+/* Creates an instance of version_function for decl DECL.  */
+
+static version_function*
+new_version_function (const tree decl)
+{
+  version_function *v;
+  v = (version_function *)xmalloc(sizeof (version_function));
+  v->decl = decl;
+  v->ifunc_decl = NULL;
+  v->ifunc_resolver_decl = NULL;
+  v->versions = NULL;
+  v->is_deleted = false;
+  return v;
+}
+
+/* Comparator function to be used in qsort routine to sort attribute
+   specification strings to "target".  */
+
+static int
+attr_strcmp (const void *v1, const void *v2)
+{
+  const char *c1 = *(char *const*)v1;
+  const char *c2 = *(char *const*)v2;
+  return strcmp (c1, c2);
+}
+
+/* STR is the argument to target attribute.  This function tokenizes
+   the comma separated arguments, sorts them and returns a string which
+   is a unique identifier for the comma separated arguments.  */
+
+static char *
+sorted_attr_string (const char *str)
+{
+  char **args = NULL;
+  char *attr_str, *ret_str;
+  char *attr = NULL;
+  unsigned int argnum = 1;
+  unsigned int i;
+
+  for (i = 0; i < strlen (str); i++)
+    if (str[i] == ',')
+      argnum++;
+
+  attr_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (attr_str, str);
+
+  for (i = 0; i < strlen (attr_str); i++)
+    if (attr_str[i] == '=')
+      attr_str[i] = '_';
+
+  if (argnum == 1)
+    return attr_str;
+
+  args = (char **)xmalloc (argnum * sizeof (char *));
+
+  i = 0;
+  attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      args[i] = attr;
+      i++;
+      attr = strtok (NULL, ",");
+    }
+
+  qsort (args, argnum, sizeof (char*), attr_strcmp);
+
+  ret_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (ret_str, args[0]);
+  for (i = 1; i < argnum; i++)
+    {
+      strcat (ret_str, "_");
+      strcat (ret_str, args[i]);
+    }
+
+  free (args);
+  free (attr_str);
+  return ret_str;
+}
+
+/* Returns true when only one of DECL1 and DECL2 is marked with "target"
+   or if the "target" attribute strings of DECL1 and DECL2 dont match.  */
+
+bool
+has_different_version_attributes (const tree decl1, const tree decl2)
+{
+  tree attr1, attr2;
+  char *c1, *c2;
+  bool ret = false;
+
+  if (TREE_CODE (decl1) != FUNCTION_DECL
+      || TREE_CODE (decl2) != FUNCTION_DECL)
+    return false;
+
+  attr1 = lookup_attribute ("target", DECL_ATTRIBUTES (decl1));
+  attr2 = lookup_attribute ("target", DECL_ATTRIBUTES (decl2));
+
+  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
+    return false;
+
+  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
+      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
+    return true;
+
+  c1 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1))));
+  c2 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2))));
+
+  if (strcmp (c1, c2) != 0)
+     ret = true;
+
+  free (c1);
+  free (c2);
+
+  return ret;
+}
+
+/* If this decl corresponds to a function and has "target" attribute,
+   append the attribute string to its assembler name.  */
+
+static void
+version_assembler_name (const tree decl)
+{
+  tree version_attr;
+  const char *orig_name, *version_string, *attr_str;
+  char *assembler_name;
+  tree assembler_name_tree;
+  
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+    return;
+
+  if (DECL_DECLARED_INLINE_P (decl)
+      &&lookup_attribute ("gnu_inline",
+			  DECL_ATTRIBUTES (decl)))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Function versions cannot be marked as gnu_inline,"
+	      " bodies have to be generated\n");
+
+  if (DECL_VIRTUAL_P (decl)
+      || DECL_VINDEX (decl))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Virtual function versioning not supported\n");
+
+  version_attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  /* target attribute string is NULL for default functions.  */
+  if (version_attr == NULL_TREE)
+    return;
+
+  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+  version_string
+    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
+
+  attr_str = sorted_attr_string (version_string);
+  assembler_name = (char *) xmalloc (strlen (orig_name)
+				     + strlen (attr_str) + 2);
+
+  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
+  if (dump_file)
+    fprintf (dump_file, "Assembler name set to %s for function version %s\n",
+	     assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
+
+  assembler_name_tree = get_identifier (assembler_name);
+
+  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree);
+  SET_DECL_RTL (decl, NULL);
+}
+
+void
+mark_function_as_version (const tree decl)
+{
+  if (DECL_FUNCTION_VERSIONED (decl))
+    return;
+  DECL_FUNCTION_VERSIONED (decl) = 1;
+  version_assembler_name (decl);
+}
+
+/* Returns true if function DECL has target attribute set.  This could be
+   a version.  */
+
+bool
+is_target_attribute_set (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && (lookup_attribute ("target", DECL_ATTRIBUTES (decl))
+	      != NULL_TREE));
+}
+
+/* Returns true if decl is multi-versioned and DECL is the default function,
+   that is it is not tagged with "target" attribute.  */
+
+bool
+is_default_function (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && (lookup_attribute ("target", DECL_ATTRIBUTES (decl))
+	      == NULL_TREE));	
+}
+
+/* For function decl DECL, find the version_function struct in the
+   decl_version_htab.  */
+
+static version_function *
+find_function_version (const tree decl)
+{
+  void *slot;
+
+  if (!DECL_FUNCTION_VERSIONED (decl))
+    return NULL;
+
+  if (!decl_version_htab)
+    return NULL;
+
+  slot = htab_find_with_hash (decl_version_htab, decl,
+                              htab_hash_pointer (decl));
+
+  if (slot != NULL)
+    return (version_function *)slot;
+
+  return NULL;
+}
+
+/* Record DECL as a function version by creating a version_function struct
+   for it and storing it in the hashtable.  */
+
+static version_function *
+add_function_version (const tree decl)
+{
+  void **slot;
+  version_function *v;
+
+  if (!DECL_FUNCTION_VERSIONED (decl))
+    return NULL;
+
+  create_decl_version_htab ();
+
+  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl,
+                                   htab_hash_pointer ((const void_p)decl),
+				   INSERT);
+
+  if (*slot != NULL)
+    return (version_function *)*slot;
+
+  v = new_version_function (decl);
+  *slot = v;
+
+  return v;
+}
+
+/* Push V into VEC only if it is not already present.  If already present
+   returns false.  */
+
+static bool
+push_function_version (version_function *v, VEC (void_p, heap) **vec)
+{
+  int ix;
+  void_p ele; 
+  for (ix = 0; VEC_iterate (void_p, *vec, ix, ele); ++ix)
+    {
+      if (ele == (void_p)v)
+        return false;
+    }
+
+  VEC_safe_push (void_p, heap, *vec, (void*)v);
+  return true;
+}
+ 
+/* Mark DECL as deleted.  This is called by the front-end when a duplicate
+   decl is merged with the original decl and the duplicate decl is deleted.
+   This function marks the duplicate_decl as invalid.  Called by
+   duplicate_decls in cp/decl.c.  */
+
+void
+mark_delete_decl_version (const tree decl)
+{
+  version_function *decl_v;
+
+  decl_v = find_function_version (decl);
+
+  if (decl_v == NULL)
+    return;
+
+  decl_v->is_deleted = true;
+
+  if (is_default_function (decl)
+      && decl_v->versions != NULL)
+    {
+      VEC_truncate (void_p, decl_v->versions, 0);
+      VEC_free (void_p, heap, decl_v->versions);
+      decl_v->versions = NULL;
+    }
+}
+
+/* Mark DECL1 and DECL2 to be function versions in the same group.  One
+   of DECL1 and DECL2 must be the default, otherwise this function does
+   nothing.  This function aggregates the versions.  */
+
+int
+group_function_versions (const tree decl1, const tree decl2)
+{
+  tree default_decl, version_decl;
+  version_function *default_v, *version_v;
+
+  gcc_assert (DECL_FUNCTION_VERSIONED (decl1)
+	      && DECL_FUNCTION_VERSIONED (decl2));
+
+  /* The version decls are added only to the default decl.  */
+  if (!is_default_function (decl1)
+      && !is_default_function (decl2))
+    return 0;
+
+  /* This can happen with duplicate declarations.  Just ignore.  */
+  if (is_default_function (decl1)
+      && is_default_function (decl2))
+    return 0;
+
+  default_decl = (is_default_function (decl1)) ? decl1 : decl2;
+  version_decl = (default_decl == decl1) ? decl2 : decl1;
+
+  gcc_assert (default_decl != version_decl);
+  create_decl_version_htab ();
+
+  /* If the version function is found, it has been added.  */
+  if (find_function_version (version_decl))
+    return 0;
+
+  default_v = add_function_version (default_decl);
+  version_v = add_function_version (version_decl);
+
+  if (default_v->versions == NULL)
+    default_v->versions = VEC_alloc (void_p, heap, 1);
+
+  push_function_version (version_v, &default_v->versions);
+  return 0;
+}
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* Return a new name by appending SUFFIX to the DECL name.  If
+   make_unique is true, append the full path name.  */
+
+static char *
+make_name (tree decl, const char *suffix, bool make_unique)
+{
+  char *global_var_name;
+  int name_len;
+  const char *name;
+  const char *unique_name = NULL;
+
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* Get a unique name that can be used globally without any chances
+     of collision at link time.  */
+  if (make_unique)
+    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
+
+  name_len = strlen (name) + strlen (suffix) + 2;
+
+  if (make_unique)
+    name_len += strlen (unique_name) + 1;
+  global_var_name = (char *) xmalloc (name_len);
+
+  /* Use '.' to concatenate names as it is demangler friendly.  */
+  if (make_unique)
+      snprintf (global_var_name, name_len, "%s.%s.%s", name,
+		unique_name, suffix);
+  else
+      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
+
+  return global_var_name;
+}
+
+/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch
+   the versions of multi-versioned function DEFAULT_DECL.  Create and
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_ifunc_resolver_func (const tree default_decl,
+			  const tree ifunc_decl,
+			  basic_block *empty_bb)
+{
+  char *resolver_name;
+  tree decl, type, decl_name, t;
+  basic_block new_bb;
+  tree old_current_function_decl;
+  bool make_unique = false;
+
+  /* IFUNC's have to be globally visible.  So, if the default_decl is
+     not, then the name of the IFUNC should be made unique.  */
+  if (TREE_PUBLIC (default_decl) == 0)
+    make_unique = true;
+
+  /* Append the filename to the resolver function if the versions are
+     not externally visible.  This is because the resolver function has
+     to be externally visible for the loader to find it.  So, appending
+     the filename will prevent conflicts with a resolver function from
+     another module which is based on the same version name.  */
+  resolver_name = make_name (default_decl, "resolver", make_unique);
+
+  /* The resolver function should return a (void *). */
+  type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  decl_name = get_identifier (resolver_name);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = TREE_USED (default_decl);
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  /* IFUNC resolvers have to be externally visible.  */
+  TREE_PUBLIC (decl) = 1;
+  DECL_UNINLINABLE (decl) = 1;
+
+  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl);
+  DECL_EXTERNAL (ifunc_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+  TREE_READONLY (decl) = 0;
+  DECL_PURE_P (decl) = 0;
+  DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
+    }
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  current_function_decl = decl;
+  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
+  cfun->curr_properties |=
+    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
+     PROP_ssa);
+  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
+  *empty_bb = new_bb;
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
+  cgraph_mark_force_output_node (cgraph_get_create_node (decl));
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      gcc_assert (cgraph_get_node (default_decl));
+      symtab_add_to_same_comdat_group (
+	(symtab_node) cgraph_get_node (decl),
+	(symtab_node) cgraph_get_node (default_decl));
+    }
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  gcc_assert (ifunc_decl != NULL);
+  /* Mark ifunc_decl as "ifunc" with resolver as resolver_name.  */
+  DECL_ATTRIBUTES (ifunc_decl) 
+    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl));
+
+  /* Create the alias here.  */
+  cgraph_create_function_alias (ifunc_decl, decl);
+  return decl;
+}
+
+/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to
+   DECL function will be replaced with calls to the ifunc.   Return the decl
+   of the ifunc created.  */
+
+static tree
+make_ifunc_func (const tree decl)
+{
+  tree ifunc_decl;
+  char *ifunc_name, *resolver_name;
+  tree fn_type, ifunc_type;
+  bool make_unique = false;
+
+  if (TREE_PUBLIC (decl) == 0)
+    make_unique = true;
+
+  ifunc_name = make_name (decl, "ifunc", make_unique);
+  resolver_name = make_name (decl, "resolver", make_unique);
+  gcc_assert (resolver_name);
+
+  fn_type = TREE_TYPE (decl);
+  ifunc_type = build_function_type (TREE_TYPE (fn_type),
+				    TYPE_ARG_TYPES (fn_type));
+  
+  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type);
+  TREE_USED (ifunc_decl) = 1;
+  DECL_CONTEXT (ifunc_decl) = NULL_TREE;
+  DECL_INITIAL (ifunc_decl) = error_mark_node;
+  DECL_ARTIFICIAL (ifunc_decl) = 1;
+  /* Mark this ifunc as external, the resolver will flip it again if
+     it gets generated.  */
+  DECL_EXTERNAL (ifunc_decl) = 1;
+  /* IFUNCs have to be externally visible.  */
+  TREE_PUBLIC (ifunc_decl) = 1;
+
+  return ifunc_decl;  
+}
+
+/* For multi-versioned function decl, which should also be the default,
+   return the decl of the ifunc resolver, create it if it does not
+   exist.  */
+
+tree
+get_ifunc_for_version (const tree decl)
+{
+  version_function *decl_v;
+  int ix;
+  void_p ele;
+
+  /* DECL has to be the default version, otherwise it is missing and
+     that is not allowed.  */
+  if (!is_default_function (decl))
+    {
+      error_at (DECL_SOURCE_LOCATION (decl), "Default version not found");
+      return decl;
+    }
+
+  decl_v = find_function_version (decl);
+  gcc_assert (decl_v != NULL);
+  if (decl_v->ifunc_decl == NULL)
+    {
+      tree ifunc_decl;
+      ifunc_decl = make_ifunc_func (decl);
+      decl_v->ifunc_decl = ifunc_decl;
+    }
+
+  if (cgraph_get_node (decl))
+    cgraph_mark_force_output_node (cgraph_get_node (decl));
+
+  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
+    {
+      version_function *v = (version_function *) ele;
+      /* This could be a deleted version.  Happens with
+	 duplicate declarations. */
+      if (v->is_deleted)
+	continue;
+      gcc_assert (v->decl != NULL);
+      if (cgraph_get_node (v->decl))
+	cgraph_mark_force_output_node (cgraph_get_node (v->decl));
+    }
+
+  return decl_v->ifunc_decl;
+}
+
+/* Generate the dispatching code to dispatch multi-versioned function
+   DECL.  Make a new function decl for dispatching and call the target
+   hook to process the "target" attributes and provide the code to
+   dispatch the right function at run-time.  */
+
+static tree
+make_ifunc_resolver_for_version (const tree decl)
+{
+  version_function *decl_v;
+  tree ifunc_resolver_decl, ifunc_decl;
+  basic_block empty_bb;
+  int ix;
+  void_p ele;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  tree old_current_function_decl;
+
+  gcc_assert (is_default_function (decl));
+
+  decl_v = find_function_version (decl);
+  gcc_assert (decl_v != NULL);
+
+  if (decl_v->ifunc_resolver_decl != NULL)
+    return decl_v->ifunc_resolver_decl;
+
+  ifunc_decl = decl_v->ifunc_decl;
+
+  if (ifunc_decl == NULL)
+    ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl);
+
+  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl,
+						  &empty_bb);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (ifunc_resolver_decl));
+  current_function_decl = ifunc_resolver_decl;
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+  VEC_safe_push (tree, heap, fn_ver_vec, decl);
+
+  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
+    {
+      version_function *v = (version_function *) ele;
+      gcc_assert (v->decl != NULL);
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (v->decl))
+        error_at (DECL_SOURCE_LOCATION (v->decl),
+		  "Virtual function versioning not supported\n");
+      if (!v->is_deleted)
+	VEC_safe_push (tree, heap, fn_ver_vec, v->decl);
+    }
+
+  gcc_assert (targetm.dispatch_version);
+  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb);
+  decl_v->ifunc_resolver_decl = ifunc_resolver_decl;
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+  return ifunc_resolver_decl;
+}
+
+/* Main entry point to pass_dispatch_versions. For multi-versioned functions,
+   generate the dispatching code.  */
+
+static unsigned int
+do_dispatch_versions (void)
+{
+  /* A new pass for generating dispatch code for multi-versioned functions.
+     Other forms of dispatch can be added when ifunc support is not available
+     like just calling the function directly after checking for target type.
+     Currently, dispatching is done through IFUNC.  This pass will become
+     more meaningful when other dispatch mechanisms are added.  */
+
+  /* Cloning a function to produce more versions will happen here when the
+     user requests that via the target attribute. For example,
+     int foo () __attribute__ ((target(("arch=core2"), ("arch=corei7"))));
+     means that the user wants the same body of foo to be versioned for core2
+     and corei7.  In that case, this function will be cloned during this
+     pass.  */
+  
+  if (DECL_FUNCTION_VERSIONED (current_function_decl)
+      && is_default_function (current_function_decl))
+    {
+      tree decl = make_ifunc_resolver_for_version (current_function_decl);
+      if (dump_file && decl)
+	dump_function_to_file (decl, dump_file, TDF_BLOCKS);
+    }
+  return 0;
+}
+
+static  bool
+gate_dispatch_versions (void)
+{
+  return true;
+}
+
+/* A pass to generate the dispatch code to execute the appropriate version
+   of a multi-versioned function at run-time.  */
+
+struct gimple_opt_pass pass_dispatch_versions =
+{
+ {
+  GIMPLE_PASS,
+  "dispatch_multiversion_functions",    /* name */
+  gate_dispatch_versions,		/* gate */
+  do_dispatch_versions,			/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_MULTIVERSION_DISPATCH,		/* tv_id */
+  PROP_cfg,				/* properties_required */
+  PROP_cfg,				/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  0					/* todo_flags_finish */
+ }
+};
Index: gcc/cgraphunit.c
===================================================================
--- gcc/cgraphunit.c	(revision 187371)
+++ gcc/cgraphunit.c	(working copy)
@@ -420,6 +420,13 @@ cgraph_finalize_function (tree decl, bool nested)
       && !DECL_DISREGARD_INLINE_LIMITS (decl))
     node->symbol.force_output = 1;
 
+  /* With function versions, keep inline functions and do not worry about
+     inline limits.  */
+  if (DECL_FUNCTION_VERSIONED (decl)
+      && DECL_DECLARED_INLINE_P (decl)
+      && !DECL_EXTERNAL (decl))
+    node->symbol.force_output = 1;
+
   /* When not optimizing, also output the static functions. (see
      PR24561), but don't do so for always_inline functions, functions
      declared inline and nested functions.  These were optimized out
Index: gcc/multiversion.h
===================================================================
--- gcc/multiversion.h	(revision 0)
+++ gcc/multiversion.h	(revision 0)
@@ -0,0 +1,55 @@
+/* Function Multiversioning.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+/* This is the header file which provides the functions to keep track
+   of functions that are multi-versioned and to generate the dispatch
+   code to call the right version at run-time.  */
+
+#ifndef GCC_MULTIVERSION_H
+#define GCC_MULTIVERION_H
+
+#include "tree.h"
+
+/* Mark DECL1 and DECL2 as function versions.  */
+int group_function_versions (const tree decl1, const tree decl2);
+
+/* Mark DECL as deleted and no longer a version.  */
+void mark_delete_decl_version (const tree decl);
+
+/* Returns true if DECL is the default version to be executed if all
+   other versions are inappropriate at run-time.  */
+bool is_default_function (const tree decl);
+
+/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL
+   must be the default function in the multi-versioned group.  */
+tree get_ifunc_for_version (const tree decl);
+
+/* Returns true when only one of DECL1 and DECL2 is marked with "target"
+   or if the "target" attribute strings of  DECL1 and DECL2 dont match.  */
+bool has_different_version_attributes (const tree decl1, const tree decl2);
+
+/* Function DECL is marked to be a multi-versioned function.  If DECL is
+   not the default version, the assembler name of DECL is changed to include
+   the attribute string to keep the name unambiguous.  */
+void mark_function_as_version (const tree decl);
+
+/* Check if decl is FUNCTION_DECL with target attribute set.  */
+bool is_target_attribute_set (const tree decl);
+#endif
Index: gcc/testsuite/g++.dg/mv1.C
===================================================================
--- gcc/testsuite/g++.dg/mv1.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv1.C	(revision 0)
@@ -0,0 +1,200 @@
+/* Test case to check if Multiversioning works.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -fPIC" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The other versions of foo.  Mix up the ordering and 
+   check if the dispatching does it in the order of priority. */
+/* Check combination of target attributes.  */
+int foo () __attribute__ ((target("arch=corei7,popcnt")));
+/* The target operands in this declaration and the definition are re-ordered.
+   This should still work.  */
+int foo () __attribute__ ((target("ssse3,avx2")));
+
+/* Check for all target attributes for which dispatchers are available.  */
+/* Check arch= */
+int foo () __attribute__((target("arch=core2")));
+int foo () __attribute__((target("arch=corei7")));
+int foo () __attribute__((target("arch=atom")));
+int foo () __attribute__((target("arch=amdfam10")));
+int foo () __attribute__((target("arch=bdver1")));
+int foo () __attribute__((target("arch=bdver2")));
+/* Check ISAs  */
+int foo () __attribute__((target("sse3")));
+int foo () __attribute__((target("sse2")));
+int foo () __attribute__((target("sse")));
+int foo () __attribute__((target("avx")));
+int foo () __attribute__((target("sse4.2")));
+int foo () __attribute__((target("popcnt")));
+int foo () __attribute__((target("sse4.1")));
+int foo () __attribute__((target("ssse3")));
+int foo () __attribute__((target("mmx")));
+int foo () __attribute__((target("avx2")));
+int (*p)() = &foo;
+int main ()
+{
+  int val = foo ();
+  assert (val ==  (*p)());
+
+  /* Check in the exact same order in which the dispatching
+     is expected to happen.  */
+  if (__builtin_cpu_is ("bdver1"))
+    assert (val == 1);
+  else if (__builtin_cpu_is ("bdver2"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("avx2")
+	   && __builtin_cpu_supports ("ssse3"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("avx2"))
+    assert (val == 4);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 5);
+  else if (__builtin_cpu_is ("corei7")
+	   && __builtin_cpu_supports ("popcnt"))
+    assert (val == 6);
+  else if (__builtin_cpu_supports ("popcnt"))
+    assert (val == 7);
+  else if (__builtin_cpu_is ("corei7"))
+    assert (val == 8);
+  else if (__builtin_cpu_supports ("sse4.2"))
+    assert (val == 9);
+  else if (__builtin_cpu_supports ("sse4.1"))
+    assert (val == 10);
+  else if (__builtin_cpu_is ("amdfam10h"))
+    assert (val == 11);
+  else if (__builtin_cpu_is ("core2"))
+    assert (val == 12);
+  else if (__builtin_cpu_is ("atom"))
+    assert (val == 13);
+  else if (__builtin_cpu_supports ("ssse3"))
+    assert (val == 14);
+  else if (__builtin_cpu_supports ("sse3"))
+    assert (val == 15);
+  else if (__builtin_cpu_supports ("sse2"))
+    assert (val == 16);
+  else if (__builtin_cpu_supports ("sse"))
+    assert (val == 17);
+  else if (__builtin_cpu_supports ("mmx"))
+    assert (val == 18);
+  else
+    assert (val == 0);
+  
+  return 0;
+}
+
+int foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=bdver1")))
+foo ()
+{
+  return 1;
+}
+
+int __attribute__ ((target("arch=bdver2")))
+foo ()
+{
+  return 2;
+}
+
+int __attribute__ ((target("avx2,ssse3")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("arch=corei7,popcnt")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("arch=corei7")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("arch=amdfam10")))
+foo ()
+{
+  return 11;
+}
+
+int __attribute__ ((target("arch=core2")))
+foo ()
+{
+  return 12;
+}
+
+int __attribute__ ((target("arch=atom")))
+foo ()
+{
+  return 13;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return 14;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return 15;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return 16;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return 17;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return 18;
+}
Index: gcc/cp/class.c
===================================================================
--- gcc/cp/class.c	(revision 187371)
+++ gcc/cp/class.c	(working copy)
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-dump.h"
 #include "splay-tree.h"
 #include "pointer-set.h"
+#include "multiversion.h"
 
 /* The number of nested classes being processed.  If we are not in the
    scope of any class, this is zero.  */
@@ -1093,7 +1094,21 @@ add_method (tree type, tree method, tree using_dec
 	      || same_type_p (TREE_TYPE (fn_type),
 			      TREE_TYPE (method_type))))
 	{
-	  if (using_decl)
+	  /* For function versions, their parms and types match
+	     but they are not duplicates.  Record function versions
+	     as and when they are found.  */
+	  if (TREE_CODE (fn) == FUNCTION_DECL
+	      && TREE_CODE (method) == FUNCTION_DECL
+	      && (is_target_attribute_set (fn)
+		  || is_target_attribute_set (method))
+	      && has_different_version_attributes (fn, method))
+ 	    {
+	      mark_function_as_version (fn);
+	      mark_function_as_version (method);
+	      group_function_versions (fn, method);
+	      continue;
+	    }
+	  else if (using_decl)
 	    {
 	      if (DECL_CONTEXT (fn) == type)
 		/* Defer to the local function.  */
@@ -1151,6 +1166,7 @@ add_method (tree type, tree method, tree using_dec
   else
     /* Replace the current slot.  */
     VEC_replace (tree, method_vec, slot, overload);
+
   return true;
 }
 
@@ -6928,8 +6944,11 @@ resolve_address_of_overloaded_function (tree targe
 	  if (DECL_ANTICIPATED (fn))
 	    continue;
 
-	  /* See if there's a match.  */
-	  if (same_type_p (target_fn_type, static_fn_type (fn)))
+	  /* See if there's a match.   For functions that are multi-versioned
+	     match it to the default function.  */
+	  if (same_type_p (target_fn_type, static_fn_type (fn))
+	      && (!DECL_FUNCTION_VERSIONED (fn)
+		  || is_default_function (fn)))
 	    matches = tree_cons (fn, NULL_TREE, matches);
 	}
     }
@@ -7091,6 +7110,22 @@ resolve_address_of_overloaded_function (tree targe
       perform_or_defer_access_check (access_path, fn, fn);
     }
 
+  /* If a pointer to a function that is multi-versioned is requested, the
+     pointer to the dispatcher function is returned instead.  This works
+     well because indirectly calling the function will dispatch the right
+     function version at run-time. Also, the function address is kept
+     unique.  */
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && is_default_function (fn))
+    {
+      tree ifunc_decl;
+      ifunc_decl = get_ifunc_for_version (fn);
+      retrofit_lang_decl (ifunc_decl);
+      gcc_assert (ifunc_decl != NULL);
+      mark_used (fn);
+      return build_fold_addr_expr (ifunc_decl);
+    }
+
   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
     return cp_build_addr_expr (fn, flags);
   else
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	(revision 187371)
+++ gcc/cp/decl.c	(working copy)
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "pointer-set.h"
 #include "splay-tree.h"
 #include "plugin.h"
+#include "multiversion.h"
 
 /* Possible cases of bad specifiers type used by bad_specifiers. */
 enum bad_spec_place {
@@ -973,6 +974,21 @@ decls_match (tree newdecl, tree olddecl)
       if (t1 != t2)
 	return 0;
 
+      /* The decls dont match if they correspond to two different versions
+	 of the same function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) 
+	  && has_different_version_attributes (newdecl, olddecl))
+	{
+	  /* One of the decls could be the default without the "target"
+	     attribute. Set it to be a versioned function here.  */
+	  mark_function_as_version (newdecl);
+	  mark_function_as_version (olddecl);
+	  /* Accumulate all the versions of a function.  */
+	  group_function_versions (olddecl, newdecl);
+	  return 0;
+	}
+
       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
 	  && ! (DECL_EXTERN_C_P (newdecl)
 		&& DECL_EXTERN_C_P (olddecl)))
@@ -1490,7 +1506,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
 	      error ("previous declaration %q+#D here", olddecl);
 	      return NULL_TREE;
 	    }
-	  else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
+	  /* For function versions, params and types match, but they
+	     are not ambiguous.  */
+	  else if ((!DECL_FUNCTION_VERSIONED (newdecl)
+		    && !DECL_FUNCTION_VERSIONED (olddecl))
+		   && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
 			      TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
 	    {
 	      error ("new declaration %q#D", newdecl);
@@ -2262,6 +2282,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
+  /* If the olddecl is a version, so is the newdecl.  */
+  if (TREE_CODE (newdecl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (olddecl))
+    {
+      DECL_FUNCTION_VERSIONED (newdecl) = 1;
+      /* Record that newdecl is not a valid version and has
+	 been deleted.  */
+      mark_delete_decl_version (newdecl);
+    }
+
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
     {
       int function_size;
@@ -3810,6 +3840,7 @@ cp_make_fname_decl (location_t loc, tree id, int t
 			    ? NULL : fname_as_string (type_dep));
   tree type;
   tree init = cp_fname_init (name, &type);
+
   tree decl = build_decl (loc, VAR_DECL, id, type);
 
   if (name)
@@ -14036,7 +14067,11 @@ cxx_comdat_group (tree decl)
 	  else
 	    break;
 	}
-      name = DECL_ASSEMBLER_NAME (decl);
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl))
+	name = DECL_NAME (decl);
+      else
+        name = DECL_ASSEMBLER_NAME (decl);
     }
 
   return name;
Index: gcc/cp/semantics.c
===================================================================
--- gcc/cp/semantics.c	(revision 187371)
+++ gcc/cp/semantics.c	(working copy)
@@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn)
       /* If the user wants us to keep all inline functions, then mark
 	 this function as needed so that finish_file will make sure to
 	 output it later.  Similarly, all dllexport'd functions must
-	 be emitted; there may be callers in other DLLs.  */
-      if ((flag_keep_inline_functions
+	 be emitted; there may be callers in other DLLs.
+	 Also, mark this function as needed if it is marked inline but
+	 is a multi-versioned function.  */
+      if (((flag_keep_inline_functions
+	    || DECL_FUNCTION_VERSIONED (fn))
 	   && DECL_DECLARED_INLINE_P (fn)
 	   && !DECL_REALLY_EXTERN (fn))
 	  || (flag_keep_inline_dllexport
Index: gcc/cp/decl2.c
===================================================================
--- gcc/cp/decl2.c	(revision 187371)
+++ gcc/cp/decl2.c	(working copy)
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "splay-tree.h"
 #include "langhooks.h"
 #include "c-family/c-ada-spec.h"
+#include "multiversion.h"
 
 extern cpp_reader *parse_in;
 
@@ -677,9 +678,13 @@ check_classfn (tree ctype, tree function, tree tem
 	  if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
 	    continue;
 
+	  /* While finding a match, same types and params are not enough
+	     if the function is versioned.  Also check version ("target")
+	     attributes.  */
 	  if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
 			   TREE_TYPE (TREE_TYPE (fndecl)))
 	      && compparms (p1, p2)
+	      && !has_different_version_attributes (function, fndecl)
 	      && (!is_template
 		  || comp_template_parms (template_parms,
 					  DECL_TEMPLATE_PARMS (fndecl)))
Index: gcc/cp/call.c
===================================================================
--- gcc/cp/call.c	(revision 187371)
+++ gcc/cp/call.c	(working copy)
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "timevar.h"
+#include "multiversion.h"
 
 /* The various kinds of conversion.  */
 
@@ -3903,6 +3904,16 @@ build_new_function_call (tree fn, VEC(tree,gc) **a
     {
       if (complain & tf_error)
 	{
+	  /* If the call is to a multiversioned function without
+	     a default version, overload resolution will fail.  */
+	  if (candidates
+	      && TREE_CODE (candidates->fn) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (candidates->fn))
+	    error_at (location_of (DECL_NAME (OVL_CURRENT (fn))),
+		      "Call to multiversioned function %<%D(%A)%> with"
+		      " no default version", DECL_NAME (OVL_CURRENT (fn)),
+		      build_tree_list_vec (*args));
+
 	  if (!any_viable_p && candidates && ! candidates->next
 	      && (TREE_CODE (candidates->fn) == FUNCTION_DECL))
 	    return cp_build_function_call_vec (candidates->fn, args, complain);
@@ -6824,6 +6835,19 @@ build_over_call (struct z_candidate *cand, int fla
   if (!already_used)
     mark_used (fn);
 
+  /* For a call to a multi-versioned function, the call should actually be to
+     the dispatcher.  */
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && is_default_function (fn))
+    {
+      tree ifunc_decl;
+      ifunc_decl = get_ifunc_for_version (fn);
+      retrofit_lang_decl (ifunc_decl);
+      gcc_assert (ifunc_decl != NULL);
+      return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl,
+					nargs, argarray);
+    }
+
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
     {
       tree t;
@@ -8081,6 +8105,60 @@ joust (struct z_candidate *cand1, struct z_candida
   size_t i;
   size_t len;
 
+  /* For Candidates of a multi-versioned function, first check if the
+     target flags of the caller match any of the candidates. If so,
+     the caller can directly call this candidate otherwise the one marked
+     default wins.  This is because the default decl is used as key to
+     aggregate all the other versions provided for it in multiversion.c.
+     When generating the actual call, the appropriate dispatcher is created
+     to call the right function version at run-time.  */
+
+  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL
+       && DECL_FUNCTION_VERSIONED (cand1->fn))
+      ||(TREE_CODE (cand2->fn) == FUNCTION_DECL
+	 && DECL_FUNCTION_VERSIONED (cand2->fn)))
+    {
+      /* Both functions must be marked versioned.  */
+      gcc_assert (DECL_FUNCTION_VERSIONED (cand1->fn)
+		  && DECL_FUNCTION_VERSIONED (cand2->fn));
+
+      /* Try to see if a direct call can be made to a version.  This is
+	 possible if the caller and callee have the same target flags.
+	 If cand->fn is marked with target attributes,  check if the
+	 target approves inlining this into the caller.  If so, this is
+	 the version we want.  */
+
+      if (is_target_attribute_set (cand1->fn)
+	  && targetm.target_option.can_inline_p (current_function_decl,
+						 cand1->fn))
+	return 1;
+
+      if (is_target_attribute_set (cand2->fn)
+	  && targetm.target_option.can_inline_p (current_function_decl,
+						 cand2->fn))
+	return -1;
+
+      /* A direct call to a version is not possible, so find the default
+	 function and return it.  This will later be converted to dispatch
+	 the right version at run time.  */
+
+      if (is_default_function (cand1->fn))
+	{
+          mark_used (cand2->fn);
+	  return 1;
+	}
+
+      if (is_default_function (cand2->fn))
+	{
+          mark_used (cand1->fn);
+	  return -1;
+	}
+
+      /* If a default function is absent, this will never get resolved leading
+	 to an ambiguous call error.  */
+      return 0;
+    }
+
   /* Candidates that involve bad conversions are always worse than those
      that don't.  */
   if (cand1->viable > cand2->viable)
Index: gcc/timevar.def
===================================================================
--- gcc/timevar.def	(revision 187371)
+++ gcc/timevar.def	(working copy)
@@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE        , "tree if-co
 DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
 DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
 DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
+DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch")
 
 /* Everything else in rest_of_compilation not included above.  */
 DEFTIMEVAR (TV_EARLY_LOCAL	     , "early local passes")
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 187371)
+++ gcc/Makefile.in	(working copy)
@@ -1297,6 +1297,7 @@ OBJS = \
 	mcf.o \
 	mode-switching.o \
 	modulo-sched.o \
+	multiversion.o \
 	omega.o \
 	omp-low.o \
 	optabs.o \
@@ -3042,6 +3043,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h
    $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
    $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
    $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
+multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+   $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
+   $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
+   $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
+   $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h
 cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 187371)
+++ gcc/passes.c	(working copy)
@@ -1293,6 +1293,7 @@ init_optimization_passes (void)
   NEXT_PASS (pass_build_cfg);
   NEXT_PASS (pass_warn_function_return);
   NEXT_PASS (pass_build_cgraph_edges);
+  NEXT_PASS (pass_dispatch_versions);
   *p = NULL;
 
   /* Interprocedural optimization passes.  */
Index: gcc/cp/mangle.c
===================================================================
--- gcc/cp/mangle.c	(revision 187371)
+++ gcc/cp/mangle.c	(working copy)
@@ -1245,7 +1245,12 @@ write_unqualified_name (const tree decl)
     {
       MANGLE_TRACE_TREE ("local-source-name", decl);
       write_char ('L');
-      write_source_name (DECL_NAME (decl));
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && DECL_ASSEMBLER_NAME_SET_P (decl))
+	write_source_name (DECL_ASSEMBLER_NAME (decl));
+      else
+	write_source_name (DECL_NAME (decl));
       /* The default discriminator is 1, and that's all we ever use,
 	 so there's no code to output one here.  */
     }
@@ -1260,7 +1265,14 @@ write_unqualified_name (const tree decl)
                && LAMBDA_TYPE_P (type))
         write_closure_type_name (type);
       else
-        write_source_name (DECL_NAME (decl));
+	{
+	  if (TREE_CODE (decl) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (decl)
+	      && DECL_ASSEMBLER_NAME_SET_P (decl))
+	    write_source_name (DECL_ASSEMBLER_NAME (decl));
+	  else
+	    write_source_name (DECL_NAME (decl));
+	}
     }
 }
 
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 187371)
+++ gcc/config/i386/i386.c	(working copy)
@@ -27664,6 +27664,438 @@ ix86_init_mmx_sse_builtins (void)
     }
 }
 
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
+   to return a pointer to VERSION_DECL if the outcome of the expression
+   formed by PREDICATE_CHAIN is true.  This function will be called during
+   version dispatch to decide which function version to execute.  It returns
+   the basic block at the end to which more conditions can be added.  */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     tree predicate_chain, basic_block new_bb)
+{
+  gimple return_stmt;
+  tree convert_expr, result_var;
+  gimple convert_stmt;
+  gimple call_cond_stmt;
+  gimple if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+
+  tree cond_var, and_expr_var = NULL_TREE;
+  gimple_seq gseq;
+
+  tree old_current_function_decl;
+  tree predicate_decl, predicate_arg;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+  current_function_decl = function_decl;
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node, NULL);
+  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  return_stmt = gimple_build_return (result_var);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      current_function_decl = old_current_function_decl;
+      return new_bb;
+    }
+
+  while (predicate_chain != NULL)
+    {
+      cond_var = create_tmp_var (integer_type_node, NULL);
+      predicate_decl = TREE_PURPOSE (predicate_chain);
+      predicate_arg = TREE_VALUE (predicate_chain);
+      call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+      gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+      gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+      gimple_set_bb (call_cond_stmt, new_bb);
+      gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+      predicate_chain = TREE_CHAIN (predicate_chain);
+      
+      if (and_expr_var == NULL)
+        and_expr_var = cond_var;
+      else
+	{
+	  gimple assign_stmt;
+	  /* Use MIN_EXPR to check if any integer is zero?.
+	     and_expr_var = min_expr <cond_var, and_expr_var>  */
+	  assign_stmt = gimple_build_assign (and_expr_var,
+			  build2 (MIN_EXPR, integer_type_node,
+				  cond_var, and_expr_var));
+
+	  gimple_set_block (assign_stmt, DECL_INITIAL (function_decl));
+	  gimple_set_bb (assign_stmt, new_bb);
+	  gimple_seq_add_stmt (&gseq, assign_stmt);
+	}
+    }
+
+  if_else_stmt = gimple_build_cond (GT_EXPR, and_expr_var,
+	  		            integer_zero_node,
+				    NULL_TREE, NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR, 0);
+
+  rebuild_cgraph_edges ();
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  return bb3;
+}
+
+/* This parses the attribute arguments to target in DECL and determines
+   the right builtin to use to match the platform specification.
+   For now, only one target argument ("arch=" or "<-m>xxx") is allowed.  */
+
+static tree 
+get_builtin_code_for_version (tree decl, unsigned int *priority)
+{
+  tree attrs;
+  struct cl_target_option cur_target;
+  tree target_node;
+  struct cl_target_option *new_target;
+  const char *arg_str = NULL;
+  const char *attrs_str = NULL;
+  char *tok_str = NULL;
+  char *token;
+
+  /* Priority of i386 features, greater value is higher priority.   This is
+     used to decide the order in which function dispatch must happen.  For
+     instance, a version specialized for SSE4.2 should be checked for dispatch
+     before a version for SSE3, as SSE4.2 implies SSE3.  */
+  enum feature_priority
+  {
+    P_ZERO = 0,
+    P_MMX,
+    P_SSE,
+    P_SSE2,
+    P_SSE3,
+    P_SSSE3,
+    P_PROC_SSSE3,
+    P_SSE4_a,
+    P_PROC_SSE4_a,
+    P_SSE4_1,
+    P_SSE4_2,
+    P_PROC_SSE4_2,
+    P_POPCNT,
+    P_AVX,
+    P_AVX2,
+    P_FMA,
+    P_PROC_FMA
+  };
+
+  /* These are the target attribute strings for which a dispatcher is
+     available, from fold_builtin_cpu.  */
+
+  static struct _feature_list
+    {
+      const char *const name;
+      const enum feature_priority priority;
+    }
+  const feature_list[] =
+    {
+      {"mmx", P_MMX},
+      {"sse", P_SSE},
+      {"sse2", P_SSE2},
+      {"sse3", P_SSE3},
+      {"ssse3", P_SSSE3},
+      {"sse4.1", P_SSE4_1},
+      {"sse4.2", P_SSE4_2},
+      {"popcnt", P_POPCNT},
+      {"avx", P_AVX},
+      {"avx2", P_AVX2}
+    };
+
+
+  static unsigned int NUM_FEATURES
+    = sizeof (feature_list) / sizeof (struct _feature_list);
+
+  unsigned int i;
+
+  tree predicate_chain = NULL_TREE;
+  tree predicate_decl, predicate_arg;
+
+  attrs = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  gcc_assert (attrs != NULL);
+
+  attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+  gcc_assert (TREE_CODE (attrs) == STRING_CST);
+  attrs_str = TREE_STRING_POINTER (attrs);
+
+  *priority = 0;
+
+  /* Handle arch= if specified.  For priority, set it to be 1 more than
+     the best instruction set the processor can handle.  For instance, if
+     there is a version for atom and a version for ssse3 (the highest ISA
+     priority for atom), the atom version must be checked for dispatch
+     before the ssse3 version. */
+  if (strstr (attrs_str, "arch=") != NULL)
+    {
+      cl_target_option_save (&cur_target, &global_options);
+      target_node = ix86_valid_target_attribute_tree (attrs);
+    
+      gcc_assert (target_node);
+      new_target = TREE_TARGET_OPTION (target_node);
+      gcc_assert (new_target);
+      
+      if (new_target->arch_specified && new_target->arch > 0)
+	{
+	  switch (new_target->arch)
+	    {
+	    case PROCESSOR_CORE2_32:
+	    case PROCESSOR_CORE2_64:
+	      arg_str = "core2";
+	      *priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_COREI7_32:
+	    case PROCESSOR_COREI7_64:
+	      arg_str = "corei7";
+	      *priority = P_PROC_SSE4_2;
+	      break;
+	    case PROCESSOR_ATOM:
+	      arg_str = "atom";
+	      *priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_AMDFAM10:
+	      arg_str = "amdfam10h";
+	      *priority = P_PROC_SSE4_a;
+	      break;
+	    case PROCESSOR_BDVER1:
+	      arg_str = "bdver1";
+	      *priority = P_PROC_FMA;
+	      break;
+	    case PROCESSOR_BDVER2:
+	      arg_str = "bdver2";
+	      *priority = P_PROC_FMA;
+	      break;
+	    }  
+	}    
+    
+      cl_target_option_restore (&global_options, &cur_target);
+      if (arg_str == NULL)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+	    	"No dispatcher found for the versioning attributes");
+	  return NULL;
+	}
+    
+      predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
+      /* For a C string literal the length includes the trailing NULL.  */
+      predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+      predicate_chain = tree_cons (predicate_decl, predicate_arg,
+				   predicate_chain);
+    }
+
+  /* Process feature name.  */
+  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
+  strcpy (tok_str, attrs_str);
+  token = strtok (tok_str, ",");
+  predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
+
+  while (token != NULL)
+    {
+      /* Do not process "arch="  */
+      if (strncmp (token, "arch=", 5) == 0)
+	{
+	  token = strtok (NULL, ",");
+	  continue;
+	}
+      for (i = 0; i < NUM_FEATURES; ++i)
+	{
+	  if (strcmp (token, feature_list[i].name) == 0)
+	    {
+	      predicate_arg = build_string_literal (
+				strlen (feature_list[i].name) + 1,
+				feature_list[i].name);
+	      predicate_chain = tree_cons (predicate_decl, predicate_arg,
+					   predicate_chain);
+	      /* Find the maximum priority feature.  */
+	      if (feature_list[i].priority > *priority)
+		*priority = feature_list[i].priority;
+
+	      break;
+	    }
+	}
+      if (i == NUM_FEATURES)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+		    "No dispatcher found for %s", token);
+	  return NULL;
+	}
+      token = strtok (NULL, ",");
+    }
+  free (tok_str);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+	        "No dispatcher found for the versioning attributes : %s",
+	        attrs_str);
+      return NULL;
+    }
+
+  predicate_chain = nreverse (predicate_chain);
+  return predicate_chain; 
+}
+ 
+static int
+feature_compare (const void *v1, const void *v2)
+{
+  typedef struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    } function_version_info;
+
+  const function_version_info c1 = *(const function_version_info *)v1;
+  const function_version_info c2 = *(const function_version_info *)v2;
+  return (c2.dispatch_priority - c1.dispatch_priority);
+}
+
+/* This is the target hook to generate the dispatch function for
+   multi-versioned functions.  DISPATCH_DECL is the function which will
+   contain the dispatch logic.  FNDECLS are the function choices for
+   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
+   in DISPATCH_DECL in which the dispatch code is generated.  */
+
+static int
+ix86_dispatch_version (tree dispatch_decl,
+		       void *fndecls_p,
+		       basic_block *empty_bb)
+{
+  tree default_decl;
+  gimple ifunc_cpu_init_stmt;
+  gimple_seq gseq;
+  tree old_current_function_decl;
+  int ix;
+  tree ele;
+  VEC (tree, heap) *fndecls;
+  unsigned int num_versions = 0;
+  unsigned int actual_versions = 0;
+  unsigned int i;
+
+  struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    }*function_version_info;
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /*fndecls_p is actually a vector.  */
+  fndecls = (VEC (tree, heap) *)fndecls_p;
+
+  /* Atleast one more version other than the default.  */
+  num_versions = VEC_length (tree, fndecls);
+  gcc_assert (num_versions >= 2);
+
+  function_version_info = (struct _function_version_info *)
+    xmalloc ((num_versions - 1) * sizeof (struct _function_version_info));
+
+  /* The first version in the vector is the default decl.  */
+  default_decl = VEC_index (tree, fndecls, 0);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
+  current_function_decl = dispatch_decl;
+
+  gseq = bb_seq (*empty_bb);
+  ifunc_cpu_init_stmt = gimple_build_call_vec (
+                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
+  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
+  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
+  set_bb_seq (*empty_bb, gseq);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+
+  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
+    {
+      tree version_decl = ele;
+      tree predicate_chain = NULL_TREE;
+      unsigned int priority;
+      /* Get attribute string, parse it and find the right predicate decl.
+         The predicate function could be a lengthy combination of many
+	 features, like arch-type and various isa-variants.  */
+      predicate_chain = get_builtin_code_for_version (version_decl, &priority);
+
+      if (predicate_chain == NULL_TREE)
+	continue;
+
+      actual_versions++;
+      function_version_info [ix - 1].version_decl = version_decl;
+      function_version_info [ix - 1].predicate_chain = predicate_chain;
+      function_version_info [ix - 1].dispatch_priority = priority;
+    }
+
+  /* Sort the versions according to descending order of dispatch priority.  The
+     priority is based on the ISA.  This is not a perfect solution.  There
+     could still be ambiguity.  If more than one function version is suitable
+     to execute,  which one should be dispatched?  In future, allow the user
+     to specify a dispatch  priority next to the version.  */
+  qsort (function_version_info, actual_versions,
+         sizeof (struct _function_version_info), feature_compare);
+
+  for  (i = 0; i < actual_versions; ++i)
+    *empty_bb = add_condition_to_bb (dispatch_decl,
+				     function_version_info[i].version_decl,
+				     function_version_info[i].predicate_chain,
+				     *empty_bb);
+
+  /* dispatch default version at the end.  */
+  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl,
+				   NULL, *empty_bb);
+
+  free (function_version_info);
+  return 0;
+}
+
 /* This builds the processor_model struct type defined in
    libgcc/config/i386/cpuinfo.c  */
 
@@ -39539,6 +39971,9 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN ix86_fold_builtin
 
+#undef TARGET_DISPATCH_VERSION
+#define TARGET_DISPATCH_VERSION ix86_dispatch_version
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
 
Index: gcc/cp/error.c
===================================================================
--- gcc/cp/error.c	(revision 187371)
+++ gcc/cp/error.c	(working copy)
@@ -1534,8 +1534,15 @@ dump_exception_spec (tree t, int flags)
 static void
 dump_function_name (tree t, int flags)
 {
-  tree name = DECL_NAME (t);
+  tree name;
 
+  /* For function versions, use the assembler name as the decl name is
+     the same for all versions.  */
+  if (DECL_FUNCTION_VERSIONED (t))
+    name = DECL_ASSEMBLER_NAME (t);
+  else
+    name = DECL_NAME (t);
+
   /* We can get here with a decl that was synthesized by language-
      independent machinery (e.g. coverage.c) in which case it won't
      have a lang_specific structure attached and DECL_CONSTRUCTOR_P

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-12  2:04                                       ` Sriraman Tallam
@ 2012-05-12 13:38                                         ` H.J. Lu
  2012-05-14 18:29                                           ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: H.J. Lu @ 2012-05-12 13:38 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi H.J.,
>
>   I have updated the patch to improve the dispatching method like we
> discussed. Each feature gets a priority now, and the dispatching is
> done in priority order. Please see i386.c for the changes.
>
> Patch also available for review here:  http://codereview.appspot.com/5752064
>

I think you need 3 tests:

1.  Only with ISA.
2.  Only with arch
3.  Mixed with ISA and arch

since test mixed ISA and arch may hide issues with ISA only or arch only.

-- 
H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-12 13:38                                         ` H.J. Lu
@ 2012-05-14 18:29                                           ` Sriraman Tallam
  2012-05-26  0:07                                             ` H.J. Lu
  2012-06-04 19:01                                             ` Sriraman Tallam
  0 siblings, 2 replies; 93+ messages in thread
From: Sriraman Tallam @ 2012-05-14 18:29 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

[-- Attachment #1: Type: text/plain, Size: 996 bytes --]

Hi H.J,

   Attaching new patch with 2 test cases, mv2.C checks ISAs only and
mv1.C checks ISAs and arches mixed. Right now, checking only arches is
not needed as they are mutually exclusive, any order should be fine.

Patch also available for review here:  http://codereview.appspot.com/5752064

Thanks,
-Sri.

On Sat, May 12, 2012 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi H.J.,
>>
>>   I have updated the patch to improve the dispatching method like we
>> discussed. Each feature gets a priority now, and the dispatching is
>> done in priority order. Please see i386.c for the changes.
>>
>> Patch also available for review here:  http://codereview.appspot.com/5752064
>>
>
> I think you need 3 tests:
>
> 1.  Only with ISA.
> 2.  Only with arch
> 3.  Mixed with ISA and arch
>
> since test mixed ISA and arch may hide issues with ISA only or arch only.
>
> --
> H.J.

[-- Attachment #2: mv_fe_patch.txt --]
[-- Type: text/plain, Size: 69812 bytes --]

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 187371)
+++ gcc/doc/tm.texi	(working copy)
@@ -10997,6 +10997,14 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb})
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 187371)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -10877,6 +10877,14 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@hook TARGET_DISPATCH_VERSION
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 187371)
+++ gcc/target.def	(working copy)
@@ -1249,6 +1249,15 @@ DEFHOOK
  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
  hook_tree_tree_int_treep_bool_null)
 
+/* Target hook to generate the dispatching code for calls to multi-versioned
+   functions.  DISPATCH_DECL is the function that will have the dispatching
+   logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the
+   basic bloc in DISPATCH_DECL which will contain the code.  */
+DEFHOOK
+(dispatch_version,
+ "",
+ int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL)
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 DEFHOOK
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 187371)
+++ gcc/tree.h	(working copy)
@@ -3528,6 +3528,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
 #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
 
+/* In FUNCTION_DECL, this is set if this function has other versions generated
+   using "target" attributes.  The default version is the one which does not
+   have any "target" attribute set. */
+#define DECL_FUNCTION_VERSIONED(NODE)\
+   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
+
 /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
    arguments/result/saved_tree fields by front ends.   It was either inherit
    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
@@ -3572,8 +3578,8 @@ struct GTY(()) tree_function_decl {
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
-
-  /* 1 bit left */
+  unsigned versioned_function : 1;
+  /* No bits left.  */
 };
 
 /* The source language of the translation-unit.  */
Index: gcc/tree-pass.h
===================================================================
--- gcc/tree-pass.h	(revision 187371)
+++ gcc/tree-pass.h	(working copy)
@@ -453,6 +453,7 @@ extern struct gimple_opt_pass pass_tm_memopt;
 extern struct gimple_opt_pass pass_tm_edges;
 extern struct gimple_opt_pass pass_split_functions;
 extern struct gimple_opt_pass pass_feedback_split_functions;
+extern struct gimple_opt_pass pass_dispatch_versions;
 
 /* IPA Passes */
 extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
Index: gcc/multiversion.c
===================================================================
--- gcc/multiversion.c	(revision 0)
+++ gcc/multiversion.c	(revision 0)
@@ -0,0 +1,833 @@
+/* Function Multiversioning.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+/* Holds the state for multi-versioned functions here. The front-end
+   updates the state as and when function versions are encountered.
+   This is then used to generate the dispatch code.  Also, the
+   optimization passes to clone hot paths involving versioned functions
+   will be done here.
+
+   Function versions are created by using the same function signature but
+   also tagging attribute "target" to specify the platform type for which
+   the version must be executed.  Here is an example:
+
+   int foo ()
+   {
+     printf ("Execute as default");
+     return 0;
+   }
+
+   int  __attribute__ ((target ("arch=corei7")))
+   foo ()
+   {
+     printf ("Execute for corei7");
+     return 0;
+   }
+   
+   int main ()
+   {
+     return foo ();
+   } 
+
+   The call to foo in main is replaced with a call to an IFUNC function that
+   contains the dispatch code to call the correct function version at
+   run-time.  */
+
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "tree-inline.h"
+#include "langhooks.h"
+#include "flags.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "toplev.h"
+#include "timevar.h"
+#include "params.h"
+#include "fibheap.h"
+#include "intl.h"
+#include "tree-pass.h"
+#include "hashtab.h"
+#include "coverage.h"
+#include "ggc.h"
+#include "tree-flow.h"
+#include "rtl.h"
+#include "ipa-prop.h"
+#include "basic-block.h"
+#include "toplev.h"
+#include "dbgcnt.h"
+#include "tree-dump.h"
+#include "output.h"
+#include "vecprim.h"
+#include "gimple-pretty-print.h"
+#include "ipa-inline.h"
+#include "target.h"
+#include "multiversion.h"
+
+typedef void * void_p;
+
+DEF_VEC_P (void_p);
+DEF_VEC_ALLOC_P (void_p, heap);
+
+/* Each function decl that is a function version gets an instance of this
+   structure.   Since this is called by the front-end, decl merging can
+   happen, where a decl created for a new declaration is merged with 
+   the old. In this case, the new decl is deleted and the IS_DELETED
+   field is set for the struct instance corresponding to the new decl.
+   IFUNC_DECL is the decl of the ifunc function for default decls.
+   IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS
+   is a vector containing the list of function versions  that are
+   the candidates for dispatch.  */
+
+typedef struct version_function_d {
+  tree decl;
+  tree ifunc_decl;
+  tree ifunc_resolver_decl;
+  VEC (void_p, heap) *versions;
+  bool is_deleted;
+} version_function;
+
+/* Hashmap has an entry for every function decl that has other function
+   versions.  For function decls that are the default, it also stores the
+   list of all the other function versions.  Each entry is a structure
+   of type version_function_d.  */
+static htab_t decl_version_htab = NULL;
+
+/* Hashtable helpers for decl_version_htab. */
+
+static hashval_t
+decl_version_htab_hash_descriptor (const void *p)
+{
+  const version_function *t = (const version_function *) p;
+  return htab_hash_pointer (t->decl);
+}
+
+/* Hashtable helper for decl_version_htab. */
+
+static int
+decl_version_htab_eq_descriptor (const void *p1, const void *p2)
+{
+  const version_function *t1 = (const version_function *) p1;
+  return htab_eq_pointer ((const void_p) t1->decl, p2);
+}
+
+/* Create the decl_version_htab.  */
+static void
+create_decl_version_htab (void)
+{
+  if (decl_version_htab == NULL)
+    decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor,
+				     decl_version_htab_eq_descriptor, NULL);
+}
+
+/* Creates an instance of version_function for decl DECL.  */
+
+static version_function*
+new_version_function (const tree decl)
+{
+  version_function *v;
+  v = (version_function *)xmalloc(sizeof (version_function));
+  v->decl = decl;
+  v->ifunc_decl = NULL;
+  v->ifunc_resolver_decl = NULL;
+  v->versions = NULL;
+  v->is_deleted = false;
+  return v;
+}
+
+/* Comparator function to be used in qsort routine to sort attribute
+   specification strings to "target".  */
+
+static int
+attr_strcmp (const void *v1, const void *v2)
+{
+  const char *c1 = *(char *const*)v1;
+  const char *c2 = *(char *const*)v2;
+  return strcmp (c1, c2);
+}
+
+/* STR is the argument to target attribute.  This function tokenizes
+   the comma separated arguments, sorts them and returns a string which
+   is a unique identifier for the comma separated arguments.  */
+
+static char *
+sorted_attr_string (const char *str)
+{
+  char **args = NULL;
+  char *attr_str, *ret_str;
+  char *attr = NULL;
+  unsigned int argnum = 1;
+  unsigned int i;
+
+  for (i = 0; i < strlen (str); i++)
+    if (str[i] == ',')
+      argnum++;
+
+  attr_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (attr_str, str);
+
+  for (i = 0; i < strlen (attr_str); i++)
+    if (attr_str[i] == '=')
+      attr_str[i] = '_';
+
+  if (argnum == 1)
+    return attr_str;
+
+  args = (char **)xmalloc (argnum * sizeof (char *));
+
+  i = 0;
+  attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      args[i] = attr;
+      i++;
+      attr = strtok (NULL, ",");
+    }
+
+  qsort (args, argnum, sizeof (char*), attr_strcmp);
+
+  ret_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (ret_str, args[0]);
+  for (i = 1; i < argnum; i++)
+    {
+      strcat (ret_str, "_");
+      strcat (ret_str, args[i]);
+    }
+
+  free (args);
+  free (attr_str);
+  return ret_str;
+}
+
+/* Returns true when only one of DECL1 and DECL2 is marked with "target"
+   or if the "target" attribute strings of DECL1 and DECL2 dont match.  */
+
+bool
+has_different_version_attributes (const tree decl1, const tree decl2)
+{
+  tree attr1, attr2;
+  char *c1, *c2;
+  bool ret = false;
+
+  if (TREE_CODE (decl1) != FUNCTION_DECL
+      || TREE_CODE (decl2) != FUNCTION_DECL)
+    return false;
+
+  attr1 = lookup_attribute ("target", DECL_ATTRIBUTES (decl1));
+  attr2 = lookup_attribute ("target", DECL_ATTRIBUTES (decl2));
+
+  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
+    return false;
+
+  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
+      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
+    return true;
+
+  c1 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1))));
+  c2 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2))));
+
+  if (strcmp (c1, c2) != 0)
+     ret = true;
+
+  free (c1);
+  free (c2);
+
+  return ret;
+}
+
+/* If this decl corresponds to a function and has "target" attribute,
+   append the attribute string to its assembler name.  */
+
+static void
+version_assembler_name (const tree decl)
+{
+  tree version_attr;
+  const char *orig_name, *version_string, *attr_str;
+  char *assembler_name;
+  tree assembler_name_tree;
+  
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+    return;
+
+  if (DECL_DECLARED_INLINE_P (decl)
+      &&lookup_attribute ("gnu_inline",
+			  DECL_ATTRIBUTES (decl)))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Function versions cannot be marked as gnu_inline,"
+	      " bodies have to be generated\n");
+
+  if (DECL_VIRTUAL_P (decl)
+      || DECL_VINDEX (decl))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Virtual function versioning not supported\n");
+
+  version_attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  /* target attribute string is NULL for default functions.  */
+  if (version_attr == NULL_TREE)
+    return;
+
+  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+  version_string
+    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
+
+  attr_str = sorted_attr_string (version_string);
+  assembler_name = (char *) xmalloc (strlen (orig_name)
+				     + strlen (attr_str) + 2);
+
+  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
+  if (dump_file)
+    fprintf (dump_file, "Assembler name set to %s for function version %s\n",
+	     assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
+
+  assembler_name_tree = get_identifier (assembler_name);
+
+  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree);
+  SET_DECL_RTL (decl, NULL);
+}
+
+void
+mark_function_as_version (const tree decl)
+{
+  if (DECL_FUNCTION_VERSIONED (decl))
+    return;
+  DECL_FUNCTION_VERSIONED (decl) = 1;
+  version_assembler_name (decl);
+}
+
+/* Returns true if function DECL has target attribute set.  This could be
+   a version.  */
+
+bool
+is_target_attribute_set (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && (lookup_attribute ("target", DECL_ATTRIBUTES (decl))
+	      != NULL_TREE));
+}
+
+/* Returns true if decl is multi-versioned and DECL is the default function,
+   that is it is not tagged with "target" attribute.  */
+
+bool
+is_default_function (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && (lookup_attribute ("target", DECL_ATTRIBUTES (decl))
+	      == NULL_TREE));	
+}
+
+/* For function decl DECL, find the version_function struct in the
+   decl_version_htab.  */
+
+static version_function *
+find_function_version (const tree decl)
+{
+  void *slot;
+
+  if (!DECL_FUNCTION_VERSIONED (decl))
+    return NULL;
+
+  if (!decl_version_htab)
+    return NULL;
+
+  slot = htab_find_with_hash (decl_version_htab, decl,
+                              htab_hash_pointer (decl));
+
+  if (slot != NULL)
+    return (version_function *)slot;
+
+  return NULL;
+}
+
+/* Record DECL as a function version by creating a version_function struct
+   for it and storing it in the hashtable.  */
+
+static version_function *
+add_function_version (const tree decl)
+{
+  void **slot;
+  version_function *v;
+
+  if (!DECL_FUNCTION_VERSIONED (decl))
+    return NULL;
+
+  create_decl_version_htab ();
+
+  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl,
+                                   htab_hash_pointer ((const void_p)decl),
+				   INSERT);
+
+  if (*slot != NULL)
+    return (version_function *)*slot;
+
+  v = new_version_function (decl);
+  *slot = v;
+
+  return v;
+}
+
+/* Push V into VEC only if it is not already present.  If already present
+   returns false.  */
+
+static bool
+push_function_version (version_function *v, VEC (void_p, heap) **vec)
+{
+  int ix;
+  void_p ele; 
+  for (ix = 0; VEC_iterate (void_p, *vec, ix, ele); ++ix)
+    {
+      if (ele == (void_p)v)
+        return false;
+    }
+
+  VEC_safe_push (void_p, heap, *vec, (void*)v);
+  return true;
+}
+ 
+/* Mark DECL as deleted.  This is called by the front-end when a duplicate
+   decl is merged with the original decl and the duplicate decl is deleted.
+   This function marks the duplicate_decl as invalid.  Called by
+   duplicate_decls in cp/decl.c.  */
+
+void
+mark_delete_decl_version (const tree decl)
+{
+  version_function *decl_v;
+
+  decl_v = find_function_version (decl);
+
+  if (decl_v == NULL)
+    return;
+
+  decl_v->is_deleted = true;
+
+  if (is_default_function (decl)
+      && decl_v->versions != NULL)
+    {
+      VEC_truncate (void_p, decl_v->versions, 0);
+      VEC_free (void_p, heap, decl_v->versions);
+      decl_v->versions = NULL;
+    }
+}
+
+/* Mark DECL1 and DECL2 to be function versions in the same group.  One
+   of DECL1 and DECL2 must be the default, otherwise this function does
+   nothing.  This function aggregates the versions.  */
+
+int
+group_function_versions (const tree decl1, const tree decl2)
+{
+  tree default_decl, version_decl;
+  version_function *default_v, *version_v;
+
+  gcc_assert (DECL_FUNCTION_VERSIONED (decl1)
+	      && DECL_FUNCTION_VERSIONED (decl2));
+
+  /* The version decls are added only to the default decl.  */
+  if (!is_default_function (decl1)
+      && !is_default_function (decl2))
+    return 0;
+
+  /* This can happen with duplicate declarations.  Just ignore.  */
+  if (is_default_function (decl1)
+      && is_default_function (decl2))
+    return 0;
+
+  default_decl = (is_default_function (decl1)) ? decl1 : decl2;
+  version_decl = (default_decl == decl1) ? decl2 : decl1;
+
+  gcc_assert (default_decl != version_decl);
+  create_decl_version_htab ();
+
+  /* If the version function is found, it has been added.  */
+  if (find_function_version (version_decl))
+    return 0;
+
+  default_v = add_function_version (default_decl);
+  version_v = add_function_version (version_decl);
+
+  if (default_v->versions == NULL)
+    default_v->versions = VEC_alloc (void_p, heap, 1);
+
+  push_function_version (version_v, &default_v->versions);
+  return 0;
+}
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* Return a new name by appending SUFFIX to the DECL name.  If
+   make_unique is true, append the full path name.  */
+
+static char *
+make_name (tree decl, const char *suffix, bool make_unique)
+{
+  char *global_var_name;
+  int name_len;
+  const char *name;
+  const char *unique_name = NULL;
+
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* Get a unique name that can be used globally without any chances
+     of collision at link time.  */
+  if (make_unique)
+    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
+
+  name_len = strlen (name) + strlen (suffix) + 2;
+
+  if (make_unique)
+    name_len += strlen (unique_name) + 1;
+  global_var_name = (char *) xmalloc (name_len);
+
+  /* Use '.' to concatenate names as it is demangler friendly.  */
+  if (make_unique)
+      snprintf (global_var_name, name_len, "%s.%s.%s", name,
+		unique_name, suffix);
+  else
+      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
+
+  return global_var_name;
+}
+
+/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch
+   the versions of multi-versioned function DEFAULT_DECL.  Create and
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_ifunc_resolver_func (const tree default_decl,
+			  const tree ifunc_decl,
+			  basic_block *empty_bb)
+{
+  char *resolver_name;
+  tree decl, type, decl_name, t;
+  basic_block new_bb;
+  tree old_current_function_decl;
+  bool make_unique = false;
+
+  /* IFUNC's have to be globally visible.  So, if the default_decl is
+     not, then the name of the IFUNC should be made unique.  */
+  if (TREE_PUBLIC (default_decl) == 0)
+    make_unique = true;
+
+  /* Append the filename to the resolver function if the versions are
+     not externally visible.  This is because the resolver function has
+     to be externally visible for the loader to find it.  So, appending
+     the filename will prevent conflicts with a resolver function from
+     another module which is based on the same version name.  */
+  resolver_name = make_name (default_decl, "resolver", make_unique);
+
+  /* The resolver function should return a (void *). */
+  type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  decl_name = get_identifier (resolver_name);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = TREE_USED (default_decl);
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  /* IFUNC resolvers have to be externally visible.  */
+  TREE_PUBLIC (decl) = 1;
+  DECL_UNINLINABLE (decl) = 1;
+
+  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl);
+  DECL_EXTERNAL (ifunc_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+  TREE_READONLY (decl) = 0;
+  DECL_PURE_P (decl) = 0;
+  DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
+    }
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  current_function_decl = decl;
+  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
+  cfun->curr_properties |=
+    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
+     PROP_ssa);
+  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
+  *empty_bb = new_bb;
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
+  cgraph_mark_force_output_node (cgraph_get_create_node (decl));
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      gcc_assert (cgraph_get_node (default_decl));
+      symtab_add_to_same_comdat_group (
+	(symtab_node) cgraph_get_node (decl),
+	(symtab_node) cgraph_get_node (default_decl));
+    }
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  gcc_assert (ifunc_decl != NULL);
+  /* Mark ifunc_decl as "ifunc" with resolver as resolver_name.  */
+  DECL_ATTRIBUTES (ifunc_decl) 
+    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl));
+
+  /* Create the alias here.  */
+  cgraph_create_function_alias (ifunc_decl, decl);
+  return decl;
+}
+
+/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to
+   DECL function will be replaced with calls to the ifunc.   Return the decl
+   of the ifunc created.  */
+
+static tree
+make_ifunc_func (const tree decl)
+{
+  tree ifunc_decl;
+  char *ifunc_name, *resolver_name;
+  tree fn_type, ifunc_type;
+  bool make_unique = false;
+
+  if (TREE_PUBLIC (decl) == 0)
+    make_unique = true;
+
+  ifunc_name = make_name (decl, "ifunc", make_unique);
+  resolver_name = make_name (decl, "resolver", make_unique);
+  gcc_assert (resolver_name);
+
+  fn_type = TREE_TYPE (decl);
+  ifunc_type = build_function_type (TREE_TYPE (fn_type),
+				    TYPE_ARG_TYPES (fn_type));
+  
+  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type);
+  TREE_USED (ifunc_decl) = 1;
+  DECL_CONTEXT (ifunc_decl) = NULL_TREE;
+  DECL_INITIAL (ifunc_decl) = error_mark_node;
+  DECL_ARTIFICIAL (ifunc_decl) = 1;
+  /* Mark this ifunc as external, the resolver will flip it again if
+     it gets generated.  */
+  DECL_EXTERNAL (ifunc_decl) = 1;
+  /* IFUNCs have to be externally visible.  */
+  TREE_PUBLIC (ifunc_decl) = 1;
+
+  return ifunc_decl;  
+}
+
+/* For multi-versioned function decl, which should also be the default,
+   return the decl of the ifunc resolver, create it if it does not
+   exist.  */
+
+tree
+get_ifunc_for_version (const tree decl)
+{
+  version_function *decl_v;
+  int ix;
+  void_p ele;
+
+  /* DECL has to be the default version, otherwise it is missing and
+     that is not allowed.  */
+  if (!is_default_function (decl))
+    {
+      error_at (DECL_SOURCE_LOCATION (decl), "Default version not found");
+      return decl;
+    }
+
+  decl_v = find_function_version (decl);
+  gcc_assert (decl_v != NULL);
+  if (decl_v->ifunc_decl == NULL)
+    {
+      tree ifunc_decl;
+      ifunc_decl = make_ifunc_func (decl);
+      decl_v->ifunc_decl = ifunc_decl;
+    }
+
+  if (cgraph_get_node (decl))
+    cgraph_mark_force_output_node (cgraph_get_node (decl));
+
+  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
+    {
+      version_function *v = (version_function *) ele;
+      /* This could be a deleted version.  Happens with
+	 duplicate declarations. */
+      if (v->is_deleted)
+	continue;
+      gcc_assert (v->decl != NULL);
+      if (cgraph_get_node (v->decl))
+	cgraph_mark_force_output_node (cgraph_get_node (v->decl));
+    }
+
+  return decl_v->ifunc_decl;
+}
+
+/* Generate the dispatching code to dispatch multi-versioned function
+   DECL.  Make a new function decl for dispatching and call the target
+   hook to process the "target" attributes and provide the code to
+   dispatch the right function at run-time.  */
+
+static tree
+make_ifunc_resolver_for_version (const tree decl)
+{
+  version_function *decl_v;
+  tree ifunc_resolver_decl, ifunc_decl;
+  basic_block empty_bb;
+  int ix;
+  void_p ele;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  tree old_current_function_decl;
+
+  gcc_assert (is_default_function (decl));
+
+  decl_v = find_function_version (decl);
+  gcc_assert (decl_v != NULL);
+
+  if (decl_v->ifunc_resolver_decl != NULL)
+    return decl_v->ifunc_resolver_decl;
+
+  ifunc_decl = decl_v->ifunc_decl;
+
+  if (ifunc_decl == NULL)
+    ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl);
+
+  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl,
+						  &empty_bb);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (ifunc_resolver_decl));
+  current_function_decl = ifunc_resolver_decl;
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+  VEC_safe_push (tree, heap, fn_ver_vec, decl);
+
+  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix)
+    {
+      version_function *v = (version_function *) ele;
+      gcc_assert (v->decl != NULL);
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (v->decl))
+        error_at (DECL_SOURCE_LOCATION (v->decl),
+		  "Virtual function versioning not supported\n");
+      if (!v->is_deleted)
+	VEC_safe_push (tree, heap, fn_ver_vec, v->decl);
+    }
+
+  gcc_assert (targetm.dispatch_version);
+  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb);
+  decl_v->ifunc_resolver_decl = ifunc_resolver_decl;
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+  return ifunc_resolver_decl;
+}
+
+/* Main entry point to pass_dispatch_versions. For multi-versioned functions,
+   generate the dispatching code.  */
+
+static unsigned int
+do_dispatch_versions (void)
+{
+  /* A new pass for generating dispatch code for multi-versioned functions.
+     Other forms of dispatch can be added when ifunc support is not available
+     like just calling the function directly after checking for target type.
+     Currently, dispatching is done through IFUNC.  This pass will become
+     more meaningful when other dispatch mechanisms are added.  */
+
+  /* Cloning a function to produce more versions will happen here when the
+     user requests that via the target attribute. For example,
+     int foo () __attribute__ ((target(("arch=core2"), ("arch=corei7"))));
+     means that the user wants the same body of foo to be versioned for core2
+     and corei7.  In that case, this function will be cloned during this
+     pass.  */
+  
+  if (DECL_FUNCTION_VERSIONED (current_function_decl)
+      && is_default_function (current_function_decl))
+    {
+      tree decl = make_ifunc_resolver_for_version (current_function_decl);
+      if (dump_file && decl)
+	dump_function_to_file (decl, dump_file, TDF_BLOCKS);
+    }
+  return 0;
+}
+
+static  bool
+gate_dispatch_versions (void)
+{
+  return true;
+}
+
+/* A pass to generate the dispatch code to execute the appropriate version
+   of a multi-versioned function at run-time.  */
+
+struct gimple_opt_pass pass_dispatch_versions =
+{
+ {
+  GIMPLE_PASS,
+  "dispatch_multiversion_functions",    /* name */
+  gate_dispatch_versions,		/* gate */
+  do_dispatch_versions,			/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_MULTIVERSION_DISPATCH,		/* tv_id */
+  PROP_cfg,				/* properties_required */
+  PROP_cfg,				/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  0					/* todo_flags_finish */
+ }
+};
Index: gcc/cgraphunit.c
===================================================================
--- gcc/cgraphunit.c	(revision 187371)
+++ gcc/cgraphunit.c	(working copy)
@@ -420,6 +420,13 @@ cgraph_finalize_function (tree decl, bool nested)
       && !DECL_DISREGARD_INLINE_LIMITS (decl))
     node->symbol.force_output = 1;
 
+  /* With function versions, keep inline functions and do not worry about
+     inline limits.  */
+  if (DECL_FUNCTION_VERSIONED (decl)
+      && DECL_DECLARED_INLINE_P (decl)
+      && !DECL_EXTERNAL (decl))
+    node->symbol.force_output = 1;
+
   /* When not optimizing, also output the static functions. (see
      PR24561), but don't do so for always_inline functions, functions
      declared inline and nested functions.  These were optimized out
Index: gcc/multiversion.h
===================================================================
--- gcc/multiversion.h	(revision 0)
+++ gcc/multiversion.h	(revision 0)
@@ -0,0 +1,55 @@
+/* Function Multiversioning.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+/* This is the header file which provides the functions to keep track
+   of functions that are multi-versioned and to generate the dispatch
+   code to call the right version at run-time.  */
+
+#ifndef GCC_MULTIVERSION_H
+#define GCC_MULTIVERION_H
+
+#include "tree.h"
+
+/* Mark DECL1 and DECL2 as function versions.  */
+int group_function_versions (const tree decl1, const tree decl2);
+
+/* Mark DECL as deleted and no longer a version.  */
+void mark_delete_decl_version (const tree decl);
+
+/* Returns true if DECL is the default version to be executed if all
+   other versions are inappropriate at run-time.  */
+bool is_default_function (const tree decl);
+
+/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL
+   must be the default function in the multi-versioned group.  */
+tree get_ifunc_for_version (const tree decl);
+
+/* Returns true when only one of DECL1 and DECL2 is marked with "target"
+   or if the "target" attribute strings of  DECL1 and DECL2 dont match.  */
+bool has_different_version_attributes (const tree decl1, const tree decl2);
+
+/* Function DECL is marked to be a multi-versioned function.  If DECL is
+   not the default version, the assembler name of DECL is changed to include
+   the attribute string to keep the name unambiguous.  */
+void mark_function_as_version (const tree decl);
+
+/* Check if decl is FUNCTION_DECL with target attribute set.  */
+bool is_target_attribute_set (const tree decl);
+#endif
Index: gcc/testsuite/g++.dg/mv1.C
===================================================================
--- gcc/testsuite/g++.dg/mv1.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv1.C	(revision 0)
@@ -0,0 +1,202 @@
+/* Test case to check if Multiversioning works.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -fPIC" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The other versions of foo.  Mix up the ordering and 
+   check if the dispatching does it in the order of priority. */
+/* Check combination of target attributes.  */
+int foo () __attribute__ ((target("arch=corei7,popcnt")));
+/* The target operands in this declaration and the definition are re-ordered.
+   This should still work.  */
+int foo () __attribute__ ((target("ssse3,avx2")));
+
+/* Check for all target attributes for which dispatchers are available.  */
+/* Check arch= */
+int foo () __attribute__((target("arch=core2")));
+int foo () __attribute__((target("arch=corei7")));
+int foo () __attribute__((target("arch=atom")));
+/* Check ISAs  */
+int foo () __attribute__((target("sse3")));
+int foo () __attribute__((target("sse2")));
+int foo () __attribute__((target("sse")));
+int foo () __attribute__((target("avx")));
+int foo () __attribute__((target("sse4.2")));
+int foo () __attribute__((target("popcnt")));
+int foo () __attribute__((target("sse4.1")));
+int foo () __attribute__((target("ssse3")));
+int foo () __attribute__((target("mmx")));
+int foo () __attribute__((target("avx2")));
+/* Check more arch=.  */
+int foo () __attribute__((target("arch=amdfam10")));
+int foo () __attribute__((target("arch=bdver1")));
+int foo () __attribute__((target("arch=bdver2")));
+
+int (*p)() = &foo;
+int main ()
+{
+  int val = foo ();
+  assert (val ==  (*p)());
+
+  /* Check in the exact same order in which the dispatching
+     is expected to happen.  */
+  if (__builtin_cpu_is ("bdver1"))
+    assert (val == 1);
+  else if (__builtin_cpu_is ("bdver2"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("avx2")
+	   && __builtin_cpu_supports ("ssse3"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("avx2"))
+    assert (val == 4);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 5);
+  else if (__builtin_cpu_is ("corei7")
+	   && __builtin_cpu_supports ("popcnt"))
+    assert (val == 6);
+  else if (__builtin_cpu_supports ("popcnt"))
+    assert (val == 7);
+  else if (__builtin_cpu_is ("corei7"))
+    assert (val == 8);
+  else if (__builtin_cpu_supports ("sse4.2"))
+    assert (val == 9);
+  else if (__builtin_cpu_supports ("sse4.1"))
+    assert (val == 10);
+  else if (__builtin_cpu_is ("amdfam10h"))
+    assert (val == 11);
+  else if (__builtin_cpu_is ("core2"))
+    assert (val == 12);
+  else if (__builtin_cpu_is ("atom"))
+    assert (val == 13);
+  else if (__builtin_cpu_supports ("ssse3"))
+    assert (val == 14);
+  else if (__builtin_cpu_supports ("sse3"))
+    assert (val == 15);
+  else if (__builtin_cpu_supports ("sse2"))
+    assert (val == 16);
+  else if (__builtin_cpu_supports ("sse"))
+    assert (val == 17);
+  else if (__builtin_cpu_supports ("mmx"))
+    assert (val == 18);
+  else
+    assert (val == 0);
+  
+  return 0;
+}
+
+int foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=bdver1")))
+foo ()
+{
+  return 1;
+}
+
+int __attribute__ ((target("arch=bdver2")))
+foo ()
+{
+  return 2;
+}
+
+int __attribute__ ((target("avx2,ssse3")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("arch=corei7,popcnt")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("arch=corei7")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("arch=amdfam10")))
+foo ()
+{
+  return 11;
+}
+
+int __attribute__ ((target("arch=core2")))
+foo ()
+{
+  return 12;
+}
+
+int __attribute__ ((target("arch=atom")))
+foo ()
+{
+  return 13;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return 14;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return 15;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return 16;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return 17;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return 18;
+}
Index: gcc/testsuite/g++.dg/mv2.C
===================================================================
--- gcc/testsuite/g++.dg/mv2.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv2.C	(revision 0)
@@ -0,0 +1,119 @@
+/* Test case to check if Multiversioning chooses the correct
+   dispatching order when versions are for various ISAs.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The dispatch checks should be in the exact reverse order of the
+   declarations below.  */
+int foo () __attribute__ ((target ("mmx")));
+int foo () __attribute__ ((target ("sse")));
+int foo () __attribute__ ((target ("sse2")));
+int foo () __attribute__ ((target ("sse3")));
+int foo () __attribute__ ((target ("ssse3")));
+int foo () __attribute__ ((target ("sse4.1")));
+int foo () __attribute__ ((target ("sse4.2")));
+int foo () __attribute__ ((target ("popcnt")));
+int foo () __attribute__ ((target ("avx")));
+int foo () __attribute__ ((target ("avx2")));
+
+int main ()
+{
+
+  int val = foo ();
+
+  if (__builtin_cpu_supports ("avx2"))
+    assert (val == 1);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("popcnt"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("sse4.2"))
+    assert (val == 4);
+  else if (__builtin_cpu_supports ("sse4.1"))
+    assert (val == 5);
+  else if (__builtin_cpu_supports ("ssse3"))
+    assert (val == 6);
+  else if (__builtin_cpu_supports ("sse3"))
+    assert (val == 7);
+  else if (__builtin_cpu_supports ("sse2"))
+    assert (val == 8);
+  else if (__builtin_cpu_supports ("sse"))
+    assert (val == 9);
+  else if (__builtin_cpu_supports ("mmx"))
+    assert (val == 10);
+  else
+    assert (val == 0);
+
+  return 0;
+}
+
+int
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return 1;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 2;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return 10;
+}
Index: gcc/cp/class.c
===================================================================
--- gcc/cp/class.c	(revision 187371)
+++ gcc/cp/class.c	(working copy)
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-dump.h"
 #include "splay-tree.h"
 #include "pointer-set.h"
+#include "multiversion.h"
 
 /* The number of nested classes being processed.  If we are not in the
    scope of any class, this is zero.  */
@@ -1093,7 +1094,21 @@ add_method (tree type, tree method, tree using_dec
 	      || same_type_p (TREE_TYPE (fn_type),
 			      TREE_TYPE (method_type))))
 	{
-	  if (using_decl)
+	  /* For function versions, their parms and types match
+	     but they are not duplicates.  Record function versions
+	     as and when they are found.  */
+	  if (TREE_CODE (fn) == FUNCTION_DECL
+	      && TREE_CODE (method) == FUNCTION_DECL
+	      && (is_target_attribute_set (fn)
+		  || is_target_attribute_set (method))
+	      && has_different_version_attributes (fn, method))
+ 	    {
+	      mark_function_as_version (fn);
+	      mark_function_as_version (method);
+	      group_function_versions (fn, method);
+	      continue;
+	    }
+	  else if (using_decl)
 	    {
 	      if (DECL_CONTEXT (fn) == type)
 		/* Defer to the local function.  */
@@ -1151,6 +1166,7 @@ add_method (tree type, tree method, tree using_dec
   else
     /* Replace the current slot.  */
     VEC_replace (tree, method_vec, slot, overload);
+
   return true;
 }
 
@@ -6928,8 +6944,11 @@ resolve_address_of_overloaded_function (tree targe
 	  if (DECL_ANTICIPATED (fn))
 	    continue;
 
-	  /* See if there's a match.  */
-	  if (same_type_p (target_fn_type, static_fn_type (fn)))
+	  /* See if there's a match.   For functions that are multi-versioned
+	     match it to the default function.  */
+	  if (same_type_p (target_fn_type, static_fn_type (fn))
+	      && (!DECL_FUNCTION_VERSIONED (fn)
+		  || is_default_function (fn)))
 	    matches = tree_cons (fn, NULL_TREE, matches);
 	}
     }
@@ -7091,6 +7110,22 @@ resolve_address_of_overloaded_function (tree targe
       perform_or_defer_access_check (access_path, fn, fn);
     }
 
+  /* If a pointer to a function that is multi-versioned is requested, the
+     pointer to the dispatcher function is returned instead.  This works
+     well because indirectly calling the function will dispatch the right
+     function version at run-time. Also, the function address is kept
+     unique.  */
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && is_default_function (fn))
+    {
+      tree ifunc_decl;
+      ifunc_decl = get_ifunc_for_version (fn);
+      retrofit_lang_decl (ifunc_decl);
+      gcc_assert (ifunc_decl != NULL);
+      mark_used (fn);
+      return build_fold_addr_expr (ifunc_decl);
+    }
+
   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
     return cp_build_addr_expr (fn, flags);
   else
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	(revision 187371)
+++ gcc/cp/decl.c	(working copy)
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "pointer-set.h"
 #include "splay-tree.h"
 #include "plugin.h"
+#include "multiversion.h"
 
 /* Possible cases of bad specifiers type used by bad_specifiers. */
 enum bad_spec_place {
@@ -973,6 +974,21 @@ decls_match (tree newdecl, tree olddecl)
       if (t1 != t2)
 	return 0;
 
+      /* The decls dont match if they correspond to two different versions
+	 of the same function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) 
+	  && has_different_version_attributes (newdecl, olddecl))
+	{
+	  /* One of the decls could be the default without the "target"
+	     attribute. Set it to be a versioned function here.  */
+	  mark_function_as_version (newdecl);
+	  mark_function_as_version (olddecl);
+	  /* Accumulate all the versions of a function.  */
+	  group_function_versions (olddecl, newdecl);
+	  return 0;
+	}
+
       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
 	  && ! (DECL_EXTERN_C_P (newdecl)
 		&& DECL_EXTERN_C_P (olddecl)))
@@ -1490,7 +1506,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
 	      error ("previous declaration %q+#D here", olddecl);
 	      return NULL_TREE;
 	    }
-	  else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
+	  /* For function versions, params and types match, but they
+	     are not ambiguous.  */
+	  else if ((!DECL_FUNCTION_VERSIONED (newdecl)
+		    && !DECL_FUNCTION_VERSIONED (olddecl))
+		   && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
 			      TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
 	    {
 	      error ("new declaration %q#D", newdecl);
@@ -2262,6 +2282,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
+  /* If the olddecl is a version, so is the newdecl.  */
+  if (TREE_CODE (newdecl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (olddecl))
+    {
+      DECL_FUNCTION_VERSIONED (newdecl) = 1;
+      /* Record that newdecl is not a valid version and has
+	 been deleted.  */
+      mark_delete_decl_version (newdecl);
+    }
+
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
     {
       int function_size;
@@ -3810,6 +3840,7 @@ cp_make_fname_decl (location_t loc, tree id, int t
 			    ? NULL : fname_as_string (type_dep));
   tree type;
   tree init = cp_fname_init (name, &type);
+
   tree decl = build_decl (loc, VAR_DECL, id, type);
 
   if (name)
@@ -14036,7 +14067,11 @@ cxx_comdat_group (tree decl)
 	  else
 	    break;
 	}
-      name = DECL_ASSEMBLER_NAME (decl);
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl))
+	name = DECL_NAME (decl);
+      else
+        name = DECL_ASSEMBLER_NAME (decl);
     }
 
   return name;
Index: gcc/cp/semantics.c
===================================================================
--- gcc/cp/semantics.c	(revision 187371)
+++ gcc/cp/semantics.c	(working copy)
@@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn)
       /* If the user wants us to keep all inline functions, then mark
 	 this function as needed so that finish_file will make sure to
 	 output it later.  Similarly, all dllexport'd functions must
-	 be emitted; there may be callers in other DLLs.  */
-      if ((flag_keep_inline_functions
+	 be emitted; there may be callers in other DLLs.
+	 Also, mark this function as needed if it is marked inline but
+	 is a multi-versioned function.  */
+      if (((flag_keep_inline_functions
+	    || DECL_FUNCTION_VERSIONED (fn))
 	   && DECL_DECLARED_INLINE_P (fn)
 	   && !DECL_REALLY_EXTERN (fn))
 	  || (flag_keep_inline_dllexport
Index: gcc/cp/decl2.c
===================================================================
--- gcc/cp/decl2.c	(revision 187371)
+++ gcc/cp/decl2.c	(working copy)
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "splay-tree.h"
 #include "langhooks.h"
 #include "c-family/c-ada-spec.h"
+#include "multiversion.h"
 
 extern cpp_reader *parse_in;
 
@@ -677,9 +678,13 @@ check_classfn (tree ctype, tree function, tree tem
 	  if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
 	    continue;
 
+	  /* While finding a match, same types and params are not enough
+	     if the function is versioned.  Also check version ("target")
+	     attributes.  */
 	  if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
 			   TREE_TYPE (TREE_TYPE (fndecl)))
 	      && compparms (p1, p2)
+	      && !has_different_version_attributes (function, fndecl)
 	      && (!is_template
 		  || comp_template_parms (template_parms,
 					  DECL_TEMPLATE_PARMS (fndecl)))
Index: gcc/cp/call.c
===================================================================
--- gcc/cp/call.c	(revision 187371)
+++ gcc/cp/call.c	(working copy)
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "timevar.h"
+#include "multiversion.h"
 
 /* The various kinds of conversion.  */
 
@@ -3903,6 +3904,16 @@ build_new_function_call (tree fn, VEC(tree,gc) **a
     {
       if (complain & tf_error)
 	{
+	  /* If the call is to a multiversioned function without
+	     a default version, overload resolution will fail.  */
+	  if (candidates
+	      && TREE_CODE (candidates->fn) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (candidates->fn))
+	    error_at (location_of (DECL_NAME (OVL_CURRENT (fn))),
+		      "Call to multiversioned function %<%D(%A)%> with"
+		      " no default version", DECL_NAME (OVL_CURRENT (fn)),
+		      build_tree_list_vec (*args));
+
 	  if (!any_viable_p && candidates && ! candidates->next
 	      && (TREE_CODE (candidates->fn) == FUNCTION_DECL))
 	    return cp_build_function_call_vec (candidates->fn, args, complain);
@@ -6824,6 +6835,19 @@ build_over_call (struct z_candidate *cand, int fla
   if (!already_used)
     mark_used (fn);
 
+  /* For a call to a multi-versioned function, the call should actually be to
+     the dispatcher.  */
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && is_default_function (fn))
+    {
+      tree ifunc_decl;
+      ifunc_decl = get_ifunc_for_version (fn);
+      retrofit_lang_decl (ifunc_decl);
+      gcc_assert (ifunc_decl != NULL);
+      return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl,
+					nargs, argarray);
+    }
+
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
     {
       tree t;
@@ -8081,6 +8105,60 @@ joust (struct z_candidate *cand1, struct z_candida
   size_t i;
   size_t len;
 
+  /* For Candidates of a multi-versioned function, first check if the
+     target flags of the caller match any of the candidates. If so,
+     the caller can directly call this candidate otherwise the one marked
+     default wins.  This is because the default decl is used as key to
+     aggregate all the other versions provided for it in multiversion.c.
+     When generating the actual call, the appropriate dispatcher is created
+     to call the right function version at run-time.  */
+
+  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL
+       && DECL_FUNCTION_VERSIONED (cand1->fn))
+      ||(TREE_CODE (cand2->fn) == FUNCTION_DECL
+	 && DECL_FUNCTION_VERSIONED (cand2->fn)))
+    {
+      /* Both functions must be marked versioned.  */
+      gcc_assert (DECL_FUNCTION_VERSIONED (cand1->fn)
+		  && DECL_FUNCTION_VERSIONED (cand2->fn));
+
+      /* Try to see if a direct call can be made to a version.  This is
+	 possible if the caller and callee have the same target flags.
+	 If cand->fn is marked with target attributes,  check if the
+	 target approves inlining this into the caller.  If so, this is
+	 the version we want.  */
+
+      if (is_target_attribute_set (cand1->fn)
+	  && targetm.target_option.can_inline_p (current_function_decl,
+						 cand1->fn))
+	return 1;
+
+      if (is_target_attribute_set (cand2->fn)
+	  && targetm.target_option.can_inline_p (current_function_decl,
+						 cand2->fn))
+	return -1;
+
+      /* A direct call to a version is not possible, so find the default
+	 function and return it.  This will later be converted to dispatch
+	 the right version at run time.  */
+
+      if (is_default_function (cand1->fn))
+	{
+          mark_used (cand2->fn);
+	  return 1;
+	}
+
+      if (is_default_function (cand2->fn))
+	{
+          mark_used (cand1->fn);
+	  return -1;
+	}
+
+      /* If a default function is absent, this will never get resolved leading
+	 to an ambiguous call error.  */
+      return 0;
+    }
+
   /* Candidates that involve bad conversions are always worse than those
      that don't.  */
   if (cand1->viable > cand2->viable)
Index: gcc/timevar.def
===================================================================
--- gcc/timevar.def	(revision 187371)
+++ gcc/timevar.def	(working copy)
@@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE        , "tree if-co
 DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
 DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
 DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
+DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch")
 
 /* Everything else in rest_of_compilation not included above.  */
 DEFTIMEVAR (TV_EARLY_LOCAL	     , "early local passes")
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 187371)
+++ gcc/Makefile.in	(working copy)
@@ -1297,6 +1297,7 @@ OBJS = \
 	mcf.o \
 	mode-switching.o \
 	modulo-sched.o \
+	multiversion.o \
 	omega.o \
 	omp-low.o \
 	optabs.o \
@@ -3042,6 +3043,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h
    $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
    $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
    $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
+multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+   $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
+   $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
+   $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
+   $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h
 cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 187371)
+++ gcc/passes.c	(working copy)
@@ -1293,6 +1293,7 @@ init_optimization_passes (void)
   NEXT_PASS (pass_build_cfg);
   NEXT_PASS (pass_warn_function_return);
   NEXT_PASS (pass_build_cgraph_edges);
+  NEXT_PASS (pass_dispatch_versions);
   *p = NULL;
 
   /* Interprocedural optimization passes.  */
Index: gcc/cp/mangle.c
===================================================================
--- gcc/cp/mangle.c	(revision 187371)
+++ gcc/cp/mangle.c	(working copy)
@@ -1245,7 +1245,12 @@ write_unqualified_name (const tree decl)
     {
       MANGLE_TRACE_TREE ("local-source-name", decl);
       write_char ('L');
-      write_source_name (DECL_NAME (decl));
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && DECL_ASSEMBLER_NAME_SET_P (decl))
+	write_source_name (DECL_ASSEMBLER_NAME (decl));
+      else
+	write_source_name (DECL_NAME (decl));
       /* The default discriminator is 1, and that's all we ever use,
 	 so there's no code to output one here.  */
     }
@@ -1260,7 +1265,14 @@ write_unqualified_name (const tree decl)
                && LAMBDA_TYPE_P (type))
         write_closure_type_name (type);
       else
-        write_source_name (DECL_NAME (decl));
+	{
+	  if (TREE_CODE (decl) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (decl)
+	      && DECL_ASSEMBLER_NAME_SET_P (decl))
+	    write_source_name (DECL_ASSEMBLER_NAME (decl));
+	  else
+	    write_source_name (DECL_NAME (decl));
+	}
     }
 }
 
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 187371)
+++ gcc/config/i386/i386.c	(working copy)
@@ -27664,6 +27664,438 @@ ix86_init_mmx_sse_builtins (void)
     }
 }
 
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
+   to return a pointer to VERSION_DECL if the outcome of the expression
+   formed by PREDICATE_CHAIN is true.  This function will be called during
+   version dispatch to decide which function version to execute.  It returns
+   the basic block at the end to which more conditions can be added.  */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     tree predicate_chain, basic_block new_bb)
+{
+  gimple return_stmt;
+  tree convert_expr, result_var;
+  gimple convert_stmt;
+  gimple call_cond_stmt;
+  gimple if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+
+  tree cond_var, and_expr_var = NULL_TREE;
+  gimple_seq gseq;
+
+  tree old_current_function_decl;
+  tree predicate_decl, predicate_arg;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+  current_function_decl = function_decl;
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node, NULL);
+  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  return_stmt = gimple_build_return (result_var);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      current_function_decl = old_current_function_decl;
+      return new_bb;
+    }
+
+  while (predicate_chain != NULL)
+    {
+      cond_var = create_tmp_var (integer_type_node, NULL);
+      predicate_decl = TREE_PURPOSE (predicate_chain);
+      predicate_arg = TREE_VALUE (predicate_chain);
+      call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+      gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+      gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+      gimple_set_bb (call_cond_stmt, new_bb);
+      gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+      predicate_chain = TREE_CHAIN (predicate_chain);
+      
+      if (and_expr_var == NULL)
+        and_expr_var = cond_var;
+      else
+	{
+	  gimple assign_stmt;
+	  /* Use MIN_EXPR to check if any integer is zero?.
+	     and_expr_var = min_expr <cond_var, and_expr_var>  */
+	  assign_stmt = gimple_build_assign (and_expr_var,
+			  build2 (MIN_EXPR, integer_type_node,
+				  cond_var, and_expr_var));
+
+	  gimple_set_block (assign_stmt, DECL_INITIAL (function_decl));
+	  gimple_set_bb (assign_stmt, new_bb);
+	  gimple_seq_add_stmt (&gseq, assign_stmt);
+	}
+    }
+
+  if_else_stmt = gimple_build_cond (GT_EXPR, and_expr_var,
+	  		            integer_zero_node,
+				    NULL_TREE, NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR, 0);
+
+  rebuild_cgraph_edges ();
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  return bb3;
+}
+
+/* This parses the attribute arguments to target in DECL and determines
+   the right builtin to use to match the platform specification.
+   For now, only one target argument ("arch=" or "<-m>xxx") is allowed.  */
+
+static tree 
+get_builtin_code_for_version (tree decl, unsigned int *priority)
+{
+  tree attrs;
+  struct cl_target_option cur_target;
+  tree target_node;
+  struct cl_target_option *new_target;
+  const char *arg_str = NULL;
+  const char *attrs_str = NULL;
+  char *tok_str = NULL;
+  char *token;
+
+  /* Priority of i386 features, greater value is higher priority.   This is
+     used to decide the order in which function dispatch must happen.  For
+     instance, a version specialized for SSE4.2 should be checked for dispatch
+     before a version for SSE3, as SSE4.2 implies SSE3.  */
+  enum feature_priority
+  {
+    P_ZERO = 0,
+    P_MMX,
+    P_SSE,
+    P_SSE2,
+    P_SSE3,
+    P_SSSE3,
+    P_PROC_SSSE3,
+    P_SSE4_a,
+    P_PROC_SSE4_a,
+    P_SSE4_1,
+    P_SSE4_2,
+    P_PROC_SSE4_2,
+    P_POPCNT,
+    P_AVX,
+    P_AVX2,
+    P_FMA,
+    P_PROC_FMA
+  };
+
+  /* These are the target attribute strings for which a dispatcher is
+     available, from fold_builtin_cpu.  */
+
+  static struct _feature_list
+    {
+      const char *const name;
+      const enum feature_priority priority;
+    }
+  const feature_list[] =
+    {
+      {"mmx", P_MMX},
+      {"sse", P_SSE},
+      {"sse2", P_SSE2},
+      {"sse3", P_SSE3},
+      {"ssse3", P_SSSE3},
+      {"sse4.1", P_SSE4_1},
+      {"sse4.2", P_SSE4_2},
+      {"popcnt", P_POPCNT},
+      {"avx", P_AVX},
+      {"avx2", P_AVX2}
+    };
+
+
+  static unsigned int NUM_FEATURES
+    = sizeof (feature_list) / sizeof (struct _feature_list);
+
+  unsigned int i;
+
+  tree predicate_chain = NULL_TREE;
+  tree predicate_decl, predicate_arg;
+
+  attrs = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  gcc_assert (attrs != NULL);
+
+  attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+  gcc_assert (TREE_CODE (attrs) == STRING_CST);
+  attrs_str = TREE_STRING_POINTER (attrs);
+
+  *priority = 0;
+
+  /* Handle arch= if specified.  For priority, set it to be 1 more than
+     the best instruction set the processor can handle.  For instance, if
+     there is a version for atom and a version for ssse3 (the highest ISA
+     priority for atom), the atom version must be checked for dispatch
+     before the ssse3 version. */
+  if (strstr (attrs_str, "arch=") != NULL)
+    {
+      cl_target_option_save (&cur_target, &global_options);
+      target_node = ix86_valid_target_attribute_tree (attrs);
+    
+      gcc_assert (target_node);
+      new_target = TREE_TARGET_OPTION (target_node);
+      gcc_assert (new_target);
+      
+      if (new_target->arch_specified && new_target->arch > 0)
+	{
+	  switch (new_target->arch)
+	    {
+	    case PROCESSOR_CORE2_32:
+	    case PROCESSOR_CORE2_64:
+	      arg_str = "core2";
+	      *priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_COREI7_32:
+	    case PROCESSOR_COREI7_64:
+	      arg_str = "corei7";
+	      *priority = P_PROC_SSE4_2;
+	      break;
+	    case PROCESSOR_ATOM:
+	      arg_str = "atom";
+	      *priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_AMDFAM10:
+	      arg_str = "amdfam10h";
+	      *priority = P_PROC_SSE4_a;
+	      break;
+	    case PROCESSOR_BDVER1:
+	      arg_str = "bdver1";
+	      *priority = P_PROC_FMA;
+	      break;
+	    case PROCESSOR_BDVER2:
+	      arg_str = "bdver2";
+	      *priority = P_PROC_FMA;
+	      break;
+	    }  
+	}    
+    
+      cl_target_option_restore (&global_options, &cur_target);
+      if (arg_str == NULL)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+	    	"No dispatcher found for the versioning attributes");
+	  return NULL;
+	}
+    
+      predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
+      /* For a C string literal the length includes the trailing NULL.  */
+      predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+      predicate_chain = tree_cons (predicate_decl, predicate_arg,
+				   predicate_chain);
+    }
+
+  /* Process feature name.  */
+  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
+  strcpy (tok_str, attrs_str);
+  token = strtok (tok_str, ",");
+  predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
+
+  while (token != NULL)
+    {
+      /* Do not process "arch="  */
+      if (strncmp (token, "arch=", 5) == 0)
+	{
+	  token = strtok (NULL, ",");
+	  continue;
+	}
+      for (i = 0; i < NUM_FEATURES; ++i)
+	{
+	  if (strcmp (token, feature_list[i].name) == 0)
+	    {
+	      predicate_arg = build_string_literal (
+				strlen (feature_list[i].name) + 1,
+				feature_list[i].name);
+	      predicate_chain = tree_cons (predicate_decl, predicate_arg,
+					   predicate_chain);
+	      /* Find the maximum priority feature.  */
+	      if (feature_list[i].priority > *priority)
+		*priority = feature_list[i].priority;
+
+	      break;
+	    }
+	}
+      if (i == NUM_FEATURES)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+		    "No dispatcher found for %s", token);
+	  return NULL;
+	}
+      token = strtok (NULL, ",");
+    }
+  free (tok_str);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+	        "No dispatcher found for the versioning attributes : %s",
+	        attrs_str);
+      return NULL;
+    }
+
+  predicate_chain = nreverse (predicate_chain);
+  return predicate_chain; 
+}
+ 
+static int
+feature_compare (const void *v1, const void *v2)
+{
+  typedef struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    } function_version_info;
+
+  const function_version_info c1 = *(const function_version_info *)v1;
+  const function_version_info c2 = *(const function_version_info *)v2;
+  return (c2.dispatch_priority - c1.dispatch_priority);
+}
+
+/* This is the target hook to generate the dispatch function for
+   multi-versioned functions.  DISPATCH_DECL is the function which will
+   contain the dispatch logic.  FNDECLS are the function choices for
+   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
+   in DISPATCH_DECL in which the dispatch code is generated.  */
+
+static int
+ix86_dispatch_version (tree dispatch_decl,
+		       void *fndecls_p,
+		       basic_block *empty_bb)
+{
+  tree default_decl;
+  gimple ifunc_cpu_init_stmt;
+  gimple_seq gseq;
+  tree old_current_function_decl;
+  int ix;
+  tree ele;
+  VEC (tree, heap) *fndecls;
+  unsigned int num_versions = 0;
+  unsigned int actual_versions = 0;
+  unsigned int i;
+
+  struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    }*function_version_info;
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /*fndecls_p is actually a vector.  */
+  fndecls = (VEC (tree, heap) *)fndecls_p;
+
+  /* Atleast one more version other than the default.  */
+  num_versions = VEC_length (tree, fndecls);
+  gcc_assert (num_versions >= 2);
+
+  function_version_info = (struct _function_version_info *)
+    xmalloc ((num_versions - 1) * sizeof (struct _function_version_info));
+
+  /* The first version in the vector is the default decl.  */
+  default_decl = VEC_index (tree, fndecls, 0);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
+  current_function_decl = dispatch_decl;
+
+  gseq = bb_seq (*empty_bb);
+  ifunc_cpu_init_stmt = gimple_build_call_vec (
+                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
+  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
+  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
+  set_bb_seq (*empty_bb, gseq);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+
+  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
+    {
+      tree version_decl = ele;
+      tree predicate_chain = NULL_TREE;
+      unsigned int priority;
+      /* Get attribute string, parse it and find the right predicate decl.
+         The predicate function could be a lengthy combination of many
+	 features, like arch-type and various isa-variants.  */
+      predicate_chain = get_builtin_code_for_version (version_decl, &priority);
+
+      if (predicate_chain == NULL_TREE)
+	continue;
+
+      actual_versions++;
+      function_version_info [ix - 1].version_decl = version_decl;
+      function_version_info [ix - 1].predicate_chain = predicate_chain;
+      function_version_info [ix - 1].dispatch_priority = priority;
+    }
+
+  /* Sort the versions according to descending order of dispatch priority.  The
+     priority is based on the ISA.  This is not a perfect solution.  There
+     could still be ambiguity.  If more than one function version is suitable
+     to execute,  which one should be dispatched?  In future, allow the user
+     to specify a dispatch  priority next to the version.  */
+  qsort (function_version_info, actual_versions,
+         sizeof (struct _function_version_info), feature_compare);
+
+  for  (i = 0; i < actual_versions; ++i)
+    *empty_bb = add_condition_to_bb (dispatch_decl,
+				     function_version_info[i].version_decl,
+				     function_version_info[i].predicate_chain,
+				     *empty_bb);
+
+  /* dispatch default version at the end.  */
+  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl,
+				   NULL, *empty_bb);
+
+  free (function_version_info);
+  return 0;
+}
+
 /* This builds the processor_model struct type defined in
    libgcc/config/i386/cpuinfo.c  */
 
@@ -39539,6 +39971,9 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN ix86_fold_builtin
 
+#undef TARGET_DISPATCH_VERSION
+#define TARGET_DISPATCH_VERSION ix86_dispatch_version
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
 
Index: gcc/cp/error.c
===================================================================
--- gcc/cp/error.c	(revision 187371)
+++ gcc/cp/error.c	(working copy)
@@ -1534,8 +1534,15 @@ dump_exception_spec (tree t, int flags)
 static void
 dump_function_name (tree t, int flags)
 {
-  tree name = DECL_NAME (t);
+  tree name;
 
+  /* For function versions, use the assembler name as the decl name is
+     the same for all versions.  */
+  if (DECL_FUNCTION_VERSIONED (t))
+    name = DECL_ASSEMBLER_NAME (t);
+  else
+    name = DECL_NAME (t);
+
   /* We can get here with a decl that was synthesized by language-
      independent machinery (e.g. coverage.c) in which case it won't
      have a lang_specific structure attached and DECL_CONSTRUCTOR_P

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-14 18:29                                           ` Sriraman Tallam
@ 2012-05-26  0:07                                             ` H.J. Lu
  2012-05-26  0:16                                               ` Sriraman Tallam
  2012-06-04 19:01                                             ` Sriraman Tallam
  1 sibling, 1 reply; 93+ messages in thread
From: H.J. Lu @ 2012-05-26  0:07 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi H.J,
>
>   Attaching new patch with 2 test cases, mv2.C checks ISAs only and
> mv1.C checks ISAs and arches mixed. Right now, checking only arches is
> not needed as they are mutually exclusive, any order should be fine.
>
> Patch also available for review here:  http://codereview.appspot.com/5752064

Sorry for the delay.  It looks OK except for the function order in tescases.
I think you should rearrange them so that they are not in the same order
as the priority.

Thanks.

H.J.
> Thanks,
> -Sri.
>
> On Sat, May 12, 2012 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Hi H.J.,
>>>
>>>   I have updated the patch to improve the dispatching method like we
>>> discussed. Each feature gets a priority now, and the dispatching is
>>> done in priority order. Please see i386.c for the changes.
>>>
>>> Patch also available for review here:  http://codereview.appspot.com/5752064
>>>
>>
>> I think you need 3 tests:
>>
>> 1.  Only with ISA.
>> 2.  Only with arch
>> 3.  Mixed with ISA and arch
>>
>> since test mixed ISA and arch may hide issues with ISA only or arch only.
>>
>> --

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-26  0:07                                             ` H.J. Lu
@ 2012-05-26  0:16                                               ` Sriraman Tallam
  2012-05-26  0:27                                                 ` H.J. Lu
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-05-26  0:16 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

Hi H.J.,

On Fri, May 25, 2012 at 5:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi H.J,
>>
>>   Attaching new patch with 2 test cases, mv2.C checks ISAs only and
>> mv1.C checks ISAs and arches mixed. Right now, checking only arches is
>> not needed as they are mutually exclusive, any order should be fine.
>>
>> Patch also available for review here:  http://codereview.appspot.com/5752064
>
> Sorry for the delay.  It looks OK except for the function order in tescases.
> I think you should rearrange them so that they are not in the same order
> as the priority.

I am not sure I understand. The function order is mixed up in the
declarations, I have explicitly commented about this. I only do the
checking in order which I must, right?


Thanks,
-Sri.

>
> Thanks.
>
> H.J.
>> Thanks,
>> -Sri.
>>
>> On Sat, May 12, 2012 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> Hi H.J.,
>>>>
>>>>   I have updated the patch to improve the dispatching method like we
>>>> discussed. Each feature gets a priority now, and the dispatching is
>>>> done in priority order. Please see i386.c for the changes.
>>>>
>>>> Patch also available for review here:  http://codereview.appspot.com/5752064
>>>>
>>>
>>> I think you need 3 tests:
>>>
>>> 1.  Only with ISA.
>>> 2.  Only with arch
>>> 3.  Mixed with ISA and arch
>>>
>>> since test mixed ISA and arch may hide issues with ISA only or arch only.
>>>
>>> --

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-26  0:16                                               ` Sriraman Tallam
@ 2012-05-26  0:27                                                 ` H.J. Lu
  2012-05-26  1:54                                                   ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: H.J. Lu @ 2012-05-26  0:27 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

On Fri, May 25, 2012 at 5:16 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi H.J.,
>
> On Fri, May 25, 2012 at 5:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Hi H.J,
>>>
>>>   Attaching new patch with 2 test cases, mv2.C checks ISAs only and
>>> mv1.C checks ISAs and arches mixed. Right now, checking only arches is
>>> not needed as they are mutually exclusive, any order should be fine.
>>>
>>> Patch also available for review here:  http://codereview.appspot.com/5752064
>>
>> Sorry for the delay.  It looks OK except for the function order in tescases.
>> I think you should rearrange them so that they are not in the same order
>> as the priority.
>
> I am not sure I understand. The function order is mixed up in the
> declarations, I have explicitly commented about this. I only do the
> checking in order which I must, right?
>
>

gcc/testsuite/g++.dg/mv2.C has

int __attribute__ ((target("avx2")))
foo ()
{
  return 1;
}

int __attribute__ ((target("avx")))
foo ()
{
  return 2;
}

int __attribute__ ((target("popcnt")))
foo ()
{
  return 3;
}

int __attribute__ ((target("sse4.2")))
foo ()
{
  return 4;
}

int __attribute__ ((target("sse4.1")))
foo ()
{
  return 5;
}

int __attribute__ ((target("ssse3")))
foo ()
{
  return 6;
}

int __attribute__ ((target("sse3")))
foo ()
{
  return 7;
}

int __attribute__ ((target("sse2")))
foo ()
{
  return 8;
}

int __attribute__ ((target("sse")))
foo ()
{
  return 9;
}

int __attribute__ ((target("mmx")))
foo ()
{
  return 10;
}

It is most in the priority order.

BTW, I noticed:

[hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model
    20: 0000000000000010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
[hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model
    82: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24
__cpu_model@@GCC_4.8.0
   310: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
[hjl@gnu-6 pr14170]$

Why is __cpu_model in both libgcc.a and libgcc_s.o?


H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-26  0:27                                                 ` H.J. Lu
@ 2012-05-26  1:54                                                   ` Sriraman Tallam
       [not found]                                                     ` <CAMe9rOowm9K7r1xnRdRjW5Y4Ay+WxgSsBLTgGvq24z=i42AS+g@mail.gmail.com>
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-05-26  1:54 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

On Fri, May 25, 2012 at 5:27 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, May 25, 2012 at 5:16 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi H.J.,
>>
>> On Fri, May 25, 2012 at 5:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> Hi H.J,
>>>>
>>>>   Attaching new patch with 2 test cases, mv2.C checks ISAs only and
>>>> mv1.C checks ISAs and arches mixed. Right now, checking only arches is
>>>> not needed as they are mutually exclusive, any order should be fine.
>>>>
>>>> Patch also available for review here:  http://codereview.appspot.com/5752064
>>>
>>> Sorry for the delay.  It looks OK except for the function order in tescases.
>>> I think you should rearrange them so that they are not in the same order
>>> as the priority.
>>
>> I am not sure I understand. The function order is mixed up in the
>> declarations, I have explicitly commented about this. I only do the
>> checking in order which I must, right?
>>
>>
>
> gcc/testsuite/g++.dg/mv2.C has
>
> int __attribute__ ((target("avx2")))
> foo ()
> {
>  return 1;
> }
>
> int __attribute__ ((target("avx")))
> foo ()
> {
>  return 2;
> }
>
> int __attribute__ ((target("popcnt")))
> foo ()
> {
>  return 3;
> }
>
> int __attribute__ ((target("sse4.2")))
> foo ()
> {
>  return 4;
> }
>
> int __attribute__ ((target("sse4.1")))
> foo ()
> {
>  return 5;
> }
>
> int __attribute__ ((target("ssse3")))
> foo ()
> {
>  return 6;
> }
>
> int __attribute__ ((target("sse3")))
> foo ()
> {
>  return 7;
> }
>
> int __attribute__ ((target("sse2")))
> foo ()
> {
>  return 8;
> }
>
> int __attribute__ ((target("sse")))
> foo ()
> {
>  return 9;
> }
>
> int __attribute__ ((target("mmx")))
> foo ()
> {
>  return 10;
> }
>
> It is most in the priority order.

Ah! ok, got it. I kept it that way because it is really the order of
the declarations before the call that matter but I will rearrange the
definitions too to be clear.

>
> BTW, I noticed:
>
> [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model
>    20: 0000000000000010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
> [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model
>    82: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24
> __cpu_model@@GCC_4.8.0
>   310: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
> [hjl@gnu-6 pr14170]$
>
> Why is __cpu_model in both libgcc.a and libgcc_s.o?

How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is
wrong but I cannot figure out the fix.

Thanks,
-Sri.

>
>
> H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
       [not found]                                                       ` <CAAs8HmzeQigcLQyfkC02u=6gCTLkjLLa_jYmp+b1HEtpMCrYWw@mail.gmail.com>
@ 2012-05-26  5:06                                                         ` H.J. Lu
  2012-05-26 22:35                                                           ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: H.J. Lu @ 2012-05-26  5:06 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Xinliang David Li, Richard Guenther, Jan Hubicka, Uros Bizjak,
	reply, gcc-patches

On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>
> On May 25, 2012 7:15 PM, "H.J. Lu" <hjl.tools@gmail.com> wrote:
>>
>>
>> On May 25, 2012 6:54 PM, "Sriraman Tallam" <tmsriram@google.com> wrote:
>> >
>> >
>> > >>
>> > >> On Fri, May 25, 2012 at 5:0 > > BTW, I noticed:
>>
>> > >
>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model
>> > >    20: 0000000000000010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model
>> > >    82: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24
>> > > __cpu_model@@GCC_4.8.0
>> > >   310: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
>> > > [hjl@gnu-6 pr14170]$
>> > >
>> > > Why is __cpu_model in both libgcc.a and libgcc_s.o?
>> >
>> > How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is
>> > wrong but I cannot figure out the fix.
>> >
>> Why don't you want it in libgcc_s.so?
>
> I thought libgcc.a is always linked in for static and dynamic builds. So
> having it in libgcc_s.so is redundant.
>

[hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_
    20: 0000000000000010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
    21: 0000000000000110   612 FUNC    GLOBAL HIDDEN     4 __cpu_indicator_init
[hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_
    82: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24
__cpu_model@@GCC_4.8.0
   223: 0000000000002b60   560 FUNC    LOCAL  DEFAULT   11 __cpu_indicator_init
   310: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
[hjl@gnu-6 pr14170]$

I think there should be only one copy of __cpu_model in the process.
It should be in libgcc_s.so. Why isn't  __cpu_indicator_init exported
from libgcc_s.so?

-- 
H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-26  5:06                                                         ` H.J. Lu
@ 2012-05-26 22:35                                                           ` Sriraman Tallam
  2012-05-26 23:56                                                             ` H.J. Lu
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-05-26 22:35 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Xinliang David Li, Richard Guenther, Jan Hubicka, Uros Bizjak,
	reply, gcc-patches, Ian Lance Taylor

On Fri, May 25, 2012 at 10:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>
>> On May 25, 2012 7:15 PM, "H.J. Lu" <hjl.tools@gmail.com> wrote:
>>>
>>>
>>> On May 25, 2012 6:54 PM, "Sriraman Tallam" <tmsriram@google.com> wrote:
>>> >
>>> >
>>> > >>
>>> > >> On Fri, May 25, 2012 at 5:0 > > BTW, I noticed:
>>>
>>> > >
>>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model
>>> > >    20: 0000000000000010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
>>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model
>>> > >    82: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24
>>> > > __cpu_model@@GCC_4.8.0
>>> > >   310: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
>>> > > [hjl@gnu-6 pr14170]$
>>> > >
>>> > > Why is __cpu_model in both libgcc.a and libgcc_s.o?
>>> >
>>> > How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is
>>> > wrong but I cannot figure out the fix.
>>> >
>>> Why don't you want it in libgcc_s.so?
>>
>> I thought libgcc.a is always linked in for static and dynamic builds. So
>> having it in libgcc_s.so is redundant.
>>
>
> [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_
>    20: 0000000000000010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
>    21: 0000000000000110   612 FUNC    GLOBAL HIDDEN     4 __cpu_indicator_init
> [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_
>    82: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24
> __cpu_model@@GCC_4.8.0
>   223: 0000000000002b60   560 FUNC    LOCAL  DEFAULT   11 __cpu_indicator_init
>   310: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
> [hjl@gnu-6 pr14170]$
>
> I think there should be only one copy of __cpu_model in the process.
> It should be in libgcc_s.so. Why isn't  __cpu_indicator_init exported
> from libgcc_s.so?

Ok, I am elaborating so that I understand the issue clearly.

The dynamic symbol table of libgcc_s.so:

$ objdump -T libgcc_s.so | grep __cpu

0000000000015fd0 g    DO .bss	0000000000000010  GCC_4.8.0   __cpu_model

It only has __cpu_model, not __cpu_indicator_init just like you
pointed out. I will fix this by adding a versioned symbol of
__cpu_indicator_init to the *.ver files.

Do you see any other issues here? I dont get the duplicate entries
part you are referring to. The static symbol table also contains
references to __cpu_model and __cpu_indicator_init, but that is
expected right?

In libgcc.a:

readelf -sWt /g/tmsriram/GCC_trunk_svn_mv_fe_at_nfs/native_builds/bld1/install/lib/gcc/x86_64-unknown-linux-gnu/libgcc.a
| grep __cpu

   20: 0000000000000010    16 OBJECT  GLOBAL HIDDEN  COM __cpu_model
    21: 0000000000000110   612 FUNC    GLOBAL HIDDEN    4 __cpu_indicator_init

libgcc.a has __cpu_model and __cpu_indicator_init as GLOBAL syms with
HIDDEN visibility. Is this an issue? Is this not needed for static
linking?

Further thoughts:

* It looks like libgcc.a is always linked for both static and dynamic
links. It occurred to me when you brought this up. So, I thought why
not exclude the symbols from libgcc_s.so! Is there any problem here?

Example:

file:test.c

int main ()
{
  return (int) __builtin_cpu_is ("corei7");
}

Case I : Use gcc to build dynamic

$ gcc test.c -Wl,-y,__cpu_model

libgcc.a(cpuinfo.o): reference to __cpu_model
libgcc_s.so: definition of __cpu_model

Case II: Use g++ to build dynamic

$ g++ test.c -Wl,-y,__cpu_model
fe1.o: reference to __cpu_model
libgcc_s.so: definition of __cpu_model

Case III: Use gcc to link static

$ gcc test.c -Wl,-y,__cpu_model -static
fe1.o: reference to __cpu_model
libgcc.a(cpuinfo.o): reference to __cpu_model


Please note that in all 3 cases, libgcc.a was linked in. Hence,
removing these symbols from the dynamic symbol table of libgcc_s.so
should have no issues.

Thanks,
-Sri.







>
> --
> H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-26 22:35                                                           ` Sriraman Tallam
@ 2012-05-26 23:56                                                             ` H.J. Lu
  2012-05-27  0:24                                                               ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: H.J. Lu @ 2012-05-26 23:56 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Xinliang David Li, Richard Guenther, Jan Hubicka, Uros Bizjak,
	reply, gcc-patches, Ian Lance Taylor

On Sat, May 26, 2012 at 3:34 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Fri, May 25, 2012 at 10:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>
>>> On May 25, 2012 7:15 PM, "H.J. Lu" <hjl.tools@gmail.com> wrote:
>>>>
>>>>
>>>> On May 25, 2012 6:54 PM, "Sriraman Tallam" <tmsriram@google.com> wrote:
>>>> >
>>>> >
>>>> > >>
>>>> > >> On Fri, May 25, 2012 at 5:0 > > BTW, I noticed:
>>>>
>>>> > >
>>>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model
>>>> > >    20: 0000000000000010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
>>>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model
>>>> > >    82: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24
>>>> > > __cpu_model@@GCC_4.8.0
>>>> > >   310: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
>>>> > > [hjl@gnu-6 pr14170]$
>>>> > >
>>>> > > Why is __cpu_model in both libgcc.a and libgcc_s.o?
>>>> >
>>>> > How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is
>>>> > wrong but I cannot figure out the fix.
>>>> >
>>>> Why don't you want it in libgcc_s.so?
>>>
>>> I thought libgcc.a is always linked in for static and dynamic builds. So
>>> having it in libgcc_s.so is redundant.
>>>
>>
>> [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_
>>    20: 0000000000000010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
>>    21: 0000000000000110   612 FUNC    GLOBAL HIDDEN     4 __cpu_indicator_init
>> [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_
>>    82: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24
>> __cpu_model@@GCC_4.8.0
>>   223: 0000000000002b60   560 FUNC    LOCAL  DEFAULT   11 __cpu_indicator_init
>>   310: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
>> [hjl@gnu-6 pr14170]$
>>
>> I think there should be only one copy of __cpu_model in the process.
>> It should be in libgcc_s.so. Why isn't  __cpu_indicator_init exported
>> from libgcc_s.so?
>
> Ok, I am elaborating so that I understand the issue clearly.
>
> The dynamic symbol table of libgcc_s.so:
>
> $ objdump -T libgcc_s.so | grep __cpu
>
> 0000000000015fd0 g    DO .bss   0000000000000010  GCC_4.8.0   __cpu_model
>
> It only has __cpu_model, not __cpu_indicator_init just like you
> pointed out. I will fix this by adding a versioned symbol of
> __cpu_indicator_init to the *.ver files.

That will be great.

> Do you see any other issues here? I dont get the duplicate entries
> part you are referring to. The static symbol table also contains
> references to __cpu_model and __cpu_indicator_init, but that is
> expected right?

Duplication comes from static and dynamic symbol tables.

> In libgcc.a:
>
> readelf -sWt /g/tmsriram/GCC_trunk_svn_mv_fe_at_nfs/native_builds/bld1/install/lib/gcc/x86_64-unknown-linux-gnu/libgcc.a
> | grep __cpu
>
>   20: 0000000000000010    16 OBJECT  GLOBAL HIDDEN  COM __cpu_model
>    21: 0000000000000110   612 FUNC    GLOBAL HIDDEN    4 __cpu_indicator_init
>
> libgcc.a has __cpu_model and __cpu_indicator_init as GLOBAL syms with
> HIDDEN visibility. Is this an issue? Is this not needed for static
> linking?
>
> Further thoughts:
>
> * It looks like libgcc.a is always linked for both static and dynamic
> links. It occurred to me when you brought this up. So, I thought why
> not exclude the symbols from libgcc_s.so! Is there any problem here?
>

You don't want one copy of those 2 symbols in each DSO where
they are used.

-- 
H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-26 23:56                                                             ` H.J. Lu
@ 2012-05-27  0:24                                                               ` Sriraman Tallam
  2012-05-27  2:06                                                                 ` H.J. Lu
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-05-27  0:24 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Xinliang David Li, Richard Guenther, Jan Hubicka, Uros Bizjak,
	reply, gcc-patches, Ian Lance Taylor

On Sat, May 26, 2012 at 4:56 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Sat, May 26, 2012 at 3:34 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Fri, May 25, 2012 at 10:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>
>>>> On May 25, 2012 7:15 PM, "H.J. Lu" <hjl.tools@gmail.com> wrote:
>>>>>
>>>>>
>>>>> On May 25, 2012 6:54 PM, "Sriraman Tallam" <tmsriram@google.com> wrote:
>>>>> >
>>>>> >
>>>>> > >>
>>>>> > >> On Fri, May 25, 2012 at 5:0 > > BTW, I noticed:
>>>>>
>>>>> > >
>>>>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model
>>>>> > >    20: 0000000000000010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
>>>>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model
>>>>> > >    82: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24
>>>>> > > __cpu_model@@GCC_4.8.0
>>>>> > >   310: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
>>>>> > > [hjl@gnu-6 pr14170]$
>>>>> > >
>>>>> > > Why is __cpu_model in both libgcc.a and libgcc_s.o?
>>>>> >
>>>>> > How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is
>>>>> > wrong but I cannot figure out the fix.
>>>>> >
>>>>> Why don't you want it in libgcc_s.so?
>>>>
>>>> I thought libgcc.a is always linked in for static and dynamic builds. So
>>>> having it in libgcc_s.so is redundant.
>>>>
>>>
>>> [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_
>>>    20: 0000000000000010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
>>>    21: 0000000000000110   612 FUNC    GLOBAL HIDDEN     4 __cpu_indicator_init
>>> [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_
>>>    82: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24
>>> __cpu_model@@GCC_4.8.0
>>>   223: 0000000000002b60   560 FUNC    LOCAL  DEFAULT   11 __cpu_indicator_init
>>>   310: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
>>> [hjl@gnu-6 pr14170]$
>>>
>>> I think there should be only one copy of __cpu_model in the process.
>>> It should be in libgcc_s.so. Why isn't  __cpu_indicator_init exported
>>> from libgcc_s.so?
>>
>> Ok, I am elaborating so that I understand the issue clearly.
>>
>> The dynamic symbol table of libgcc_s.so:
>>
>> $ objdump -T libgcc_s.so | grep __cpu
>>
>> 0000000000015fd0 g    DO .bss   0000000000000010  GCC_4.8.0   __cpu_model
>>
>> It only has __cpu_model, not __cpu_indicator_init just like you
>> pointed out. I will fix this by adding a versioned symbol of
>> __cpu_indicator_init to the *.ver files.
>
> That will be great.
>
>> Do you see any other issues here? I dont get the duplicate entries
>> part you are referring to. The static symbol table also contains
>> references to __cpu_model and __cpu_indicator_init, but that is
>> expected right?
>
> Duplication comes from static and dynamic symbol tables.
>
>> In libgcc.a:
>>
>> readelf -sWt /g/tmsriram/GCC_trunk_svn_mv_fe_at_nfs/native_builds/bld1/install/lib/gcc/x86_64-unknown-linux-gnu/libgcc.a
>> | grep __cpu
>>
>>   20: 0000000000000010    16 OBJECT  GLOBAL HIDDEN  COM __cpu_model
>>    21: 0000000000000110   612 FUNC    GLOBAL HIDDEN    4 __cpu_indicator_init
>>
>> libgcc.a has __cpu_model and __cpu_indicator_init as GLOBAL syms with
>> HIDDEN visibility. Is this an issue? Is this not needed for static
>> linking?
>>
>> Further thoughts:
>>
>> * It looks like libgcc.a is always linked for both static and dynamic
>> links. It occurred to me when you brought this up. So, I thought why
>> not exclude the symbols from libgcc_s.so! Is there any problem here?
>>
>
> You don't want one copy of those 2 symbols in each DSO where
> they are used.

Right, I agree. But this problem exists right now even if libgcc_s.so
is provided with these symbols. Please see example below:

Example:

dso.c
-------

int some_func ()
{
   return (int) __builtin_cpu_is ("corei7");
}

Build with gcc driver:
$ gcc dso.c -fPIC -shared -o dso.so
$ nm dso.so | grep __cpu
0000000000000780 t __cpu_indicator_init
0000000000001e00 b __cpu_model

This DSO is getting its own local copy of __cpu_model. This is fine
functionally but this is not the behaviour you have in mind.

whereas, if I build with g++ driver:

$ g++ dso.c -fPIC -shared dso.so
$ nm dso.so | grep __cpu
                 U __cpu_model

This is as we would like, __cpu_model is undefined.

The difference is that with the gcc driver, the link line is -lgcc
-lgcc_s, whereas with the g++ driver -lgcc is not even present!

Should I fix the gcc driver instead? This double-standard is not clear to me.

Thanks,
-Sri.










>
> --
> H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-27  0:24                                                               ` Sriraman Tallam
@ 2012-05-27  2:06                                                                 ` H.J. Lu
  2012-05-27  2:23                                                                   ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: H.J. Lu @ 2012-05-27  2:06 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Xinliang David Li, Richard Guenther, Jan Hubicka, Uros Bizjak,
	reply, gcc-patches, Ian Lance Taylor

On Sat, May 26, 2012 at 5:23 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Sat, May 26, 2012 at 4:56 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Sat, May 26, 2012 at 3:34 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> On Fri, May 25, 2012 at 10:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>
>>>>> On May 25, 2012 7:15 PM, "H.J. Lu" <hjl.tools@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>> On May 25, 2012 6:54 PM, "Sriraman Tallam" <tmsriram@google.com> wrote:
>>>>>> >
>>>>>> >
>>>>>> > >>
>>>>>> > >> On Fri, May 25, 2012 at 5:0 > > BTW, I noticed:
>>>>>>
>>>>>> > >
>>>>>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model
>>>>>> > >    20: 0000000000000010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
>>>>>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model
>>>>>> > >    82: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24
>>>>>> > > __cpu_model@@GCC_4.8.0
>>>>>> > >   310: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
>>>>>> > > [hjl@gnu-6 pr14170]$
>>>>>> > >
>>>>>> > > Why is __cpu_model in both libgcc.a and libgcc_s.o?
>>>>>> >
>>>>>> > How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is
>>>>>> > wrong but I cannot figure out the fix.
>>>>>> >
>>>>>> Why don't you want it in libgcc_s.so?
>>>>>
>>>>> I thought libgcc.a is always linked in for static and dynamic builds. So
>>>>> having it in libgcc_s.so is redundant.
>>>>>
>>>>
>>>> [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_
>>>>    20: 0000000000000010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
>>>>    21: 0000000000000110   612 FUNC    GLOBAL HIDDEN     4 __cpu_indicator_init
>>>> [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_
>>>>    82: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24
>>>> __cpu_model@@GCC_4.8.0
>>>>   223: 0000000000002b60   560 FUNC    LOCAL  DEFAULT   11 __cpu_indicator_init
>>>>   310: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
>>>> [hjl@gnu-6 pr14170]$
>>>>
>>>> I think there should be only one copy of __cpu_model in the process.
>>>> It should be in libgcc_s.so. Why isn't  __cpu_indicator_init exported
>>>> from libgcc_s.so?
>>>
>>> Ok, I am elaborating so that I understand the issue clearly.
>>>
>>> The dynamic symbol table of libgcc_s.so:
>>>
>>> $ objdump -T libgcc_s.so | grep __cpu
>>>
>>> 0000000000015fd0 g    DO .bss   0000000000000010  GCC_4.8.0   __cpu_model
>>>
>>> It only has __cpu_model, not __cpu_indicator_init just like you
>>> pointed out. I will fix this by adding a versioned symbol of
>>> __cpu_indicator_init to the *.ver files.
>>
>> That will be great.
>>
>>> Do you see any other issues here? I dont get the duplicate entries
>>> part you are referring to. The static symbol table also contains
>>> references to __cpu_model and __cpu_indicator_init, but that is
>>> expected right?
>>
>> Duplication comes from static and dynamic symbol tables.
>>
>>> In libgcc.a:
>>>
>>> readelf -sWt /g/tmsriram/GCC_trunk_svn_mv_fe_at_nfs/native_builds/bld1/install/lib/gcc/x86_64-unknown-linux-gnu/libgcc.a
>>> | grep __cpu
>>>
>>>   20: 0000000000000010    16 OBJECT  GLOBAL HIDDEN  COM __cpu_model
>>>    21: 0000000000000110   612 FUNC    GLOBAL HIDDEN    4 __cpu_indicator_init
>>>
>>> libgcc.a has __cpu_model and __cpu_indicator_init as GLOBAL syms with
>>> HIDDEN visibility. Is this an issue? Is this not needed for static
>>> linking?
>>>
>>> Further thoughts:
>>>
>>> * It looks like libgcc.a is always linked for both static and dynamic
>>> links. It occurred to me when you brought this up. So, I thought why
>>> not exclude the symbols from libgcc_s.so! Is there any problem here?
>>>
>>
>> You don't want one copy of those 2 symbols in each DSO where
>> they are used.
>
> Right, I agree. But this problem exists right now even if libgcc_s.so
> is provided with these symbols. Please see example below:
>
> Example:
>
> dso.c
> -------
>
> int some_func ()
> {
>   return (int) __builtin_cpu_is ("corei7");
> }
>
> Build with gcc driver:
> $ gcc dso.c -fPIC -shared -o dso.so
> $ nm dso.so | grep __cpu
> 0000000000000780 t __cpu_indicator_init
> 0000000000001e00 b __cpu_model
>
> This DSO is getting its own local copy of __cpu_model. This is fine
> functionally but this is not the behaviour you have in mind.
>
> whereas, if I build with g++ driver:
>
> $ g++ dso.c -fPIC -shared dso.so
> $ nm dso.so | grep __cpu
>                 U __cpu_model
>
> This is as we would like, __cpu_model is undefined.
>
> The difference is that with the gcc driver, the link line is -lgcc
> -lgcc_s, whereas with the g++ driver -lgcc is not even present!
>
> Should I fix the gcc driver instead? This double-standard is not clear to me.
>

That is because libgcc_s.so is preferred by g++. We can do one
of 3 things:

1. Abuse libgcc_eh.a by moving __cpu_model and __cpu_indicator_init
from libgcc.a to libgcc_eh.a.
2. Rename libgcc_eh.a to libgcc_static.a and move __cpu_model and
__cpu_indicator_init from libgcc.a to libgcc_static.a.
3. Add  libgcc_static.a and move __cpu_model and __cpu_indicator_ini
 from libgcc.a to libgcc_static.a.  We treat libgcc_static.a similar to
libgcc_eh.a.


-- 
H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-27  2:06                                                                 ` H.J. Lu
@ 2012-05-27  2:23                                                                   ` Sriraman Tallam
  2012-05-27  2:31                                                                     ` H.J. Lu
  2012-05-27 19:02                                                                     ` Ian Lance Taylor
  0 siblings, 2 replies; 93+ messages in thread
From: Sriraman Tallam @ 2012-05-27  2:23 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Xinliang David Li, Richard Guenther, Jan Hubicka, Uros Bizjak,
	reply, gcc-patches, Ian Lance Taylor

On Sat, May 26, 2012 at 7:06 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Sat, May 26, 2012 at 5:23 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Sat, May 26, 2012 at 4:56 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Sat, May 26, 2012 at 3:34 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> On Fri, May 25, 2012 at 10:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>>
>>>>>> On May 25, 2012 7:15 PM, "H.J. Lu" <hjl.tools@gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On May 25, 2012 6:54 PM, "Sriraman Tallam" <tmsriram@google.com> wrote:
>>>>>>> >
>>>>>>> >
>>>>>>> > >>
>>>>>>> > >> On Fri, May 25, 2012 at 5:0 > > BTW, I noticed:
>>>>>>>
>>>>>>> > >
>>>>>>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model
>>>>>>> > >    20: 0000000000000010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
>>>>>>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model
>>>>>>> > >    82: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24
>>>>>>> > > __cpu_model@@GCC_4.8.0
>>>>>>> > >   310: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
>>>>>>> > > [hjl@gnu-6 pr14170]$
>>>>>>> > >
>>>>>>> > > Why is __cpu_model in both libgcc.a and libgcc_s.o?
>>>>>>> >
>>>>>>> > How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is
>>>>>>> > wrong but I cannot figure out the fix.
>>>>>>> >
>>>>>>> Why don't you want it in libgcc_s.so?
>>>>>>
>>>>>> I thought libgcc.a is always linked in for static and dynamic builds. So
>>>>>> having it in libgcc_s.so is redundant.
>>>>>>
>>>>>
>>>>> [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_
>>>>>    20: 0000000000000010    16 OBJECT  GLOBAL HIDDEN   COM __cpu_model
>>>>>    21: 0000000000000110   612 FUNC    GLOBAL HIDDEN     4 __cpu_indicator_init
>>>>> [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_
>>>>>    82: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24
>>>>> __cpu_model@@GCC_4.8.0
>>>>>   223: 0000000000002b60   560 FUNC    LOCAL  DEFAULT   11 __cpu_indicator_init
>>>>>   310: 0000000000214ff0    16 OBJECT  GLOBAL DEFAULT   24 __cpu_model
>>>>> [hjl@gnu-6 pr14170]$
>>>>>
>>>>> I think there should be only one copy of __cpu_model in the process.
>>>>> It should be in libgcc_s.so. Why isn't  __cpu_indicator_init exported
>>>>> from libgcc_s.so?
>>>>
>>>> Ok, I am elaborating so that I understand the issue clearly.
>>>>
>>>> The dynamic symbol table of libgcc_s.so:
>>>>
>>>> $ objdump -T libgcc_s.so | grep __cpu
>>>>
>>>> 0000000000015fd0 g    DO .bss   0000000000000010  GCC_4.8.0   __cpu_model
>>>>
>>>> It only has __cpu_model, not __cpu_indicator_init just like you
>>>> pointed out. I will fix this by adding a versioned symbol of
>>>> __cpu_indicator_init to the *.ver files.
>>>
>>> That will be great.
>>>
>>>> Do you see any other issues here? I dont get the duplicate entries
>>>> part you are referring to. The static symbol table also contains
>>>> references to __cpu_model and __cpu_indicator_init, but that is
>>>> expected right?
>>>
>>> Duplication comes from static and dynamic symbol tables.
>>>
>>>> In libgcc.a:
>>>>
>>>> readelf -sWt /g/tmsriram/GCC_trunk_svn_mv_fe_at_nfs/native_builds/bld1/install/lib/gcc/x86_64-unknown-linux-gnu/libgcc.a
>>>> | grep __cpu
>>>>
>>>>   20: 0000000000000010    16 OBJECT  GLOBAL HIDDEN  COM __cpu_model
>>>>    21: 0000000000000110   612 FUNC    GLOBAL HIDDEN    4 __cpu_indicator_init
>>>>
>>>> libgcc.a has __cpu_model and __cpu_indicator_init as GLOBAL syms with
>>>> HIDDEN visibility. Is this an issue? Is this not needed for static
>>>> linking?
>>>>
>>>> Further thoughts:
>>>>
>>>> * It looks like libgcc.a is always linked for both static and dynamic
>>>> links. It occurred to me when you brought this up. So, I thought why
>>>> not exclude the symbols from libgcc_s.so! Is there any problem here?
>>>>
>>>
>>> You don't want one copy of those 2 symbols in each DSO where
>>> they are used.
>>
>> Right, I agree. But this problem exists right now even if libgcc_s.so
>> is provided with these symbols. Please see example below:
>>
>> Example:
>>
>> dso.c
>> -------
>>
>> int some_func ()
>> {
>>   return (int) __builtin_cpu_is ("corei7");
>> }
>>
>> Build with gcc driver:
>> $ gcc dso.c -fPIC -shared -o dso.so
>> $ nm dso.so | grep __cpu
>> 0000000000000780 t __cpu_indicator_init
>> 0000000000001e00 b __cpu_model
>>
>> This DSO is getting its own local copy of __cpu_model. This is fine
>> functionally but this is not the behaviour you have in mind.
>>
>> whereas, if I build with g++ driver:
>>
>> $ g++ dso.c -fPIC -shared dso.so
>> $ nm dso.so | grep __cpu
>>                 U __cpu_model
>>
>> This is as we would like, __cpu_model is undefined.
>>
>> The difference is that with the gcc driver, the link line is -lgcc
>> -lgcc_s, whereas with the g++ driver -lgcc is not even present!
>>
>> Should I fix the gcc driver instead? This double-standard is not clear to me.
>>
>
> That is because libgcc_s.so is preferred by g++. We can do one
> of 3 things:
>
> 1. Abuse libgcc_eh.a by moving __cpu_model and __cpu_indicator_init
> from libgcc.a to libgcc_eh.a.
> 2. Rename libgcc_eh.a to libgcc_static.a and move __cpu_model and
> __cpu_indicator_init from libgcc.a to libgcc_static.a.
> 3. Add  libgcc_static.a and move __cpu_model and __cpu_indicator_ini
>  from libgcc.a to libgcc_static.a.  We treat libgcc_static.a similar to
> libgcc_eh.a.

Any reason why gcc should not be made to prefer libgcc_s.so too like g++?

Thanks for clearing this up. I will take a stab at it.

-Sri.

>
>
> --
> H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-27  2:23                                                                   ` Sriraman Tallam
@ 2012-05-27  2:31                                                                     ` H.J. Lu
  2012-05-27 19:02                                                                     ` Ian Lance Taylor
  1 sibling, 0 replies; 93+ messages in thread
From: H.J. Lu @ 2012-05-27  2:31 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Xinliang David Li, Richard Guenther, Jan Hubicka, Uros Bizjak,
	reply, gcc-patches, Ian Lance Taylor

On Sat, May 26, 2012 at 7:23 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>
>> That is because libgcc_s.so is preferred by g++. We can do one
>> of 3 things:
>>
>> 1. Abuse libgcc_eh.a by moving __cpu_model and __cpu_indicator_init
>> from libgcc.a to libgcc_eh.a.
>> 2. Rename libgcc_eh.a to libgcc_static.a and move __cpu_model and
>> __cpu_indicator_init from libgcc.a to libgcc_static.a.
>> 3. Add  libgcc_static.a and move __cpu_model and __cpu_indicator_ini
>>  from libgcc.a to libgcc_static.a.  We treat libgcc_static.a similar to
>> libgcc_eh.a.
>
> Any reason why gcc should not be made to prefer libgcc_s.so too like g++?
>
> Thanks for clearing this up. I will take a stab at it.
>

This is a long story.  The short answer is people didn't want
to add libgcc_s.so to DT_NEEDED for C programs.  But
it is no longer an issue since we now pass

 -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed
-lgcc_s --no-as-needed

to linker.


-- 
H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-27  2:23                                                                   ` Sriraman Tallam
  2012-05-27  2:31                                                                     ` H.J. Lu
@ 2012-05-27 19:02                                                                     ` Ian Lance Taylor
  1 sibling, 0 replies; 93+ messages in thread
From: Ian Lance Taylor @ 2012-05-27 19:02 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: H.J. Lu, Xinliang David Li, Richard Guenther, Jan Hubicka,
	Uros Bizjak, reply, gcc-patches

Sriraman Tallam <tmsriram@google.com> writes:

> Any reason why gcc should not be made to prefer libgcc_s.so too like g++?

It's controlled by the -shared-libgcc and -static-libgcc options.

The -shared-libgcc option is the default for g++ because several years
ago a shared libgcc was required to make exception handling work
correctly when exceptions were thrown across shared library boundaries.
That is no longer true when using GNU ld or gold on a GNU/Linux system,
but it is still true on some systems.

The -static-libgcc option is the default for gcc because the assumption
is that most C programs do not throw exceptions.  The -shared-libgcc
option is available for those that do.

Ian

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-05-14 18:29                                           ` Sriraman Tallam
  2012-05-26  0:07                                             ` H.J. Lu
@ 2012-06-04 19:01                                             ` Sriraman Tallam
  2012-06-04 21:36                                               ` H.J. Lu
  2012-06-14 20:35                                               ` Sriraman Tallam
  1 sibling, 2 replies; 93+ messages in thread
From: Sriraman Tallam @ 2012-06-04 19:01 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

[-- Attachment #1: Type: text/plain, Size: 2030 bytes --]

Hi,

   Attaching updated patch for function multiversioning which brings
in plenty of changes.

* As suggested by Richard earlier, I have made cgraph aware of
function versions. All nodes of function versions are chained and the
dispatcher bodies are created on demand while building cgraph edges.
The dispatcher body will be created if and only if there is a call or
reference to a versioned function. Previously, I was maintaining the
list of versions separately in a hash map, all that is gone now.
* Now, the file multiverison.c has some helper routines that are used
in the context of function versioning. There are no new passes and no
new globals.
* More tests, updated existing tests.
* Fixed lots of bugs.
* Updated patch description.

Patch attached. Patch also available for review at
http://codereview.appspot.com/5752064

Please let me know what you think,

Thanks,
-Sri.


On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi H.J,
>
>   Attaching new patch with 2 test cases, mv2.C checks ISAs only and
> mv1.C checks ISAs and arches mixed. Right now, checking only arches is
> not needed as they are mutually exclusive, any order should be fine.
>
> Patch also available for review here:  http://codereview.appspot.com/5752064
>
> Thanks,
> -Sri.
>
> On Sat, May 12, 2012 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Hi H.J.,
>>>
>>>   I have updated the patch to improve the dispatching method like we
>>> discussed. Each feature gets a priority now, and the dispatching is
>>> done in priority order. Please see i386.c for the changes.
>>>
>>> Patch also available for review here:  http://codereview.appspot.com/5752064
>>>
>>
>> I think you need 3 tests:
>>
>> 1.  Only with ISA.
>> 2.  Only with arch
>> 3.  Mixed with ISA and arch
>>
>> since test mixed ISA and arch may hide issues with ISA only or arch only.
>>
>> --
>> H.J.

[-- Attachment #2: mv_fe_patch.txt --]
[-- Type: text/plain, Size: 74230 bytes --]


Overview of the patch which adds front-end support to specify function versions.

Example:

int foo ();  /* Default version */
int foo () __attribute__ ((target("avx,popcnt")));/*Specialized for avx and popcnt */
int foo () __attribute__ ((target("arch=core2,ssse3")));/*Specialized for core2 and ssse3*/

int main ()
{
 int (*p)() = &foo;
 return foo () + (*p)();
}

int foo ()
{
 return 0;
}

int __attribute__ ((target("avx,popcnt")))
foo ()
{
 return 0;
}

int __attribute__ ((target("arch=core2,ssse3")))
foo ()
{
 return 0;
}

The above example has foo defined 3 times, but all 3 definitions of foo are
different versions of the same function. The call to foo in main, directly and
via a pointer, are calls to the multi-versioned function foo which is dispatched
to the right foo at run-time.

What does the patch do?

* Tracking decls that correspond to function versions of function
name, say "foo":

When the front-end sees more than one decl for "foo", with atleast one decl
tagged with "target"  attributes, it marks it as function versions. To
prevent duplicate definition errors with other versions of "foo",
"decls_match" function in cp/decl.c is made to return false when 2 decls have
the same signature but different target attributes. This will make all function
versions of "foo" to be added to the overload list of "foo".

* Change the assembler names of the function versions.

The front-end changes the assembler names of the function versions by suffixing
the sorted list of args to "target" to the function name of "foo". For example,
he assembler name of "void foo () __attribute__ ((target ("sse4")))" will
become _Z3foov.sse4.

* Overload resolution:

 Function "build_over_call" in cp/call.c sees a call to function
"foo", which is multi-versioned. The overload resolution happens in
function "joust" in "cp/call.c". Here, the call to "foo" has all
possible versions of "foo" as candidates. All the candidates of "foo" are
stored in the cgraph data structures. Each version of foo is chained in a 
doubly-linked list with the default function as the first element.  This allows
any pass to access all the semantically identical versions. Also, a dispatcher
decl is created which should be called and at run-time will dispatch the right
function version.

Also, in joust, where overload resolution happens, a multiversioned function
resolution is made to return the most specialized version.  This is the version
that will be checked for dispatching first and is determined by the target.
Now, if the caller can inline this function version then a direct call is made
to this function version rather than go through the dispatcher. When a direct
call cannot be made, a call to the dispatcher function is created.

* Creating the dispatcher body.

The dispatcher body, called the resolver is made only when there is a call to a
multiversioned function dispatcher or the address of a function is taken. This
is generated during build_cgraph_edges for a call or cgraph_mark_address_taken
for a pointer reference.

* Dispatch ordering.

The order in which the function versions are checked during dispatch is based
on a priority value assigned for the ISA that is catered. More specialized 
versions are checked for dispatching first.  This is to mitigate the ambiguity
that can arise when more than one function version is valid for execution on
a particular platform.  This is not a perfect solution and in future, the user
should be allowed to assign a dispatching priority value to each version.


	* doc/tm.texi.in (TARGET_DISPATCH_VERSION): New hook description.
	(TARGET_COMPARE_VERSIONS): New hook description.
	* doc/tm.texi: Regenerate.
	* cgraphbuild.c (build_cgraph_edges): Generate body of multiversion
	function dispatcher.
	* c-family/c-common.c (handle_target_attribute): Always keep target
	attributes tagged.
	* target.def (dispatch_version): New target hook.
	(compare_versions): New hook.
	* cgraph.c (cgraph_mark_address_taken_node): Generate body of multiversion
	function dispatcher.
	* cgraph.h (cgraph_node): New members dispatcher_fndecl, resolver_fndecl,
	prev_function_version, next_function_version, dispatcher_function.
	(is_default_function_version): New function.
	(mark_function_as_version): New function.
	(has_different_version_attributes): New function.
	(function_target_attribute): New function.
	(build_dispatcher_for_function_versions): New function.
	(build_resolver_for_function_versions): New function.
	* tree.h (DECL_FUNCTION_VERSIONED): New macro.
	(tree_function_decl): New bit-field versioned_function.
	* multiversion.c: New file.
	* testsuite/g++.dg/mv1.C: New test.
	* testsuite/g++.dg/mv2.C: New test.
	* testsuite/g++.dg/mv3.C: New test.
	* testsuite/g++.dg/mv4.C: New test.
	* cp/class.c:
	(add_method): Change assembler names of function versions.
	(resolve_address_of_overloaded_function): Save all function
	version candidates. Create dispatcher decl and return address of
	dispatcher instead.
	* cp/decl.c (decls_match): Make decls unmatched for versioned
	functions.
	(duplicate_decls): Remove ambiguity for versioned functions. 
	(cxx_comdat_group): Make comdat group of versioned functions be the
	same.
	* cp/error.c (dump_exception_spec): Dump assembler name for function
	versions.
	* cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
	functions that are also marked inline.
	* cp/decl2.c:(check_classfn): Check attributes of versioned functions
	for match.
	* cp/call.c: (build_new_function_call): Check if versioned functions
	have a default version.
	(build_over_call): Make calls to multiversioned functions
	to call the dispatcher.
	(joust): For calls to multi-versioned functions, make the most
	specialized function version win.
	(tourney): Generate dispatcher decl for function versions.
	* cp/mangle.c (write_unqualified_name): Use assembler name for
	versioned functions.
	* Makefile.in: Add multiversion.o
	* config/i386/i386.c (add_condition_to_bb): New function.
	(get_builtin_code_for_version): New function.
	(ix86_compare_versions): New function.
	(feature_compare): New function.
	(ix86_dispatch_version): New function.
	(TARGET_DISPATCH_VERSION): New macro.
	(TARGET_COMPARE_VERSION): New macro.

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 187817)
+++ gcc/doc/tm.texi	(working copy)
@@ -10997,6 +10997,21 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb})
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
+@deftypefn {Target Hook} int TARGET_COMPARE_VERSIONS (tree @var{decl1}, tree @var{decl2})
+This hook is used to compare the target attributes in two functions to
+figure out which function's features get higher priority.  This is used
+during multi-versioning to figure out the order of dispatching. @var{decl1}
+and @var{decl2} are the two function decls that will be compared.
+@end deftypefn
+
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 187817)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -10877,6 +10877,21 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@hook TARGET_DISPATCH_VERSION
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
+@hook TARGET_COMPARE_VERSIONS
+This hook is used to compare the target attributes in two functions to
+figure out which function's features get higher priority.  This is used
+during multi-versioning to figure out the order of dispatching. @var{decl1}
+and @var{decl2} are the two function decls that will be compared.
+@end deftypefn
+
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/cgraphbuild.c
===================================================================
--- gcc/cgraphbuild.c	(revision 187817)
+++ gcc/cgraphbuild.c	(working copy)
@@ -288,7 +288,6 @@ mark_store (gimple stmt, tree t, void *data)
      }
   return false;
 }
-
 /* Create cgraph edges for function calls.
    Also look for functions and variables having addresses taken.  */
 
@@ -316,6 +315,20 @@ build_cgraph_edges (void)
 	      int freq = compute_call_stmt_bb_frequency (current_function_decl,
 							 bb);
 	      decl = gimple_call_fndecl (stmt);
+	      /* If a call to a multiversioned function dispatcher is found,
+		 generate the body to dispatch the right function
+		 at run-time.  */
+	      if (decl && cgraph_get_node (decl)
+		  && cgraph_get_node (decl)->dispatcher_function)
+		{
+		  tree resolver_decl;
+		  struct cgraph_node *curr_node = cgraph_get_node (decl);
+		  gcc_assert (curr_node->next_function_version);
+		  resolver_decl
+		    = build_resolver_for_function_versions (curr_node);
+		  gcc_assert (resolver_decl);
+		}
+
 	      if (decl)
 		cgraph_create_edge (node, cgraph_get_create_node (decl),
 				    stmt, bb->count, freq);
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 187817)
+++ gcc/c-family/c-common.c	(working copy)
@@ -8246,9 +8246,15 @@ handle_target_attribute (tree *node, tree name, tr
       warning (OPT_Wattributes, "%qE attribute ignored", name);
       *no_add_attrs = true;
     }
-  else if (! targetm.target_option.valid_attribute_p (*node, name, args,
-						      flags))
-    *no_add_attrs = true;
+  else
+    {
+      /* When a target attribute is invalid, it may also be because the
+	 target for the compilation unit and the attribute match.  For
+         instance, target attribute "xxx" is invalid when -mxxx is used.
+         When used with multiversioning, removing the attribute can lead
+         to duplicate definitions.  So, keep the attribute tagged.  */
+      targetm.target_option.valid_attribute_p (*node, name, args, flags);
+    }
 
   return NULL_TREE;
 }
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 187817)
+++ gcc/target.def	(working copy)
@@ -1249,6 +1249,24 @@ DEFHOOK
  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
  hook_tree_tree_int_treep_bool_null)
 
+/* Target hook to generate the dispatching code for calls to multi-versioned
+   functions.  DISPATCH_DECL is the function that will have the dispatching
+   logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the
+   basic block in DISPATCH_DECL which will contain the code.  */
+DEFHOOK
+(dispatch_version,
+ "",
+ int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL)
+
+/* Target hook to compare the priority of target features in function DECL1
+   and DECL2.  It returns positive value if DECL1 is higher priority,
+   negative value if DECL2 is higher priority and 0 if they are the
+   same.  */
+DEFHOOK
+(compare_versions,
+ "",
+ int, (tree decl1, tree decl2), NULL)
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 DEFHOOK
Index: gcc/cgraph.c
===================================================================
--- gcc/cgraph.c	(revision 187817)
+++ gcc/cgraph.c	(working copy)
@@ -1278,6 +1278,14 @@ cgraph_mark_address_taken_node (struct cgraph_node
   node->symbol.address_taken = 1;
   node = cgraph_function_or_thunk_node (node, NULL);
   node->symbol.address_taken = 1;
+  /* If the address of a multiversioned function dispatcher is taken,
+     generate the body to dispatch the right function at run-time.  This
+     is needed as the address can be used to do an indirect call.  */
+  if (node->dispatcher_function)
+    {
+      gcc_assert (node->next_function_version);
+      build_resolver_for_function_versions (node);
+    }
 }
 
 /* Return local info for the compiled function.  */
Index: gcc/cgraph.h
===================================================================
--- gcc/cgraph.h	(revision 187817)
+++ gcc/cgraph.h	(working copy)
@@ -220,6 +220,19 @@ struct GTY(()) cgraph_node {
   struct cgraph_node *prev_sibling_clone;
   struct cgraph_node *clones;
   struct cgraph_node *clone_of;
+
+  /* If this node corresponds to a function version, this points
+     to the dispatcher function.  */
+  tree dispatcher_fndecl;
+  /* If this node is a dispatcher for function versions, this points
+     to resolver function.  */
+  tree resolver_fndecl;
+  /* Chains all the semantically identical function versions.  The
+     first function in this chain is the default function.  */
+  struct cgraph_node *prev_function_version;
+  /* If this node is a dispatcher for function versions, this also points
+     to the default function version.  */
+  struct cgraph_node *next_function_version;
   /* For functions with many calls sites it holds map from call expression
      to the edge to speed up cgraph_edge function.  */
   htab_t GTY((param_is (struct cgraph_edge))) call_site_hash;
@@ -271,6 +284,7 @@ struct GTY(()) cgraph_node {
   /* ?? We should be able to remove this.  We have enough bits in
      cgraph to calculate it.  */
   unsigned tm_clone : 1;
+  unsigned dispatcher_function : 1;
 };
 
 typedef struct cgraph_node *cgraph_node_ptr;
@@ -636,6 +650,22 @@ void cgraph_rebuild_references (void);
 int compute_call_stmt_bb_frequency (tree, basic_block bb);
 void record_references_in_initializer (tree, bool);
 
+/* In multiversion.c  */
+/* Returns true if DECL is a function version and is the default version.  */
+bool is_default_function_version (tree decl);
+void mark_function_as_version (tree);
+/* Returns true if the "target" attribute strings of DECL1 and DECL2
+   dont match.  */
+bool has_different_version_attributes (const tree decl1, const tree decl2);
+/* Return the target attribute if decl is FUNCTION_DECL. */
+tree function_target_attribute (const tree decl);
+/* Builds the dispatcher decl for function versions in VEC.  */
+tree build_dispatcher_for_function_versions (VEC (tree,heap) *vec);
+/* Builds the resolver function which picks the right function version at
+   run-time.  NODE is the cgraph node of the dispatcher which points to
+   the various function versions to be resolved.  */
+tree build_resolver_for_function_versions (struct cgraph_node *node);
+
 /* In ipa.c  */
 bool symtab_remove_unreachable_nodes (bool, FILE *);
 cgraph_node_set cgraph_node_set_new (void);
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 187817)
+++ gcc/tree.h	(working copy)
@@ -3534,6 +3534,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
 #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
 
+/* In FUNCTION_DECL, this is set if this function has other versions generated
+   using "target" attributes.  The default version is the one which does not
+   have any "target" attribute set. */
+#define DECL_FUNCTION_VERSIONED(NODE)\
+   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
+
 /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
    arguments/result/saved_tree fields by front ends.   It was either inherit
    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
@@ -3578,8 +3584,8 @@ struct GTY(()) tree_function_decl {
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
-
-  /* 1 bit left */
+  unsigned versioned_function : 1;
+  /* No bits left.  */
 };
 
 /* The source language of the translation-unit.  */
Index: gcc/multiversion.c
===================================================================
--- gcc/multiversion.c	(revision 0)
+++ gcc/multiversion.c	(revision 0)
@@ -0,0 +1,572 @@
+/* Function Multiversioning.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+/* This file contains routines for handling multiversioned functions.
+
+   Function versions are created by using the same function signature but
+   also tagging attribute "target" to specify the platform type for which
+   the version must be executed.  Here is an example:
+
+   int foo ()
+   {
+     printf ("Execute as default");
+     return 0;
+   }
+
+   int  __attribute__ ((target ("arch=corei7")))
+   foo ()
+   {
+     printf ("Execute for corei7");
+     return 0;
+   }
+   
+   int main ()
+   {
+     return foo ();
+   } 
+
+   The call to foo in main is replaced with a call to a dispatcher function
+   that contains the resolver code to call the correct function version at
+   run-time.  */
+
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "langhooks.h"
+#include "flags.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "toplev.h"
+#include "params.h"
+#include "coverage.h"
+#include "ggc.h"
+#include "basic-block.h"
+#include "toplev.h"
+#include "tree-dump.h"
+#include "output.h"
+#include "gimple-pretty-print.h"
+#include "target.h"
+#include "tree-flow.h"
+
+/* Comparator function to be used in qsort routine to sort attribute
+   specification strings to "target".  */
+
+static int
+attr_strcmp (const void *v1, const void *v2)
+{
+  const char *c1 = *(char *const*)v1;
+  const char *c2 = *(char *const*)v2;
+  return strcmp (c1, c2);
+}
+
+/* STR is the argument to target attribute.  This function tokenizes
+   the comma separated arguments, sorts them and returns a string which
+   is a unique identifier for the comma separated arguments.  */
+
+static char *
+sorted_attr_string (const char *str)
+{
+  char **args = NULL;
+  char *attr_str, *ret_str;
+  char *attr = NULL;
+  unsigned int argnum = 1;
+  unsigned int i;
+
+  for (i = 0; i < strlen (str); i++)
+    if (str[i] == ',')
+      argnum++;
+
+  attr_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (attr_str, str);
+
+  for (i = 0; i < strlen (attr_str); i++)
+    if (attr_str[i] == '=')
+      attr_str[i] = '_';
+
+  if (argnum == 1)
+    return attr_str;
+
+  args = (char **)xmalloc (argnum * sizeof (char *));
+
+  i = 0;
+  attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      args[i] = attr;
+      i++;
+      attr = strtok (NULL, ",");
+    }
+
+  qsort (args, argnum, sizeof (char*), attr_strcmp);
+
+  ret_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (ret_str, args[0]);
+  for (i = 1; i < argnum; i++)
+    {
+      strcat (ret_str, "_");
+      strcat (ret_str, args[i]);
+    }
+
+  free (args);
+  free (attr_str);
+  return ret_str;
+}
+
+/* Returns true when only one of DECL1 and DECL2 is marked with "target"
+   or if the "target" attribute strings of DECL1 and DECL2 dont match.  */
+
+bool
+has_different_version_attributes (const tree decl1, const tree decl2)
+{
+  tree attr1, attr2;
+  char *c1, *c2;
+  bool ret = false;
+
+  if (TREE_CODE (decl1) != FUNCTION_DECL
+      || TREE_CODE (decl2) != FUNCTION_DECL)
+    return false;
+
+  attr1 = function_target_attribute (decl1);
+  attr2 = function_target_attribute (decl2);
+
+  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
+    return false;
+
+  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
+      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
+    return true;
+
+  c1 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1))));
+  c2 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2))));
+
+  if (strcmp (c1, c2) != 0)
+     ret = true;
+
+  free (c1);
+  free (c2);
+
+  return ret;
+}
+
+/* If this decl corresponds to a function and has "target" attribute,
+   append the attribute string to its assembler name.  */
+
+static void
+version_assembler_name (const tree decl)
+{
+  tree version_attr;
+  const char *orig_name, *version_string, *attr_str;
+  char *assembler_name;
+  tree assembler_name_tree;
+  
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+    return;
+
+  if (DECL_DECLARED_INLINE_P (decl)
+      && lookup_attribute ("gnu_inline",
+			   DECL_ATTRIBUTES (decl)))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Function versions cannot be marked as gnu_inline,"
+	      " bodies have to be generated\n");
+
+  if (DECL_VIRTUAL_P (decl)
+      || DECL_VINDEX (decl))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Virtual function versioning not supported\n");
+
+  version_attr = function_target_attribute (decl);
+  /* target attribute string is NULL for default functions.  */
+  if (version_attr == NULL_TREE)
+    return;
+
+  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+  version_string
+    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
+
+  attr_str = sorted_attr_string (version_string);
+  assembler_name = (char *) xmalloc (strlen (orig_name)
+				     + strlen (attr_str) + 2);
+
+  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
+  if (dump_file)
+    fprintf (dump_file, "Assembler name set to %s for function version %s\n",
+	     assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
+
+  assembler_name_tree = get_identifier (assembler_name);
+
+  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree);
+  SET_DECL_RTL (decl, NULL);
+}
+
+void
+mark_function_as_version (const tree decl)
+{
+  if (DECL_FUNCTION_VERSIONED (decl))
+    return;
+  DECL_FUNCTION_VERSIONED (decl) = 1;
+  version_assembler_name (decl);
+}
+
+/* Returns target attribute tree DECL is a FUNCTION_DECL, returns
+   NULL otherwise.  */
+
+tree
+function_target_attribute (const tree decl)
+{
+  if (TREE_CODE (decl) == FUNCTION_DECL)
+    return lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  return NULL;
+}
+
+/* Returns true if decl is multi-versioned and DECL is the default function,
+   that is it is not tagged with "target" attribute.  */
+
+bool
+is_default_function_version (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && (function_target_attribute (decl) == NULL_TREE));
+}
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* Return a new name by appending SUFFIX to the DECL name.  If
+   make_unique is true, append the full path name.  */
+
+static char *
+make_name (tree decl, const char *suffix, bool make_unique)
+{
+  char *global_var_name;
+  int name_len;
+  const char *name;
+  const char *unique_name = NULL;
+
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* Get a unique name that can be used globally without any chances
+     of collision at link time.  */
+  if (make_unique)
+    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
+
+  name_len = strlen (name) + strlen (suffix) + 2;
+
+  if (make_unique)
+    name_len += strlen (unique_name) + 1;
+  global_var_name = (char *) xmalloc (name_len);
+
+  /* Use '.' to concatenate names as it is demangler friendly.  */
+  if (make_unique)
+      snprintf (global_var_name, name_len, "%s.%s.%s", name,
+		unique_name, suffix);
+  else
+      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
+
+  return global_var_name;
+}
+
+
+/* Make the resolver function decl to dispatch the versions of
+   a multi-versioned function,  DEFAULT_DECL.  Create an
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_resolver_func (const tree default_decl,
+		    const tree dispatch_decl,
+		    basic_block *empty_bb)
+{
+  char *resolver_name;
+  tree decl, type, decl_name, t;
+  basic_block new_bb;
+  tree old_current_function_decl;
+  bool is_uniq = false;
+
+  /* IFUNC's have to be globally visible.  So, if the default_decl is
+     not, then the name of the IFUNC should be made unique.  */
+  if (TREE_PUBLIC (default_decl) == 0)
+    is_uniq = true;
+
+  /* Append the filename to the resolver function if the versions are
+     not externally visible.  This is because the resolver function has
+     to be externally visible for the loader to find it.  So, appending
+     the filename will prevent conflicts with a resolver function from
+     another module which is based on the same version name.  */
+  resolver_name = make_name (default_decl, "resolver", is_uniq);
+
+  /* The resolver function should return a (void *). */
+  type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  decl_name = get_identifier (resolver_name);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  /* IFUNC resolvers have to be externally visible.  */
+  TREE_PUBLIC (decl) = 1;
+  DECL_UNINLINABLE (decl) = 1;
+
+  DECL_EXTERNAL (decl) = 0;
+  DECL_EXTERNAL (dispatch_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+  TREE_READONLY (decl) = 0;
+  DECL_PURE_P (decl) = 0;
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
+      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
+    }
+  else if (TREE_PUBLIC (default_decl))
+    {
+      /* In this case, each translation unit with a call to this
+	 versioned function will put out a resolver.  Ensure it
+	 is comdat to keep just one copy.  */
+      DECL_COMDAT (decl) = 1;
+      make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+    }
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  current_function_decl = decl;
+  gimple_register_cfg_hooks ();
+  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
+  cfun->curr_properties |=
+    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
+     PROP_gimple_any);
+  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
+  *empty_bb = new_bb;
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      gcc_assert (cgraph_get_node (default_decl));
+      symtab_add_to_same_comdat_group (
+	(symtab_node) cgraph_get_node (decl),
+	(symtab_node) cgraph_get_node (default_decl));
+    }
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  gcc_assert (dispatch_decl != NULL);
+  /* Mark dispatch_decl as "ifunc" with resolver as resolver_name.  */
+  DECL_ATTRIBUTES (dispatch_decl) 
+    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));
+
+  /* Create the alias for dispatch to resolver here.  */
+  cgraph_create_function_alias (dispatch_decl, decl);
+  return decl;
+}
+
+/* Generate the dispatching code body to dispatch multi-versioned function
+   DECL.  The target hook is called to process the "target" attributes and
+   provide the code to dispatch the right function at run-time.  NODE points
+   to the dispatcher decl whose body will be created.  */
+
+tree 
+build_resolver_for_function_versions (struct cgraph_node *node)
+{
+  tree resolver_decl;
+  basic_block empty_bb;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  tree old_current_function_decl;
+  tree default_ver_decl;
+  struct cgraph_node *versn;
+
+  if (node->resolver_fndecl)
+    return node->resolver_fndecl;
+
+  default_ver_decl = node->next_function_version->symbol.decl;
+  resolver_decl = make_resolver_func (default_ver_decl,
+			  		    node->symbol.decl,
+					    &empty_bb);
+  node->resolver_fndecl = resolver_decl;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (resolver_decl));
+  current_function_decl = resolver_decl;
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+
+  for (versn = node->next_function_version; versn;
+       versn = versn->next_function_version)
+    {
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (versn->symbol.decl))
+        error_at (DECL_SOURCE_LOCATION (versn->symbol.decl),
+		  "Virtual function multiversioning not supported");
+      VEC_safe_push (tree, heap, fn_ver_vec, versn->symbol.decl);
+    }
+
+  gcc_assert (targetm.dispatch_version);
+  targetm.dispatch_version (resolver_decl, fn_ver_vec, &empty_bb);
+
+  rebuild_cgraph_edges (); 
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+  return resolver_decl;
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher.
+   Return the decl created.  */
+
+static tree
+make_dispatcher_decl (const tree decl)
+{
+  tree func_decl;
+  char *func_name, *resolver_name;
+  tree fn_type, func_type;
+  bool is_uniq = false;
+
+  if (TREE_PUBLIC (decl) == 0)
+    is_uniq = true;
+
+  func_name = make_name (decl, "ifunc", is_uniq);
+  resolver_name = make_name (decl, "resolver", is_uniq);
+  gcc_assert (resolver_name);
+
+  fn_type = TREE_TYPE (decl);
+  func_type = build_function_type (TREE_TYPE (fn_type),
+				    TYPE_ARG_TYPES (fn_type));
+  
+  func_decl = build_fn_decl (func_name, func_type);
+  TREE_USED (func_decl) = 1;
+  DECL_CONTEXT (func_decl) = NULL_TREE;
+  DECL_INITIAL (func_decl) = error_mark_node;
+  DECL_ARTIFICIAL (func_decl) = 1;
+  /* Mark this func as external, the resolver will flip it again if
+     it gets generated.  */
+  DECL_EXTERNAL (func_decl) = 1;
+  /* This will be of type IFUNCs have to be externally visible.  */
+  TREE_PUBLIC (func_decl) = 1;
+
+  return func_decl;  
+}
+
+tree
+build_dispatcher_for_function_versions (VEC (tree,heap) *fn_ver_vec)
+{
+  struct cgraph_node *node = NULL;
+  struct cgraph_node *default_node = NULL;
+  struct cgraph_node *curr_node = NULL;
+  int ix;
+  tree ele;
+  tree dispatch_decl = NULL;
+
+  gcc_assert (fn_ver_vec != NULL);
+
+  /* Find the default version.  */
+  for (ix = 0; VEC_iterate (tree, fn_ver_vec, ix, ele); ++ix)
+    {
+      if (is_default_function_version (ele))
+	{
+	  default_node = cgraph_get_create_node (ele);
+	  break;
+	}
+    }
+
+  /* If there is no default node, just return NULL.  */
+  if (!default_node)
+    return NULL;
+
+  if (default_node->dispatcher_fndecl)
+    return default_node->dispatcher_fndecl;
+
+#if defined (ASM_OUTPUT_TYPE_DIRECTIVE) && HAVE_GNU_INDIRECT_FUNCTION
+  /* Right now, the dispatching is done via ifunc.  */
+  dispatch_decl = make_dispatcher_decl (default_node->symbol.decl); 
+#else
+  error_at (DECL_SOURCE_LOCATION (default_node->symbol.decl),
+	    "Multiversioning needs ifunc which is not supported "
+	    "in this configuration");
+#endif
+  default_node->dispatcher_fndecl = dispatch_decl;
+  curr_node = cgraph_get_create_node (dispatch_decl);
+  gcc_assert (curr_node);
+  curr_node->dispatcher_function = 1;
+  cgraph_mark_address_taken_node (default_node);
+
+  for (ix = 0; VEC_iterate (tree, fn_ver_vec, ix, ele); ++ix)
+    {
+      node = cgraph_get_create_node (ele);
+      gcc_assert (node != NULL && DECL_FUNCTION_VERSIONED (ele));
+      if (node == default_node)
+	continue;
+      gcc_assert (function_target_attribute (ele) != NULL_TREE);
+      if (curr_node->next_function_version)
+ 	{
+	  node->next_function_version = curr_node->next_function_version;
+	  curr_node->next_function_version->prev_function_version = node;
+	}
+      curr_node->next_function_version = node;
+      node->prev_function_version = curr_node;
+      node->dispatcher_fndecl = dispatch_decl;
+    }
+
+  /* The default version should be the first node.  */
+  default_node->next_function_version = curr_node->next_function_version;
+  curr_node->next_function_version->prev_function_version = default_node;
+  curr_node->next_function_version = default_node;
+  
+  return dispatch_decl; 
+}
Index: gcc/cgraphunit.c
===================================================================
--- gcc/cgraphunit.c	(revision 187817)
+++ gcc/cgraphunit.c	(working copy)
@@ -940,7 +940,7 @@ cgraph_analyze_functions (void)
 
 	      for (edge = cnode->callees; edge; edge = edge->next_callee)
 		if (edge->callee->local.finalized)
-		  enqueue_node ((symtab_node)edge->callee);
+                 enqueue_node ((symtab_node)edge->callee);
 
 	      /* If decl is a clone of an abstract function, mark that abstract
 		 function so that we don't release its body. The DECL_INITIAL() of that
Index: gcc/testsuite/g++.dg/mv2.C
===================================================================
--- gcc/testsuite/g++.dg/mv2.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv2.C	(revision 0)
@@ -0,0 +1,119 @@
+/* Test case to check if Multiversioning chooses the correct
+   dispatching order when versions are for various ISAs.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The dispatch checks should be in the exact reverse order of the
+   declarations below.  */
+int foo () __attribute__ ((target ("mmx")));
+int foo () __attribute__ ((target ("sse")));
+int foo () __attribute__ ((target ("sse2")));
+int foo () __attribute__ ((target ("sse3")));
+int foo () __attribute__ ((target ("ssse3")));
+int foo () __attribute__ ((target ("sse4.1")));
+int foo () __attribute__ ((target ("sse4.2")));
+int foo () __attribute__ ((target ("popcnt")));
+int foo () __attribute__ ((target ("avx")));
+int foo () __attribute__ ((target ("avx2")));
+
+int main ()
+{
+
+  int val = foo ();
+
+  if (__builtin_cpu_supports ("avx2"))
+    assert (val == 1);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("popcnt"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("sse4.2"))
+    assert (val == 4);
+  else if (__builtin_cpu_supports ("sse4.1"))
+    assert (val == 5);
+  else if (__builtin_cpu_supports ("ssse3"))
+    assert (val == 6);
+  else if (__builtin_cpu_supports ("sse3"))
+    assert (val == 7);
+  else if (__builtin_cpu_supports ("sse2"))
+    assert (val == 8);
+  else if (__builtin_cpu_supports ("sse"))
+    assert (val == 9);
+  else if (__builtin_cpu_supports ("mmx"))
+    assert (val == 10);
+  else
+    assert (val == 0);
+
+  return 0;
+}
+
+int
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 2;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return 1;
+}
Index: gcc/testsuite/g++.dg/mv4.C
===================================================================
--- gcc/testsuite/g++.dg/mv4.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv4.C	(revision 0)
@@ -0,0 +1,23 @@
+/* Test case to check if the compiler generates an error message
+   when the default version of a multiversioned function is absent
+   and its pointer is taken.  */
+
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2" } */
+
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int main ()
+{
+  int (*p)() = &foo; /* { dg-error "Pointer to a multiversioned function without a default is not allowed" {} } */
+  return (p)();
+}
Index: gcc/testsuite/g++.dg/mv1.C
===================================================================
--- gcc/testsuite/g++.dg/mv1.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv1.C	(revision 0)
@@ -0,0 +1,202 @@
+/* Test case to check if Multiversioning works.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -fPIC" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The other versions of foo.  Mix up the ordering and 
+   check if the dispatching does it in the order of priority. */
+/* Check combination of target attributes.  */
+int foo () __attribute__ ((target("arch=corei7,popcnt")));
+/* The target operands in this declaration and the definition are re-ordered.
+   This should still work.  */
+int foo () __attribute__ ((target("ssse3,avx2")));
+
+/* Check for all target attributes for which dispatchers are available.  */
+/* Check arch= */
+int foo () __attribute__((target("arch=core2")));
+int foo () __attribute__((target("arch=corei7")));
+int foo () __attribute__((target("arch=atom")));
+/* Check ISAs  */
+int foo () __attribute__((target("sse3")));
+int foo () __attribute__((target("sse2")));
+int foo () __attribute__((target("sse")));
+int foo () __attribute__((target("avx")));
+int foo () __attribute__((target("sse4.2")));
+int foo () __attribute__((target("popcnt")));
+int foo () __attribute__((target("sse4.1")));
+int foo () __attribute__((target("ssse3")));
+int foo () __attribute__((target("mmx")));
+int foo () __attribute__((target("avx2")));
+/* Check more arch=.  */
+int foo () __attribute__((target("arch=amdfam10")));
+int foo () __attribute__((target("arch=bdver1")));
+int foo () __attribute__((target("arch=bdver2")));
+
+int (*p)() = &foo;
+int main ()
+{
+  int val = foo ();
+  assert (val ==  (*p)());
+
+  /* Check in the exact same order in which the dispatching
+     is expected to happen.  */
+  if (__builtin_cpu_is ("bdver1"))
+    assert (val == 1);
+  else if (__builtin_cpu_is ("bdver2"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("avx2")
+	   && __builtin_cpu_supports ("ssse3"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("avx2"))
+    assert (val == 4);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 5);
+  else if (__builtin_cpu_is ("corei7")
+	   && __builtin_cpu_supports ("popcnt"))
+    assert (val == 6);
+  else if (__builtin_cpu_supports ("popcnt"))
+    assert (val == 7);
+  else if (__builtin_cpu_is ("corei7"))
+    assert (val == 8);
+  else if (__builtin_cpu_supports ("sse4.2"))
+    assert (val == 9);
+  else if (__builtin_cpu_supports ("sse4.1"))
+    assert (val == 10);
+  else if (__builtin_cpu_is ("amdfam10h"))
+    assert (val == 11);
+  else if (__builtin_cpu_is ("core2"))
+    assert (val == 12);
+  else if (__builtin_cpu_is ("atom"))
+    assert (val == 13);
+  else if (__builtin_cpu_supports ("ssse3"))
+    assert (val == 14);
+  else if (__builtin_cpu_supports ("sse3"))
+    assert (val == 15);
+  else if (__builtin_cpu_supports ("sse2"))
+    assert (val == 16);
+  else if (__builtin_cpu_supports ("sse"))
+    assert (val == 17);
+  else if (__builtin_cpu_supports ("mmx"))
+    assert (val == 18);
+  else
+    assert (val == 0);
+  
+  return 0;
+}
+
+int foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=corei7,popcnt")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("avx2,ssse3")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("arch=core2")))
+foo ()
+{
+  return 12;
+}
+
+int __attribute__ ((target("arch=corei7")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("arch=atom")))
+foo ()
+{
+  return 13;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return 15;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return 16;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return 17;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return 14;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return 18;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("arch=amdfam10")))
+foo ()
+{
+  return 11;
+}
+
+int __attribute__ ((target("arch=bdver1")))
+foo ()
+{
+  return 1;
+}
+
+int __attribute__ ((target("arch=bdver2")))
+foo ()
+{
+  return 2;
+}
Index: gcc/testsuite/g++.dg/mv3.C
===================================================================
--- gcc/testsuite/g++.dg/mv3.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv3.C	(revision 0)
@@ -0,0 +1,37 @@
+/* Test case to check if a call to a multiversioned function
+   is replaced with a direct call to the particular version when
+   the most specialized version's target attributes match the
+   caller.  
+  
+   In this program, foo is multiversioned but there is no default
+   function.  This is an error if the call has to go through a
+   dispatcher.  However, the call to foo in bar can be replaced
+   with a direct call to the popcnt version of foo.  Hence, this
+   test should pass.  */
+
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2" } */
+
+
+/* Default version.  */
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target ("popcnt")))
+bar ()
+{
+  return foo ();
+}
+
+int main ()
+{
+  return bar ();
+}
Index: gcc/cp/class.c
===================================================================
--- gcc/cp/class.c	(revision 187817)
+++ gcc/cp/class.c	(working copy)
@@ -1093,7 +1093,20 @@ add_method (tree type, tree method, tree using_dec
 	      || same_type_p (TREE_TYPE (fn_type),
 			      TREE_TYPE (method_type))))
 	{
-	  if (using_decl)
+	  /* For function versions, their parms and types match
+	     but they are not duplicates.  Record function versions
+	     as and when they are found.  */
+	  if (TREE_CODE (fn) == FUNCTION_DECL
+	      && TREE_CODE (method) == FUNCTION_DECL
+	      && (function_target_attribute (fn)
+		  || function_target_attribute (method))
+	      && has_different_version_attributes (fn, method))
+ 	    {
+	      mark_function_as_version (fn);
+	      mark_function_as_version (method);
+	      continue;
+	    }
+	  else if (using_decl)
 	    {
 	      if (DECL_CONTEXT (fn) == type)
 		/* Defer to the local function.  */
@@ -6863,6 +6876,7 @@ resolve_address_of_overloaded_function (tree targe
   tree matches = NULL_TREE;
   tree fn;
   tree target_fn_type;
+  VEC (tree, heap) *fn_ver_vec = NULL;
 
   /* By the time we get here, we should be seeing only real
      pointer-to-member types, not the internal POINTER_TYPE to
@@ -6927,9 +6941,19 @@ resolve_address_of_overloaded_function (tree targe
 	  if (DECL_ANTICIPATED (fn))
 	    continue;
 
-	  /* See if there's a match.  */
+	  /* See if there's a match.   For functions that are multi-versioned,
+	     all the versions match.  */
 	  if (same_type_p (target_fn_type, static_fn_type (fn)))
-	    matches = tree_cons (fn, NULL_TREE, matches);
+	    {
+	      matches = tree_cons (fn, NULL_TREE, matches);
+	      /*If versioned, push all possible versions into a vector.  */
+	      if (DECL_FUNCTION_VERSIONED (fn))
+		{
+		  if (fn_ver_vec == NULL)
+		   fn_ver_vec = VEC_alloc (tree, heap, 2);
+		  VEC_safe_push (tree, heap, fn_ver_vec, fn); 
+		}
+	    }
 	}
     }
 
@@ -7024,10 +7048,15 @@ resolve_address_of_overloaded_function (tree targe
       tree match;
 
       fn = TREE_PURPOSE (matches);
-      for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
-	if (!decls_match (fn, TREE_PURPOSE (match)))
-	  break;
 
+      /* For multi-versioned functions, more than one match is just fine.  */
+      if (DECL_FUNCTION_VERSIONED (fn))
+	match = NULL_TREE;
+      else
+        for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+  	  if (!decls_match (fn, TREE_PURPOSE (match)))
+	    break;
+
       if (match)
 	{
 	  if (flags & tf_error)
@@ -7090,6 +7119,28 @@ resolve_address_of_overloaded_function (tree targe
       perform_or_defer_access_check (access_path, fn, fn);
     }
 
+  /* If a pointer to a function that is multi-versioned is requested, the
+     pointer to the dispatcher function is returned instead.  This works
+     well because indirectly calling the function will dispatch the right
+     function version at run-time. Also, the function address is kept
+     unique.  */
+  if (DECL_FUNCTION_VERSIONED (fn))
+    {
+      tree dispatcher_decl;
+      gcc_assert (fn_ver_vec != NULL);
+      dispatcher_decl = build_dispatcher_for_function_versions (fn_ver_vec);
+      if (!dispatcher_decl)
+	{
+	  error_at (input_location, "Pointer to a multiversioned function"
+		    " without a default is not allowed");
+	  return error_mark_node;
+	}
+      retrofit_lang_decl (dispatcher_decl);
+      mark_used (fn);
+      VEC_free (tree, heap, fn_ver_vec);
+      fn = dispatcher_decl;
+    }
+
   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
     return cp_build_addr_expr (fn, flags);
   else
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	(revision 187817)
+++ gcc/cp/decl.c	(working copy)
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "pointer-set.h"
 #include "splay-tree.h"
 #include "plugin.h"
+#include "cgraph.h"
 
 /* Possible cases of bad specifiers type used by bad_specifiers. */
 enum bad_spec_place {
@@ -973,6 +974,19 @@ decls_match (tree newdecl, tree olddecl)
       if (t1 != t2)
 	return 0;
 
+      /* The decls dont match if they correspond to two different versions
+	 of the same function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) 
+	  && has_different_version_attributes (newdecl, olddecl))
+	{
+	  /* One of the decls could be the default without the "target"
+	     attribute. Set it to be a versioned function here.  */
+	  mark_function_as_version (newdecl);
+	  mark_function_as_version (olddecl);
+	  return 0;
+	}
+
       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
 	  && ! (DECL_EXTERN_C_P (newdecl)
 		&& DECL_EXTERN_C_P (olddecl)))
@@ -1490,7 +1504,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
 	      error ("previous declaration %q+#D here", olddecl);
 	      return NULL_TREE;
 	    }
-	  else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
+	  /* For function versions, params and types match, but they
+	     are not ambiguous.  */
+	  else if ((!DECL_FUNCTION_VERSIONED (newdecl)
+		    && !DECL_FUNCTION_VERSIONED (olddecl))
+		   && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
 			      TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
 	    {
 	      error ("new declaration %q#D", newdecl);
@@ -2262,6 +2280,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
+  /* If the olddecl is a version, so is the newdecl.  */
+  if (TREE_CODE (newdecl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (olddecl))
+    DECL_FUNCTION_VERSIONED (newdecl) = 1;
+
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
     {
       int function_size;
@@ -14043,7 +14066,11 @@ cxx_comdat_group (tree decl)
 	  else
 	    break;
 	}
-      name = DECL_ASSEMBLER_NAME (decl);
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl))
+	name = DECL_NAME (decl);
+      else
+        name = DECL_ASSEMBLER_NAME (decl);
     }
 
   return name;
Index: gcc/cp/error.c
===================================================================
--- gcc/cp/error.c	(revision 187817)
+++ gcc/cp/error.c	(working copy)
@@ -1534,8 +1534,15 @@ dump_exception_spec (tree t, int flags)
 static void
 dump_function_name (tree t, int flags)
 {
-  tree name = DECL_NAME (t);
+  tree name;
 
+  /* For function versions, use the assembler name as the decl name is
+     the same for all versions.  */
+  if (DECL_FUNCTION_VERSIONED (t))
+    name = DECL_ASSEMBLER_NAME (t);
+  else
+    name = DECL_NAME (t);
+
   /* We can get here with a decl that was synthesized by language-
      independent machinery (e.g. coverage.c) in which case it won't
      have a lang_specific structure attached and DECL_CONSTRUCTOR_P
Index: gcc/cp/semantics.c
===================================================================
--- gcc/cp/semantics.c	(revision 187817)
+++ gcc/cp/semantics.c	(working copy)
@@ -3784,8 +3784,11 @@ expand_or_defer_fn_1 (tree fn)
       /* If the user wants us to keep all inline functions, then mark
 	 this function as needed so that finish_file will make sure to
 	 output it later.  Similarly, all dllexport'd functions must
-	 be emitted; there may be callers in other DLLs.  */
-      if ((flag_keep_inline_functions
+	 be emitted; there may be callers in other DLLs.
+	 Also, mark this function as needed if it is marked inline but
+	 is a multi-versioned function.  */
+      if (((flag_keep_inline_functions
+	    || DECL_FUNCTION_VERSIONED (fn))
 	   && DECL_DECLARED_INLINE_P (fn)
 	   && !DECL_REALLY_EXTERN (fn))
 	  || (flag_keep_inline_dllexport
Index: gcc/cp/decl2.c
===================================================================
--- gcc/cp/decl2.c	(revision 187817)
+++ gcc/cp/decl2.c	(working copy)
@@ -675,9 +675,13 @@ check_classfn (tree ctype, tree function, tree tem
 	  if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
 	    continue;
 
+	  /* While finding a match, same types and params are not enough
+	     if the function is versioned.  Also check version ("target")
+	     attributes.  */
 	  if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
 			   TREE_TYPE (TREE_TYPE (fndecl)))
 	      && compparms (p1, p2)
+	      && !has_different_version_attributes (function, fndecl)
 	      && (!is_template
 		  || comp_template_parms (template_parms,
 					  DECL_TEMPLATE_PARMS (fndecl)))
Index: gcc/cp/call.c
===================================================================
--- gcc/cp/call.c	(revision 187817)
+++ gcc/cp/call.c	(working copy)
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "timevar.h"
+#include "cgraph.h"
 
 /* The various kinds of conversion.  */
 
@@ -3905,6 +3906,16 @@ build_new_function_call (tree fn, VEC(tree,gc) **a
     {
       if (complain & tf_error)
 	{
+	  /* If the call is to a multiversioned function without
+	     a default version, overload resolution will fail.  */
+	  if (candidates
+	      && TREE_CODE (candidates->fn) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (candidates->fn))
+	    error_at (location_of (DECL_NAME (OVL_CURRENT (fn))),
+		      "Call to multiversioned function %<%D(%A)%> with"
+		      " no default version", DECL_NAME (OVL_CURRENT (fn)),
+		      build_tree_list_vec (*args));
+
 	  if (!any_viable_p && candidates && ! candidates->next
 	      && (TREE_CODE (candidates->fn) == FUNCTION_DECL))
 	    return cp_build_function_call_vec (candidates->fn, args, complain);
@@ -6829,6 +6840,30 @@ build_over_call (struct z_candidate *cand, int fla
   if (!already_used)
     mark_used (fn);
 
+  /* For calls to a multi-versioned function, overload resolution
+     returns the function with the highest target priority, that is,
+     the version that will checked for dispatching first.  If this
+     version is inlinable, a direct call can be made otherwise it
+     should go through the dispatcher.  */
+
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && !targetm.target_option.can_inline_p (current_function_decl, fn))
+    {
+      tree dispatcher_decl = NULL;
+      struct cgraph_node *node = cgraph_get_node (fn);
+      if (node != NULL)
+        dispatcher_decl = cgraph_get_node (fn)->dispatcher_fndecl;
+      if (dispatcher_decl == NULL)
+	{
+	  error_at (input_location, "Call to multiversioned function"
+		    " without a default is not allowed");
+	  return NULL;
+	}
+      retrofit_lang_decl (dispatcher_decl);
+      gcc_assert (dispatcher_decl != NULL);
+      fn = dispatcher_decl;
+    }
+
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
     {
       tree t;
@@ -8086,6 +8121,29 @@ joust (struct z_candidate *cand1, struct z_candida
   size_t i;
   size_t len;
 
+  /* For Candidates of a multi-versioned function,  make the version with
+     the most specialized target attributes, highest priority win.  This
+     version will be checked for dispatching first.  If this version can
+     be inlined into the caller the front-end will simply make a direct
+     call to this function.  */
+
+  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL
+       && DECL_FUNCTION_VERSIONED (cand1->fn))
+      ||(TREE_CODE (cand2->fn) == FUNCTION_DECL
+	 && DECL_FUNCTION_VERSIONED (cand2->fn)))
+    {
+      /* Both functions must be marked versioned.  */
+      gcc_assert (DECL_FUNCTION_VERSIONED (cand1->fn)
+		  && DECL_FUNCTION_VERSIONED (cand2->fn));
+
+      /* Always make the version with the higher priority, more
+	 specialized, win.  */
+      if (targetm.compare_versions (cand1->fn, cand2->fn) >= 0)
+	return 1;
+      else
+	return -1;
+    }
+
   /* Candidates that involve bad conversions are always worse than those
      that don't.  */
   if (cand1->viable > cand2->viable)
@@ -8431,6 +8489,20 @@ tourney (struct z_candidate *candidates, tsubst_fl
   int fate;
   int champ_compared_to_predecessor = 0;
 
+  /* For multiversioned functions, aggregate all the versions here for
+     generating the dispatcher body later if necessary.  */
+
+  if (DECL_FUNCTION_VERSIONED (candidates->fn))
+    {
+      VEC (tree, heap) *fn_ver_vec = NULL;
+      struct z_candidate *ver = candidates;
+      fn_ver_vec = VEC_alloc (tree, heap, 2);
+      for (;ver; ver = ver->next)
+        VEC_safe_push (tree, heap, fn_ver_vec, ver->fn);
+      build_dispatcher_for_function_versions (fn_ver_vec);
+      VEC_free (tree, heap, fn_ver_vec);
+    }
+
   /* Walk through the list once, comparing each current champ to the next
      candidate, knocking out a candidate or two with each comparison.  */
 
Index: gcc/cp/mangle.c
===================================================================
--- gcc/cp/mangle.c	(revision 187817)
+++ gcc/cp/mangle.c	(working copy)
@@ -1245,7 +1245,12 @@ write_unqualified_name (const tree decl)
     {
       MANGLE_TRACE_TREE ("local-source-name", decl);
       write_char ('L');
-      write_source_name (DECL_NAME (decl));
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && DECL_ASSEMBLER_NAME_SET_P (decl))
+	write_source_name (DECL_ASSEMBLER_NAME (decl));
+      else
+	write_source_name (DECL_NAME (decl));
       /* The default discriminator is 1, and that's all we ever use,
 	 so there's no code to output one here.  */
     }
@@ -1260,7 +1265,14 @@ write_unqualified_name (const tree decl)
                && LAMBDA_TYPE_P (type))
         write_closure_type_name (type);
       else
-        write_source_name (DECL_NAME (decl));
+	{
+	  if (TREE_CODE (decl) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (decl)
+	      && DECL_ASSEMBLER_NAME_SET_P (decl))
+	    write_source_name (DECL_ASSEMBLER_NAME (decl));
+	  else
+	    write_source_name (DECL_NAME (decl));
+	}
     }
 }
 
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 187817)
+++ gcc/Makefile.in	(working copy)
@@ -1297,6 +1297,7 @@ OBJS = \
 	mcf.o \
 	mode-switching.o \
 	modulo-sched.o \
+	multiversion.o \
 	omega.o \
 	omp-low.o \
 	optabs.o \
@@ -3042,6 +3043,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h
    $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
    $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
    $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
+multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+   $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
+   $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
+   $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
+   $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h
 cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 187817)
+++ gcc/config/i386/i386.c	(working copy)
@@ -27626,6 +27626,473 @@ ix86_init_mmx_sse_builtins (void)
     }
 }
 
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
+   to return a pointer to VERSION_DECL if the outcome of the expression
+   formed by PREDICATE_CHAIN is true.  This function will be called during
+   version dispatch to decide which function version to execute.  It returns
+   the basic block at the end to which more conditions can be added.  */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     tree predicate_chain, basic_block new_bb)
+{
+  gimple return_stmt;
+  tree convert_expr, result_var;
+  gimple convert_stmt;
+  gimple call_cond_stmt;
+  gimple if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+
+  tree cond_var, and_expr_var = NULL_TREE;
+  gimple_seq gseq;
+
+  tree old_current_function_decl;
+  tree predicate_decl, predicate_arg;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+  current_function_decl = function_decl;
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node, NULL);
+  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  return_stmt = gimple_build_return (result_var);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      current_function_decl = old_current_function_decl;
+      return new_bb;
+    }
+
+  while (predicate_chain != NULL)
+    {
+      cond_var = create_tmp_var (integer_type_node, NULL);
+      predicate_decl = TREE_PURPOSE (predicate_chain);
+      predicate_arg = TREE_VALUE (predicate_chain);
+      call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+      gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+      gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+      gimple_set_bb (call_cond_stmt, new_bb);
+      gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+      predicate_chain = TREE_CHAIN (predicate_chain);
+      
+      if (and_expr_var == NULL)
+        and_expr_var = cond_var;
+      else
+	{
+	  gimple assign_stmt;
+	  /* Use MIN_EXPR to check if any integer is zero?.
+	     and_expr_var = min_expr <cond_var, and_expr_var>  */
+	  assign_stmt = gimple_build_assign (and_expr_var,
+			  build2 (MIN_EXPR, integer_type_node,
+				  cond_var, and_expr_var));
+
+	  gimple_set_block (assign_stmt, DECL_INITIAL (function_decl));
+	  gimple_set_bb (assign_stmt, new_bb);
+	  gimple_seq_add_stmt (&gseq, assign_stmt);
+	}
+    }
+
+  if_else_stmt = gimple_build_cond (GT_EXPR, and_expr_var,
+	  		            integer_zero_node,
+				    NULL_TREE, NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR, 0);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  return bb3;
+}
+
+/* This parses the attribute arguments to target in DECL and determines
+   the right builtin to use to match the platform specification.
+   For now, only one target argument ("arch=" or "<-m>xxx") is allowed.
+   It returns the priority value for this version decl.  If PREDICATE_LIST
+   is not NULL, it stores the list of cpu features that need to be checked
+   before dispatching this function.  */
+
+static unsigned int
+get_builtin_code_for_version (tree decl, tree *predicate_list)
+{
+  tree attrs;
+  struct cl_target_option cur_target;
+  tree target_node;
+  struct cl_target_option *new_target;
+  const char *arg_str = NULL;
+  const char *attrs_str = NULL;
+  char *tok_str = NULL;
+  char *token;
+  unsigned int priority = 0;
+
+  /* Priority of i386 features, greater value is higher priority.   This is
+     used to decide the order in which function dispatch must happen.  For
+     instance, a version specialized for SSE4.2 should be checked for dispatch
+     before a version for SSE3, as SSE4.2 implies SSE3.  */
+  enum feature_priority
+  {
+    P_ZERO = 0,
+    P_MMX,
+    P_SSE,
+    P_SSE2,
+    P_SSE3,
+    P_SSSE3,
+    P_PROC_SSSE3,
+    P_SSE4_a,
+    P_PROC_SSE4_a,
+    P_SSE4_1,
+    P_SSE4_2,
+    P_PROC_SSE4_2,
+    P_POPCNT,
+    P_AVX,
+    P_AVX2,
+    P_FMA,
+    P_PROC_FMA
+  };
+
+  /* These are the target attribute strings for which a dispatcher is
+     available, from fold_builtin_cpu.  */
+
+  static struct _feature_list
+    {
+      const char *const name;
+      const enum feature_priority priority;
+    }
+  const feature_list[] =
+    {
+      {"mmx", P_MMX},
+      {"sse", P_SSE},
+      {"sse2", P_SSE2},
+      {"sse3", P_SSE3},
+      {"ssse3", P_SSSE3},
+      {"sse4.1", P_SSE4_1},
+      {"sse4.2", P_SSE4_2},
+      {"popcnt", P_POPCNT},
+      {"avx", P_AVX},
+      {"avx2", P_AVX2}
+    };
+
+
+  static unsigned int NUM_FEATURES
+    = sizeof (feature_list) / sizeof (struct _feature_list);
+
+  unsigned int i;
+
+  tree predicate_chain = NULL_TREE;
+  tree predicate_decl, predicate_arg;
+
+  attrs = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  gcc_assert (attrs != NULL);
+
+  attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+  gcc_assert (TREE_CODE (attrs) == STRING_CST);
+  attrs_str = TREE_STRING_POINTER (attrs);
+
+
+  /* Handle arch= if specified.  For priority, set it to be 1 more than
+     the best instruction set the processor can handle.  For instance, if
+     there is a version for atom and a version for ssse3 (the highest ISA
+     priority for atom), the atom version must be checked for dispatch
+     before the ssse3 version. */
+  if (strstr (attrs_str, "arch=") != NULL)
+    {
+      cl_target_option_save (&cur_target, &global_options);
+      target_node = ix86_valid_target_attribute_tree (attrs);
+    
+      gcc_assert (target_node);
+      new_target = TREE_TARGET_OPTION (target_node);
+      gcc_assert (new_target);
+      
+      if (new_target->arch_specified && new_target->arch > 0)
+	{
+	  switch (new_target->arch)
+	    {
+	    case PROCESSOR_CORE2_32:
+	    case PROCESSOR_CORE2_64:
+	      arg_str = "core2";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_COREI7_32:
+	    case PROCESSOR_COREI7_64:
+	      arg_str = "corei7";
+	      priority = P_PROC_SSE4_2;
+	      break;
+	    case PROCESSOR_ATOM:
+	      arg_str = "atom";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_AMDFAM10:
+	      arg_str = "amdfam10h";
+	      priority = P_PROC_SSE4_a;
+	      break;
+	    case PROCESSOR_BDVER1:
+	      arg_str = "bdver1";
+	      priority = P_PROC_FMA;
+	      break;
+	    case PROCESSOR_BDVER2:
+	      arg_str = "bdver2";
+	      priority = P_PROC_FMA;
+	      break;
+	    }  
+	}    
+    
+      cl_target_option_restore (&global_options, &cur_target);
+	
+      if (predicate_list && arg_str == NULL)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+	    	"No dispatcher found for the versioning attributes");
+	  return 0;
+	}
+    
+      if (predicate_list)
+	{
+          predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
+          /* For a C string literal the length includes the trailing NULL.  */
+          predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+          predicate_chain = tree_cons (predicate_decl, predicate_arg,
+				       predicate_chain);
+	}
+    }
+
+  /* Process feature name.  */
+  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
+  strcpy (tok_str, attrs_str);
+  token = strtok (tok_str, ",");
+  predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
+
+  while (token != NULL)
+    {
+      /* Do not process "arch="  */
+      if (strncmp (token, "arch=", 5) == 0)
+	{
+	  token = strtok (NULL, ",");
+	  continue;
+	}
+      for (i = 0; i < NUM_FEATURES; ++i)
+	{
+	  if (strcmp (token, feature_list[i].name) == 0)
+	    {
+	      if (predicate_list)
+		{
+		  predicate_arg = build_string_literal (
+				  strlen (feature_list[i].name) + 1,
+				  feature_list[i].name);
+		  predicate_chain = tree_cons (predicate_decl, predicate_arg,
+					       predicate_chain);
+		}
+	      /* Find the maximum priority feature.  */
+	      if (feature_list[i].priority > priority)
+		priority = feature_list[i].priority;
+
+	      break;
+	    }
+	}
+      if (predicate_list && i == NUM_FEATURES)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+		    "No dispatcher found for %s", token);
+	  return 0;
+	}
+      token = strtok (NULL, ",");
+    }
+  free (tok_str);
+
+  if (predicate_list && predicate_chain == NULL_TREE)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+	        "No dispatcher found for the versioning attributes : %s",
+	        attrs_str);
+      return 0;
+    }
+  else if (predicate_list)
+    {
+      predicate_chain = nreverse (predicate_chain);
+      *predicate_list = predicate_chain;
+    }
+
+  return priority; 
+}
+
+/* This compares the priority of target features in function DECL1
+   and DECL2.  It returns positive value if DECL1 is higher priority,
+   negative value if DECL2 is higher priority and 0 if they are the
+   same.  */
+
+static int
+ix86_compare_versions (tree decl1, tree decl2)
+{
+  unsigned int priority1 = 0;
+  unsigned int priority2 = 0;
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl1)) != NULL)
+    priority1 = get_builtin_code_for_version (decl1, NULL);
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl2)) != NULL)
+    priority2 = get_builtin_code_for_version (decl2, NULL);
+
+  return (int)priority1 - (int)priority2;
+}
+ 
+static int
+feature_compare (const void *v1, const void *v2)
+{
+  typedef struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    } function_version_info;
+
+  const function_version_info c1 = *(const function_version_info *)v1;
+  const function_version_info c2 = *(const function_version_info *)v2;
+  return (c2.dispatch_priority - c1.dispatch_priority);
+}
+
+/* This is the target hook to generate the dispatch function for
+   multi-versioned functions.  DISPATCH_DECL is the function which will
+   contain the dispatch logic.  FNDECLS are the function choices for
+   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
+   in DISPATCH_DECL in which the dispatch code is generated.  */
+
+static int
+ix86_dispatch_version (tree dispatch_decl,
+		       void *fndecls_p,
+		       basic_block *empty_bb)
+{
+  tree default_decl;
+  gimple ifunc_cpu_init_stmt;
+  gimple_seq gseq;
+  tree old_current_function_decl;
+  int ix;
+  tree ele;
+  VEC (tree, heap) *fndecls;
+  unsigned int num_versions = 0;
+  unsigned int actual_versions = 0;
+  unsigned int i;
+
+  struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    }*function_version_info;
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /*fndecls_p is actually a vector.  */
+  fndecls = (VEC (tree, heap) *)fndecls_p;
+
+  /* Atleast one more version other than the default.  */
+  num_versions = VEC_length (tree, fndecls);
+  gcc_assert (num_versions >= 2);
+
+  function_version_info = (struct _function_version_info *)
+    xmalloc ((num_versions - 1) * sizeof (struct _function_version_info));
+
+  /* The first version in the vector is the default decl.  */
+  default_decl = VEC_index (tree, fndecls, 0);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
+  current_function_decl = dispatch_decl;
+
+  gseq = bb_seq (*empty_bb);
+  /* Function version dispatch is via IFUNC.  IFUNC resolvers fire before
+     constructors, so explicity call __builtin_cpu_init here.  */
+  ifunc_cpu_init_stmt = gimple_build_call_vec (
+                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
+  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
+  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
+  set_bb_seq (*empty_bb, gseq);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+
+  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
+    {
+      tree version_decl = ele;
+      tree predicate_chain = NULL_TREE;
+      unsigned int priority;
+      /* Get attribute string, parse it and find the right predicate decl.
+         The predicate function could be a lengthy combination of many
+	 features, like arch-type and various isa-variants.  */
+      priority = get_builtin_code_for_version (version_decl,
+	 			               &predicate_chain);
+
+      if (predicate_chain == NULL_TREE)
+	continue;
+
+      actual_versions++;
+      function_version_info [ix - 1].version_decl = version_decl;
+      function_version_info [ix - 1].predicate_chain = predicate_chain;
+      function_version_info [ix - 1].dispatch_priority = priority;
+    }
+
+  /* Sort the versions according to descending order of dispatch priority.  The
+     priority is based on the ISA.  This is not a perfect solution.  There
+     could still be ambiguity.  If more than one function version is suitable
+     to execute,  which one should be dispatched?  In future, allow the user
+     to specify a dispatch  priority next to the version.  */
+  qsort (function_version_info, actual_versions,
+         sizeof (struct _function_version_info), feature_compare);
+
+  for  (i = 0; i < actual_versions; ++i)
+    *empty_bb = add_condition_to_bb (dispatch_decl,
+				     function_version_info[i].version_decl,
+				     function_version_info[i].predicate_chain,
+				     *empty_bb);
+
+  /* dispatch default version at the end.  */
+  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl,
+				   NULL, *empty_bb);
+
+  free (function_version_info);
+  return 0;
+}
+
 /* This builds the processor_model struct type defined in
    libgcc/config/i386/cpuinfo.c  */
 
@@ -39571,6 +40038,12 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN ix86_fold_builtin
 
+#undef TARGET_DISPATCH_VERSION
+#define TARGET_DISPATCH_VERSION ix86_dispatch_version
+
+#undef TARGET_COMPARE_VERSIONS
+#define TARGET_COMPARE_VERSIONS ix86_compare_versions
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-06-04 19:01                                             ` Sriraman Tallam
@ 2012-06-04 21:36                                               ` H.J. Lu
  2012-06-04 22:29                                                 ` Sriraman Tallam
  2012-06-14 20:35                                               ` Sriraman Tallam
  1 sibling, 1 reply; 93+ messages in thread
From: H.J. Lu @ 2012-06-04 21:36 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

On Mon, Jun 4, 2012 at 11:59 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi,
>
>   Attaching updated patch for function multiversioning which brings
> in plenty of changes.
>
> * As suggested by Richard earlier, I have made cgraph aware of
> function versions. All nodes of function versions are chained and the
> dispatcher bodies are created on demand while building cgraph edges.
> The dispatcher body will be created if and only if there is a call or
> reference to a versioned function. Previously, I was maintaining the
> list of versions separately in a hash map, all that is gone now.
> * Now, the file multiverison.c has some helper routines that are used
> in the context of function versioning. There are no new passes and no
> new globals.
> * More tests, updated existing tests.
> * Fixed lots of bugs.
> * Updated patch description.
>
> Patch attached. Patch also available for review at
> http://codereview.appspot.com/5752064
>
> Please let me know what you think,
>

Build failed in libstdc++-v3:

/export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/locale_classes.h:546:59:
internal compiler error: tree check: expected function_decl, have
identifier_node in tourney, at cp/call.c:8498
  for (size_t __i = 0; __ret && __i < _S_categories_size - 1; ++__i)
                                                           ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.
make[5]: *** [x86_64-unknown-linux-gnu/bits/stdc++.h.gch/O2g.gch] Erro

on Linux/x86-64.


-- 
H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-06-04 21:36                                               ` H.J. Lu
@ 2012-06-04 22:29                                                 ` Sriraman Tallam
  2012-06-05 13:56                                                   ` H.J. Lu
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-06-04 22:29 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

[-- Attachment #1: Type: text/plain, Size: 1904 bytes --]

Bug fixed and new patch attached.

Patch also available for review at http://codereview.appspot.com/5752064

Thanks,
-Sri.

On Mon, Jun 4, 2012 at 2:36 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Jun 4, 2012 at 11:59 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi,
>>
>>   Attaching updated patch for function multiversioning which brings
>> in plenty of changes.
>>
>> * As suggested by Richard earlier, I have made cgraph aware of
>> function versions. All nodes of function versions are chained and the
>> dispatcher bodies are created on demand while building cgraph edges.
>> The dispatcher body will be created if and only if there is a call or
>> reference to a versioned function. Previously, I was maintaining the
>> list of versions separately in a hash map, all that is gone now.
>> * Now, the file multiverison.c has some helper routines that are used
>> in the context of function versioning. There are no new passes and no
>> new globals.
>> * More tests, updated existing tests.
>> * Fixed lots of bugs.
>> * Updated patch description.
>>
>> Patch attached. Patch also available for review at
>> http://codereview.appspot.com/5752064
>>
>> Please let me know what you think,
>>
>
> Build failed in libstdc++-v3:
>
> /export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/locale_classes.h:546:59:
> internal compiler error: tree check: expected function_decl, have
> identifier_node in tourney, at cp/call.c:8498
>  for (size_t __i = 0; __ret && __i < _S_categories_size - 1; ++__i)
>                                                           ^
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See <http://gcc.gnu.org/bugs.html> for instructions.
> make[5]: *** [x86_64-unknown-linux-gnu/bits/stdc++.h.gch/O2g.gch] Erro
>
> on Linux/x86-64.
>
>
> --
> H.J.

[-- Attachment #2: mv_fe_patch.txt --]
[-- Type: text/plain, Size: 74269 bytes --]


Overview of the patch which adds front-end support to specify function versions.

Example:

int foo ();  /* Default version */
int foo () __attribute__ ((target("avx,popcnt")));/*Specialized for avx and popcnt */
int foo () __attribute__ ((target("arch=core2,ssse3")));/*Specialized for core2 and ssse3*/

int main ()
{
 int (*p)() = &foo;
 return foo () + (*p)();
}

int foo ()
{
 return 0;
}

int __attribute__ ((target("avx,popcnt")))
foo ()
{
 return 0;
}

int __attribute__ ((target("arch=core2,ssse3")))
foo ()
{
 return 0;
}

The above example has foo defined 3 times, but all 3 definitions of foo are
different versions of the same function. The call to foo in main, directly and
via a pointer, are calls to the multi-versioned function foo which is dispatched
to the right foo at run-time.

What does the patch do?

* Tracking decls that correspond to function versions of function
name, say "foo":

When the front-end sees more than one decl for "foo", with atleast one decl
tagged with "target"  attributes, it marks it as function versions. To
prevent duplicate definition errors with other versions of "foo",
"decls_match" function in cp/decl.c is made to return false when 2 decls have
the same signature but different target attributes. This will make all function
versions of "foo" to be added to the overload list of "foo".

* Change the assembler names of the function versions.

The front-end changes the assembler names of the function versions by suffixing
the sorted list of args to "target" to the function name of "foo". For example,
he assembler name of "void foo () __attribute__ ((target ("sse4")))" will
become _Z3foov.sse4.

* Overload resolution:

 Function "build_over_call" in cp/call.c sees a call to function
"foo", which is multi-versioned. The overload resolution happens in
function "joust" in "cp/call.c". Here, the call to "foo" has all
possible versions of "foo" as candidates. All the candidates of "foo" are
stored in the cgraph data structures. Each version of foo is chained in a 
doubly-linked list with the default function as the first element.  This allows
any pass to access all the semantically identical versions. Also, a dispatcher
decl is created which should be called and at run-time will dispatch the right
function version.

Also, in joust, where overload resolution happens, a multiversioned function
resolution is made to return the most specialized version.  This is the version
that will be checked for dispatching first and is determined by the target.
Now, if the caller can inline this function version then a direct call is made
to this function version rather than go through the dispatcher. When a direct
call cannot be made, a call to the dispatcher function is created.

* Creating the dispatcher body.

The dispatcher body, called the resolver is made only when there is a call to a
multiversioned function dispatcher or the address of a function is taken. This
is generated during build_cgraph_edges for a call or cgraph_mark_address_taken
for a pointer reference.

* Dispatch ordering.

The order in which the function versions are checked during dispatch is based
on a priority value assigned for the ISA that is catered. More specialized 
versions are checked for dispatching first.  This is to mitigate the ambiguity
that can arise when more than one function version is valid for execution on
a particular platform.  This is not a perfect solution and in future, the user
should be allowed to assign a dispatching priority value to each version.


	* doc/tm.texi.in (TARGET_DISPATCH_VERSION): New hook description.
	(TARGET_COMPARE_VERSIONS): New hook description.
	* doc/tm.texi: Regenerate.
	* cgraphbuild.c (build_cgraph_edges): Generate body of multiversion
	function dispatcher.
	* c-family/c-common.c (handle_target_attribute): Always keep target
	attributes tagged.
	* target.def (dispatch_version): New target hook.
	(compare_versions): New hook.
	* cgraph.c (cgraph_mark_address_taken_node): Generate body of multiversion
	function dispatcher.
	* cgraph.h (cgraph_node): New members dispatcher_fndecl, resolver_fndecl,
	prev_function_version, next_function_version, dispatcher_function.
	(is_default_function_version): New function.
	(mark_function_as_version): New function.
	(has_different_version_attributes): New function.
	(function_target_attribute): New function.
	(build_dispatcher_for_function_versions): New function.
	(build_resolver_for_function_versions): New function.
	* tree.h (DECL_FUNCTION_VERSIONED): New macro.
	(tree_function_decl): New bit-field versioned_function.
	* multiversion.c: New file.
	* testsuite/g++.dg/mv1.C: New test.
	* testsuite/g++.dg/mv2.C: New test.
	* testsuite/g++.dg/mv3.C: New test.
	* testsuite/g++.dg/mv4.C: New test.
	* cp/class.c:
	(add_method): Change assembler names of function versions.
	(resolve_address_of_overloaded_function): Save all function
	version candidates. Create dispatcher decl and return address of
	dispatcher instead.
	* cp/decl.c (decls_match): Make decls unmatched for versioned
	functions.
	(duplicate_decls): Remove ambiguity for versioned functions. 
	(cxx_comdat_group): Make comdat group of versioned functions be the
	same.
	* cp/error.c (dump_exception_spec): Dump assembler name for function
	versions.
	* cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
	functions that are also marked inline.
	* cp/decl2.c:(check_classfn): Check attributes of versioned functions
	for match.
	* cp/call.c: (build_new_function_call): Check if versioned functions
	have a default version.
	(build_over_call): Make calls to multiversioned functions
	to call the dispatcher.
	(joust): For calls to multi-versioned functions, make the most
	specialized function version win.
	(tourney): Generate dispatcher decl for function versions.
	* cp/mangle.c (write_unqualified_name): Use assembler name for
	versioned functions.
	* Makefile.in: Add multiversion.o
	* config/i386/i386.c (add_condition_to_bb): New function.
	(get_builtin_code_for_version): New function.
	(ix86_compare_versions): New function.
	(feature_compare): New function.
	(ix86_dispatch_version): New function.
	(TARGET_DISPATCH_VERSION): New macro.
	(TARGET_COMPARE_VERSION): New macro.

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 188209)
+++ gcc/doc/tm.texi	(working copy)
@@ -11001,6 +11001,21 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb})
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
+@deftypefn {Target Hook} int TARGET_COMPARE_VERSIONS (tree @var{decl1}, tree @var{decl2})
+This hook is used to compare the target attributes in two functions to
+figure out which function's features get higher priority.  This is used
+during multi-versioning to figure out the order of dispatching. @var{decl1}
+and @var{decl2} are the two function decls that will be compared.
+@end deftypefn
+
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 188209)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -10879,6 +10879,21 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@hook TARGET_DISPATCH_VERSION
+For multi-versioned function, this hook sets up the dispatcher.
+@var{dispatch_decl} is the function that will be used to dispatch the
+version. @var{fndecls} are the function choices for dispatch.
+@var{empty_bb} is an basic block in @var{dispatch_decl} where the
+code to do the dispatch will be added.
+@end deftypefn
+
+@hook TARGET_COMPARE_VERSIONS
+This hook is used to compare the target attributes in two functions to
+figure out which function's features get higher priority.  This is used
+during multi-versioning to figure out the order of dispatching. @var{decl1}
+and @var{decl2} are the two function decls that will be compared.
+@end deftypefn
+
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/cgraphbuild.c
===================================================================
--- gcc/cgraphbuild.c	(revision 188209)
+++ gcc/cgraphbuild.c	(working copy)
@@ -288,7 +288,6 @@ mark_store (gimple stmt, tree t, void *data)
      }
   return false;
 }
-
 /* Create cgraph edges for function calls.
    Also look for functions and variables having addresses taken.  */
 
@@ -316,6 +315,20 @@ build_cgraph_edges (void)
 	      int freq = compute_call_stmt_bb_frequency (current_function_decl,
 							 bb);
 	      decl = gimple_call_fndecl (stmt);
+	      /* If a call to a multiversioned function dispatcher is found,
+		 generate the body to dispatch the right function
+		 at run-time.  */
+	      if (decl && cgraph_get_node (decl)
+		  && cgraph_get_node (decl)->dispatcher_function)
+		{
+		  tree resolver_decl;
+		  struct cgraph_node *curr_node = cgraph_get_node (decl);
+		  gcc_assert (curr_node->next_function_version);
+		  resolver_decl
+		    = build_resolver_for_function_versions (curr_node);
+		  gcc_assert (resolver_decl);
+		}
+
 	      if (decl)
 		cgraph_create_edge (node, cgraph_get_create_node (decl),
 				    stmt, bb->count, freq);
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 188209)
+++ gcc/c-family/c-common.c	(working copy)
@@ -8245,9 +8245,15 @@ handle_target_attribute (tree *node, tree name, tr
       warning (OPT_Wattributes, "%qE attribute ignored", name);
       *no_add_attrs = true;
     }
-  else if (! targetm.target_option.valid_attribute_p (*node, name, args,
-						      flags))
-    *no_add_attrs = true;
+  else
+    {
+      /* When a target attribute is invalid, it may also be because the
+	 target for the compilation unit and the attribute match.  For
+         instance, target attribute "xxx" is invalid when -mxxx is used.
+         When used with multiversioning, removing the attribute can lead
+         to duplicate definitions.  So, keep the attribute tagged.  */
+      targetm.target_option.valid_attribute_p (*node, name, args, flags);
+    }
 
   return NULL_TREE;
 }
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 188209)
+++ gcc/target.def	(working copy)
@@ -1249,6 +1249,24 @@ DEFHOOK
  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
  hook_tree_tree_int_treep_bool_null)
 
+/* Target hook to generate the dispatching code for calls to multi-versioned
+   functions.  DISPATCH_DECL is the function that will have the dispatching
+   logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the
+   basic block in DISPATCH_DECL which will contain the code.  */
+DEFHOOK
+(dispatch_version,
+ "",
+ int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL)
+
+/* Target hook to compare the priority of target features in function DECL1
+   and DECL2.  It returns positive value if DECL1 is higher priority,
+   negative value if DECL2 is higher priority and 0 if they are the
+   same.  */
+DEFHOOK
+(compare_versions,
+ "",
+ int, (tree decl1, tree decl2), NULL)
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 DEFHOOK
Index: gcc/cgraph.c
===================================================================
--- gcc/cgraph.c	(revision 188209)
+++ gcc/cgraph.c	(working copy)
@@ -1277,6 +1277,14 @@ cgraph_mark_address_taken_node (struct cgraph_node
   node->symbol.address_taken = 1;
   node = cgraph_function_or_thunk_node (node, NULL);
   node->symbol.address_taken = 1;
+  /* If the address of a multiversioned function dispatcher is taken,
+     generate the body to dispatch the right function at run-time.  This
+     is needed as the address can be used to do an indirect call.  */
+  if (node->dispatcher_function)
+    {
+      gcc_assert (node->next_function_version);
+      build_resolver_for_function_versions (node);
+    }
 }
 
 /* Return local info for the compiled function.  */
Index: gcc/cgraph.h
===================================================================
--- gcc/cgraph.h	(revision 188209)
+++ gcc/cgraph.h	(working copy)
@@ -220,6 +220,19 @@ struct GTY(()) cgraph_node {
   struct cgraph_node *prev_sibling_clone;
   struct cgraph_node *clones;
   struct cgraph_node *clone_of;
+
+  /* If this node corresponds to a function version, this points
+     to the dispatcher function.  */
+  tree dispatcher_fndecl;
+  /* If this node is a dispatcher for function versions, this points
+     to resolver function.  */
+  tree resolver_fndecl;
+  /* Chains all the semantically identical function versions.  The
+     first function in this chain is the default function.  */
+  struct cgraph_node *prev_function_version;
+  /* If this node is a dispatcher for function versions, this also points
+     to the default function version.  */
+  struct cgraph_node *next_function_version;
   /* For functions with many calls sites it holds map from call expression
      to the edge to speed up cgraph_edge function.  */
   htab_t GTY((param_is (struct cgraph_edge))) call_site_hash;
@@ -271,6 +284,7 @@ struct GTY(()) cgraph_node {
   /* ?? We should be able to remove this.  We have enough bits in
      cgraph to calculate it.  */
   unsigned tm_clone : 1;
+  unsigned dispatcher_function : 1;
 };
 
 typedef struct cgraph_node *cgraph_node_ptr;
@@ -636,6 +650,22 @@ void cgraph_rebuild_references (void);
 int compute_call_stmt_bb_frequency (tree, basic_block bb);
 void record_references_in_initializer (tree, bool);
 
+/* In multiversion.c  */
+/* Returns true if DECL is a function version and is the default version.  */
+bool is_default_function_version (tree decl);
+void mark_function_as_version (tree);
+/* Returns true if the "target" attribute strings of DECL1 and DECL2
+   dont match.  */
+bool has_different_version_attributes (const tree decl1, const tree decl2);
+/* Return the target attribute if decl is FUNCTION_DECL. */
+tree function_target_attribute (const tree decl);
+/* Builds the dispatcher decl for function versions in VEC.  */
+tree build_dispatcher_for_function_versions (VEC (tree,heap) *vec);
+/* Builds the resolver function which picks the right function version at
+   run-time.  NODE is the cgraph node of the dispatcher which points to
+   the various function versions to be resolved.  */
+tree build_resolver_for_function_versions (struct cgraph_node *node);
+
 /* In ipa.c  */
 bool symtab_remove_unreachable_nodes (bool, FILE *);
 cgraph_node_set cgraph_node_set_new (void);
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 188209)
+++ gcc/tree.h	(working copy)
@@ -3523,6 +3523,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
 #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
 
+/* In FUNCTION_DECL, this is set if this function has other versions generated
+   using "target" attributes.  The default version is the one which does not
+   have any "target" attribute set. */
+#define DECL_FUNCTION_VERSIONED(NODE)\
+   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
+
 /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
    arguments/result/saved_tree fields by front ends.   It was either inherit
    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
@@ -3567,8 +3573,8 @@ struct GTY(()) tree_function_decl {
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
-
-  /* 1 bit left */
+  unsigned versioned_function : 1;
+  /* No bits left.  */
 };
 
 /* The source language of the translation-unit.  */
Index: gcc/multiversion.c
===================================================================
--- gcc/multiversion.c	(revision 0)
+++ gcc/multiversion.c	(revision 0)
@@ -0,0 +1,572 @@
+/* Function Multiversioning.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsriram@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+/* This file contains routines for handling multiversioned functions.
+
+   Function versions are created by using the same function signature but
+   also tagging attribute "target" to specify the platform type for which
+   the version must be executed.  Here is an example:
+
+   int foo ()
+   {
+     printf ("Execute as default");
+     return 0;
+   }
+
+   int  __attribute__ ((target ("arch=corei7")))
+   foo ()
+   {
+     printf ("Execute for corei7");
+     return 0;
+   }
+   
+   int main ()
+   {
+     return foo ();
+   } 
+
+   The call to foo in main is replaced with a call to a dispatcher function
+   that contains the resolver code to call the correct function version at
+   run-time.  */
+
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "langhooks.h"
+#include "flags.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "toplev.h"
+#include "params.h"
+#include "coverage.h"
+#include "ggc.h"
+#include "basic-block.h"
+#include "toplev.h"
+#include "tree-dump.h"
+#include "output.h"
+#include "gimple-pretty-print.h"
+#include "target.h"
+#include "tree-flow.h"
+
+/* Comparator function to be used in qsort routine to sort attribute
+   specification strings to "target".  */
+
+static int
+attr_strcmp (const void *v1, const void *v2)
+{
+  const char *c1 = *(char *const*)v1;
+  const char *c2 = *(char *const*)v2;
+  return strcmp (c1, c2);
+}
+
+/* STR is the argument to target attribute.  This function tokenizes
+   the comma separated arguments, sorts them and returns a string which
+   is a unique identifier for the comma separated arguments.  */
+
+static char *
+sorted_attr_string (const char *str)
+{
+  char **args = NULL;
+  char *attr_str, *ret_str;
+  char *attr = NULL;
+  unsigned int argnum = 1;
+  unsigned int i;
+
+  for (i = 0; i < strlen (str); i++)
+    if (str[i] == ',')
+      argnum++;
+
+  attr_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (attr_str, str);
+
+  for (i = 0; i < strlen (attr_str); i++)
+    if (attr_str[i] == '=')
+      attr_str[i] = '_';
+
+  if (argnum == 1)
+    return attr_str;
+
+  args = (char **)xmalloc (argnum * sizeof (char *));
+
+  i = 0;
+  attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      args[i] = attr;
+      i++;
+      attr = strtok (NULL, ",");
+    }
+
+  qsort (args, argnum, sizeof (char*), attr_strcmp);
+
+  ret_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (ret_str, args[0]);
+  for (i = 1; i < argnum; i++)
+    {
+      strcat (ret_str, "_");
+      strcat (ret_str, args[i]);
+    }
+
+  free (args);
+  free (attr_str);
+  return ret_str;
+}
+
+/* Returns true when only one of DECL1 and DECL2 is marked with "target"
+   or if the "target" attribute strings of DECL1 and DECL2 dont match.  */
+
+bool
+has_different_version_attributes (const tree decl1, const tree decl2)
+{
+  tree attr1, attr2;
+  char *c1, *c2;
+  bool ret = false;
+
+  if (TREE_CODE (decl1) != FUNCTION_DECL
+      || TREE_CODE (decl2) != FUNCTION_DECL)
+    return false;
+
+  attr1 = function_target_attribute (decl1);
+  attr2 = function_target_attribute (decl2);
+
+  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
+    return false;
+
+  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
+      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
+    return true;
+
+  c1 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1))));
+  c2 = sorted_attr_string (
+	TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2))));
+
+  if (strcmp (c1, c2) != 0)
+     ret = true;
+
+  free (c1);
+  free (c2);
+
+  return ret;
+}
+
+/* If this decl corresponds to a function and has "target" attribute,
+   append the attribute string to its assembler name.  */
+
+static void
+version_assembler_name (const tree decl)
+{
+  tree version_attr;
+  const char *orig_name, *version_string, *attr_str;
+  char *assembler_name;
+  tree assembler_name_tree;
+  
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+    return;
+
+  if (DECL_DECLARED_INLINE_P (decl)
+      && lookup_attribute ("gnu_inline",
+			   DECL_ATTRIBUTES (decl)))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Function versions cannot be marked as gnu_inline,"
+	      " bodies have to be generated\n");
+
+  if (DECL_VIRTUAL_P (decl)
+      || DECL_VINDEX (decl))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Virtual function versioning not supported\n");
+
+  version_attr = function_target_attribute (decl);
+  /* target attribute string is NULL for default functions.  */
+  if (version_attr == NULL_TREE)
+    return;
+
+  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+  version_string
+    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
+
+  attr_str = sorted_attr_string (version_string);
+  assembler_name = (char *) xmalloc (strlen (orig_name)
+				     + strlen (attr_str) + 2);
+
+  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
+  if (dump_file)
+    fprintf (dump_file, "Assembler name set to %s for function version %s\n",
+	     assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
+
+  assembler_name_tree = get_identifier (assembler_name);
+
+  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree);
+  SET_DECL_RTL (decl, NULL);
+}
+
+void
+mark_function_as_version (const tree decl)
+{
+  if (DECL_FUNCTION_VERSIONED (decl))
+    return;
+  DECL_FUNCTION_VERSIONED (decl) = 1;
+  version_assembler_name (decl);
+}
+
+/* Returns target attribute tree DECL is a FUNCTION_DECL, returns
+   NULL otherwise.  */
+
+tree
+function_target_attribute (const tree decl)
+{
+  if (TREE_CODE (decl) == FUNCTION_DECL)
+    return lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  return NULL;
+}
+
+/* Returns true if decl is multi-versioned and DECL is the default function,
+   that is it is not tagged with "target" attribute.  */
+
+bool
+is_default_function_version (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && (function_target_attribute (decl) == NULL_TREE));
+}
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* Return a new name by appending SUFFIX to the DECL name.  If
+   make_unique is true, append the full path name.  */
+
+static char *
+make_name (tree decl, const char *suffix, bool make_unique)
+{
+  char *global_var_name;
+  int name_len;
+  const char *name;
+  const char *unique_name = NULL;
+
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* Get a unique name that can be used globally without any chances
+     of collision at link time.  */
+  if (make_unique)
+    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
+
+  name_len = strlen (name) + strlen (suffix) + 2;
+
+  if (make_unique)
+    name_len += strlen (unique_name) + 1;
+  global_var_name = (char *) xmalloc (name_len);
+
+  /* Use '.' to concatenate names as it is demangler friendly.  */
+  if (make_unique)
+      snprintf (global_var_name, name_len, "%s.%s.%s", name,
+		unique_name, suffix);
+  else
+      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
+
+  return global_var_name;
+}
+
+
+/* Make the resolver function decl to dispatch the versions of
+   a multi-versioned function,  DEFAULT_DECL.  Create an
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_resolver_func (const tree default_decl,
+		    const tree dispatch_decl,
+		    basic_block *empty_bb)
+{
+  char *resolver_name;
+  tree decl, type, decl_name, t;
+  basic_block new_bb;
+  tree old_current_function_decl;
+  bool is_uniq = false;
+
+  /* IFUNC's have to be globally visible.  So, if the default_decl is
+     not, then the name of the IFUNC should be made unique.  */
+  if (TREE_PUBLIC (default_decl) == 0)
+    is_uniq = true;
+
+  /* Append the filename to the resolver function if the versions are
+     not externally visible.  This is because the resolver function has
+     to be externally visible for the loader to find it.  So, appending
+     the filename will prevent conflicts with a resolver function from
+     another module which is based on the same version name.  */
+  resolver_name = make_name (default_decl, "resolver", is_uniq);
+
+  /* The resolver function should return a (void *). */
+  type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  decl_name = get_identifier (resolver_name);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  /* IFUNC resolvers have to be externally visible.  */
+  TREE_PUBLIC (decl) = 1;
+  DECL_UNINLINABLE (decl) = 1;
+
+  DECL_EXTERNAL (decl) = 0;
+  DECL_EXTERNAL (dispatch_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+  TREE_READONLY (decl) = 0;
+  DECL_PURE_P (decl) = 0;
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
+      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
+    }
+  else if (TREE_PUBLIC (default_decl))
+    {
+      /* In this case, each translation unit with a call to this
+	 versioned function will put out a resolver.  Ensure it
+	 is comdat to keep just one copy.  */
+      DECL_COMDAT (decl) = 1;
+      make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+    }
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  current_function_decl = decl;
+  gimple_register_cfg_hooks ();
+  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
+  cfun->curr_properties |=
+    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
+     PROP_gimple_any);
+  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
+  *empty_bb = new_bb;
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      gcc_assert (cgraph_get_node (default_decl));
+      symtab_add_to_same_comdat_group (
+	(symtab_node) cgraph_get_node (decl),
+	(symtab_node) cgraph_get_node (default_decl));
+    }
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  gcc_assert (dispatch_decl != NULL);
+  /* Mark dispatch_decl as "ifunc" with resolver as resolver_name.  */
+  DECL_ATTRIBUTES (dispatch_decl) 
+    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));
+
+  /* Create the alias for dispatch to resolver here.  */
+  cgraph_create_function_alias (dispatch_decl, decl);
+  return decl;
+}
+
+/* Generate the dispatching code body to dispatch multi-versioned function
+   DECL.  The target hook is called to process the "target" attributes and
+   provide the code to dispatch the right function at run-time.  NODE points
+   to the dispatcher decl whose body will be created.  */
+
+tree 
+build_resolver_for_function_versions (struct cgraph_node *node)
+{
+  tree resolver_decl;
+  basic_block empty_bb;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  tree old_current_function_decl;
+  tree default_ver_decl;
+  struct cgraph_node *versn;
+
+  if (node->resolver_fndecl)
+    return node->resolver_fndecl;
+
+  default_ver_decl = node->next_function_version->symbol.decl;
+  resolver_decl = make_resolver_func (default_ver_decl,
+			  		    node->symbol.decl,
+					    &empty_bb);
+  node->resolver_fndecl = resolver_decl;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (resolver_decl));
+  current_function_decl = resolver_decl;
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+
+  for (versn = node->next_function_version; versn;
+       versn = versn->next_function_version)
+    {
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (versn->symbol.decl))
+        error_at (DECL_SOURCE_LOCATION (versn->symbol.decl),
+		  "Virtual function multiversioning not supported");
+      VEC_safe_push (tree, heap, fn_ver_vec, versn->symbol.decl);
+    }
+
+  gcc_assert (targetm.dispatch_version);
+  targetm.dispatch_version (resolver_decl, fn_ver_vec, &empty_bb);
+
+  rebuild_cgraph_edges (); 
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+  return resolver_decl;
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher.
+   Return the decl created.  */
+
+static tree
+make_dispatcher_decl (const tree decl)
+{
+  tree func_decl;
+  char *func_name, *resolver_name;
+  tree fn_type, func_type;
+  bool is_uniq = false;
+
+  if (TREE_PUBLIC (decl) == 0)
+    is_uniq = true;
+
+  func_name = make_name (decl, "ifunc", is_uniq);
+  resolver_name = make_name (decl, "resolver", is_uniq);
+  gcc_assert (resolver_name);
+
+  fn_type = TREE_TYPE (decl);
+  func_type = build_function_type (TREE_TYPE (fn_type),
+				    TYPE_ARG_TYPES (fn_type));
+  
+  func_decl = build_fn_decl (func_name, func_type);
+  TREE_USED (func_decl) = 1;
+  DECL_CONTEXT (func_decl) = NULL_TREE;
+  DECL_INITIAL (func_decl) = error_mark_node;
+  DECL_ARTIFICIAL (func_decl) = 1;
+  /* Mark this func as external, the resolver will flip it again if
+     it gets generated.  */
+  DECL_EXTERNAL (func_decl) = 1;
+  /* This will be of type IFUNCs have to be externally visible.  */
+  TREE_PUBLIC (func_decl) = 1;
+
+  return func_decl;  
+}
+
+tree
+build_dispatcher_for_function_versions (VEC (tree,heap) *fn_ver_vec)
+{
+  struct cgraph_node *node = NULL;
+  struct cgraph_node *default_node = NULL;
+  struct cgraph_node *curr_node = NULL;
+  int ix;
+  tree ele;
+  tree dispatch_decl = NULL;
+
+  gcc_assert (fn_ver_vec != NULL);
+
+  /* Find the default version.  */
+  for (ix = 0; VEC_iterate (tree, fn_ver_vec, ix, ele); ++ix)
+    {
+      if (is_default_function_version (ele))
+	{
+	  default_node = cgraph_get_create_node (ele);
+	  break;
+	}
+    }
+
+  /* If there is no default node, just return NULL.  */
+  if (!default_node)
+    return NULL;
+
+  if (default_node->dispatcher_fndecl)
+    return default_node->dispatcher_fndecl;
+
+#if defined (ASM_OUTPUT_TYPE_DIRECTIVE) && HAVE_GNU_INDIRECT_FUNCTION
+  /* Right now, the dispatching is done via ifunc.  */
+  dispatch_decl = make_dispatcher_decl (default_node->symbol.decl); 
+#else
+  error_at (DECL_SOURCE_LOCATION (default_node->symbol.decl),
+	    "Multiversioning needs ifunc which is not supported "
+	    "in this configuration");
+#endif
+  default_node->dispatcher_fndecl = dispatch_decl;
+  curr_node = cgraph_get_create_node (dispatch_decl);
+  gcc_assert (curr_node);
+  curr_node->dispatcher_function = 1;
+  cgraph_mark_address_taken_node (default_node);
+
+  for (ix = 0; VEC_iterate (tree, fn_ver_vec, ix, ele); ++ix)
+    {
+      node = cgraph_get_create_node (ele);
+      gcc_assert (node != NULL && DECL_FUNCTION_VERSIONED (ele));
+      if (node == default_node)
+	continue;
+      gcc_assert (function_target_attribute (ele) != NULL_TREE);
+      if (curr_node->next_function_version)
+ 	{
+	  node->next_function_version = curr_node->next_function_version;
+	  curr_node->next_function_version->prev_function_version = node;
+	}
+      curr_node->next_function_version = node;
+      node->prev_function_version = curr_node;
+      node->dispatcher_fndecl = dispatch_decl;
+    }
+
+  /* The default version should be the first node.  */
+  default_node->next_function_version = curr_node->next_function_version;
+  curr_node->next_function_version->prev_function_version = default_node;
+  curr_node->next_function_version = default_node;
+  
+  return dispatch_decl; 
+}
Index: gcc/cgraphunit.c
===================================================================
--- gcc/cgraphunit.c	(revision 188209)
+++ gcc/cgraphunit.c	(working copy)
@@ -939,7 +939,7 @@ cgraph_analyze_functions (void)
 
 	      for (edge = cnode->callees; edge; edge = edge->next_callee)
 		if (edge->callee->local.finalized)
-		  enqueue_node ((symtab_node)edge->callee);
+                 enqueue_node ((symtab_node)edge->callee);
 
 	      /* If decl is a clone of an abstract function, mark that abstract
 		 function so that we don't release its body. The DECL_INITIAL() of that
Index: gcc/testsuite/g++.dg/mv2.C
===================================================================
--- gcc/testsuite/g++.dg/mv2.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv2.C	(revision 0)
@@ -0,0 +1,119 @@
+/* Test case to check if Multiversioning chooses the correct
+   dispatching order when versions are for various ISAs.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The dispatch checks should be in the exact reverse order of the
+   declarations below.  */
+int foo () __attribute__ ((target ("mmx")));
+int foo () __attribute__ ((target ("sse")));
+int foo () __attribute__ ((target ("sse2")));
+int foo () __attribute__ ((target ("sse3")));
+int foo () __attribute__ ((target ("ssse3")));
+int foo () __attribute__ ((target ("sse4.1")));
+int foo () __attribute__ ((target ("sse4.2")));
+int foo () __attribute__ ((target ("popcnt")));
+int foo () __attribute__ ((target ("avx")));
+int foo () __attribute__ ((target ("avx2")));
+
+int main ()
+{
+
+  int val = foo ();
+
+  if (__builtin_cpu_supports ("avx2"))
+    assert (val == 1);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("popcnt"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("sse4.2"))
+    assert (val == 4);
+  else if (__builtin_cpu_supports ("sse4.1"))
+    assert (val == 5);
+  else if (__builtin_cpu_supports ("ssse3"))
+    assert (val == 6);
+  else if (__builtin_cpu_supports ("sse3"))
+    assert (val == 7);
+  else if (__builtin_cpu_supports ("sse2"))
+    assert (val == 8);
+  else if (__builtin_cpu_supports ("sse"))
+    assert (val == 9);
+  else if (__builtin_cpu_supports ("mmx"))
+    assert (val == 10);
+  else
+    assert (val == 0);
+
+  return 0;
+}
+
+int
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 2;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return 1;
+}
Index: gcc/testsuite/g++.dg/mv4.C
===================================================================
--- gcc/testsuite/g++.dg/mv4.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv4.C	(revision 0)
@@ -0,0 +1,23 @@
+/* Test case to check if the compiler generates an error message
+   when the default version of a multiversioned function is absent
+   and its pointer is taken.  */
+
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2" } */
+
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int main ()
+{
+  int (*p)() = &foo; /* { dg-error "Pointer to a multiversioned function without a default is not allowed" {} } */
+  return (p)();
+}
Index: gcc/testsuite/g++.dg/mv1.C
===================================================================
--- gcc/testsuite/g++.dg/mv1.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv1.C	(revision 0)
@@ -0,0 +1,202 @@
+/* Test case to check if Multiversioning works.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -fPIC" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The other versions of foo.  Mix up the ordering and 
+   check if the dispatching does it in the order of priority. */
+/* Check combination of target attributes.  */
+int foo () __attribute__ ((target("arch=corei7,popcnt")));
+/* The target operands in this declaration and the definition are re-ordered.
+   This should still work.  */
+int foo () __attribute__ ((target("ssse3,avx2")));
+
+/* Check for all target attributes for which dispatchers are available.  */
+/* Check arch= */
+int foo () __attribute__((target("arch=core2")));
+int foo () __attribute__((target("arch=corei7")));
+int foo () __attribute__((target("arch=atom")));
+/* Check ISAs  */
+int foo () __attribute__((target("sse3")));
+int foo () __attribute__((target("sse2")));
+int foo () __attribute__((target("sse")));
+int foo () __attribute__((target("avx")));
+int foo () __attribute__((target("sse4.2")));
+int foo () __attribute__((target("popcnt")));
+int foo () __attribute__((target("sse4.1")));
+int foo () __attribute__((target("ssse3")));
+int foo () __attribute__((target("mmx")));
+int foo () __attribute__((target("avx2")));
+/* Check more arch=.  */
+int foo () __attribute__((target("arch=amdfam10")));
+int foo () __attribute__((target("arch=bdver1")));
+int foo () __attribute__((target("arch=bdver2")));
+
+int (*p)() = &foo;
+int main ()
+{
+  int val = foo ();
+  assert (val ==  (*p)());
+
+  /* Check in the exact same order in which the dispatching
+     is expected to happen.  */
+  if (__builtin_cpu_is ("bdver1"))
+    assert (val == 1);
+  else if (__builtin_cpu_is ("bdver2"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("avx2")
+	   && __builtin_cpu_supports ("ssse3"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("avx2"))
+    assert (val == 4);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 5);
+  else if (__builtin_cpu_is ("corei7")
+	   && __builtin_cpu_supports ("popcnt"))
+    assert (val == 6);
+  else if (__builtin_cpu_supports ("popcnt"))
+    assert (val == 7);
+  else if (__builtin_cpu_is ("corei7"))
+    assert (val == 8);
+  else if (__builtin_cpu_supports ("sse4.2"))
+    assert (val == 9);
+  else if (__builtin_cpu_supports ("sse4.1"))
+    assert (val == 10);
+  else if (__builtin_cpu_is ("amdfam10h"))
+    assert (val == 11);
+  else if (__builtin_cpu_is ("core2"))
+    assert (val == 12);
+  else if (__builtin_cpu_is ("atom"))
+    assert (val == 13);
+  else if (__builtin_cpu_supports ("ssse3"))
+    assert (val == 14);
+  else if (__builtin_cpu_supports ("sse3"))
+    assert (val == 15);
+  else if (__builtin_cpu_supports ("sse2"))
+    assert (val == 16);
+  else if (__builtin_cpu_supports ("sse"))
+    assert (val == 17);
+  else if (__builtin_cpu_supports ("mmx"))
+    assert (val == 18);
+  else
+    assert (val == 0);
+  
+  return 0;
+}
+
+int foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=corei7,popcnt")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("avx2,ssse3")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("arch=core2")))
+foo ()
+{
+  return 12;
+}
+
+int __attribute__ ((target("arch=corei7")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("arch=atom")))
+foo ()
+{
+  return 13;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return 15;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return 16;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return 17;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return 14;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return 18;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("arch=amdfam10")))
+foo ()
+{
+  return 11;
+}
+
+int __attribute__ ((target("arch=bdver1")))
+foo ()
+{
+  return 1;
+}
+
+int __attribute__ ((target("arch=bdver2")))
+foo ()
+{
+  return 2;
+}
Index: gcc/testsuite/g++.dg/mv3.C
===================================================================
--- gcc/testsuite/g++.dg/mv3.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv3.C	(revision 0)
@@ -0,0 +1,37 @@
+/* Test case to check if a call to a multiversioned function
+   is replaced with a direct call to the particular version when
+   the most specialized version's target attributes match the
+   caller.  
+  
+   In this program, foo is multiversioned but there is no default
+   function.  This is an error if the call has to go through a
+   dispatcher.  However, the call to foo in bar can be replaced
+   with a direct call to the popcnt version of foo.  Hence, this
+   test should pass.  */
+
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2" } */
+
+
+/* Default version.  */
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target ("popcnt")))
+bar ()
+{
+  return foo ();
+}
+
+int main ()
+{
+  return bar ();
+}
Index: gcc/cp/class.c
===================================================================
--- gcc/cp/class.c	(revision 188209)
+++ gcc/cp/class.c	(working copy)
@@ -1092,7 +1092,20 @@ add_method (tree type, tree method, tree using_dec
 	      || same_type_p (TREE_TYPE (fn_type),
 			      TREE_TYPE (method_type))))
 	{
-	  if (using_decl)
+	  /* For function versions, their parms and types match
+	     but they are not duplicates.  Record function versions
+	     as and when they are found.  */
+	  if (TREE_CODE (fn) == FUNCTION_DECL
+	      && TREE_CODE (method) == FUNCTION_DECL
+	      && (function_target_attribute (fn)
+		  || function_target_attribute (method))
+	      && has_different_version_attributes (fn, method))
+ 	    {
+	      mark_function_as_version (fn);
+	      mark_function_as_version (method);
+	      continue;
+	    }
+	  else if (using_decl)
 	    {
 	      if (DECL_CONTEXT (fn) == type)
 		/* Defer to the local function.  */
@@ -6862,6 +6875,7 @@ resolve_address_of_overloaded_function (tree targe
   tree matches = NULL_TREE;
   tree fn;
   tree target_fn_type;
+  VEC (tree, heap) *fn_ver_vec = NULL;
 
   /* By the time we get here, we should be seeing only real
      pointer-to-member types, not the internal POINTER_TYPE to
@@ -6926,9 +6940,19 @@ resolve_address_of_overloaded_function (tree targe
 	  if (DECL_ANTICIPATED (fn))
 	    continue;
 
-	  /* See if there's a match.  */
+	  /* See if there's a match.   For functions that are multi-versioned,
+	     all the versions match.  */
 	  if (same_type_p (target_fn_type, static_fn_type (fn)))
-	    matches = tree_cons (fn, NULL_TREE, matches);
+	    {
+	      matches = tree_cons (fn, NULL_TREE, matches);
+	      /*If versioned, push all possible versions into a vector.  */
+	      if (DECL_FUNCTION_VERSIONED (fn))
+		{
+		  if (fn_ver_vec == NULL)
+		   fn_ver_vec = VEC_alloc (tree, heap, 2);
+		  VEC_safe_push (tree, heap, fn_ver_vec, fn); 
+		}
+	    }
 	}
     }
 
@@ -7023,10 +7047,15 @@ resolve_address_of_overloaded_function (tree targe
       tree match;
 
       fn = TREE_PURPOSE (matches);
-      for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
-	if (!decls_match (fn, TREE_PURPOSE (match)))
-	  break;
 
+      /* For multi-versioned functions, more than one match is just fine.  */
+      if (DECL_FUNCTION_VERSIONED (fn))
+	match = NULL_TREE;
+      else
+        for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+  	  if (!decls_match (fn, TREE_PURPOSE (match)))
+	    break;
+
       if (match)
 	{
 	  if (flags & tf_error)
@@ -7089,6 +7118,28 @@ resolve_address_of_overloaded_function (tree targe
       perform_or_defer_access_check (access_path, fn, fn);
     }
 
+  /* If a pointer to a function that is multi-versioned is requested, the
+     pointer to the dispatcher function is returned instead.  This works
+     well because indirectly calling the function will dispatch the right
+     function version at run-time. Also, the function address is kept
+     unique.  */
+  if (DECL_FUNCTION_VERSIONED (fn))
+    {
+      tree dispatcher_decl;
+      gcc_assert (fn_ver_vec != NULL);
+      dispatcher_decl = build_dispatcher_for_function_versions (fn_ver_vec);
+      if (!dispatcher_decl)
+	{
+	  error_at (input_location, "Pointer to a multiversioned function"
+		    " without a default is not allowed");
+	  return error_mark_node;
+	}
+      retrofit_lang_decl (dispatcher_decl);
+      mark_used (fn);
+      VEC_free (tree, heap, fn_ver_vec);
+      fn = dispatcher_decl;
+    }
+
   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
     return cp_build_addr_expr (fn, flags);
   else
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	(revision 188209)
+++ gcc/cp/decl.c	(working copy)
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "pointer-set.h"
 #include "splay-tree.h"
 #include "plugin.h"
+#include "cgraph.h"
 
 /* Possible cases of bad specifiers type used by bad_specifiers. */
 enum bad_spec_place {
@@ -973,6 +974,19 @@ decls_match (tree newdecl, tree olddecl)
       if (t1 != t2)
 	return 0;
 
+      /* The decls dont match if they correspond to two different versions
+	 of the same function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) 
+	  && has_different_version_attributes (newdecl, olddecl))
+	{
+	  /* One of the decls could be the default without the "target"
+	     attribute. Set it to be a versioned function here.  */
+	  mark_function_as_version (newdecl);
+	  mark_function_as_version (olddecl);
+	  return 0;
+	}
+
       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
 	  && ! (DECL_EXTERN_C_P (newdecl)
 		&& DECL_EXTERN_C_P (olddecl)))
@@ -1490,7 +1504,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
 	      error ("previous declaration %q+#D here", olddecl);
 	      return NULL_TREE;
 	    }
-	  else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
+	  /* For function versions, params and types match, but they
+	     are not ambiguous.  */
+	  else if ((!DECL_FUNCTION_VERSIONED (newdecl)
+		    && !DECL_FUNCTION_VERSIONED (olddecl))
+		   && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
 			      TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
 	    {
 	      error ("new declaration %q#D", newdecl);
@@ -2262,6 +2280,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
+  /* If the olddecl is a version, so is the newdecl.  */
+  if (TREE_CODE (newdecl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (olddecl))
+    DECL_FUNCTION_VERSIONED (newdecl) = 1;
+
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
     {
       int function_size;
@@ -14044,7 +14067,11 @@ cxx_comdat_group (tree decl)
 	  else
 	    break;
 	}
-      name = DECL_ASSEMBLER_NAME (decl);
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl))
+	name = DECL_NAME (decl);
+      else
+        name = DECL_ASSEMBLER_NAME (decl);
     }
 
   return name;
Index: gcc/cp/error.c
===================================================================
--- gcc/cp/error.c	(revision 188209)
+++ gcc/cp/error.c	(working copy)
@@ -1539,8 +1539,15 @@ dump_exception_spec (tree t, int flags)
 static void
 dump_function_name (tree t, int flags)
 {
-  tree name = DECL_NAME (t);
+  tree name;
 
+  /* For function versions, use the assembler name as the decl name is
+     the same for all versions.  */
+  if (DECL_FUNCTION_VERSIONED (t))
+    name = DECL_ASSEMBLER_NAME (t);
+  else
+    name = DECL_NAME (t);
+
   /* We can get here with a decl that was synthesized by language-
      independent machinery (e.g. coverage.c) in which case it won't
      have a lang_specific structure attached and DECL_CONSTRUCTOR_P
Index: gcc/cp/semantics.c
===================================================================
--- gcc/cp/semantics.c	(revision 188209)
+++ gcc/cp/semantics.c	(working copy)
@@ -3780,8 +3780,11 @@ expand_or_defer_fn_1 (tree fn)
       /* If the user wants us to keep all inline functions, then mark
 	 this function as needed so that finish_file will make sure to
 	 output it later.  Similarly, all dllexport'd functions must
-	 be emitted; there may be callers in other DLLs.  */
-      if ((flag_keep_inline_functions
+	 be emitted; there may be callers in other DLLs.
+	 Also, mark this function as needed if it is marked inline but
+	 is a multi-versioned function.  */
+      if (((flag_keep_inline_functions
+	    || DECL_FUNCTION_VERSIONED (fn))
 	   && DECL_DECLARED_INLINE_P (fn)
 	   && !DECL_REALLY_EXTERN (fn))
 	  || (flag_keep_inline_dllexport
Index: gcc/cp/decl2.c
===================================================================
--- gcc/cp/decl2.c	(revision 188209)
+++ gcc/cp/decl2.c	(working copy)
@@ -674,9 +674,13 @@ check_classfn (tree ctype, tree function, tree tem
 	  if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
 	    continue;
 
+	  /* While finding a match, same types and params are not enough
+	     if the function is versioned.  Also check version ("target")
+	     attributes.  */
 	  if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
 			   TREE_TYPE (TREE_TYPE (fndecl)))
 	      && compparms (p1, p2)
+	      && !has_different_version_attributes (function, fndecl)
 	      && (!is_template
 		  || comp_template_parms (template_parms,
 					  DECL_TEMPLATE_PARMS (fndecl)))
Index: gcc/cp/call.c
===================================================================
--- gcc/cp/call.c	(revision 188209)
+++ gcc/cp/call.c	(working copy)
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "timevar.h"
+#include "cgraph.h"
 
 /* The various kinds of conversion.  */
 
@@ -3904,6 +3905,16 @@ build_new_function_call (tree fn, VEC(tree,gc) **a
     {
       if (complain & tf_error)
 	{
+	  /* If the call is to a multiversioned function without
+	     a default version, overload resolution will fail.  */
+	  if (candidates
+	      && TREE_CODE (candidates->fn) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (candidates->fn))
+	    error_at (location_of (DECL_NAME (OVL_CURRENT (fn))),
+		      "Call to multiversioned function %<%D(%A)%> with"
+		      " no default version", DECL_NAME (OVL_CURRENT (fn)),
+		      build_tree_list_vec (*args));
+
 	  if (!any_viable_p && candidates && ! candidates->next
 	      && (TREE_CODE (candidates->fn) == FUNCTION_DECL))
 	    return cp_build_function_call_vec (candidates->fn, args, complain);
@@ -6832,6 +6843,30 @@ build_over_call (struct z_candidate *cand, int fla
   if (!already_used)
     mark_used (fn);
 
+  /* For calls to a multi-versioned function, overload resolution
+     returns the function with the highest target priority, that is,
+     the version that will checked for dispatching first.  If this
+     version is inlinable, a direct call can be made otherwise it
+     should go through the dispatcher.  */
+
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && !targetm.target_option.can_inline_p (current_function_decl, fn))
+    {
+      tree dispatcher_decl = NULL;
+      struct cgraph_node *node = cgraph_get_node (fn);
+      if (node != NULL)
+        dispatcher_decl = cgraph_get_node (fn)->dispatcher_fndecl;
+      if (dispatcher_decl == NULL)
+	{
+	  error_at (input_location, "Call to multiversioned function"
+		    " without a default is not allowed");
+	  return NULL;
+	}
+      retrofit_lang_decl (dispatcher_decl);
+      gcc_assert (dispatcher_decl != NULL);
+      fn = dispatcher_decl;
+    }
+
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
     {
       tree t;
@@ -8089,6 +8124,29 @@ joust (struct z_candidate *cand1, struct z_candida
   size_t i;
   size_t len;
 
+  /* For Candidates of a multi-versioned function,  make the version with
+     the most specialized target attributes, highest priority win.  This
+     version will be checked for dispatching first.  If this version can
+     be inlined into the caller the front-end will simply make a direct
+     call to this function.  */
+
+  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL
+       && DECL_FUNCTION_VERSIONED (cand1->fn))
+      ||(TREE_CODE (cand2->fn) == FUNCTION_DECL
+	 && DECL_FUNCTION_VERSIONED (cand2->fn)))
+    {
+      /* Both functions must be marked versioned.  */
+      gcc_assert (DECL_FUNCTION_VERSIONED (cand1->fn)
+		  && DECL_FUNCTION_VERSIONED (cand2->fn));
+
+      /* Always make the version with the higher priority, more
+	 specialized, win.  */
+      if (targetm.compare_versions (cand1->fn, cand2->fn) >= 0)
+	return 1;
+      else
+	return -1;
+    }
+
   /* Candidates that involve bad conversions are always worse than those
      that don't.  */
   if (cand1->viable > cand2->viable)
@@ -8434,6 +8492,21 @@ tourney (struct z_candidate *candidates, tsubst_fl
   int fate;
   int champ_compared_to_predecessor = 0;
 
+  /* For multiversioned functions, aggregate all the versions here for
+     generating the dispatcher body later if necessary.  */
+
+  if (TREE_CODE (candidates->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (candidates->fn))
+    {
+      VEC (tree, heap) *fn_ver_vec = NULL;
+      struct z_candidate *ver = candidates;
+      fn_ver_vec = VEC_alloc (tree, heap, 2);
+      for (;ver; ver = ver->next)
+        VEC_safe_push (tree, heap, fn_ver_vec, ver->fn);
+      build_dispatcher_for_function_versions (fn_ver_vec);
+      VEC_free (tree, heap, fn_ver_vec);
+    }
+
   /* Walk through the list once, comparing each current champ to the next
      candidate, knocking out a candidate or two with each comparison.  */
 
Index: gcc/cp/mangle.c
===================================================================
--- gcc/cp/mangle.c	(revision 188209)
+++ gcc/cp/mangle.c	(working copy)
@@ -1245,7 +1245,12 @@ write_unqualified_name (const tree decl)
     {
       MANGLE_TRACE_TREE ("local-source-name", decl);
       write_char ('L');
-      write_source_name (DECL_NAME (decl));
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && DECL_ASSEMBLER_NAME_SET_P (decl))
+	write_source_name (DECL_ASSEMBLER_NAME (decl));
+      else
+	write_source_name (DECL_NAME (decl));
       /* The default discriminator is 1, and that's all we ever use,
 	 so there's no code to output one here.  */
     }
@@ -1260,7 +1265,14 @@ write_unqualified_name (const tree decl)
                && LAMBDA_TYPE_P (type))
         write_closure_type_name (type);
       else
-        write_source_name (DECL_NAME (decl));
+	{
+	  if (TREE_CODE (decl) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (decl)
+	      && DECL_ASSEMBLER_NAME_SET_P (decl))
+	    write_source_name (DECL_ASSEMBLER_NAME (decl));
+	  else
+	    write_source_name (DECL_NAME (decl));
+	}
     }
 }
 
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 188209)
+++ gcc/Makefile.in	(working copy)
@@ -1312,6 +1312,7 @@ OBJS = \
 	mcf.o \
 	mode-switching.o \
 	modulo-sched.o \
+	multiversion.o \
 	omega.o \
 	omp-low.o \
 	optabs.o \
@@ -3044,6 +3045,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h
    $(DF_H) $(TIMEVAR_H) $(TREE_PASS_H) $(RECOG_H) $(EXPR_H) \
    $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \
    $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
+multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+   $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
+   $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
+   $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
+   $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h
 cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h \
    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) toplev.h $(DIAGNOSTIC_CORE_H) \
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 188209)
+++ gcc/config/i386/i386.c	(working copy)
@@ -27732,6 +27732,473 @@ ix86_init_mmx_sse_builtins (void)
     }
 }
 
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
+   to return a pointer to VERSION_DECL if the outcome of the expression
+   formed by PREDICATE_CHAIN is true.  This function will be called during
+   version dispatch to decide which function version to execute.  It returns
+   the basic block at the end to which more conditions can be added.  */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     tree predicate_chain, basic_block new_bb)
+{
+  gimple return_stmt;
+  tree convert_expr, result_var;
+  gimple convert_stmt;
+  gimple call_cond_stmt;
+  gimple if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+
+  tree cond_var, and_expr_var = NULL_TREE;
+  gimple_seq gseq;
+
+  tree old_current_function_decl;
+  tree predicate_decl, predicate_arg;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+  current_function_decl = function_decl;
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node, NULL);
+  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  return_stmt = gimple_build_return (result_var);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      current_function_decl = old_current_function_decl;
+      return new_bb;
+    }
+
+  while (predicate_chain != NULL)
+    {
+      cond_var = create_tmp_var (integer_type_node, NULL);
+      predicate_decl = TREE_PURPOSE (predicate_chain);
+      predicate_arg = TREE_VALUE (predicate_chain);
+      call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+      gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+      gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+      gimple_set_bb (call_cond_stmt, new_bb);
+      gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+      predicate_chain = TREE_CHAIN (predicate_chain);
+      
+      if (and_expr_var == NULL)
+        and_expr_var = cond_var;
+      else
+	{
+	  gimple assign_stmt;
+	  /* Use MIN_EXPR to check if any integer is zero?.
+	     and_expr_var = min_expr <cond_var, and_expr_var>  */
+	  assign_stmt = gimple_build_assign (and_expr_var,
+			  build2 (MIN_EXPR, integer_type_node,
+				  cond_var, and_expr_var));
+
+	  gimple_set_block (assign_stmt, DECL_INITIAL (function_decl));
+	  gimple_set_bb (assign_stmt, new_bb);
+	  gimple_seq_add_stmt (&gseq, assign_stmt);
+	}
+    }
+
+  if_else_stmt = gimple_build_cond (GT_EXPR, and_expr_var,
+	  		            integer_zero_node,
+				    NULL_TREE, NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR, 0);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  return bb3;
+}
+
+/* This parses the attribute arguments to target in DECL and determines
+   the right builtin to use to match the platform specification.
+   For now, only one target argument ("arch=" or "<-m>xxx") is allowed.
+   It returns the priority value for this version decl.  If PREDICATE_LIST
+   is not NULL, it stores the list of cpu features that need to be checked
+   before dispatching this function.  */
+
+static unsigned int
+get_builtin_code_for_version (tree decl, tree *predicate_list)
+{
+  tree attrs;
+  struct cl_target_option cur_target;
+  tree target_node;
+  struct cl_target_option *new_target;
+  const char *arg_str = NULL;
+  const char *attrs_str = NULL;
+  char *tok_str = NULL;
+  char *token;
+  unsigned int priority = 0;
+
+  /* Priority of i386 features, greater value is higher priority.   This is
+     used to decide the order in which function dispatch must happen.  For
+     instance, a version specialized for SSE4.2 should be checked for dispatch
+     before a version for SSE3, as SSE4.2 implies SSE3.  */
+  enum feature_priority
+  {
+    P_ZERO = 0,
+    P_MMX,
+    P_SSE,
+    P_SSE2,
+    P_SSE3,
+    P_SSSE3,
+    P_PROC_SSSE3,
+    P_SSE4_a,
+    P_PROC_SSE4_a,
+    P_SSE4_1,
+    P_SSE4_2,
+    P_PROC_SSE4_2,
+    P_POPCNT,
+    P_AVX,
+    P_AVX2,
+    P_FMA,
+    P_PROC_FMA
+  };
+
+  /* These are the target attribute strings for which a dispatcher is
+     available, from fold_builtin_cpu.  */
+
+  static struct _feature_list
+    {
+      const char *const name;
+      const enum feature_priority priority;
+    }
+  const feature_list[] =
+    {
+      {"mmx", P_MMX},
+      {"sse", P_SSE},
+      {"sse2", P_SSE2},
+      {"sse3", P_SSE3},
+      {"ssse3", P_SSSE3},
+      {"sse4.1", P_SSE4_1},
+      {"sse4.2", P_SSE4_2},
+      {"popcnt", P_POPCNT},
+      {"avx", P_AVX},
+      {"avx2", P_AVX2}
+    };
+
+
+  static unsigned int NUM_FEATURES
+    = sizeof (feature_list) / sizeof (struct _feature_list);
+
+  unsigned int i;
+
+  tree predicate_chain = NULL_TREE;
+  tree predicate_decl, predicate_arg;
+
+  attrs = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  gcc_assert (attrs != NULL);
+
+  attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+  gcc_assert (TREE_CODE (attrs) == STRING_CST);
+  attrs_str = TREE_STRING_POINTER (attrs);
+
+
+  /* Handle arch= if specified.  For priority, set it to be 1 more than
+     the best instruction set the processor can handle.  For instance, if
+     there is a version for atom and a version for ssse3 (the highest ISA
+     priority for atom), the atom version must be checked for dispatch
+     before the ssse3 version. */
+  if (strstr (attrs_str, "arch=") != NULL)
+    {
+      cl_target_option_save (&cur_target, &global_options);
+      target_node = ix86_valid_target_attribute_tree (attrs);
+    
+      gcc_assert (target_node);
+      new_target = TREE_TARGET_OPTION (target_node);
+      gcc_assert (new_target);
+      
+      if (new_target->arch_specified && new_target->arch > 0)
+	{
+	  switch (new_target->arch)
+	    {
+	    case PROCESSOR_CORE2_32:
+	    case PROCESSOR_CORE2_64:
+	      arg_str = "core2";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_COREI7_32:
+	    case PROCESSOR_COREI7_64:
+	      arg_str = "corei7";
+	      priority = P_PROC_SSE4_2;
+	      break;
+	    case PROCESSOR_ATOM:
+	      arg_str = "atom";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_AMDFAM10:
+	      arg_str = "amdfam10h";
+	      priority = P_PROC_SSE4_a;
+	      break;
+	    case PROCESSOR_BDVER1:
+	      arg_str = "bdver1";
+	      priority = P_PROC_FMA;
+	      break;
+	    case PROCESSOR_BDVER2:
+	      arg_str = "bdver2";
+	      priority = P_PROC_FMA;
+	      break;
+	    }  
+	}    
+    
+      cl_target_option_restore (&global_options, &cur_target);
+	
+      if (predicate_list && arg_str == NULL)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+	    	"No dispatcher found for the versioning attributes");
+	  return 0;
+	}
+    
+      if (predicate_list)
+	{
+          predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
+          /* For a C string literal the length includes the trailing NULL.  */
+          predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+          predicate_chain = tree_cons (predicate_decl, predicate_arg,
+				       predicate_chain);
+	}
+    }
+
+  /* Process feature name.  */
+  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
+  strcpy (tok_str, attrs_str);
+  token = strtok (tok_str, ",");
+  predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
+
+  while (token != NULL)
+    {
+      /* Do not process "arch="  */
+      if (strncmp (token, "arch=", 5) == 0)
+	{
+	  token = strtok (NULL, ",");
+	  continue;
+	}
+      for (i = 0; i < NUM_FEATURES; ++i)
+	{
+	  if (strcmp (token, feature_list[i].name) == 0)
+	    {
+	      if (predicate_list)
+		{
+		  predicate_arg = build_string_literal (
+				  strlen (feature_list[i].name) + 1,
+				  feature_list[i].name);
+		  predicate_chain = tree_cons (predicate_decl, predicate_arg,
+					       predicate_chain);
+		}
+	      /* Find the maximum priority feature.  */
+	      if (feature_list[i].priority > priority)
+		priority = feature_list[i].priority;
+
+	      break;
+	    }
+	}
+      if (predicate_list && i == NUM_FEATURES)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+		    "No dispatcher found for %s", token);
+	  return 0;
+	}
+      token = strtok (NULL, ",");
+    }
+  free (tok_str);
+
+  if (predicate_list && predicate_chain == NULL_TREE)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+	        "No dispatcher found for the versioning attributes : %s",
+	        attrs_str);
+      return 0;
+    }
+  else if (predicate_list)
+    {
+      predicate_chain = nreverse (predicate_chain);
+      *predicate_list = predicate_chain;
+    }
+
+  return priority; 
+}
+
+/* This compares the priority of target features in function DECL1
+   and DECL2.  It returns positive value if DECL1 is higher priority,
+   negative value if DECL2 is higher priority and 0 if they are the
+   same.  */
+
+static int
+ix86_compare_versions (tree decl1, tree decl2)
+{
+  unsigned int priority1 = 0;
+  unsigned int priority2 = 0;
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl1)) != NULL)
+    priority1 = get_builtin_code_for_version (decl1, NULL);
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl2)) != NULL)
+    priority2 = get_builtin_code_for_version (decl2, NULL);
+
+  return (int)priority1 - (int)priority2;
+}
+ 
+static int
+feature_compare (const void *v1, const void *v2)
+{
+  typedef struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    } function_version_info;
+
+  const function_version_info c1 = *(const function_version_info *)v1;
+  const function_version_info c2 = *(const function_version_info *)v2;
+  return (c2.dispatch_priority - c1.dispatch_priority);
+}
+
+/* This is the target hook to generate the dispatch function for
+   multi-versioned functions.  DISPATCH_DECL is the function which will
+   contain the dispatch logic.  FNDECLS are the function choices for
+   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
+   in DISPATCH_DECL in which the dispatch code is generated.  */
+
+static int
+ix86_dispatch_version (tree dispatch_decl,
+		       void *fndecls_p,
+		       basic_block *empty_bb)
+{
+  tree default_decl;
+  gimple ifunc_cpu_init_stmt;
+  gimple_seq gseq;
+  tree old_current_function_decl;
+  int ix;
+  tree ele;
+  VEC (tree, heap) *fndecls;
+  unsigned int num_versions = 0;
+  unsigned int actual_versions = 0;
+  unsigned int i;
+
+  struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    }*function_version_info;
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /*fndecls_p is actually a vector.  */
+  fndecls = (VEC (tree, heap) *)fndecls_p;
+
+  /* Atleast one more version other than the default.  */
+  num_versions = VEC_length (tree, fndecls);
+  gcc_assert (num_versions >= 2);
+
+  function_version_info = (struct _function_version_info *)
+    xmalloc ((num_versions - 1) * sizeof (struct _function_version_info));
+
+  /* The first version in the vector is the default decl.  */
+  default_decl = VEC_index (tree, fndecls, 0);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
+  current_function_decl = dispatch_decl;
+
+  gseq = bb_seq (*empty_bb);
+  /* Function version dispatch is via IFUNC.  IFUNC resolvers fire before
+     constructors, so explicity call __builtin_cpu_init here.  */
+  ifunc_cpu_init_stmt = gimple_build_call_vec (
+                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
+  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
+  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
+  set_bb_seq (*empty_bb, gseq);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+
+  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
+    {
+      tree version_decl = ele;
+      tree predicate_chain = NULL_TREE;
+      unsigned int priority;
+      /* Get attribute string, parse it and find the right predicate decl.
+         The predicate function could be a lengthy combination of many
+	 features, like arch-type and various isa-variants.  */
+      priority = get_builtin_code_for_version (version_decl,
+	 			               &predicate_chain);
+
+      if (predicate_chain == NULL_TREE)
+	continue;
+
+      actual_versions++;
+      function_version_info [ix - 1].version_decl = version_decl;
+      function_version_info [ix - 1].predicate_chain = predicate_chain;
+      function_version_info [ix - 1].dispatch_priority = priority;
+    }
+
+  /* Sort the versions according to descending order of dispatch priority.  The
+     priority is based on the ISA.  This is not a perfect solution.  There
+     could still be ambiguity.  If more than one function version is suitable
+     to execute,  which one should be dispatched?  In future, allow the user
+     to specify a dispatch  priority next to the version.  */
+  qsort (function_version_info, actual_versions,
+         sizeof (struct _function_version_info), feature_compare);
+
+  for  (i = 0; i < actual_versions; ++i)
+    *empty_bb = add_condition_to_bb (dispatch_decl,
+				     function_version_info[i].version_decl,
+				     function_version_info[i].predicate_chain,
+				     *empty_bb);
+
+  /* dispatch default version at the end.  */
+  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl,
+				   NULL, *empty_bb);
+
+  free (function_version_info);
+  return 0;
+}
+
 /* This builds the processor_model struct type defined in
    libgcc/config/i386/cpuinfo.c  */
 
@@ -39673,6 +40140,12 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN ix86_fold_builtin
 
+#undef TARGET_DISPATCH_VERSION
+#define TARGET_DISPATCH_VERSION ix86_dispatch_version
+
+#undef TARGET_COMPARE_VERSIONS
+#define TARGET_COMPARE_VERSIONS ix86_compare_versions
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-06-04 22:29                                                 ` Sriraman Tallam
@ 2012-06-05 13:56                                                   ` H.J. Lu
  0 siblings, 0 replies; 93+ messages in thread
From: H.J. Lu @ 2012-06-05 13:56 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

On Mon, Jun 4, 2012 at 3:29 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Bug fixed and new patch attached.
>
> Patch also available for review at http://codereview.appspot.com/5752064
>

I think you should also export __cpu_indicator_init in libgcc_s.so.
Also, is this feature C++ only?  Can you make it to work for C?


-- 
H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-06-04 19:01                                             ` Sriraman Tallam
  2012-06-04 21:36                                               ` H.J. Lu
@ 2012-06-14 20:35                                               ` Sriraman Tallam
  2012-06-20  1:10                                                 ` Sriraman Tallam
                                                                   ` (2 more replies)
  1 sibling, 3 replies; 93+ messages in thread
From: Sriraman Tallam @ 2012-06-14 20:35 UTC (permalink / raw)
  To: jason, mark, nathan, H.J. Lu
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

+cc c++ front-end maintainers

Hi,

   C++ Frontend maintainers, Could you please take a look at the
front-end part when you find the time?

   Honza, your thoughts on the callgraph part?

   Richard, any further comments/feedback?

   Additionally, I am working on generating better mangled names for
function versions, along the lines of C++ thunks.

Thanks,
-Sri.

On Mon, Jun 4, 2012 at 11:59 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi,
>
>   Attaching updated patch for function multiversioning which brings
> in plenty of changes.
>
> * As suggested by Richard earlier, I have made cgraph aware of
> function versions. All nodes of function versions are chained and the
> dispatcher bodies are created on demand while building cgraph edges.
> The dispatcher body will be created if and only if there is a call or
> reference to a versioned function. Previously, I was maintaining the
> list of versions separately in a hash map, all that is gone now.
> * Now, the file multiverison.c has some helper routines that are used
> in the context of function versioning. There are no new passes and no
> new globals.
> * More tests, updated existing tests.
> * Fixed lots of bugs.
> * Updated patch description.
>
> Patch attached. Patch also available for review at
> http://codereview.appspot.com/5752064
>
> Please let me know what you think,
>
> Thanks,
> -Sri.
>
>
> On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi H.J,
>>
>>   Attaching new patch with 2 test cases, mv2.C checks ISAs only and
>> mv1.C checks ISAs and arches mixed. Right now, checking only arches is
>> not needed as they are mutually exclusive, any order should be fine.
>>
>> Patch also available for review here:  http://codereview.appspot.com/5752064
>>
>> Thanks,
>> -Sri.
>>
>> On Sat, May 12, 2012 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> Hi H.J.,
>>>>
>>>>   I have updated the patch to improve the dispatching method like we
>>>> discussed. Each feature gets a priority now, and the dispatching is
>>>> done in priority order. Please see i386.c for the changes.
>>>>
>>>> Patch also available for review here:  http://codereview.appspot.com/5752064
>>>>
>>>
>>> I think you need 3 tests:
>>>
>>> 1.  Only with ISA.
>>> 2.  Only with arch
>>> 3.  Mixed with ISA and arch
>>>
>>> since test mixed ISA and arch may hide issues with ISA only or arch only.
>>>
>>> --
>>> H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-06-14 20:35                                               ` Sriraman Tallam
@ 2012-06-20  1:10                                                 ` Sriraman Tallam
  2012-07-06  9:14                                                 ` Richard Guenther
  2012-07-07  6:06                                                 ` Jason Merrill
  2 siblings, 0 replies; 93+ messages in thread
From: Sriraman Tallam @ 2012-06-20  1:10 UTC (permalink / raw)
  To: jason, mark, nathan, H.J. Lu
  Cc: Richard Guenther, Jan Hubicka, Uros Bizjak, reply, gcc-patches, David Li

Ping.

On Thu, Jun 14, 2012 at 1:13 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> +cc c++ front-end maintainers
>
> Hi,
>
>   C++ Frontend maintainers, Could you please take a look at the
> front-end part when you find the time?
>
>   Honza, your thoughts on the callgraph part?
>
>   Richard, any further comments/feedback?
>
>   Additionally, I am working on generating better mangled names for
> function versions, along the lines of C++ thunks.
>
> Thanks,
> -Sri.
>
> On Mon, Jun 4, 2012 at 11:59 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi,
>>
>>   Attaching updated patch for function multiversioning which brings
>> in plenty of changes.
>>
>> * As suggested by Richard earlier, I have made cgraph aware of
>> function versions. All nodes of function versions are chained and the
>> dispatcher bodies are created on demand while building cgraph edges.
>> The dispatcher body will be created if and only if there is a call or
>> reference to a versioned function. Previously, I was maintaining the
>> list of versions separately in a hash map, all that is gone now.
>> * Now, the file multiverison.c has some helper routines that are used
>> in the context of function versioning. There are no new passes and no
>> new globals.
>> * More tests, updated existing tests.
>> * Fixed lots of bugs.
>> * Updated patch description.
>>
>> Patch attached. Patch also available for review at
>> http://codereview.appspot.com/5752064
>>
>> Please let me know what you think,
>>
>> Thanks,
>> -Sri.
>>
>>
>> On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Hi H.J,
>>>
>>>   Attaching new patch with 2 test cases, mv2.C checks ISAs only and
>>> mv1.C checks ISAs and arches mixed. Right now, checking only arches is
>>> not needed as they are mutually exclusive, any order should be fine.
>>>
>>> Patch also available for review here:  http://codereview.appspot.com/5752064
>>>
>>> Thanks,
>>> -Sri.
>>>
>>> On Sat, May 12, 2012 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>> Hi H.J.,
>>>>>
>>>>>   I have updated the patch to improve the dispatching method like we
>>>>> discussed. Each feature gets a priority now, and the dispatching is
>>>>> done in priority order. Please see i386.c for the changes.
>>>>>
>>>>> Patch also available for review here:  http://codereview.appspot.com/5752064
>>>>>
>>>>
>>>> I think you need 3 tests:
>>>>
>>>> 1.  Only with ISA.
>>>> 2.  Only with arch
>>>> 3.  Mixed with ISA and arch
>>>>
>>>> since test mixed ISA and arch may hide issues with ISA only or arch only.
>>>>
>>>> --
>>>> H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-06-14 20:35                                               ` Sriraman Tallam
  2012-06-20  1:10                                                 ` Sriraman Tallam
@ 2012-07-06  9:14                                                 ` Richard Guenther
  2012-07-06 17:38                                                   ` Sriraman Tallam
  2012-07-07  6:06                                                 ` Jason Merrill
  2 siblings, 1 reply; 93+ messages in thread
From: Richard Guenther @ 2012-07-06  9:14 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: jason, mark, nathan, H.J. Lu, Jan Hubicka, Uros Bizjak, reply,
	gcc-patches, David Li

On Thu, Jun 14, 2012 at 10:13 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> +cc c++ front-end maintainers
>
> Hi,
>
>    C++ Frontend maintainers, Could you please take a look at the
> front-end part when you find the time?

So you have (for now?) omitted the C frontend change(s)?

>    Honza, your thoughts on the callgraph part?
>
>    Richard, any further comments/feedback?

Overall I like it - the cgraph portions need comments from Honza and the
C++ portions from a C++ maintainer though.

I would appreciate a C version, too.

As you are tackling the C++ frontend first you should add some C++
specific testcases - if only to verify you properly reject cases you do not
or can not implement.  Like eventually

class Foo {
  virtual void bar() __attribute__((target("sse")));
  virtual void bar() __attribute__((target("sse2")));
};

or

template <class T>
void bar (T t) __attribute__((target("sse")));
template <class T>
void bar (T t) __attribute__((target("sse2")));
template <>
void bar (int t);

(how does regular C++ overload resolution / template specialization
interfere with the target overloads?)

Thanks,
Richard.

>    Additionally, I am working on generating better mangled names for
> function versions, along the lines of C++ thunks.
>
> Thanks,
> -Sri.
>
> On Mon, Jun 4, 2012 at 11:59 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi,
>>
>>   Attaching updated patch for function multiversioning which brings
>> in plenty of changes.
>>
>> * As suggested by Richard earlier, I have made cgraph aware of
>> function versions. All nodes of function versions are chained and the
>> dispatcher bodies are created on demand while building cgraph edges.
>> The dispatcher body will be created if and only if there is a call or
>> reference to a versioned function. Previously, I was maintaining the
>> list of versions separately in a hash map, all that is gone now.
>> * Now, the file multiverison.c has some helper routines that are used
>> in the context of function versioning. There are no new passes and no
>> new globals.
>> * More tests, updated existing tests.
>> * Fixed lots of bugs.
>> * Updated patch description.
>>
>> Patch attached. Patch also available for review at
>> http://codereview.appspot.com/5752064
>>
>> Please let me know what you think,
>>
>> Thanks,
>> -Sri.
>>
>>
>> On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Hi H.J,
>>>
>>>   Attaching new patch with 2 test cases, mv2.C checks ISAs only and
>>> mv1.C checks ISAs and arches mixed. Right now, checking only arches is
>>> not needed as they are mutually exclusive, any order should be fine.
>>>
>>> Patch also available for review here:  http://codereview.appspot.com/5752064
>>>
>>> Thanks,
>>> -Sri.
>>>
>>> On Sat, May 12, 2012 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>> Hi H.J.,
>>>>>
>>>>>   I have updated the patch to improve the dispatching method like we
>>>>> discussed. Each feature gets a priority now, and the dispatching is
>>>>> done in priority order. Please see i386.c for the changes.
>>>>>
>>>>> Patch also available for review here:  http://codereview.appspot.com/5752064
>>>>>
>>>>
>>>> I think you need 3 tests:
>>>>
>>>> 1.  Only with ISA.
>>>> 2.  Only with arch
>>>> 3.  Mixed with ISA and arch
>>>>
>>>> since test mixed ISA and arch may hide issues with ISA only or arch only.
>>>>
>>>> --
>>>> H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-07-06  9:14                                                 ` Richard Guenther
@ 2012-07-06 17:38                                                   ` Sriraman Tallam
  0 siblings, 0 replies; 93+ messages in thread
From: Sriraman Tallam @ 2012-07-06 17:38 UTC (permalink / raw)
  To: Richard Guenther
  Cc: jason, mark, nathan, H.J. Lu, Jan Hubicka, Uros Bizjak, reply,
	gcc-patches, David Li

On Fri, Jul 6, 2012 at 2:14 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
>
> On Thu, Jun 14, 2012 at 10:13 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> > +cc c++ front-end maintainers
> >
> > Hi,
> >
> >    C++ Frontend maintainers, Could you please take a look at the
> > front-end part when you find the time?
>
> So you have (for now?) omitted the C frontend change(s)?

Yes, for now. I thought I will get the C++ changes and associated
middle-end checked in first. The C changes should be easy to add, I
have to introduce a new attribute for this. So, the C front-end should
look like this:

int foo (); // default version.
int foo_sse4() __attribute__ ((version("foo"), target("sse4.2"))); //
A version of foo.

and the call will be to foo.

The version attribute will be the new one, there may be an existing
attribute that I could use too for this purpose. I was thinking if the
"alias" attribute along with the "target" attribute could be used for
this purpose but it makes things unnecessarily complicated. What do
you think?

>
> >    Honza, your thoughts on the callgraph part?
> >
> >    Richard, any further comments/feedback?
>
> Overall I like it - the cgraph portions need comments from Honza and the
> C++ portions from a C++ maintainer though.
>
> I would appreciate a C version, too.

Sure, I will get to it immediately after the current patch reaches a
stable point.

>
> As you are tackling the C++ frontend first you should add some C++
> specific testcases - if only to verify you properly reject cases you do not
> or can not implement.  Like eventually

Sure, I will add these test cases.

Thanks for reviewing,
-Sri.

>
> class Foo {
>   virtual void bar() __attribute__((target("sse")));
>   virtual void bar() __attribute__((target("sse2")));
> };
>
> or
>
> template <class T>
> void bar (T t) __attribute__((target("sse")));
> template <class T>
> void bar (T t) __attribute__((target("sse2")));
> template <>
> void bar (int t);
>
> (how does regular C++ overload resolution / template specialization
> interfere with the target overloads?)
>
> Thanks,
> Richard.
>
> >    Additionally, I am working on generating better mangled names for
> > function versions, along the lines of C++ thunks.
> >
> > Thanks,
> > -Sri.
> >
> > On Mon, Jun 4, 2012 at 11:59 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> >> Hi,
> >>
> >>   Attaching updated patch for function multiversioning which brings
> >> in plenty of changes.
> >>
> >> * As suggested by Richard earlier, I have made cgraph aware of
> >> function versions. All nodes of function versions are chained and the
> >> dispatcher bodies are created on demand while building cgraph edges.
> >> The dispatcher body will be created if and only if there is a call or
> >> reference to a versioned function. Previously, I was maintaining the
> >> list of versions separately in a hash map, all that is gone now.
> >> * Now, the file multiverison.c has some helper routines that are used
> >> in the context of function versioning. There are no new passes and no
> >> new globals.
> >> * More tests, updated existing tests.
> >> * Fixed lots of bugs.
> >> * Updated patch description.
> >>
> >> Patch attached. Patch also available for review at
> >> http://codereview.appspot.com/5752064
> >>
> >> Please let me know what you think,
> >>
> >> Thanks,
> >> -Sri.
> >>
> >>
> >> On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> >>> Hi H.J,
> >>>
> >>>   Attaching new patch with 2 test cases, mv2.C checks ISAs only and
> >>> mv1.C checks ISAs and arches mixed. Right now, checking only arches is
> >>> not needed as they are mutually exclusive, any order should be fine.
> >>>
> >>> Patch also available for review here:  http://codereview.appspot.com/5752064
> >>>
> >>> Thanks,
> >>> -Sri.
> >>>
> >>> On Sat, May 12, 2012 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> >>>> On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> >>>>> Hi H.J.,
> >>>>>
> >>>>>   I have updated the patch to improve the dispatching method like we
> >>>>> discussed. Each feature gets a priority now, and the dispatching is
> >>>>> done in priority order. Please see i386.c for the changes.
> >>>>>
> >>>>> Patch also available for review here:  http://codereview.appspot.com/5752064
> >>>>>
> >>>>
> >>>> I think you need 3 tests:
> >>>>
> >>>> 1.  Only with ISA.
> >>>> 2.  Only with arch
> >>>> 3.  Mixed with ISA and arch
> >>>>
> >>>> since test mixed ISA and arch may hide issues with ISA only or arch only.
> >>>>
> >>>> --
> >>>> H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-06-14 20:35                                               ` Sriraman Tallam
  2012-06-20  1:10                                                 ` Sriraman Tallam
  2012-07-06  9:14                                                 ` Richard Guenther
@ 2012-07-07  6:06                                                 ` Jason Merrill
  2012-07-07 18:38                                                   ` Xinliang David Li
  2 siblings, 1 reply; 93+ messages in thread
From: Jason Merrill @ 2012-07-07  6:06 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: mark, nathan, H.J. Lu, Richard Guenther, Jan Hubicka,
	Uros Bizjak, reply, gcc-patches, David Li

On 06/14/2012 04:13 PM, Sriraman Tallam wrote:
>     C++ Frontend maintainers, Could you please take a look at the
> front-end part when you find the time?

It seems to me that what you have here are target-specific attributes 
that affect the signature of a function such that they make two 
declarations different that would otherwise declare the same function. 
Stepping away from the specific notion of versioning, it seems that 
these are the questions that you want the front end to be able to ask 
about these attributes:

* Does this attribute affect a function signature?
* Do the attributes on these two declarations make them different?
* Do the attributes on these two declarations make one a better match?
* Given a call to function X, should I call another function instead?
* Return a string representation of the attributes on this function that 
affect its signature.

Does this seem like a worthwhile direction to other people, or do you 
like better the approach the patch takes, handling versioning directly?

Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-07-07  6:06                                                 ` Jason Merrill
@ 2012-07-07 18:38                                                   ` Xinliang David Li
  2012-07-08 11:21                                                     ` Jason Merrill
  0 siblings, 1 reply; 93+ messages in thread
From: Xinliang David Li @ 2012-07-07 18:38 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Sriraman Tallam, mark, nathan, H.J. Lu, Richard Guenther,
	Jan Hubicka, Uros Bizjak, reply, gcc-patches

On Fri, Jul 6, 2012 at 11:05 PM, Jason Merrill <jason@redhat.com> wrote:
> On 06/14/2012 04:13 PM, Sriraman Tallam wrote:
>>
>>     C++ Frontend maintainers, Could you please take a look at the
>> front-end part when you find the time?
>
>
> It seems to me that what you have here are target-specific attributes that
> affect the signature of a function such that they make two declarations
> different that would otherwise declare the same function. Stepping away from
> the specific notion of versioning, it seems that these are the questions
> that you want the front end to be able to ask about these attributes:
>
> * Does this attribute affect a function signature?

The question becomes if a caller 'bar' with target attribute 'x' can
make a call to a function 'foo' with an incompatible target attribute
'y'. If the answer is no, then the target attribute is part of 'foo's
signature.  I think the answer is yes -- the attribute affects a
function signature.

> * Do the attributes on these two declarations make them different?

yes.

> * Do the attributes on these two declarations make one a better match?

yes -- and there are rules defined for that.

> * Given a call to function X, should I call another function instead?

The binding can happen at compile time (given caller/callee attribute)
or at the runtime.

> * Return a string representation of the attributes on this function that
> affect its signature.

yes.

>
> Does this seem like a worthwhile direction to other people, or do you like
> better the approach the patch takes, handling versioning directly?

There are prior discussions about this. The direct way of handling it
is to use __builtin_dispatch, but we concluded that using function
overloading is much more user friendly. Note that Intel's icc has a
similar feature to the overloading approach implemented by Sri here.

thanks,

David
>
> Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-07-07 18:38                                                   ` Xinliang David Li
@ 2012-07-08 11:21                                                     ` Jason Merrill
  2012-07-09 21:27                                                       ` Xinliang David Li
  0 siblings, 1 reply; 93+ messages in thread
From: Jason Merrill @ 2012-07-08 11:21 UTC (permalink / raw)
  To: Xinliang David Li
  Cc: Sriraman Tallam, mark, nathan, H.J. Lu, Richard Guenther,
	Jan Hubicka, Uros Bizjak, reply, gcc-patches

On 07/07/2012 08:38 PM, Xinliang David Li wrote:
>> It seems to me that what you have here are target-specific attributes that
>> affect the signature of a function such that they make two declarations
>> different that would otherwise declare the same function. Stepping away from
>> the specific notion of versioning, it seems that these are the questions
>> that you want the front end to be able to ask about these attributes:
>>
>> * Does this attribute affect a function signature?
>
> The question becomes if a caller 'bar' with target attribute 'x' can
> make a call to a function 'foo' with an incompatible target attribute
> 'y'. If the answer is no, then the target attribute is part of 'foo's
> signature.  I think the answer is yes -- the attribute affects a
> function signature.

Yes, clearly the answer is yes for the target attribute.  But I wasn't 
asking someone to answer those questions; I was saying that those are 
the questions the front end needs to be able to ask of the back end in 
order to implement this functionality in a more generic way.

Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-07-08 11:21                                                     ` Jason Merrill
@ 2012-07-09 21:27                                                       ` Xinliang David Li
  2012-07-10  9:46                                                         ` Jason Merrill
  0 siblings, 1 reply; 93+ messages in thread
From: Xinliang David Li @ 2012-07-09 21:27 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Sriraman Tallam, mark, nathan, H.J. Lu, Richard Guenther,
	Jan Hubicka, Uros Bizjak, reply, gcc-patches

Ok.  Do you have specific comments on the patch?

thanks,

David

On Sun, Jul 8, 2012 at 4:20 AM, Jason Merrill <jason@redhat.com> wrote:
> On 07/07/2012 08:38 PM, Xinliang David Li wrote:
>>>
>>> It seems to me that what you have here are target-specific attributes
>>> that
>>> affect the signature of a function such that they make two declarations
>>> different that would otherwise declare the same function. Stepping away
>>> from
>>> the specific notion of versioning, it seems that these are the questions
>>> that you want the front end to be able to ask about these attributes:
>>>
>>> * Does this attribute affect a function signature?
>>
>>
>> The question becomes if a caller 'bar' with target attribute 'x' can
>> make a call to a function 'foo' with an incompatible target attribute
>> 'y'. If the answer is no, then the target attribute is part of 'foo's
>> signature.  I think the answer is yes -- the attribute affects a
>> function signature.
>
>
> Yes, clearly the answer is yes for the target attribute.  But I wasn't
> asking someone to answer those questions; I was saying that those are the
> questions the front end needs to be able to ask of the back end in order to
> implement this functionality in a more generic way.
>
> Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-07-09 21:27                                                       ` Xinliang David Li
@ 2012-07-10  9:46                                                         ` Jason Merrill
  2012-07-10 16:09                                                           ` Xinliang David Li
  0 siblings, 1 reply; 93+ messages in thread
From: Jason Merrill @ 2012-07-10  9:46 UTC (permalink / raw)
  To: Xinliang David Li
  Cc: Sriraman Tallam, mark, nathan, H.J. Lu, Richard Guenther,
	Jan Hubicka, Uros Bizjak, reply, gcc-patches

On 07/09/2012 11:27 PM, Xinliang David Li wrote:
> Ok.  Do you have specific comments on the patch?

My comment is "Perhaps we want to implement this using a more generic 
mechanism."  I was thinking to defer a detailed code review until that 
question is settled.

Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-07-10  9:46                                                         ` Jason Merrill
@ 2012-07-10 16:09                                                           ` Xinliang David Li
       [not found]                                                             ` <CAAs8HmxHF38ktt6syjWp-MpjiX+6NcXh7_8Xn6iKnAiF2vRymQ@mail.gmail.com>
  0 siblings, 1 reply; 93+ messages in thread
From: Xinliang David Li @ 2012-07-10 16:09 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Sriraman Tallam, mark, nathan, H.J. Lu, Richard Guenther,
	Jan Hubicka, Uros Bizjak, reply, gcc-patches

On Tue, Jul 10, 2012 at 2:46 AM, Jason Merrill <jason@redhat.com> wrote:
> On 07/09/2012 11:27 PM, Xinliang David Li wrote:
>>
>> Ok.  Do you have specific comments on the patch?
>
>
> My comment is "Perhaps we want to implement this using a more generic
> mechanism."  I was thinking to defer a detailed code review until that
> question is settled.

We all like more generic solutions :)


Sri, can you provide more descriptions on FE changes -- this will help
reviewers get started.

By the way, there are a couple of files with bad contents and needs
re-upload -- e.g, cp/decl.c.

thanks,

David

>
> Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
       [not found]                                                             ` <CAAs8HmxHF38ktt6syjWp-MpjiX+6NcXh7_8Xn6iKnAiF2vRymQ@mail.gmail.com>
@ 2012-07-19 20:40                                                               ` Jason Merrill
  2012-07-30 19:16                                                                 ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: Jason Merrill @ 2012-07-19 20:40 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Xinliang David Li, mark, nathan, H.J. Lu, Richard Guenther,
	Jan Hubicka, Uros Bizjak, reply, gcc-patches

On 07/10/2012 03:14 PM, Sriraman Tallam wrote:
> I am using the questions you asked previously
> to explain how I solved each of them. When working on this patch, these
> are the exact questions I had and tried to address it.
>
> * Does this attribute affect a function signature?
>
> The function signature should be changed when there is more than one
> definition/declaration of foo distinguished by unique target attributes.
 >[...]

I agree.  I was trying to suggest that these questions are what the 
front end needs to care about, not about versioning specifically.  If 
these questions are turned into target hooks, all of the logic specific 
to versioning can be contained in the target.

My only question intended to be answered by humans is, do people think 
moving the versioning logic behind more generic target hooks is worthwhile?

Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-07-19 20:40                                                               ` Jason Merrill
@ 2012-07-30 19:16                                                                 ` Sriraman Tallam
  2012-08-25  0:34                                                                   ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-07-30 19:16 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Xinliang David Li, mark, nathan, H.J. Lu, Richard Guenther,
	Jan Hubicka, Uros Bizjak, reply, gcc-patches

On Thu, Jul 19, 2012 at 1:39 PM, Jason Merrill <jason@redhat.com> wrote:
>
> On 07/10/2012 03:14 PM, Sriraman Tallam wrote:
>>
>> I am using the questions you asked previously
>> to explain how I solved each of them. When working on this patch, these
>> are the exact questions I had and tried to address it.
>>
>> * Does this attribute affect a function signature?
>>
>> The function signature should be changed when there is more than one
>> definition/declaration of foo distinguished by unique target attributes.
>
> >[...]
>
> I agree.  I was trying to suggest that these questions are what the front end needs to care about, not about versioning specifically.  If these questions are turned into target hooks, all of the logic specific to versioning can be contained in the target.
>
> My only question intended to be answered by humans is, do people think moving the versioning logic behind more generic target hooks is worthwhile?

I have  some comments related

For the example below,

// Default version.
int foo ()
{
  .....
}

// Version  XXX feature supported by Target ABC.
int foo __attribute__ ((target ("XXX")))
{
   ....
}

How should the second version of foo be treated for targets where
feature XXX is not supported? Right now, I am working on having my
patch completely ignore such function versions when compiled for
targets that do not understand the attribute. I could move this check
into a generic target hook so that a function definition that does not
make sense for the current target is ignored.

Also, currently the patch uses target hooks to do the following:

- Find if a particular version can be called directly, rather than go
through the dispatcher.
- Determine what the dispatcher body should be.
- Determining the order in which function versions must be dispatched.

I do not have a strong opinion on whether the entire logic should be
based on target hooks.

Thanks,
-Sri.

>
>
>
> Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-07-30 19:16                                                                 ` Sriraman Tallam
@ 2012-08-25  0:34                                                                   ` Sriraman Tallam
  2012-09-18 16:29                                                                     ` Sriraman Tallam
                                                                                       ` (2 more replies)
  0 siblings, 3 replies; 93+ messages in thread
From: Sriraman Tallam @ 2012-08-25  0:34 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Xinliang David Li, mark, nathan, H.J. Lu, Richard Guenther,
	Jan Hubicka, Uros Bizjak, reply, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2932 bytes --]

Hi Jason,

   I have created a new patch to use target hooks for all the
functionality and make the front-end just call the target hooks at the
appropriate places. This is more like what you suggested in a previous
mail. In particular, target hooks address the following questions:

* Determine if two function decls with the same signature are versions.
* Determine the new assembler name of a function version.
* Generate the dispatcher function for a set of function versions.
* Compare versions to see if one has a higher priority over the other.

Patch attached and also available for review at:

http://codereview.appspot.com/5752064/

Hope this is more along the lines of what you had in mind, please let
me know what you think.

Thanks,
-Sri.


On Mon, Jul 30, 2012 at 12:01 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Thu, Jul 19, 2012 at 1:39 PM, Jason Merrill <jason@redhat.com> wrote:
>>
>> On 07/10/2012 03:14 PM, Sriraman Tallam wrote:
>>>
>>> I am using the questions you asked previously
>>> to explain how I solved each of them. When working on this patch, these
>>> are the exact questions I had and tried to address it.
>>>
>>> * Does this attribute affect a function signature?
>>>
>>> The function signature should be changed when there is more than one
>>> definition/declaration of foo distinguished by unique target attributes.
>>
>> >[...]
>>
>> I agree.  I was trying to suggest that these questions are what the front end needs to care about, not about versioning specifically.  If these questions are turned into target hooks, all of the logic specific to versioning can be contained in the target.
>>
>> My only question intended to be answered by humans is, do people think moving the versioning logic behind more generic target hooks is worthwhile?
>
> I have  some comments related
>
> For the example below,
>
> // Default version.
> int foo ()
> {
>   .....
> }
>
> // Version  XXX feature supported by Target ABC.
> int foo __attribute__ ((target ("XXX")))
> {
>    ....
> }
>
> How should the second version of foo be treated for targets where
> feature XXX is not supported? Right now, I am working on having my
> patch completely ignore such function versions when compiled for
> targets that do not understand the attribute. I could move this check
> into a generic target hook so that a function definition that does not
> make sense for the current target is ignored.
>
> Also, currently the patch uses target hooks to do the following:
>
> - Find if a particular version can be called directly, rather than go
> through the dispatcher.
> - Determine what the dispatcher body should be.
> - Determining the order in which function versions must be dispatched.
>
> I do not have a strong opinion on whether the entire logic should be
> based on target hooks.
>
> Thanks,
> -Sri.
>
>>
>>
>>
>> Jason

[-- Attachment #2: mv_fe_patch_new.txt --]
[-- Type: text/plain, Size: 74859 bytes --]

Overview of the patch which adds support to specify function versions.  This is
only enabled for target i386.

Example:

int foo ();  /* Default version */
int foo () __attribute__ ((target("avx,popcnt")));/*Specialized for avx and popcnt */
int foo () __attribute__ ((target("arch=core2,ssse3")));/*Specialized for core2 and ssse3*/

int main ()
{
 int (*p)() = &foo;
 return foo () + (*p)();
}

int foo ()
{
 return 0;
}

int __attribute__ ((target("avx,popcnt")))
foo ()
{
 return 0;
}

int __attribute__ ((target("arch=core2,ssse3")))
foo ()
{
 return 0;
}

The above example has foo defined 3 times, but all 3 definitions of foo are
different versions of the same function. The call to foo in main, directly and
via a pointer, are calls to the multi-versioned function foo which is dispatched
to the right foo at run-time.

Front-end changes:

The front-end changes are calls at appropriate places to target hooks that
determine the following:

* Determine if two function decls with the same signature are versions.
* Determine the assembler name of a function version.
* Generate the dispatcher function for a set of function versions.
* Compare versions to see if one has a higher priority over the other.

All the implementation happens in the target-specific config/i386/i386.c.

What does the patch do?

* Tracking decls that correspond to function versions of function
name, say "foo":

When the front-end sees more than one decl for "foo", it calls a target hook to
determine if they are versions. To prevent duplicate definition errors with other
 versions of "foo", "decls_match" function in cp/decl.c is made to return false
 when 2 decls have are deemed versions by the target. This will make all function
versions of "foo" to be added to the overload list of "foo".

* Change the assembler names of the function versions.

For i386, the target changes the assembler names of the function versions by
 suffixing the sorted list of args to "target" to the function name of "foo". For
example, he assembler name of "void foo () __attribute__ ((target ("sse4")))" will
become _Z3foov.sse4.

* Overload resolution:

 Function "build_over_call" in cp/call.c sees a call to function
"foo", which is multi-versioned. The overload resolution happens in
function "joust" in "cp/call.c". Here, the call to "foo" has all
possible versions of "foo" as candidates. All the candidates of "foo" are
stored in the cgraph data structures. Each version of foo is chained in a 
doubly-linked list with the default function as the first element.  This allows
any pass to access all the semantically identical versions. A call to a
multi-versioned function will be replaced by a call to a dispatcher function,
determined by a target hook, to execute the right function version at run-time.

Optimization to directly call a version when possible:
Also, in joust, where overload resolution happens, a multiversioned function
resolution is made to return the most specialized version.  This is the version
that will be checked for dispatching first and is determined by the target.
Now, if the caller can inline this function version then a direct call is made
to this function version rather than go through the dispatcher. When a direct
call cannot be made, a call to the dispatcher function is created.

* Creating the dispatcher body.

The dispatcher body, called the resolver is made only when there is a call to a
multiversioned function dispatcher or the address of a function is taken. This
is generated during build_cgraph_edges for a call or cgraph_mark_address_taken
for a pointer reference. This is done by another target hook.

* Dispatch ordering.

The order in which the function versions are checked during dispatch is based
on a priority value assigned for the ISA that is catered. More specialized 
versions are checked for dispatching first.  This is to mitigate the ambiguity
that can arise when more than one function version is valid for execution on
a particular platform.  This is not a perfect solution and in future, the user
should be allowed to assign a dispatching priority value to each version.



	* doc/tm.texi.in (TARGET_OPTION_FUNCTION_VERSIONS): New hook description.
	* (TARGET_COMPARE_VERSION_PRIORITY): New hook description.
	* (TARGET_SET_VERSION_ASSEMBLER_NAME): New hook description.
	* (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New hook description.
	* (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New hook description.
	* doc/tm.texi: Regenerate.
	* cgraphbuild.c (build_cgraph_edges): Generate body of multiversion
	function dispatcher.
	* c-family/c-common.c (handle_target_attribute): Warn for invalid attributes.
	* target.def (compare_version_priority): New target hook.
	* (set_version_assembler_name): New target hook.
	* (generate_version_dispatcher_body): New target hook.
	* (get_function_versions_dispatcher): New target hook.
	* (function_versions): New target hook.
	* cgraph.c (cgraph_mark_address_taken_node): Generate body of multiversion
	function dispatcher.
	* cgraph.h (cgraph_node): New members version_dispatcher_decl,
	prev_function_version, next_function_version, dispatcher_function.
	* tree.h (DECL_FUNCTION_VERSIONED): New macro.
	(tree_function_decl): New bit-field versioned_function.
	* testsuite/g++.dg/mv1.C: New test.
	* testsuite/g++.dg/mv2.C: New test.
	* testsuite/g++.dg/mv3.C: New test.
	* testsuite/g++.dg/mv4.C: New test.
	* cp/class.c:
	(add_method): Change assembler names of function versions.
	(resolve_address_of_overloaded_function): Save all function
	version candidates. Create dispatcher decl and return address of
	dispatcher instead.
	* cp/decl.c (decls_match): Make decls unmatched for versioned
	functions.
	(duplicate_decls): Remove ambiguity for versioned functions. 
	(cxx_comdat_group): Make comdat group of versioned functions be the
	same.
	* cp/error.c (dump_exception_spec): Dump assembler name for function
	versions.
	* cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
	functions that are also marked inline.
	* cp/decl2.c:(check_classfn): Check attributes of versioned functions
	for match.
	* cp/call.c: (build_new_function_call): Check if versioned functions
	have a default version.
	(build_over_call): Make calls to multiversioned functions
	to call the dispatcher.
	(joust): For calls to multi-versioned functions, make the most
	specialized function version win.
	(tourney): Generate dispatcher decl for function versions.
	* cp/mangle.c (write_unqualified_name): Use assembler name for
	versioned functions.
	* Makefile.in: Add multiversion.o
	* config/i386/i386.c (add_condition_to_bb): New function.
	(get_builtin_code_for_version): New function.
	(ix86_compare_version_priority): New function.
	(feature_compare): New function.
	(dispatch_function_versions): New function.
	* (ix86_function_versions):New function.
	* (attr_strcmp):New function.
	* (sorted_attr_string):New function.
	* (ix86_set_version_assembler_name):New function.
	* (make_name):New function.
	* (make_dispatcher_decl):New function.
	* (is_function_default_version):New function.
	* (ix86_get_function_versions_dispatcher):New function.
	* (make_attribute):New function.
	* (make_resolver_func):New function.
	* (ix86_generate_version_dispatcher_body):New function.
	* (TARGET_COMPARE_VERSION_PRIORITY):New macro.
	* (TARGET_SET_VERSION_ASSEMBLER_NAME):New macro.
	* (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER):New macro.
	* (TARGET_GENERATE_VERSION_DISPATCHER_BODY):New macro.
	* (TARGET_OPTION_FUNCTION_VERSIONS):New macro.

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 190493)
+++ gcc/doc/tm.texi	(working copy)
@@ -9894,6 +9894,11 @@ changed via the optimize attribute or pragma, see
 @code{TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE}
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_OPTION_FUNCTION_VERSIONS (tree @var{decl1}, tree @var{decl2})
+This target hook returns @code{true} if @var{FN1} and @var{FN2} are
+versions of the same function.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_CAN_INLINE_P (tree @var{caller}, tree @var{callee})
 This target hook returns @code{false} if the @var{caller} function
 cannot inline @var{callee}, based on target specific information.  By
@@ -10925,6 +10930,32 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_COMPARE_VERSION_PRIORITY (tree @var{decl1}, tree @var{decl2})
+This hook is used to compare the target attributes in two functions to
+figure out which function's features get higher priority.  This is used
+during multi-versioning to figure out the order of dispatching. @var{decl1}
+and @var{decl2} are the two function decls that will be compared.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_SET_VERSION_ASSEMBLER_NAME (tree @var{decl})
+This hook is for getting the new assembler name of a function that is
+a version.
+@end deftypefn
+
+@deftypefn {Target Hook} tree TARGET_GET_FUNCTION_VERSIONS_DISPATCHER (void *@var{arglist})
+This hook is used to get the dispatcher function for a set of function
+versions.  The dispatcher function is called to invoke the rignt function
+version at run-time. @var{arglist} is the vector of function versions
+that should be considered for dispatch.
+@end deftypefn
+
+@deftypefn {Target Hook} tree TARGET_GENERATE_VERSION_DISPATCHER_BODY (void *@var{arg})
+This hook is used to generate the dispatcher logic to invoke the right
+function version at runtime for a given set of function versions.
+@var{arg} points to the callgraph node of the dispatcher function whose
+body must be generated.
+@end deftypefn
+
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 190493)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -9763,6 +9763,11 @@ changed via the optimize attribute or pragma, see
 @code{TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE}
 @end deftypefn
 
+@hook TARGET_OPTION_FUNCTION_VERSIONS
+This target hook returns @code{true} if @var{FN1} and @var{FN2} are
+versions of the same function.
+@end deftypefn
+
 @hook TARGET_CAN_INLINE_P
 This target hook returns @code{false} if the @var{caller} function
 cannot inline @var{callee}, based on target specific information.  By
@@ -10783,6 +10788,32 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@hook TARGET_COMPARE_VERSION_PRIORITY
+This hook is used to compare the target attributes in two functions to
+figure out which function's features get higher priority.  This is used
+during multi-versioning to figure out the order of dispatching. @var{decl1}
+and @var{decl2} are the two function decls that will be compared.
+@end deftypefn
+
+@hook TARGET_SET_VERSION_ASSEMBLER_NAME
+This hook is for getting the new assembler name of a function that is
+a version.
+@end deftypefn
+
+@hook TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+This hook is used to get the dispatcher function for a set of function
+versions.  The dispatcher function is called to invoke the rignt function
+version at run-time. @var{arglist} is the vector of function versions
+that should be considered for dispatch.
+@end deftypefn
+
+@hook TARGET_GENERATE_VERSION_DISPATCHER_BODY
+This hook is used to generate the dispatcher logic to invoke the right
+function version at runtime for a given set of function versions.
+@var{arg} points to the callgraph node of the dispatcher function whose
+body must be generated.
+@end deftypefn
+
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/cgraphbuild.c
===================================================================
--- gcc/cgraphbuild.c	(revision 190493)
+++ gcc/cgraphbuild.c	(working copy)
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-utils.h"
 #include "except.h"
 #include "ipa-inline.h"
+#include "target.h"
 
 /* Context of record_reference.  */
 struct record_reference_ctx
@@ -288,7 +289,6 @@ mark_store (gimple stmt, tree t, void *data)
      }
   return false;
 }
-
 /* Create cgraph edges for function calls.
    Also look for functions and variables having addresses taken.  */
 
@@ -317,8 +317,21 @@ build_cgraph_edges (void)
 							 bb);
 	      decl = gimple_call_fndecl (stmt);
 	      if (decl)
-		cgraph_create_edge (node, cgraph_get_create_node (decl),
-				    stmt, bb->count, freq);
+		{
+		  struct cgraph_node *callee = cgraph_get_create_node (decl);
+	          /* If a call to a multiversioned function dispatcher is
+		     found, generate the body to dispatch the right function
+		     at run-time.  */
+		  if (callee->dispatcher_function)
+		    {
+		      tree resolver_decl;
+		      gcc_assert (callee->next_function_version);
+		      resolver_decl
+			 = targetm.generate_version_dispatcher_body (callee);
+		      gcc_assert (resolver_decl != NULL_TREE);
+		    }
+		  cgraph_create_edge (node, callee, stmt, bb->count, freq);
+	        }
 	      else
 		cgraph_create_indirect_edge (node, stmt,
 					     gimple_call_flags (stmt),
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 190493)
+++ gcc/c-family/c-common.c	(working copy)
@@ -8502,9 +8502,22 @@ handle_target_attribute (tree *node, tree name, tr
       warning (OPT_Wattributes, "%qE attribute ignored", name);
       *no_add_attrs = true;
     }
-  else if (! targetm.target_option.valid_attribute_p (*node, name, args,
-						      flags))
-    *no_add_attrs = true;
+  else
+    {
+      /* When a target attribute is invalid, it may also be because the
+	 target for the compilation unit and the attribute match.  For
+         instance, target attribute "xxx" is invalid when -mxxx is used.
+         When used with multiversioning, removing the attribute will lead
+         to duplicate definitions if a default version is provided.
+	 So, generate a warning here and remove the attribute.  */
+      if (!targetm.target_option.valid_attribute_p (*node, name, args, flags))
+	{
+	  warning (OPT_Wattributes,
+		   "Invalid target attribute in function %qE, ignored.",
+		   *node);
+	  *no_add_attrs = true;
+	}
+    }
 
   return NULL_TREE;
 }
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 190493)
+++ gcc/target.def	(working copy)
@@ -1298,6 +1298,39 @@ DEFHOOK
  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
  hook_tree_tree_int_treep_bool_null)
 
+/* Target hook to compare the priority of target features in function DECL1
+   and DECL2.  It returns positive value if DECL1 is higher priority,
+   negative value if DECL2 is higher priority and 0 if they are the
+   same.  */
+DEFHOOK
+(compare_version_priority,
+ "",
+ int, (tree decl1, tree decl2), NULL)
+
+/* Target hook to get the new assembler name of DECL which is a function
+   version.  */
+
+DEFHOOK
+(set_version_assembler_name,
+ "",
+ void, (tree decl), NULL)
+
+/* Target hook to generate the dispatcher body for a function version
+   dispatcher ARG, which is a cgraph_node pointer.  */
+
+DEFHOOK
+(generate_version_dispatcher_body,
+ "",
+ tree, (void *arg), NULL) 
+
+/* Target hook to generate a function version dispatcher DECL for the list
+   of function versions in arglist, which is a vector of decls.  */
+
+DEFHOOK
+(get_function_versions_dispatcher,
+ "",
+ tree, (void *arglist), NULL)
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 DEFHOOK
@@ -2705,6 +2738,14 @@ DEFHOOK
  void, (void),
  hook_void_void)
 
+/* Returns true if DECL1 and DECL2 are versions of the same function.  */
+
+DEFHOOK
+(function_versions,
+ "",
+ bool, (tree decl1, tree decl2),
+ hook_bool_tree_tree_false)
+
 /* Function to determine if one function can inline another function.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_"
Index: gcc/cgraph.c
===================================================================
--- gcc/cgraph.c	(revision 190493)
+++ gcc/cgraph.c	(working copy)
@@ -1281,6 +1281,14 @@ cgraph_mark_address_taken_node (struct cgraph_node
   node->symbol.address_taken = 1;
   node = cgraph_function_or_thunk_node (node, NULL);
   node->symbol.address_taken = 1;
+  /* If the address of a multiversioned function dispatcher is taken,
+     generate the body to dispatch the right function at run-time.  This
+     is needed as the address can be used to do an indirect call.  */
+  if (node->dispatcher_function)
+    {
+      gcc_assert (node->next_function_version);
+      targetm.generate_version_dispatcher_body (node);
+    }
 }
 
 /* Return local info for the compiled function.  */
Index: gcc/cgraph.h
===================================================================
--- gcc/cgraph.h	(revision 190493)
+++ gcc/cgraph.h	(working copy)
@@ -220,6 +220,26 @@ struct GTY(()) cgraph_node {
   struct cgraph_node *prev_sibling_clone;
   struct cgraph_node *clones;
   struct cgraph_node *clone_of;
+
+  /* TODO: Put version_dispatcher_decl, prev_function_version,
+     next_function_version into a struct for readability.
+
+     If this node corresponds to a function version, this points
+     to the dispatcher function decl which is the function that must
+     be called to execute the right function version at run-time.
+
+     If this node is a dispatcher for function versions, this points
+     to resolver function, which holds  the function body for the
+     dispatcher.  */
+  tree version_dispatcher_decl;
+
+  /* Chains all the semantically identical function versions.  The
+     first function in this chain is the default function.  */
+  struct cgraph_node *prev_function_version;
+  /* If this node is a dispatcher for function versions, this also points
+     to the default function version.  */
+  struct cgraph_node *next_function_version;
+
   /* For functions with many calls sites it holds map from call expression
      to the edge to speed up cgraph_edge function.  */
   htab_t GTY((param_is (struct cgraph_edge))) call_site_hash;
@@ -271,6 +291,7 @@ struct GTY(()) cgraph_node {
   /* ?? We should be able to remove this.  We have enough bits in
      cgraph to calculate it.  */
   unsigned tm_clone : 1;
+  unsigned dispatcher_function : 1;
 };
 
 DEF_VEC_P(symtab_node);
@@ -648,6 +669,7 @@ void cgraph_rebuild_references (void);
 int compute_call_stmt_bb_frequency (tree, basic_block bb);
 void record_references_in_initializer (tree, bool);
 
+
 /* In ipa.c  */
 bool symtab_remove_unreachable_nodes (bool, FILE *);
 cgraph_node_set cgraph_node_set_new (void);
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 190493)
+++ gcc/tree.h	(working copy)
@@ -3436,6 +3436,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
 #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
 
+/* In FUNCTION_DECL, this is set if this function has other versions generated
+   using "target" attributes.  The default version is the one which does not
+   have any "target" attribute set. */
+#define DECL_FUNCTION_VERSIONED(NODE)\
+   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
+
 /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
    arguments/result/saved_tree fields by front ends.   It was either inherit
    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
@@ -3480,8 +3486,8 @@ struct GTY(()) tree_function_decl {
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
-
-  /* 1 bit left */
+  unsigned versioned_function : 1;
+  /* No bits left.  */
 };
 
 /* The source language of the translation-unit.  */
Index: gcc/testsuite/g++.dg/mv2.C
===================================================================
--- gcc/testsuite/g++.dg/mv2.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv2.C	(revision 0)
@@ -0,0 +1,119 @@
+/* Test case to check if Multiversioning chooses the correct
+   dispatching order when versions are for various ISAs.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -mno-sse -mno-mmx -mno-popcnt -mno-avx" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The dispatch checks should be in the exact reverse order of the
+   declarations below.  */
+int foo () __attribute__ ((target ("mmx")));
+int foo () __attribute__ ((target ("sse")));
+int foo () __attribute__ ((target ("sse2")));
+int foo () __attribute__ ((target ("sse3")));
+int foo () __attribute__ ((target ("ssse3")));
+int foo () __attribute__ ((target ("sse4.1")));
+int foo () __attribute__ ((target ("sse4.2")));
+int foo () __attribute__ ((target ("popcnt")));
+int foo () __attribute__ ((target ("avx")));
+int foo () __attribute__ ((target ("avx2")));
+
+int main ()
+{
+
+  int val = foo ();
+
+  if (__builtin_cpu_supports ("avx2"))
+    assert (val == 1);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("popcnt"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("sse4.2"))
+    assert (val == 4);
+  else if (__builtin_cpu_supports ("sse4.1"))
+    assert (val == 5);
+  else if (__builtin_cpu_supports ("ssse3"))
+    assert (val == 6);
+  else if (__builtin_cpu_supports ("sse3"))
+    assert (val == 7);
+  else if (__builtin_cpu_supports ("sse2"))
+    assert (val == 8);
+  else if (__builtin_cpu_supports ("sse"))
+    assert (val == 9);
+  else if (__builtin_cpu_supports ("mmx"))
+    assert (val == 10);
+  else
+    assert (val == 0);
+
+  return 0;
+}
+
+int
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 2;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return 1;
+}
Index: gcc/testsuite/g++.dg/mv4.C
===================================================================
--- gcc/testsuite/g++.dg/mv4.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv4.C	(revision 0)
@@ -0,0 +1,23 @@
+/* Test case to check if the compiler generates an error message
+   when the default version of a multiversioned function is absent
+   and its pointer is taken.  */
+
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -mno-sse -mno-popcnt" } */
+
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int main ()
+{
+  int (*p)() = &foo; /* { dg-error "Pointer to a multiversioned function without a default is not allowed" {} } */
+  return (*p)();
+}
Index: gcc/testsuite/g++.dg/mv1.C
===================================================================
--- gcc/testsuite/g++.dg/mv1.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv1.C	(revision 0)
@@ -0,0 +1,130 @@
+/* Test case to check if Multiversioning works.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -fPIC -mno-avx" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The other versions of foo.  Mix up the ordering and 
+   check if the dispatching does it in the order of priority. */
+/* Check combination of target attributes.  */
+int foo () __attribute__ ((target("arch=corei7,popcnt")));
+/* The target operands in this declaration and the definition are re-ordered.
+   This should still work.  */
+int foo () __attribute__ ((target("ssse3,avx2")));
+
+/* Check for all target attributes for which dispatchers are available.  */
+/* Check arch= */
+int foo () __attribute__((target("arch=core2")));
+int foo () __attribute__((target("arch=corei7")));
+int foo () __attribute__((target("arch=atom")));
+/* Check ISAs  */
+int foo () __attribute__((target("avx")));
+int foo () __attribute__ ((target("arch=core2,sse4.2")));
+/* Check more arch=.  */
+int foo () __attribute__((target("arch=amdfam10")));
+int foo () __attribute__((target("arch=bdver1")));
+int foo () __attribute__((target("arch=bdver2")));
+
+int (*p)() = &foo;
+int main ()
+{
+  int val = foo ();
+  assert (val ==  (*p)());
+
+  /* Check in the exact same order in which the dispatching
+     is expected to happen.  */
+  if (__builtin_cpu_is ("bdver1"))
+    assert (val == 1);
+  else if (__builtin_cpu_is ("bdver2"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("avx2")
+	   && __builtin_cpu_supports ("ssse3"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 4);
+  else if (__builtin_cpu_is ("corei7")
+	   && __builtin_cpu_supports ("popcnt"))
+    assert (val == 5);
+  else if (__builtin_cpu_is ("corei7"))
+    assert (val == 6);
+  else if (__builtin_cpu_is ("amdfam10h"))
+    assert (val == 7);
+  else if (__builtin_cpu_is ("core2")
+	   && __builtin_cpu_supports ("sse4.2"))
+    assert (val == 8);
+  else if (__builtin_cpu_is ("core2"))
+    assert (val == 9);
+  else if (__builtin_cpu_is ("atom"))
+    assert (val == 10);
+  else
+    assert (val == 0);
+  
+  return 0;
+}
+
+int foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=corei7,popcnt")))
+foo ()
+{
+  return 5;
+}
+int __attribute__ ((target("avx2,ssse3")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("arch=core2")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("arch=corei7")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("arch=atom")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("arch=core2,sse4.2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("arch=amdfam10")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("arch=bdver1")))
+foo ()
+{
+  return 1;
+}
+
+int __attribute__ ((target("arch=bdver2")))
+foo ()
+{
+  return 2;
+}
Index: gcc/testsuite/g++.dg/mv3.C
===================================================================
--- gcc/testsuite/g++.dg/mv3.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv3.C	(revision 0)
@@ -0,0 +1,37 @@
+/* Test case to check if a call to a multiversioned function
+   is replaced with a direct call to the particular version when
+   the most specialized version's target attributes match the
+   caller.  
+  
+   In this program, foo is multiversioned but there is no default
+   function.  This is an error if the call has to go through a
+   dispatcher.  However, the call to foo in bar can be replaced
+   with a direct call to the popcnt version of foo.  Hence, this
+   test should pass.  */
+
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -mno-sse -mno-popcnt" } */
+
+
+/* Default version.  */
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target ("popcnt")))
+bar ()
+{
+  return foo ();
+}
+
+int main ()
+{
+  return bar ();
+}
Index: gcc/cp/class.c
===================================================================
--- gcc/cp/class.c	(revision 190493)
+++ gcc/cp/class.c	(working copy)
@@ -1091,7 +1091,20 @@ add_method (tree type, tree method, tree using_dec
 	      || same_type_p (TREE_TYPE (fn_type),
 			      TREE_TYPE (method_type))))
 	{
-	  if (using_decl)
+	  /* For function versions, their parms and types match
+	     but they are not duplicates.  Record function versions
+	     as and when they are found.  */
+	  if (TREE_CODE (fn) == FUNCTION_DECL
+	      && TREE_CODE (method) == FUNCTION_DECL
+	      && (DECL_FUNCTION_SPECIFIC_TARGET (fn)
+		  || DECL_FUNCTION_SPECIFIC_TARGET (method))
+	      && targetm.target_option.function_versions (fn, method))
+ 	    {
+	      targetm.set_version_assembler_name (fn);
+	      targetm.set_version_assembler_name (method);
+	      continue;
+	    }
+	  else if (using_decl)
 	    {
 	      if (DECL_CONTEXT (fn) == type)
 		/* Defer to the local function.  */
@@ -6922,6 +6935,7 @@ resolve_address_of_overloaded_function (tree targe
   tree matches = NULL_TREE;
   tree fn;
   tree target_fn_type;
+  VEC (tree, heap) *fn_ver_vec = NULL;
 
   /* By the time we get here, we should be seeing only real
      pointer-to-member types, not the internal POINTER_TYPE to
@@ -6986,9 +7000,19 @@ resolve_address_of_overloaded_function (tree targe
 	  if (DECL_ANTICIPATED (fn))
 	    continue;
 
-	  /* See if there's a match.  */
+	  /* See if there's a match.   For functions that are multi-versioned,
+	     all the versions match.  */
 	  if (same_type_p (target_fn_type, static_fn_type (fn)))
-	    matches = tree_cons (fn, NULL_TREE, matches);
+	    {
+	      matches = tree_cons (fn, NULL_TREE, matches);
+	      /*If versioned, push all possible versions into a vector.  */
+	      if (DECL_FUNCTION_VERSIONED (fn))
+		{
+		  if (fn_ver_vec == NULL)
+		   fn_ver_vec = VEC_alloc (tree, heap, 2);
+		  VEC_safe_push (tree, heap, fn_ver_vec, fn); 
+		}
+	    }
 	}
     }
 
@@ -7080,13 +7104,26 @@ resolve_address_of_overloaded_function (tree targe
     {
       /* There were too many matches.  First check if they're all
 	 the same function.  */
-      tree match;
+      tree match = NULL_TREE;
 
       fn = TREE_PURPOSE (matches);
-      for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
-	if (!decls_match (fn, TREE_PURPOSE (match)))
-	  break;
 
+      /* For multi-versioned functions, more than one match is just fine.
+	 Call decls_match to make sure they are different because they are
+	 versioned.  */
+      if (DECL_FUNCTION_VERSIONED (fn))
+	{
+          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+  	    if (decls_match (fn, TREE_PURPOSE (match)))
+	      break;
+	}
+      else
+	{
+          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+  	    if (!decls_match (fn, TREE_PURPOSE (match)))
+	      break;
+	}
+
       if (match)
 	{
 	  if (flags & tf_error)
@@ -7148,6 +7185,28 @@ resolve_address_of_overloaded_function (tree targe
       perform_or_defer_access_check (access_path, fn, fn, flags);
     }
 
+  /* If a pointer to a function that is multi-versioned is requested, the
+     pointer to the dispatcher function is returned instead.  This works
+     well because indirectly calling the function will dispatch the right
+     function version at run-time.  */
+  if (DECL_FUNCTION_VERSIONED (fn))
+    {
+      tree dispatcher_decl = NULL;
+      gcc_assert (fn_ver_vec != NULL);
+      gcc_assert (targetm.get_function_versions_dispatcher);
+      dispatcher_decl = targetm.get_function_versions_dispatcher (fn_ver_vec);
+      if (!dispatcher_decl)
+	{
+	  error_at (input_location, "Pointer to a multiversioned function"
+		    " without a default is not allowed");
+	  return error_mark_node;
+	}
+      retrofit_lang_decl (dispatcher_decl);
+      mark_used (fn);
+      VEC_free (tree, heap, fn_ver_vec);
+      fn = dispatcher_decl;
+    }
+
   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
     return cp_build_addr_expr (fn, flags);
   else
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	(revision 190493)
+++ gcc/cp/decl.c	(working copy)
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "pointer-set.h"
 #include "splay-tree.h"
 #include "plugin.h"
+#include "cgraph.h"
 
 /* Possible cases of bad specifiers type used by bad_specifiers. */
 enum bad_spec_place {
@@ -972,6 +973,17 @@ decls_match (tree newdecl, tree olddecl)
       if (t1 != t2)
 	return 0;
 
+      /* The decls dont match if they correspond to two different versions
+	 of the same function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) 
+	  && targetm.target_option.function_versions (newdecl, olddecl))
+	{
+	  targetm.set_version_assembler_name(newdecl);
+	  targetm.set_version_assembler_name(olddecl);
+	  return 0;
+	}
+
       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
 	  && ! (DECL_EXTERN_C_P (newdecl)
 		&& DECL_EXTERN_C_P (olddecl)))
@@ -1490,7 +1502,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
 	      error ("previous declaration %q+#D here", olddecl);
 	      return NULL_TREE;
 	    }
-	  else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
+	  /* For function versions, params and types match, but they
+	     are not ambiguous.  */
+	  else if ((!DECL_FUNCTION_VERSIONED (newdecl)
+		    && !DECL_FUNCTION_VERSIONED (olddecl))
+		   && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
 			      TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
 	    {
 	      error ("new declaration %q#D", newdecl);
@@ -2262,6 +2278,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
+  /* If the olddecl is a version, so is the newdecl.  */
+  if (TREE_CODE (newdecl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (olddecl))
+    DECL_FUNCTION_VERSIONED (newdecl) = 1;
+
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
     {
       int function_size;
@@ -14024,7 +14045,11 @@ cxx_comdat_group (tree decl)
 	  else
 	    break;
 	}
-      name = DECL_ASSEMBLER_NAME (decl);
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl))
+	name = DECL_NAME (decl);
+      else
+        name = DECL_ASSEMBLER_NAME (decl);
     }
 
   return name;
Index: gcc/cp/error.c
===================================================================
--- gcc/cp/error.c	(revision 190493)
+++ gcc/cp/error.c	(working copy)
@@ -1539,8 +1539,15 @@ dump_exception_spec (tree t, int flags)
 static void
 dump_function_name (tree t, int flags)
 {
-  tree name = DECL_NAME (t);
+  tree name;
 
+  /* For function versions, use the assembler name as the decl name is
+     the same for all versions.  */
+  if (DECL_FUNCTION_VERSIONED (t))
+    name = DECL_ASSEMBLER_NAME (t);
+  else
+    name = DECL_NAME (t);
+
   /* We can get here with a decl that was synthesized by language-
      independent machinery (e.g. coverage.c) in which case it won't
      have a lang_specific structure attached and DECL_CONSTRUCTOR_P
Index: gcc/cp/semantics.c
===================================================================
--- gcc/cp/semantics.c	(revision 190493)
+++ gcc/cp/semantics.c	(working copy)
@@ -3775,8 +3775,11 @@ expand_or_defer_fn_1 (tree fn)
       /* If the user wants us to keep all inline functions, then mark
 	 this function as needed so that finish_file will make sure to
 	 output it later.  Similarly, all dllexport'd functions must
-	 be emitted; there may be callers in other DLLs.  */
-      if ((flag_keep_inline_functions
+	 be emitted; there may be callers in other DLLs.
+	 Also, mark this function as needed if it is marked inline but
+	 is a multi-versioned function.  */
+      if (((flag_keep_inline_functions
+	    || DECL_FUNCTION_VERSIONED (fn))
 	   && DECL_DECLARED_INLINE_P (fn)
 	   && !DECL_REALLY_EXTERN (fn))
 	  || (flag_keep_inline_dllexport
Index: gcc/cp/decl2.c
===================================================================
--- gcc/cp/decl2.c	(revision 190493)
+++ gcc/cp/decl2.c	(working copy)
@@ -674,9 +674,13 @@ check_classfn (tree ctype, tree function, tree tem
 	  if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
 	    continue;
 
+	  /* While finding a match, same types and params are not enough
+	     if the function is versioned.  Also check version ("target")
+	     attributes.  */
 	  if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
 			   TREE_TYPE (TREE_TYPE (fndecl)))
 	      && compparms (p1, p2)
+	      && !targetm.target_option.function_versions (function, fndecl)
 	      && (!is_template
 		  || comp_template_parms (template_parms,
 					  DECL_TEMPLATE_PARMS (fndecl)))
Index: gcc/cp/call.c
===================================================================
--- gcc/cp/call.c	(revision 190493)
+++ gcc/cp/call.c	(working copy)
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "timevar.h"
+#include "cgraph.h"
 
 /* The various kinds of conversion.  */
 
@@ -3910,6 +3911,16 @@ build_new_function_call (tree fn, VEC(tree,gc) **a
     {
       if (complain & tf_error)
 	{
+	  /* If the call is to a multiversioned function without
+	     a default version, overload resolution will fail.  */
+	  if (candidates
+	      && TREE_CODE (candidates->fn) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (candidates->fn))
+	    error_at (location_of (DECL_NAME (OVL_CURRENT (fn))),
+		      "Call to multiversioned function %<%D(%A)%> with"
+		      " no default version", DECL_NAME (OVL_CURRENT (fn)),
+		      build_tree_list_vec (*args));
+
 	  if (!any_viable_p && candidates && ! candidates->next
 	      && (TREE_CODE (candidates->fn) == FUNCTION_DECL))
 	    return cp_build_function_call_vec (candidates->fn, args, complain);
@@ -6858,6 +6869,30 @@ build_over_call (struct z_candidate *cand, int fla
   if (!already_used)
     mark_used (fn);
 
+  /* For calls to a multi-versioned function, overload resolution
+     returns the function with the highest target priority, that is,
+     the version that will checked for dispatching first.  If this
+     version is inlinable, a direct call can be made otherwise it
+     should go through the dispatcher.  */
+
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && !targetm.target_option.can_inline_p (current_function_decl, fn))
+    {
+      tree dispatcher_decl = NULL;
+      struct cgraph_node *node = cgraph_get_node (fn);
+      if (node != NULL)
+        dispatcher_decl = cgraph_get_node (fn)->version_dispatcher_decl;
+      if (dispatcher_decl == NULL)
+	{
+	  error_at (input_location, "Call to multiversioned function"
+		    " without a default is not allowed");
+	  return NULL;
+	}
+      retrofit_lang_decl (dispatcher_decl);
+      gcc_assert (dispatcher_decl != NULL);
+      fn = dispatcher_decl;
+    }
+
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
     {
       tree t;
@@ -8123,6 +8158,29 @@ joust (struct z_candidate *cand1, struct z_candida
   size_t i;
   size_t len;
 
+  /* For Candidates of a multi-versioned function,  make the version with
+     the highest priority win.  This version will be checked for dispatching
+     first.  If this version can be inlined into the caller the front-end
+     will simply make a direct call to this function.  */
+
+  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL
+       && DECL_FUNCTION_VERSIONED (cand1->fn))
+      ||(TREE_CODE (cand2->fn) == FUNCTION_DECL
+	 && DECL_FUNCTION_VERSIONED (cand2->fn)))
+    {
+      /* Both functions must be marked versioned.  */
+      gcc_assert (DECL_FUNCTION_VERSIONED (cand1->fn)
+		  && DECL_FUNCTION_VERSIONED (cand2->fn));
+
+      /* Always make the version with the higher priority, more
+	 specialized, win.  */
+      gcc_assert (targetm.compare_version_priority);
+      if (targetm.compare_version_priority (cand1->fn, cand2->fn) >= 0)
+	return 1;
+      else
+	return -1;
+    }
+
   /* Candidates that involve bad conversions are always worse than those
      that don't.  */
   if (cand1->viable > cand2->viable)
@@ -8468,6 +8526,22 @@ tourney (struct z_candidate *candidates, tsubst_fl
   int fate;
   int champ_compared_to_predecessor = 0;
 
+  /* For multiversioned functions, aggregate all the versions here for
+     generating the dispatcher body later if necessary.  */
+
+  if (TREE_CODE (candidates->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (candidates->fn))
+    {
+      VEC (tree, heap) *fn_ver_vec = NULL;
+      struct z_candidate *ver = candidates;
+      fn_ver_vec = VEC_alloc (tree, heap, 2);
+      for (;ver; ver = ver->next)
+        VEC_safe_push (tree, heap, fn_ver_vec, ver->fn);
+      gcc_assert (targetm.get_function_versions_dispatcher);
+      targetm.get_function_versions_dispatcher (fn_ver_vec);
+      VEC_free (tree, heap, fn_ver_vec);
+    }
+
   /* Walk through the list once, comparing each current champ to the next
      candidate, knocking out a candidate or two with each comparison.  */
 
Index: gcc/cp/mangle.c
===================================================================
--- gcc/cp/mangle.c	(revision 190493)
+++ gcc/cp/mangle.c	(working copy)
@@ -1245,7 +1245,12 @@ write_unqualified_name (const tree decl)
     {
       MANGLE_TRACE_TREE ("local-source-name", decl);
       write_char ('L');
-      write_source_name (DECL_NAME (decl));
+      if (TREE_CODE (decl) == FUNCTION_DECL
+          && DECL_FUNCTION_VERSIONED (decl)
+	  && DECL_ASSEMBLER_NAME_SET_P (decl))
+	write_source_name (DECL_ASSEMBLER_NAME (decl));
+      else
+	write_source_name (DECL_NAME (decl));
       /* The default discriminator is 1, and that's all we ever use,
 	 so there's no code to output one here.  */
     }
@@ -1260,7 +1265,14 @@ write_unqualified_name (const tree decl)
                && LAMBDA_TYPE_P (type))
         write_closure_type_name (type);
       else
-        write_source_name (DECL_NAME (decl));
+	{
+	  if (TREE_CODE (decl) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (decl)
+	      && DECL_ASSEMBLER_NAME_SET_P (decl))
+	    write_source_name (DECL_ASSEMBLER_NAME (decl));
+	  else
+	    write_source_name (DECL_NAME (decl));
+	}
     }
 }
 
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 190493)
+++ gcc/config/i386/i386.c	(working copy)
@@ -62,6 +62,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "diagnostic.h"
 #include "dumpfile.h"
+#include "tree-pass.h"
+#include "tree-flow.h"
 
 enum upper_128bits_state
 {
@@ -28030,6 +28032,980 @@ ix86_init_mmx_sse_builtins (void)
     }
 }
 
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
+   to return a pointer to VERSION_DECL if the outcome of the expression
+   formed by PREDICATE_CHAIN is true.  This function will be called during
+   version dispatch to decide which function version to execute.  It returns
+   the basic block at the end, to which more conditions can be added.  */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     tree predicate_chain, basic_block new_bb)
+{
+  gimple return_stmt;
+  tree convert_expr, result_var;
+  gimple convert_stmt;
+  gimple call_cond_stmt;
+  gimple if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+
+  tree cond_var, and_expr_var = NULL_TREE;
+  gimple_seq gseq;
+
+  tree old_current_function_decl;
+  tree predicate_decl, predicate_arg;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+  current_function_decl = function_decl;
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node, NULL);
+  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  return_stmt = gimple_build_return (result_var);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      current_function_decl = old_current_function_decl;
+      return new_bb;
+    }
+
+  while (predicate_chain != NULL)
+    {
+      cond_var = create_tmp_var (integer_type_node, NULL);
+      predicate_decl = TREE_PURPOSE (predicate_chain);
+      predicate_arg = TREE_VALUE (predicate_chain);
+      call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+      gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+      gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+      gimple_set_bb (call_cond_stmt, new_bb);
+      gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+      predicate_chain = TREE_CHAIN (predicate_chain);
+      
+      if (and_expr_var == NULL)
+        and_expr_var = cond_var;
+      else
+	{
+	  gimple assign_stmt;
+	  /* Use MIN_EXPR to check if any integer is zero?.
+	     and_expr_var = min_expr <cond_var, and_expr_var>  */
+	  assign_stmt = gimple_build_assign (and_expr_var,
+			  build2 (MIN_EXPR, integer_type_node,
+				  cond_var, and_expr_var));
+
+	  gimple_set_block (assign_stmt, DECL_INITIAL (function_decl));
+	  gimple_set_bb (assign_stmt, new_bb);
+	  gimple_seq_add_stmt (&gseq, assign_stmt);
+	}
+    }
+
+  if_else_stmt = gimple_build_cond (GT_EXPR, and_expr_var,
+	  		            integer_zero_node,
+				    NULL_TREE, NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR, 0);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  return bb3;
+}
+
+/* This parses the attribute arguments to target in DECL and determines
+   the right builtin to use to match the platform specification.
+   For now, only one target argument ("arch=" or "<-m>xxx") is allowed.
+   It returns the priority value for this version decl.  If PREDICATE_LIST
+   is not NULL, it stores the list of cpu features that need to be checked
+   before dispatching this function.  */
+
+static unsigned int
+get_builtin_code_for_version (tree decl, tree *predicate_list)
+{
+  tree attrs;
+  struct cl_target_option cur_target;
+  tree target_node;
+  struct cl_target_option *new_target;
+  const char *arg_str = NULL;
+  const char *attrs_str = NULL;
+  char *tok_str = NULL;
+  char *token;
+  unsigned int priority = 0;
+
+  /* Priority of i386 features, greater value is higher priority.   This is
+     used to decide the order in which function dispatch must happen.  For
+     instance, a version specialized for SSE4.2 should be checked for dispatch
+     before a version for SSE3, as SSE4.2 implies SSE3.  */
+  enum feature_priority
+  {
+    P_ZERO = 0,
+    P_MMX,
+    P_SSE,
+    P_SSE2,
+    P_SSE3,
+    P_SSSE3,
+    P_PROC_SSSE3,
+    P_SSE4_a,
+    P_PROC_SSE4_a,
+    P_SSE4_1,
+    P_SSE4_2,
+    P_PROC_SSE4_2,
+    P_POPCNT,
+    P_AVX,
+    P_AVX2,
+    P_FMA,
+    P_PROC_FMA
+  };
+
+  /* These are the target attribute strings for which a dispatcher is
+     available, from fold_builtin_cpu.  */
+
+  static struct _feature_list
+    {
+      const char *const name;
+      const enum feature_priority priority;
+    }
+  const feature_list[] =
+    {
+      {"mmx", P_MMX},
+      {"sse", P_SSE},
+      {"sse2", P_SSE2},
+      {"sse3", P_SSE3},
+      {"ssse3", P_SSSE3},
+      {"sse4.1", P_SSE4_1},
+      {"sse4.2", P_SSE4_2},
+      {"popcnt", P_POPCNT},
+      {"avx", P_AVX},
+      {"avx2", P_AVX2}
+    };
+
+
+  static unsigned int NUM_FEATURES
+    = sizeof (feature_list) / sizeof (struct _feature_list);
+
+  unsigned int i;
+
+  tree predicate_chain = NULL_TREE;
+  tree predicate_decl, predicate_arg;
+
+  attrs = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  gcc_assert (attrs != NULL);
+
+  attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+  gcc_assert (TREE_CODE (attrs) == STRING_CST);
+  attrs_str = TREE_STRING_POINTER (attrs);
+
+
+  /* Handle arch= if specified.  For priority, set it to be 1 more than
+     the best instruction set the processor can handle.  For instance, if
+     there is a version for atom and a version for ssse3 (the highest ISA
+     priority for atom), the atom version must be checked for dispatch
+     before the ssse3 version. */
+  if (strstr (attrs_str, "arch=") != NULL)
+    {
+      cl_target_option_save (&cur_target, &global_options);
+      target_node = ix86_valid_target_attribute_tree (attrs);
+    
+      gcc_assert (target_node);
+      new_target = TREE_TARGET_OPTION (target_node);
+      gcc_assert (new_target);
+      
+      if (new_target->arch_specified && new_target->arch > 0)
+	{
+	  switch (new_target->arch)
+	    {
+	    case PROCESSOR_CORE2_32:
+	    case PROCESSOR_CORE2_64:
+	      arg_str = "core2";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_COREI7_32:
+	    case PROCESSOR_COREI7_64:
+	      arg_str = "corei7";
+	      priority = P_PROC_SSE4_2;
+	      break;
+	    case PROCESSOR_ATOM:
+	      arg_str = "atom";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_AMDFAM10:
+	      arg_str = "amdfam10h";
+	      priority = P_PROC_SSE4_a;
+	      break;
+	    case PROCESSOR_BDVER1:
+	      arg_str = "bdver1";
+	      priority = P_PROC_FMA;
+	      break;
+	    case PROCESSOR_BDVER2:
+	      arg_str = "bdver2";
+	      priority = P_PROC_FMA;
+	      break;
+	    }  
+	}    
+    
+      cl_target_option_restore (&global_options, &cur_target);
+	
+      if (predicate_list && arg_str == NULL)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+	    	"No dispatcher found for the versioning attributes");
+	  return 0;
+	}
+    
+      if (predicate_list)
+	{
+          predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
+          /* For a C string literal the length includes the trailing NULL.  */
+          predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+          predicate_chain = tree_cons (predicate_decl, predicate_arg,
+				       predicate_chain);
+	}
+    }
+
+  /* Process feature name.  */
+  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
+  strcpy (tok_str, attrs_str);
+  token = strtok (tok_str, ",");
+  predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
+
+  while (token != NULL)
+    {
+      /* Do not process "arch="  */
+      if (strncmp (token, "arch=", 5) == 0)
+	{
+	  token = strtok (NULL, ",");
+	  continue;
+	}
+      for (i = 0; i < NUM_FEATURES; ++i)
+	{
+	  if (strcmp (token, feature_list[i].name) == 0)
+	    {
+	      if (predicate_list)
+		{
+		  predicate_arg = build_string_literal (
+				  strlen (feature_list[i].name) + 1,
+				  feature_list[i].name);
+		  predicate_chain = tree_cons (predicate_decl, predicate_arg,
+					       predicate_chain);
+		}
+	      /* Find the maximum priority feature.  */
+	      if (feature_list[i].priority > priority)
+		priority = feature_list[i].priority;
+
+	      break;
+	    }
+	}
+      if (predicate_list && i == NUM_FEATURES)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+		    "No dispatcher found for %s", token);
+	  return 0;
+	}
+      token = strtok (NULL, ",");
+    }
+  free (tok_str);
+
+  if (predicate_list && predicate_chain == NULL_TREE)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+	        "No dispatcher found for the versioning attributes : %s",
+	        attrs_str);
+      return 0;
+    }
+  else if (predicate_list)
+    {
+      predicate_chain = nreverse (predicate_chain);
+      *predicate_list = predicate_chain;
+    }
+
+  return priority; 
+}
+
+/* This compares the priority of target features in function DECL1
+   and DECL2.  It returns positive value if DECL1 is higher priority,
+   negative value if DECL2 is higher priority and 0 if they are the
+   same.  */
+
+static int
+ix86_compare_version_priority (tree decl1, tree decl2)
+{
+  unsigned int priority1 = 0;
+  unsigned int priority2 = 0;
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl1)) != NULL)
+    priority1 = get_builtin_code_for_version (decl1, NULL);
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl2)) != NULL)
+    priority2 = get_builtin_code_for_version (decl2, NULL);
+
+  return (int)priority1 - (int)priority2;
+}
+
+/* V1 and V2 point to function versions with different priorities
+   based on the target ISA.  This function compares their priorities.  */
+ 
+static int
+feature_compare (const void *v1, const void *v2)
+{
+  typedef struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    } function_version_info;
+
+  const function_version_info c1 = *(const function_version_info *)v1;
+  const function_version_info c2 = *(const function_version_info *)v2;
+  return (c2.dispatch_priority - c1.dispatch_priority);
+}
+
+/* This function generates the dispatch function for
+   multi-versioned functions.  DISPATCH_DECL is the function which will
+   contain the dispatch logic.  FNDECLS are the function choices for
+   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
+   in DISPATCH_DECL in which the dispatch code is generated.  */
+
+static int
+dispatch_function_versions (tree dispatch_decl,
+			    void *fndecls_p,
+			    basic_block *empty_bb)
+{
+  tree default_decl;
+  gimple ifunc_cpu_init_stmt;
+  gimple_seq gseq;
+  tree old_current_function_decl;
+  int ix;
+  tree ele;
+  VEC (tree, heap) *fndecls;
+  unsigned int num_versions = 0;
+  unsigned int actual_versions = 0;
+  unsigned int i;
+
+  struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    }*function_version_info;
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /*fndecls_p is actually a vector.  */
+  fndecls = (VEC (tree, heap) *)fndecls_p;
+
+  /* Atleast one more version other than the default.  */
+  num_versions = VEC_length (tree, fndecls);
+  gcc_assert (num_versions >= 2);
+
+  function_version_info = (struct _function_version_info *)
+    xmalloc ((num_versions - 1) * sizeof (struct _function_version_info));
+
+  /* The first version in the vector is the default decl.  */
+  default_decl = VEC_index (tree, fndecls, 0);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
+  current_function_decl = dispatch_decl;
+
+  gseq = bb_seq (*empty_bb);
+  /* Function version dispatch is via IFUNC.  IFUNC resolvers fire before
+     constructors, so explicity call __builtin_cpu_init here.  */
+  ifunc_cpu_init_stmt = gimple_build_call_vec (
+                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
+  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
+  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
+  set_bb_seq (*empty_bb, gseq);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+
+  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
+    {
+      tree version_decl = ele;
+      tree predicate_chain = NULL_TREE;
+      unsigned int priority;
+      /* Get attribute string, parse it and find the right predicate decl.
+         The predicate function could be a lengthy combination of many
+	 features, like arch-type and various isa-variants.  */
+      priority = get_builtin_code_for_version (version_decl,
+	 			               &predicate_chain);
+
+      if (predicate_chain == NULL_TREE)
+	continue;
+
+      actual_versions++;
+      function_version_info [ix - 1].version_decl = version_decl;
+      function_version_info [ix - 1].predicate_chain = predicate_chain;
+      function_version_info [ix - 1].dispatch_priority = priority;
+    }
+
+  /* Sort the versions according to descending order of dispatch priority.  The
+     priority is based on the ISA.  This is not a perfect solution.  There
+     could still be ambiguity.  If more than one function version is suitable
+     to execute,  which one should be dispatched?  In future, allow the user
+     to specify a dispatch  priority next to the version.  */
+  qsort (function_version_info, actual_versions,
+         sizeof (struct _function_version_info), feature_compare);
+
+  for  (i = 0; i < actual_versions; ++i)
+    *empty_bb = add_condition_to_bb (dispatch_decl,
+				     function_version_info[i].version_decl,
+				     function_version_info[i].predicate_chain,
+				     *empty_bb);
+
+  /* dispatch default version at the end.  */
+  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl,
+				   NULL, *empty_bb);
+
+  free (function_version_info);
+  return 0;
+}
+
+/* This function returns true if fn1 and fn2 are versions of the same function.
+   Returns false if only one of the function decls has the target attribute
+   set or if the targets of the function decls are different.  This assumes
+   the fn1 and fn2 have the same signature.  */
+
+static bool
+ix86_function_versions (tree fn1, tree fn2)
+{
+  tree attr1, attr2;
+  struct cl_target_option *target1, *target2;
+
+  if (TREE_CODE (fn1) != FUNCTION_DECL
+      || TREE_CODE (fn2) != FUNCTION_DECL)
+    return false;
+
+  attr1 = DECL_FUNCTION_SPECIFIC_TARGET (fn1);
+  attr2 = DECL_FUNCTION_SPECIFIC_TARGET (fn2);
+
+  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
+    return false;
+
+  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
+      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
+    return true;
+
+  target1 = TREE_TARGET_OPTION (attr1);
+  target2 = TREE_TARGET_OPTION (attr2);
+
+  if (target1->x_ix86_isa_flags == target2->x_ix86_isa_flags
+      && target1->x_target_flags == target2->x_target_flags
+      && target1->arch == target2->arch
+      && target1->tune == target2->tune
+      && target1->x_ix86_fpmath == target2->x_ix86_fpmath
+      && target1->branch_cost == target2->branch_cost)
+    return false;
+
+  return true;
+}
+
+/* Comparator function to be used in qsort routine to sort attribute
+   specification strings to "target".  */
+
+static int
+attr_strcmp (const void *v1, const void *v2)
+{
+  const char *c1 = *(char *const*)v1;
+  const char *c2 = *(char *const*)v2;
+  return strcmp (c1, c2);
+}
+
+/* STR is the argument to target attribute.  This function tokenizes
+   the comma separated arguments, sorts them and returns a string which
+   is a unique identifier for the comma separated arguments.  */
+
+static char *
+sorted_attr_string (const char *str)
+{
+  char **args = NULL;
+  char *attr_str, *ret_str;
+  char *attr = NULL;
+  unsigned int argnum = 1;
+  unsigned int i;
+
+  for (i = 0; i < strlen (str); i++)
+    if (str[i] == ',')
+      argnum++;
+
+  attr_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (attr_str, str);
+
+  for (i = 0; i < strlen (attr_str); i++)
+    if (attr_str[i] == '=' || attr_str[i]== '-')
+      attr_str[i] = '_';
+
+  if (argnum == 1)
+    return attr_str;
+
+  args = (char **)xmalloc (argnum * sizeof (char *));
+
+  i = 0;
+  attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      args[i] = attr;
+      i++;
+      attr = strtok (NULL, ",");
+    }
+
+  qsort (args, argnum, sizeof (char*), attr_strcmp);
+
+  ret_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (ret_str, args[0]);
+  for (i = 1; i < argnum; i++)
+    {
+      strcat (ret_str, "_");
+      strcat (ret_str, args[i]);
+    }
+
+  free (args);
+  free (attr_str);
+  return ret_str;
+}
+
+/* This function changes the assembler name for functions that are
+   versions.  If DECL is a function version and has a "target"
+   attribute, it appends the attribute string to its assembler name.  */
+
+static void
+ix86_set_version_assembler_name (tree decl)
+{
+  tree version_attr;
+  const char *orig_name, *version_string, *attr_str;
+  char *assembler_name;
+  tree assembler_name_tree;
+
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+    return;
+
+  if (DECL_FUNCTION_VERSIONED (decl)
+      && DECL_ASSEMBLER_NAME_SET_P (decl))
+    return;
+
+  DECL_FUNCTION_VERSIONED (decl) = 1;
+
+  if (DECL_DECLARED_INLINE_P (decl)
+      && lookup_attribute ("gnu_inline",
+			   DECL_ATTRIBUTES (decl)))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Function versions cannot be marked as gnu_inline,"
+	      " bodies have to be generated\n");
+
+  if (DECL_VIRTUAL_P (decl)
+      || DECL_VINDEX (decl))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Virtual function versioning not supported\n");
+
+  version_attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+
+  /* target attribute string is NULL for default functions.  */
+  if (version_attr == NULL_TREE)
+    return;
+
+  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+  version_string
+    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
+
+  attr_str = sorted_attr_string (version_string);
+  assembler_name = (char *) xmalloc (strlen (orig_name)
+				     + strlen (attr_str) + 2);
+
+  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
+  if (dump_file)
+    fprintf (stderr, "Assembler name set to %s for function version %s\n",
+	     assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl)));
+
+  assembler_name_tree = get_identifier (assembler_name);
+
+  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree);
+  SET_DECL_RTL (decl, NULL);
+}
+
+/* Return a new name by appending SUFFIX to the DECL name.  If
+   make_unique is true, append the full path name.  */
+
+static char *
+make_name (tree decl, const char *suffix, bool make_unique)
+{
+  char *global_var_name;
+  int name_len;
+  const char *name;
+  const char *unique_name = NULL;
+
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* Get a unique name that can be used globally without any chances
+     of collision at link time.  */
+  if (make_unique)
+    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
+
+  name_len = strlen (name) + strlen (suffix) + 2;
+
+  if (make_unique)
+    name_len += strlen (unique_name) + 1;
+  global_var_name = (char *) xmalloc (name_len);
+
+  /* Use '.' to concatenate names as it is demangler friendly.  */
+  if (make_unique)
+      snprintf (global_var_name, name_len, "%s.%s.%s", name,
+		unique_name, suffix);
+  else
+      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
+
+  return global_var_name;
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher
+   by the front-end.  Return the decl created.  */
+
+static tree
+make_dispatcher_decl (const tree decl)
+{
+  tree func_decl;
+  char *func_name, *resolver_name;
+  tree fn_type, func_type;
+  bool is_uniq = false;
+
+  if (TREE_PUBLIC (decl) == 0)
+    is_uniq = true;
+
+  func_name = make_name (decl, "ifunc", is_uniq);
+  resolver_name = make_name (decl, "resolver", is_uniq);
+  gcc_assert (resolver_name);
+
+  fn_type = TREE_TYPE (decl);
+  func_type = build_function_type (TREE_TYPE (fn_type),
+				   TYPE_ARG_TYPES (fn_type));
+  
+  func_decl = build_fn_decl (func_name, func_type);
+  TREE_USED (func_decl) = 1;
+  DECL_CONTEXT (func_decl) = NULL_TREE;
+  DECL_INITIAL (func_decl) = error_mark_node;
+  DECL_ARTIFICIAL (func_decl) = 1;
+  /* Mark this func as external, the resolver will flip it again if
+     it gets generated.  */
+  DECL_EXTERNAL (func_decl) = 1;
+  /* This will be of type IFUNCs have to be externally visible.  */
+  TREE_PUBLIC (func_decl) = 1;
+
+  return func_decl;  
+}
+
+/* Returns true if decl is multi-versioned and DECL is the default function,
+   that is it is not tagged with target specific optimization.  */
+
+static bool
+is_function_default_version (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL_TREE);
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher
+   by the front-end.  It also chains the cgraph nodes of all the
+   semantically identical versions in vector FN_VER_VEC_P.  Returns the
+   decl of the dispatcher function.  */
+
+static tree
+ix86_get_function_versions_dispatcher (void *fn_ver_vec_p)
+{
+  struct cgraph_node *node = NULL;
+  struct cgraph_node *default_node = NULL;
+  struct cgraph_node *curr_node = NULL;
+  int ix;
+  tree ele;
+  tree dispatch_decl = NULL;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+
+  fn_ver_vec = (VEC (tree,heap) *) fn_ver_vec_p;
+  gcc_assert (fn_ver_vec != NULL);
+
+  /* Find the default version.  */
+  for (ix = 0; VEC_iterate (tree, fn_ver_vec, ix, ele); ++ix)
+    {
+      if (is_function_default_version (ele))
+	{
+	  default_node = cgraph_get_create_node (ele);
+	  break;
+	}
+    }
+
+  /* If there is no default node, just return NULL.  */
+  if (!default_node)
+    return NULL;
+
+  if (default_node->version_dispatcher_decl)
+    return default_node->version_dispatcher_decl;
+
+#if defined (ASM_OUTPUT_TYPE_DIRECTIVE) && HAVE_GNU_INDIRECT_FUNCTION
+  /* Right now, the dispatching is done via ifunc.  */
+  dispatch_decl = make_dispatcher_decl (default_node->symbol.decl); 
+#else
+  error_at (DECL_SOURCE_LOCATION (default_node->symbol.decl),
+	    "Multiversioning needs ifunc which is not supported "
+	    "in this configuration");
+#endif
+  default_node->version_dispatcher_decl = dispatch_decl;
+  curr_node = cgraph_get_create_node (dispatch_decl);
+  gcc_assert (curr_node);
+  curr_node->dispatcher_function = 1;
+  cgraph_mark_address_taken_node (default_node);
+
+  for (ix = 0; VEC_iterate (tree, fn_ver_vec, ix, ele); ++ix)
+    {
+      node = cgraph_get_create_node (ele);
+      gcc_assert (node != NULL && DECL_FUNCTION_VERSIONED (ele));
+      if (node == default_node)
+	continue;
+      gcc_assert (DECL_FUNCTION_SPECIFIC_TARGET (ele) != NULL_TREE);
+
+      if (curr_node->next_function_version)
+ 	{
+	  node->next_function_version = curr_node->next_function_version;
+	  curr_node->next_function_version->prev_function_version = node;
+	}
+      curr_node->next_function_version = node;
+      node->prev_function_version = curr_node;
+      node->version_dispatcher_decl = dispatch_decl;
+    }
+
+  /* The default version should be the first node.  */
+  default_node->next_function_version = curr_node->next_function_version;
+  curr_node->next_function_version->prev_function_version = default_node;
+  curr_node->next_function_version = default_node;
+  
+  return dispatch_decl;
+}
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* Make the resolver function decl to dispatch the versions of
+   a multi-versioned function,  DEFAULT_DECL.  Create an
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_resolver_func (const tree default_decl,
+		    const tree dispatch_decl,
+		    basic_block *empty_bb)
+{
+  char *resolver_name;
+  tree decl, type, decl_name, t;
+  basic_block new_bb;
+  tree old_current_function_decl;
+  bool is_uniq = false;
+
+  /* IFUNC's have to be globally visible.  So, if the default_decl is
+     not, then the name of the IFUNC should be made unique.  */
+  if (TREE_PUBLIC (default_decl) == 0)
+    is_uniq = true;
+
+  /* Append the filename to the resolver function if the versions are
+     not externally visible.  This is because the resolver function has
+     to be externally visible for the loader to find it.  So, appending
+     the filename will prevent conflicts with a resolver function from
+     another module which is based on the same version name.  */
+  resolver_name = make_name (default_decl, "resolver", is_uniq);
+
+  /* The resolver function should return a (void *). */
+  type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  decl_name = get_identifier (resolver_name);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  /* IFUNC resolvers have to be externally visible.  */
+  TREE_PUBLIC (decl) = 1;
+  DECL_UNINLINABLE (decl) = 1;
+
+  DECL_EXTERNAL (decl) = 0;
+  DECL_EXTERNAL (dispatch_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+  TREE_READONLY (decl) = 0;
+  DECL_PURE_P (decl) = 0;
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
+      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
+    }
+  else if (TREE_PUBLIC (default_decl))
+    {
+      /* In this case, each translation unit with a call to this
+	 versioned function will put out a resolver.  Ensure it
+	 is comdat to keep just one copy.  */
+      DECL_COMDAT (decl) = 1;
+      make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+    }
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  current_function_decl = decl;
+  gimple_register_cfg_hooks ();
+  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
+  cfun->curr_properties |=
+    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_ssa
+     | PROP_gimple_any);
+  cfun->curr_properties = 15;
+  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
+  *empty_bb = new_bb;
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      gcc_assert (cgraph_get_node (default_decl));
+      symtab_add_to_same_comdat_group (
+	(symtab_node) cgraph_get_node (decl),
+	(symtab_node) cgraph_get_node (default_decl));
+    }
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  gcc_assert (dispatch_decl != NULL);
+  /* Mark dispatch_decl as "ifunc" with resolver as resolver_name.  */
+  DECL_ATTRIBUTES (dispatch_decl) 
+    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));
+
+  /* Create the alias for dispatch to resolver here.  */
+  cgraph_create_function_alias (dispatch_decl, decl);
+  return decl;
+}
+
+/* Generate the dispatching code body to dispatch multi-versioned function
+   DECL.  The target hook is called to process the "target" attributes and
+   provide the code to dispatch the right function at run-time.  NODE points
+   to the dispatcher decl whose body will be created.  */
+
+static tree 
+ix86_generate_version_dispatcher_body (void *node_p)
+{
+  tree resolver_decl;
+  basic_block empty_bb;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  tree old_current_function_decl;
+  tree default_ver_decl;
+  struct cgraph_node *versn;
+  struct cgraph_node *node;
+
+  node = (cgraph_node *)node_p;
+
+  gcc_assert (node->dispatcher_function);
+
+  if (node->version_dispatcher_decl)
+    return node->version_dispatcher_decl;
+
+  default_ver_decl = node->next_function_version->symbol.decl;
+  resolver_decl = make_resolver_func (default_ver_decl,
+				      node->symbol.decl, &empty_bb);
+  node->version_dispatcher_decl = resolver_decl;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (resolver_decl));
+  current_function_decl = resolver_decl;
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+
+  for (versn = node->next_function_version; versn;
+       versn = versn->next_function_version)
+    {
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (versn->symbol.decl))
+        error_at (DECL_SOURCE_LOCATION (versn->symbol.decl),
+		  "Virtual function multiversioning not supported");
+      VEC_safe_push (tree, heap, fn_ver_vec, versn->symbol.decl);
+    }
+
+  dispatch_function_versions(resolver_decl, fn_ver_vec, &empty_bb);
+
+  rebuild_cgraph_edges (); 
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+  return resolver_decl;
+}
+
 /* This builds the processor_model struct type defined in
    libgcc/config/i386/cpuinfo.c  */
 
@@ -40652,6 +41628,20 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN ix86_fold_builtin
 
+#undef TARGET_COMPARE_VERSION_PRIORITY
+#define TARGET_COMPARE_VERSION_PRIORITY ix86_compare_version_priority
+
+#undef TARGET_SET_VERSION_ASSEMBLER_NAME
+#define TARGET_SET_VERSION_ASSEMBLER_NAME ix86_set_version_assembler_name
+
+#undef TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+#define TARGET_GET_FUNCTION_VERSIONS_DISPATCHER \
+  ix86_get_function_versions_dispatcher
+
+#undef TARGET_GENERATE_VERSION_DISPATCHER_BODY
+#define TARGET_GENERATE_VERSION_DISPATCHER_BODY \
+  ix86_generate_version_dispatcher_body
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
 
@@ -40789,6 +41779,9 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_OPTION_PRINT
 #define TARGET_OPTION_PRINT ix86_function_specific_print
 
+#undef TARGET_OPTION_FUNCTION_VERSIONS
+#define TARGET_OPTION_FUNCTION_VERSIONS ix86_function_versions
+
 #undef TARGET_CAN_INLINE_P
 #define TARGET_CAN_INLINE_P ix86_can_inline_p
 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-08-25  0:34                                                                   ` Sriraman Tallam
@ 2012-09-18 16:29                                                                     ` Sriraman Tallam
  2012-10-05 17:07                                                                       ` Xinliang David Li
  2012-10-05 17:44                                                                     ` Jason Merrill
  2012-10-05 18:32                                                                     ` Jason Merrill
  2 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-09-18 16:29 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Xinliang David Li, mark, nathan, H.J. Lu, Richard Guenther,
	Jan Hubicka, Uros Bizjak, reply, GCC Patches

Ping.

On Fri, Aug 24, 2012 at 5:34 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi Jason,
>
>    I have created a new patch to use target hooks for all the
> functionality and make the front-end just call the target hooks at the
> appropriate places. This is more like what you suggested in a previous
> mail. In particular, target hooks address the following questions:
>
> * Determine if two function decls with the same signature are versions.
> * Determine the new assembler name of a function version.
> * Generate the dispatcher function for a set of function versions.
> * Compare versions to see if one has a higher priority over the other.
>
> Patch attached and also available for review at:
>
> http://codereview.appspot.com/5752064/
>
> Hope this is more along the lines of what you had in mind, please let
> me know what you think.
>
> Thanks,
> -Sri.
>
>
> On Mon, Jul 30, 2012 at 12:01 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Thu, Jul 19, 2012 at 1:39 PM, Jason Merrill <jason@redhat.com> wrote:
>>>
>>> On 07/10/2012 03:14 PM, Sriraman Tallam wrote:
>>>>
>>>> I am using the questions you asked previously
>>>> to explain how I solved each of them. When working on this patch, these
>>>> are the exact questions I had and tried to address it.
>>>>
>>>> * Does this attribute affect a function signature?
>>>>
>>>> The function signature should be changed when there is more than one
>>>> definition/declaration of foo distinguished by unique target attributes.
>>>
>>> >[...]
>>>
>>> I agree.  I was trying to suggest that these questions are what the front end needs to care about, not about versioning specifically.  If these questions are turned into target hooks, all of the logic specific to versioning can be contained in the target.
>>>
>>> My only question intended to be answered by humans is, do people think moving the versioning logic behind more generic target hooks is worthwhile?
>>
>> I have  some comments related
>>
>> For the example below,
>>
>> // Default version.
>> int foo ()
>> {
>>   .....
>> }
>>
>> // Version  XXX feature supported by Target ABC.
>> int foo __attribute__ ((target ("XXX")))
>> {
>>    ....
>> }
>>
>> How should the second version of foo be treated for targets where
>> feature XXX is not supported? Right now, I am working on having my
>> patch completely ignore such function versions when compiled for
>> targets that do not understand the attribute. I could move this check
>> into a generic target hook so that a function definition that does not
>> make sense for the current target is ignored.
>>
>> Also, currently the patch uses target hooks to do the following:
>>
>> - Find if a particular version can be called directly, rather than go
>> through the dispatcher.
>> - Determine what the dispatcher body should be.
>> - Determining the order in which function versions must be dispatched.
>>
>> I do not have a strong opinion on whether the entire logic should be
>> based on target hooks.
>>
>> Thanks,
>> -Sri.
>>
>>>
>>>
>>>
>>> Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-09-18 16:29                                                                     ` Sriraman Tallam
@ 2012-10-05 17:07                                                                       ` Xinliang David Li
  0 siblings, 0 replies; 93+ messages in thread
From: Xinliang David Li @ 2012-10-05 17:07 UTC (permalink / raw)
  To: Sriraman Tallam, Jason Merrill
  Cc: mark, nathan, H.J. Lu, Richard Guenther, Jan Hubicka,
	Uros Bizjak, reply, GCC Patches

Hi Jason, Sri has addressed the comments you had on FE part. Can you
take a look if it is ok?   Stage-1 is going to be closed soon, and we
hope to get this major feature in 4.8.

thanks,

David



On Tue, Sep 18, 2012 at 9:29 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> Ping.
>
> On Fri, Aug 24, 2012 at 5:34 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi Jason,
>>
>>    I have created a new patch to use target hooks for all the
>> functionality and make the front-end just call the target hooks at the
>> appropriate places. This is more like what you suggested in a previous
>> mail. In particular, target hooks address the following questions:
>>
>> * Determine if two function decls with the same signature are versions.
>> * Determine the new assembler name of a function version.
>> * Generate the dispatcher function for a set of function versions.
>> * Compare versions to see if one has a higher priority over the other.
>>
>> Patch attached and also available for review at:
>>
>> http://codereview.appspot.com/5752064/
>>
>> Hope this is more along the lines of what you had in mind, please let
>> me know what you think.
>>
>> Thanks,
>> -Sri.
>>
>>
>> On Mon, Jul 30, 2012 at 12:01 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> On Thu, Jul 19, 2012 at 1:39 PM, Jason Merrill <jason@redhat.com> wrote:
>>>>
>>>> On 07/10/2012 03:14 PM, Sriraman Tallam wrote:
>>>>>
>>>>> I am using the questions you asked previously
>>>>> to explain how I solved each of them. When working on this patch, these
>>>>> are the exact questions I had and tried to address it.
>>>>>
>>>>> * Does this attribute affect a function signature?
>>>>>
>>>>> The function signature should be changed when there is more than one
>>>>> definition/declaration of foo distinguished by unique target attributes.
>>>>
>>>> >[...]
>>>>
>>>> I agree.  I was trying to suggest that these questions are what the front end needs to care about, not about versioning specifically.  If these questions are turned into target hooks, all of the logic specific to versioning can be contained in the target.
>>>>
>>>> My only question intended to be answered by humans is, do people think moving the versioning logic behind more generic target hooks is worthwhile?
>>>
>>> I have  some comments related
>>>
>>> For the example below,
>>>
>>> // Default version.
>>> int foo ()
>>> {
>>>   .....
>>> }
>>>
>>> // Version  XXX feature supported by Target ABC.
>>> int foo __attribute__ ((target ("XXX")))
>>> {
>>>    ....
>>> }
>>>
>>> How should the second version of foo be treated for targets where
>>> feature XXX is not supported? Right now, I am working on having my
>>> patch completely ignore such function versions when compiled for
>>> targets that do not understand the attribute. I could move this check
>>> into a generic target hook so that a function definition that does not
>>> make sense for the current target is ignored.
>>>
>>> Also, currently the patch uses target hooks to do the following:
>>>
>>> - Find if a particular version can be called directly, rather than go
>>> through the dispatcher.
>>> - Determine what the dispatcher body should be.
>>> - Determining the order in which function versions must be dispatched.
>>>
>>> I do not have a strong opinion on whether the entire logic should be
>>> based on target hooks.
>>>
>>> Thanks,
>>> -Sri.
>>>
>>>>
>>>>
>>>>
>>>> Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-08-25  0:34                                                                   ` Sriraman Tallam
  2012-09-18 16:29                                                                     ` Sriraman Tallam
@ 2012-10-05 17:44                                                                     ` Jason Merrill
  2012-10-05 18:14                                                                       ` Jason Merrill
  2012-10-05 21:58                                                                       ` Sriraman Tallam
  2012-10-05 18:32                                                                     ` Jason Merrill
  2 siblings, 2 replies; 93+ messages in thread
From: Jason Merrill @ 2012-10-05 17:44 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Xinliang David Li, mark, nathan, H.J. Lu, Richard Guenther,
	Jan Hubicka, Uros Bizjak, reply, gcc-patches

On 08/24/2012 08:34 PM, Sriraman Tallam wrote:
> +  /* If the address of a multiversioned function dispatcher is taken,
> +     generate the body to dispatch the right function at run-time.  This
> +     is needed as the address can be used to do an indirect call.  */

It seems to me that you don't need a dispatcher for doing indirect 
calls; you could just take the address of the version you would choose 
if you were doing a direct call.

The only reason for a dispatcher I can think of is if you want the 
address of a function to compare equal across translation units compiled 
with different target flags.  I'm not sure that's necessary; am I 
missing something?

Continuing to look at the patch.

Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-10-05 17:44                                                                     ` Jason Merrill
@ 2012-10-05 18:14                                                                       ` Jason Merrill
  2012-10-05 21:58                                                                       ` Sriraman Tallam
  1 sibling, 0 replies; 93+ messages in thread
From: Jason Merrill @ 2012-10-05 18:14 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Xinliang David Li, mark, nathan, H.J. Lu, Richard Guenther,
	Jan Hubicka, Uros Bizjak, reply, gcc-patches

On 10/05/2012 01:43 PM, Jason Merrill wrote:
> On 08/24/2012 08:34 PM, Sriraman Tallam wrote:
>> +  /* If the address of a multiversioned function dispatcher is taken,
>> +     generate the body to dispatch the right function at run-time.  This
>> +     is needed as the address can be used to do an indirect call.  */
>
> It seems to me that you don't need a dispatcher for doing indirect
> calls; you could just take the address of the version you would choose
> if you were doing a direct call.

Oh, I see you use the dispatcher for direct calls as well.  Why is that? 
  Why do you do direct calls when the function is inlineable, but not 
otherwise?

Jason


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-08-25  0:34                                                                   ` Sriraman Tallam
  2012-09-18 16:29                                                                     ` Sriraman Tallam
  2012-10-05 17:44                                                                     ` Jason Merrill
@ 2012-10-05 18:32                                                                     ` Jason Merrill
  2012-10-11  0:13                                                                       ` Sriraman Tallam
  2 siblings, 1 reply; 93+ messages in thread
From: Jason Merrill @ 2012-10-05 18:32 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Xinliang David Li, mark, nathan, H.J. Lu, Richard Guenther,
	Jan Hubicka, Uros Bizjak, reply, gcc-patches

On 08/24/2012 08:34 PM, Sriraman Tallam wrote:
> +	  /* For function versions, their parms and types match
> +	     but they are not duplicates.  Record function versions
> +	     as and when they are found.  */
> +	  if (TREE_CODE (fn) == FUNCTION_DECL
> +	      && TREE_CODE (method) == FUNCTION_DECL
> +	      && (DECL_FUNCTION_SPECIFIC_TARGET (fn)
> +		  || DECL_FUNCTION_SPECIFIC_TARGET (method))
> +	      && targetm.target_option.function_versions (fn, method))
> + 	    {
> +	      targetm.set_version_assembler_name (fn);
> +	      targetm.set_version_assembler_name (method);
> +	      continue;
> +	    }

This seems like an odd place to be setting assembler names; better to 
just have the existing mangle_decl_assembler_name hook add the 
appropriate suffix when it's called normally.

> +	 Also, mark this function as needed if it is marked inline but
> +	 is a multi-versioned function.  */

Why?  If it's used, it should be marked needed though the normal process.

> +	    error_at (location_of (DECL_NAME (OVL_CURRENT (fn))),
> +		      "Call to multiversioned function %<%D(%A)%> with"
> +		      " no default version", DECL_NAME (OVL_CURRENT (fn)),
> +		      build_tree_list_vec (*args));

location_of just returns input_location if you ask it for the location 
of an identifier, so you might as well use error with no explicit 
location.  And why not print candidates->fn instead of pasting the 
name/args?  Also, lowercase "call".

> +    {
> +      tree dispatcher_decl = NULL;
> +      struct cgraph_node *node = cgraph_get_node (fn);
> +      if (node != NULL)
> +        dispatcher_decl = cgraph_get_node (fn)->version_dispatcher_decl;
> +      if (dispatcher_decl == NULL)
> +	{
> +	  error_at (input_location, "Call to multiversioned function"
> +		    " without a default is not allowed");
> +	  return NULL;
> +	}
> +      retrofit_lang_decl (dispatcher_decl);
> +      gcc_assert (dispatcher_decl != NULL);
> +      fn = dispatcher_decl;
> +    }

Let's move this logic into a separate function that returns the 
dispatcher function.

> +      /* Both functions must be marked versioned.  */
> +      gcc_assert (DECL_FUNCTION_VERSIONED (cand1->fn)
> +		  && DECL_FUNCTION_VERSIONED (cand2->fn));

Why can't you compare a versioned function and a non-versioned one?

The code in joust should go further down in the function, before the 
handling of two declarations of the same function.

> +  /* For multiversioned functions, aggregate all the versions here for
> +     generating the dispatcher body later if necessary.  */
> +
> +  if (TREE_CODE (candidates->fn) == FUNCTION_DECL
> +      && DECL_FUNCTION_VERSIONED (candidates->fn))
> +    {
> +      VEC (tree, heap) *fn_ver_vec = NULL;
> +      struct z_candidate *ver = candidates;
> +      fn_ver_vec = VEC_alloc (tree, heap, 2);
> +      for (;ver; ver = ver->next)
> +        VEC_safe_push (tree, heap, fn_ver_vec, ver->fn);
> +      gcc_assert (targetm.get_function_versions_dispatcher);
> +      targetm.get_function_versions_dispatcher (fn_ver_vec);
> +      VEC_free (tree, heap, fn_ver_vec);
> +    }

This seems to assume that all the functions in the list of candidates 
are versioned, but there might be unrelated functions from different 
namespaces.  Also, doing this every time someone calls a versioned 
function seems like the wrong place; I would think it would be better to 
build up a list of versions as you seed declarations, and then use that 
list to define the dispatcher at EOF if it's needed.

> +      if (TREE_CODE (decl) == FUNCTION_DECL
> +          && DECL_FUNCTION_VERSIONED (decl)
> +	  && DECL_ASSEMBLER_NAME_SET_P (decl))
> +	write_source_name (DECL_ASSEMBLER_NAME (decl));
> +      else
> +	write_source_name (DECL_NAME (decl));

Again, I think it's better to handle the suffix via 
mangle_decl_assembler_name.

Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-10-05 17:44                                                                     ` Jason Merrill
  2012-10-05 18:14                                                                       ` Jason Merrill
@ 2012-10-05 21:58                                                                       ` Sriraman Tallam
  2012-10-05 22:50                                                                         ` Jason Merrill
  1 sibling, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-10-05 21:58 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Xinliang David Li, mark, nathan, H.J. Lu, Richard Guenther,
	Jan Hubicka, Uros Bizjak, reply, GCC Patches

On Fri, Oct 5, 2012 at 10:43 AM, Jason Merrill <jason@redhat.com> wrote:
> On 08/24/2012 08:34 PM, Sriraman Tallam wrote:
>>
>> +  /* If the address of a multiversioned function dispatcher is taken,
>> +     generate the body to dispatch the right function at run-time.  This
>>
>> +     is needed as the address can be used to do an indirect call.  */
>
>
> It seems to me that you don't need a dispatcher for doing indirect calls;
> you could just take the address of the version you would choose if you were
> doing a direct call.
>
> The only reason for a dispatcher I can think of is if you want the address
> of a function to compare equal across translation units compiled with
> different target flags.  I'm not sure that's necessary; am I missing
> something?

In general, the dispatcher is always necessary since it is not known
what function version will be called at compile time. This is true
whether it is a direct or an indirect call.

Example:
int foo() __attribute__(sse3)
{
}

int foo () __attribute__(sse4)
{
}

int main ()
{
   foo (); // The version of foo to be called is not known at compile
time. Needs dispatcher.
   int (*p)() = &foo; // What should be the value of p?
   (*p)(); // This needs a dispatcher too.
}

Now, since a dispatcher is necessary when the address of the function
is taken, I thought I could as well make it the address of the
function.

Thanks,
-Sri.

>
> Continuing to look at the patch.
>
> Jason
>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-10-05 21:58                                                                       ` Sriraman Tallam
@ 2012-10-05 22:50                                                                         ` Jason Merrill
  2012-10-05 23:45                                                                           ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: Jason Merrill @ 2012-10-05 22:50 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Xinliang David Li, mark, nathan, H.J. Lu, Richard Guenther,
	Jan Hubicka, Uros Bizjak, reply, GCC Patches

On 10/05/2012 05:57 PM, Sriraman Tallam wrote:
> In general, the dispatcher is always necessary since it is not known
> what function version will be called at compile time. This is true
> whether it is a direct or an indirect call.

So you want to compile with lowest common denominator flags and then 
choose a faster version at runtime based on the running configuration? 
I see.

Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-10-05 22:50                                                                         ` Jason Merrill
@ 2012-10-05 23:45                                                                           ` Sriraman Tallam
  0 siblings, 0 replies; 93+ messages in thread
From: Sriraman Tallam @ 2012-10-05 23:45 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Xinliang David Li, mark, nathan, H.J. Lu, Richard Guenther,
	Jan Hubicka, Uros Bizjak, reply, GCC Patches

On Fri, Oct 5, 2012 at 3:50 PM, Jason Merrill <jason@redhat.com> wrote:
> On 10/05/2012 05:57 PM, Sriraman Tallam wrote:
>>
>> In general, the dispatcher is always necessary since it is not known
>> what function version will be called at compile time. This is true
>> whether it is a direct or an indirect call.
>
>
> So you want to compile with lowest common denominator flags and then choose
> a faster version at runtime based on the running configuration? I see.
>

Yes.

Thanks,
-Sri.

> Jason
>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-10-05 18:32                                                                     ` Jason Merrill
@ 2012-10-11  0:13                                                                       ` Sriraman Tallam
  2012-10-12 22:41                                                                         ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-10-11  0:13 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Xinliang David Li, mark, nathan, H.J. Lu, Richard Guenther,
	Jan Hubicka, Uros Bizjak, reply, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 5222 bytes --]

Hi Jason,

   I have addressed all your comments and attached the new patch.

On Fri, Oct 5, 2012 at 11:32 AM, Jason Merrill <jason@redhat.com> wrote:
> On 08/24/2012 08:34 PM, Sriraman Tallam wrote:
>>
>> +         /* For function versions, their parms and types match
>> +            but they are not duplicates.  Record function versions
>> +            as and when they are found.  */
>> +         if (TREE_CODE (fn) == FUNCTION_DECL
>> +             && TREE_CODE (method) == FUNCTION_DECL
>> +             && (DECL_FUNCTION_SPECIFIC_TARGET (fn)
>> +                 || DECL_FUNCTION_SPECIFIC_TARGET (method))
>> +             && targetm.target_option.function_versions (fn, method))
>> +           {
>> +             targetm.set_version_assembler_name (fn);
>> +             targetm.set_version_assembler_name (method);
>> +             continue;
>> +           }
>
>
> This seems like an odd place to be setting assembler names; better to just
> have the existing mangle_decl_assembler_name hook add the appropriate suffix
> when it's called normally.

I moved this to mangle_decl_assembler_name. Still,  functions may go
from not being a version to then becoming versions after a new
definition is detected. In such cases, I explicitly call mangle_decl
to modify the assembler name.

>
>
>> +        Also, mark this function as needed if it is marked inline but
>> +        is a multi-versioned function.  */
>
>
> Why?  If it's used, it should be marked needed though the normal process.

How do I do this? If a versioned function is marked inline, I need to
keep it but it has no explicit callers. How do I mark that it is
needed?

>
>> +           error_at (location_of (DECL_NAME (OVL_CURRENT (fn))),
>> +                     "Call to multiversioned function %<%D(%A)%> with"
>> +                     " no default version", DECL_NAME (OVL_CURRENT (fn)),
>> +                     build_tree_list_vec (*args));
>
>
> location_of just returns input_location if you ask it for the location of an
> identifier, so you might as well use error with no explicit location.  And
> why not print candidates->fn instead of pasting the name/args?  Also,
> lowercase "call".

I removed this since the check already happens elsewhere.

>
>> +    {
>> +      tree dispatcher_decl = NULL;
>> +      struct cgraph_node *node = cgraph_get_node (fn);
>> +      if (node != NULL)
>> +        dispatcher_decl = cgraph_get_node (fn)->version_dispatcher_decl;
>> +      if (dispatcher_decl == NULL)
>> +       {
>> +         error_at (input_location, "Call to multiversioned function"
>> +                   " without a default is not allowed");
>> +         return NULL;
>> +       }
>> +      retrofit_lang_decl (dispatcher_decl);
>> +      gcc_assert (dispatcher_decl != NULL);
>> +      fn = dispatcher_decl;
>> +    }
>
>
> Let's move this logic into a separate function that returns the dispatcher
> function.

Done.

>
>> +      /* Both functions must be marked versioned.  */
>> +      gcc_assert (DECL_FUNCTION_VERSIONED (cand1->fn)
>> +                 && DECL_FUNCTION_VERSIONED (cand2->fn));
>
>
> Why can't you compare a versioned function and a non-versioned one?

Right, there was a big bug in my code. I have changed this now. This
should address your question.

>
> The code in joust should go further down in the function, before the
> handling of two declarations of the same function.

Done.

>
>> +  /* For multiversioned functions, aggregate all the versions here for
>> +     generating the dispatcher body later if necessary.  */
>> +
>> +  if (TREE_CODE (candidates->fn) == FUNCTION_DECL
>> +      && DECL_FUNCTION_VERSIONED (candidates->fn))
>> +    {
>>
>> +      VEC (tree, heap) *fn_ver_vec = NULL;
>> +      struct z_candidate *ver = candidates;
>>
>> +      fn_ver_vec = VEC_alloc (tree, heap, 2);
>> +      for (;ver; ver = ver->next)
>> +        VEC_safe_push (tree, heap, fn_ver_vec, ver->fn);
>> +      gcc_assert (targetm.get_function_versions_dispatcher);
>> +      targetm.get_function_versions_dispatcher (fn_ver_vec);
>> +      VEC_free (tree, heap, fn_ver_vec);
>> +    }
>
>
> This seems to assume that all the functions in the list of candidates are
> versioned, but there might be unrelated functions from different namespaces.
> Also, doing this every time someone calls a versioned function seems like
> the wrong place; I would think it would be better to build up a list of
> versions as you seed declarations, and then use that list to define the
> dispatcher at EOF if it's needed.


This was the bug I was referring to earlier. I have moved this to a
separate function. I thought it is better to do this on demand. I have
changed the code so that the aggregation and dispatcher generation
happens exactly once.


>
>> +      if (TREE_CODE (decl) == FUNCTION_DECL
>> +          && DECL_FUNCTION_VERSIONED (decl)
>> +         && DECL_ASSEMBLER_NAME_SET_P (decl))
>> +       write_source_name (DECL_ASSEMBLER_NAME (decl));
>> +      else
>> +       write_source_name (DECL_NAME (decl));
>
>
> Again, I think it's better to handle the suffix via
> mangle_decl_assembler_name.

Removed.


Thanks for the comments. Please let me know what you think about the new patch.

-Sri.

>
> Jason
>

[-- Attachment #2: mv_fe_patch_101012.txt --]
[-- Type: text/plain, Size: 75986 bytes --]

Overview of the patch which adds support to specify function versions.  This is
only enabled for target i386.

Example:

int foo ();  /* Default version */
int foo () __attribute__ ((target("avx,popcnt")));/*Specialized for avx and popcnt */
int foo () __attribute__ ((target("arch=core2,ssse3")));/*Specialized for core2 and ssse3*/

int main ()
{
 int (*p)() = &foo;
 return foo () + (*p)();
}

int foo ()
{
 return 0;
}

int __attribute__ ((target("avx,popcnt")))
foo ()
{
 return 0;
}

int __attribute__ ((target("arch=core2,ssse3")))
foo ()
{
 return 0;
}

The above example has foo defined 3 times, but all 3 definitions of foo are
different versions of the same function. The call to foo in main, directly and
via a pointer, are calls to the multi-versioned function foo which is dispatched
to the right foo at run-time.

Front-end changes:

The front-end changes are calls at appropriate places to target hooks that
determine the following:

* Determine if two function decls with the same signature are versions.
* Determine the assembler name of a function version.
* Generate the dispatcher function for a set of function versions.
* Compare versions to see if one has a higher priority over the other.

All the implementation happens in the target-specific config/i386/i386.c.

What does the patch do?

* Tracking decls that correspond to function versions of function
name, say "foo":

When the front-end sees more than one decl for "foo", it calls a target hook to
determine if they are versions. To prevent duplicate definition errors with other
 versions of "foo", "decls_match" function in cp/decl.c is made to return false
 when 2 decls have are deemed versions by the target. This will make all function
versions of "foo" to be added to the overload list of "foo".

* Change the assembler names of the function versions.

For i386, the target changes the assembler names of the function versions by
 suffixing the sorted list of args to "target" to the function name of "foo". For
example, he assembler name of "void foo () __attribute__ ((target ("sse4")))" will
become _Z3foov.sse4.  The target hook mangle_decl_assembler_name is used for this.

* Overload resolution:

 Function "build_over_call" in cp/call.c sees a call to function
"foo", which is multi-versioned. The overload resolution happens in
function "joust" in "cp/call.c". Here, the call to "foo" has all
possible versions of "foo" as candidates. All the candidates of "foo" are
stored in the cgraph data structures. Each version of foo is chained in a 
doubly-linked list with the default function as the first element.  This allows
any pass to access all the semantically identical versions. A call to a
multi-versioned function will be replaced by a call to a dispatcher function,
determined by a target hook, to execute the right function version at run-time.

Optimization to directly call a version when possible:
Also, in joust, where overload resolution happens, a multiversioned function
resolution is made to return the most specialized version.  This is the version
that will be checked for dispatching first and is determined by the target.
Now, if the caller can inline this function version then a direct call is made
to this function version rather than go through the dispatcher. When a direct
call cannot be made, a call to the dispatcher function is created.

* Creating the dispatcher body.

The dispatcher body, called the resolver is made only when there is a call to a
multiversioned function dispatcher or the address of a function is taken. This
is generated during build_cgraph_edges for a call or cgraph_mark_address_taken
for a pointer reference. This is done by another target hook.

* Dispatch ordering.

The order in which the function versions are checked during dispatch is based
on a priority value assigned for the ISA that is catered. More specialized 
versions are checked for dispatching first.  This is to mitigate the ambiguity
that can arise when more than one function version is valid for execution on
a particular platform.  This is not a perfect solution and in future, the user
should be allowed to assign a dispatching priority value to each version.

	* doc/tm.texi.in (TARGET_OPTION_FUNCTION_VERSIONS): New hook description.
	* (TARGET_COMPARE_VERSION_PRIORITY): New hook description.
	* (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New hook description.
	* (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New hook description.
	* doc/tm.texi: Regenerate.
	* cgraphbuild.c (build_cgraph_edges): Generate body of multiversion
	function dispatcher.
	* c-family/c-common.c (handle_target_attribute): Warn for invalid attributes.
	* target.def (compare_version_priority): New target hook.
	* (generate_version_dispatcher_body): New target hook.
	* (get_function_versions_dispatcher): New target hook.
	* (function_versions): New target hook.
	* cgraph.c (cgraph_mark_address_taken_node): Generate body of multiversion
	function dispatcher.
	* cgraph.h (cgraph_node): New members version_dispatcher_decl,
	prev_function_version, next_function_version, dispatcher_function.
	* tree.h (DECL_FUNCTION_VERSIONED): New macro.
	(tree_function_decl): New bit-field versioned_function.
	* testsuite/g++.dg/mv1.C: New test.
	* testsuite/g++.dg/mv2.C: New test.
	* testsuite/g++.dg/mv3.C: New test.
	* testsuite/g++.dg/mv4.C: New test.
	* cp/class.c:
	(add_method): Change assembler names of function versions.
	(resolve_address_of_overloaded_function): Save all function
	version candidates. Create dispatcher decl and return address of
	dispatcher instead.
	* cp/decl.c (decls_match): Make decls unmatched for versioned
	functions.
	(duplicate_decls): Remove ambiguity for versioned functions. 
	(cxx_comdat_group): Make comdat group of versioned functions be the
	same.
	* cp/error.c (dump_exception_spec): Dump assembler name for function
	versions.
	* cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
	functions that are also marked inline.
	* cp/decl2.c:(check_classfn): Check attributes of versioned functions
	for match.
	* cp/call.c: (build_new_function_call): Check if versioned functions
	have a default version.
	(build_over_call): Make calls to multiversioned functions
	to call the dispatcher.
	(joust): For calls to multi-versioned functions, make the most
	specialized function version win.
	(tourney): Generate dispatcher decl for function versions.
	(get_function_version_dispatcher): New function.
	(generate_function_versions_dispatcher): New function.
	* Makefile.in: Add multiversion.o
	* config/i386/i386.c (add_condition_to_bb): New function.
	(get_builtin_code_for_version): New function.
	(ix86_compare_version_priority): New function.
	(feature_compare): New function.
	(dispatch_function_versions): New function.
	* (ix86_function_versions): New function.
	* (attr_strcmp): New function.
	* (sorted_attr_string): New function.
	* (ix86_mangle_function_version_assembler_name): New function.
	* (ix86_mangle_decl_assembler_name): New function.
	* (make_name): New function.
	* (make_dispatcher_decl): New function.
	* (is_function_default_version): New function.
	* (ix86_get_function_versions_dispatcher): New function.
	* (make_attribute): New function.
	* (make_resolver_func): New function.
	* (ix86_generate_version_dispatcher_body): New function.
	* (TARGET_COMPARE_VERSION_PRIORITY): New macro.
	* (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New macro.
	* (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New macro.
	* (TARGET_OPTION_FUNCTION_VERSIONS): New macro.
	* (TARGET_MANGLE_DECL_ASSEMBLER_NAME): New macro.
	

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 190493)
+++ gcc/doc/tm.texi	(working copy)
@@ -9894,6 +9894,11 @@ changed via the optimize attribute or pragma, see
 @code{TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE}
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_OPTION_FUNCTION_VERSIONS (tree @var{decl1}, tree @var{decl2})
+This target hook returns @code{true} if @var{FN1} and @var{FN2} are
+versions of the same function.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_CAN_INLINE_P (tree @var{caller}, tree @var{callee})
 This target hook returns @code{false} if the @var{caller} function
 cannot inline @var{callee}, based on target specific information.  By
@@ -10925,6 +10930,27 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_COMPARE_VERSION_PRIORITY (tree @var{decl1}, tree @var{decl2})
+This hook is used to compare the target attributes in two functions to
+figure out which function's features get higher priority.  This is used
+during multi-versioning to figure out the order of dispatching. @var{decl1}
+and @var{decl2} are the two function decls that will be compared.
+@end deftypefn
+
+@deftypefn {Target Hook} tree TARGET_GET_FUNCTION_VERSIONS_DISPATCHER (void *@var{arglist})
+This hook is used to get the dispatcher function for a set of function
+versions.  The dispatcher function is called to invoke the rignt function
+version at run-time. @var{arglist} is the vector of function versions
+that should be considered for dispatch.
+@end deftypefn
+
+@deftypefn {Target Hook} tree TARGET_GENERATE_VERSION_DISPATCHER_BODY (void *@var{arg})
+This hook is used to generate the dispatcher logic to invoke the right
+function version at runtime for a given set of function versions.
+@var{arg} points to the callgraph node of the dispatcher function whose
+body must be generated.
+@end deftypefn
+
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 190493)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -9763,6 +9763,11 @@ changed via the optimize attribute or pragma, see
 @code{TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE}
 @end deftypefn
 
+@hook TARGET_OPTION_FUNCTION_VERSIONS
+This target hook returns @code{true} if @var{FN1} and @var{FN2} are
+versions of the same function.
+@end deftypefn
+
 @hook TARGET_CAN_INLINE_P
 This target hook returns @code{false} if the @var{caller} function
 cannot inline @var{callee}, based on target specific information.  By
@@ -10783,6 +10788,27 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@hook TARGET_COMPARE_VERSION_PRIORITY
+This hook is used to compare the target attributes in two functions to
+figure out which function's features get higher priority.  This is used
+during multi-versioning to figure out the order of dispatching. @var{decl1}
+and @var{decl2} are the two function decls that will be compared.
+@end deftypefn
+
+@hook TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+This hook is used to get the dispatcher function for a set of function
+versions.  The dispatcher function is called to invoke the rignt function
+version at run-time. @var{arglist} is the vector of function versions
+that should be considered for dispatch.
+@end deftypefn
+
+@hook TARGET_GENERATE_VERSION_DISPATCHER_BODY
+This hook is used to generate the dispatcher logic to invoke the right
+function version at runtime for a given set of function versions.
+@var{arg} points to the callgraph node of the dispatcher function whose
+body must be generated.
+@end deftypefn
+
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/cgraphbuild.c
===================================================================
--- gcc/cgraphbuild.c	(revision 190493)
+++ gcc/cgraphbuild.c	(working copy)
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-utils.h"
 #include "except.h"
 #include "ipa-inline.h"
+#include "target.h"
 
 /* Context of record_reference.  */
 struct record_reference_ctx
@@ -288,7 +289,6 @@ mark_store (gimple stmt, tree t, void *data)
      }
   return false;
 }
-
 /* Create cgraph edges for function calls.
    Also look for functions and variables having addresses taken.  */
 
@@ -317,8 +317,22 @@ build_cgraph_edges (void)
 							 bb);
 	      decl = gimple_call_fndecl (stmt);
 	      if (decl)
-		cgraph_create_edge (node, cgraph_get_create_node (decl),
-				    stmt, bb->count, freq);
+		{
+		  struct cgraph_node *callee = cgraph_get_create_node (decl);
+	          /* If a call to a multiversioned function dispatcher is
+		     found, generate the body to dispatch the right function
+		     at run-time.  */
+		  if (callee->dispatcher_function)
+		    {
+		      tree resolver_decl;
+		      gcc_assert (callee->next_function_version);
+		      gcc_assert (targetm.generate_version_dispatcher_body);
+		      resolver_decl
+			 = targetm.generate_version_dispatcher_body (callee);
+		      gcc_assert (resolver_decl != NULL_TREE);
+		    }
+		  cgraph_create_edge (node, callee, stmt, bb->count, freq);
+	        }
 	      else
 		cgraph_create_indirect_edge (node, stmt,
 					     gimple_call_flags (stmt),
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 190493)
+++ gcc/c-family/c-common.c	(working copy)
@@ -8502,9 +8502,22 @@ handle_target_attribute (tree *node, tree name, tr
       warning (OPT_Wattributes, "%qE attribute ignored", name);
       *no_add_attrs = true;
     }
-  else if (! targetm.target_option.valid_attribute_p (*node, name, args,
-						      flags))
-    *no_add_attrs = true;
+  else
+    {
+      /* When a target attribute is invalid, it may also be because the
+	 target for the compilation unit and the attribute match.  For
+         instance, target attribute "xxx" is invalid when -mxxx is used.
+         When used with multiversioning, removing the attribute will lead
+         to duplicate definitions if a default version is provided.
+	 So, generate a warning here and remove the attribute.  */
+      if (!targetm.target_option.valid_attribute_p (*node, name, args, flags))
+	{
+	  warning (OPT_Wattributes,
+		   "Invalid target attribute in function %qE, ignored.",
+		   *node);
+	  *no_add_attrs = true;
+	}
+    }
 
   return NULL_TREE;
 }
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 190493)
+++ gcc/target.def	(working copy)
@@ -1298,6 +1298,31 @@ DEFHOOK
  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
  hook_tree_tree_int_treep_bool_null)
 
+/* Target hook to compare the priority of target features in function DECL1
+   and DECL2.  It returns positive value if DECL1 is higher priority,
+   negative value if DECL2 is higher priority and 0 if they are the
+   same.  */
+DEFHOOK
+(compare_version_priority,
+ "",
+ int, (tree decl1, tree decl2), NULL)
+
+/* Target hook to generate the dispatcher body for a function version
+   dispatcher ARG, which is a cgraph_node pointer.  */
+
+DEFHOOK
+(generate_version_dispatcher_body,
+ "",
+ tree, (void *arg), NULL) 
+
+/* Target hook to generate a function version dispatcher DECL for the list
+   of function versions in arglist, which is a vector of decls.  */
+
+DEFHOOK
+(get_function_versions_dispatcher,
+ "",
+ tree, (void *arglist), NULL)
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 DEFHOOK
@@ -2705,6 +2730,14 @@ DEFHOOK
  void, (void),
  hook_void_void)
 
+/* Returns true if DECL1 and DECL2 are versions of the same function.  */
+
+DEFHOOK
+(function_versions,
+ "",
+ bool, (tree decl1, tree decl2),
+ hook_bool_tree_tree_false)
+
 /* Function to determine if one function can inline another function.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_"
Index: gcc/cgraph.c
===================================================================
--- gcc/cgraph.c	(revision 190493)
+++ gcc/cgraph.c	(working copy)
@@ -1281,6 +1281,14 @@ cgraph_mark_address_taken_node (struct cgraph_node
   node->symbol.address_taken = 1;
   node = cgraph_function_or_thunk_node (node, NULL);
   node->symbol.address_taken = 1;
+  /* If the address of a multiversioned function dispatcher is taken,
+     generate the body to dispatch the right function at run-time.  This
+     is needed as the address can be used to do an indirect call.  */
+  if (node->dispatcher_function)
+    {
+      gcc_assert (node->next_function_version);
+      targetm.generate_version_dispatcher_body (node);
+    }
 }
 
 /* Return local info for the compiled function.  */
Index: gcc/cgraph.h
===================================================================
--- gcc/cgraph.h	(revision 190493)
+++ gcc/cgraph.h	(working copy)
@@ -220,6 +220,26 @@ struct GTY(()) cgraph_node {
   struct cgraph_node *prev_sibling_clone;
   struct cgraph_node *clones;
   struct cgraph_node *clone_of;
+
+  /* TODO: Put version_dispatcher_decl, prev_function_version,
+     next_function_version into a struct for readability.
+
+     If this node corresponds to a function version, this points
+     to the dispatcher function decl which is the function that must
+     be called to execute the right function version at run-time.
+
+     If this node is a dispatcher for function versions, this points
+     to resolver function, which holds  the function body for the
+     dispatcher.  */
+  tree version_dispatcher_decl;
+
+  /* Chains all the semantically identical function versions.  The
+     first function in this chain is the default function.  */
+  struct cgraph_node *prev_function_version;
+  /* If this node is a dispatcher for function versions, this also points
+     to the default function version.  */
+  struct cgraph_node *next_function_version;
+
   /* For functions with many calls sites it holds map from call expression
      to the edge to speed up cgraph_edge function.  */
   htab_t GTY((param_is (struct cgraph_edge))) call_site_hash;
@@ -271,6 +291,7 @@ struct GTY(()) cgraph_node {
   /* ?? We should be able to remove this.  We have enough bits in
      cgraph to calculate it.  */
   unsigned tm_clone : 1;
+  unsigned dispatcher_function : 1;
 };
 
 DEF_VEC_P(symtab_node);
@@ -648,6 +669,7 @@ void cgraph_rebuild_references (void);
 int compute_call_stmt_bb_frequency (tree, basic_block bb);
 void record_references_in_initializer (tree, bool);
 
+
 /* In ipa.c  */
 bool symtab_remove_unreachable_nodes (bool, FILE *);
 cgraph_node_set cgraph_node_set_new (void);
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 190493)
+++ gcc/tree.h	(working copy)
@@ -3436,6 +3436,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
 #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
 
+/* In FUNCTION_DECL, this is set if this function has other versions generated
+   using "target" attributes.  The default version is the one which does not
+   have any "target" attribute set. */
+#define DECL_FUNCTION_VERSIONED(NODE)\
+   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
+
 /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
    arguments/result/saved_tree fields by front ends.   It was either inherit
    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
@@ -3480,8 +3486,8 @@ struct GTY(()) tree_function_decl {
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
-
-  /* 1 bit left */
+  unsigned versioned_function : 1;
+  /* No bits left.  */
 };
 
 /* The source language of the translation-unit.  */
Index: gcc/testsuite/g++.dg/mv2.C
===================================================================
--- gcc/testsuite/g++.dg/mv2.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv2.C	(revision 0)
@@ -0,0 +1,119 @@
+/* Test case to check if Multiversioning chooses the correct
+   dispatching order when versions are for various ISAs.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -mno-sse -mno-mmx -mno-popcnt -mno-avx" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The dispatch checks should be in the exact reverse order of the
+   declarations below.  */
+int foo () __attribute__ ((target ("mmx")));
+int foo () __attribute__ ((target ("sse")));
+int foo () __attribute__ ((target ("sse2")));
+int foo () __attribute__ ((target ("sse3")));
+int foo () __attribute__ ((target ("ssse3")));
+int foo () __attribute__ ((target ("sse4.1")));
+int foo () __attribute__ ((target ("sse4.2")));
+int foo () __attribute__ ((target ("popcnt")));
+int foo () __attribute__ ((target ("avx")));
+int foo () __attribute__ ((target ("avx2")));
+
+int main ()
+{
+
+  int val = foo ();
+
+  if (__builtin_cpu_supports ("avx2"))
+    assert (val == 1);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("popcnt"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("sse4.2"))
+    assert (val == 4);
+  else if (__builtin_cpu_supports ("sse4.1"))
+    assert (val == 5);
+  else if (__builtin_cpu_supports ("ssse3"))
+    assert (val == 6);
+  else if (__builtin_cpu_supports ("sse3"))
+    assert (val == 7);
+  else if (__builtin_cpu_supports ("sse2"))
+    assert (val == 8);
+  else if (__builtin_cpu_supports ("sse"))
+    assert (val == 9);
+  else if (__builtin_cpu_supports ("mmx"))
+    assert (val == 10);
+  else
+    assert (val == 0);
+
+  return 0;
+}
+
+int
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 2;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return 1;
+}
Index: gcc/testsuite/g++.dg/mv4.C
===================================================================
--- gcc/testsuite/g++.dg/mv4.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv4.C	(revision 0)
@@ -0,0 +1,23 @@
+/* Test case to check if the compiler generates an error message
+   when the default version of a multiversioned function is absent
+   and its pointer is taken.  */
+
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -mno-sse -mno-popcnt" } */
+
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int main ()
+{
+  int (*p)() = &foo; /* { dg-error "Pointer to a multiversioned function without a default is not allowed" {} } */
+  return (*p)();
+}
Index: gcc/testsuite/g++.dg/mv1.C
===================================================================
--- gcc/testsuite/g++.dg/mv1.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv1.C	(revision 0)
@@ -0,0 +1,130 @@
+/* Test case to check if Multiversioning works.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -fPIC -mno-avx -mno-popcnt" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The other versions of foo.  Mix up the ordering and 
+   check if the dispatching does it in the order of priority. */
+/* Check combination of target attributes.  */
+int foo () __attribute__ ((target("arch=corei7,popcnt")));
+/* The target operands in this declaration and the definition are re-ordered.
+   This should still work.  */
+int foo () __attribute__ ((target("ssse3,avx2")));
+
+/* Check for all target attributes for which dispatchers are available.  */
+/* Check arch= */
+int foo () __attribute__((target("arch=core2")));
+int foo () __attribute__((target("arch=corei7")));
+int foo () __attribute__((target("arch=atom")));
+/* Check ISAs  */
+int foo () __attribute__((target("avx")));
+int foo () __attribute__ ((target("arch=core2,sse4.2")));
+/* Check more arch=.  */
+int foo () __attribute__((target("arch=amdfam10")));
+int foo () __attribute__((target("arch=bdver1")));
+int foo () __attribute__((target("arch=bdver2")));
+
+int (*p)() = &foo;
+int main ()
+{
+  int val = foo ();
+  assert (val ==  (*p)());
+
+  /* Check in the exact same order in which the dispatching
+     is expected to happen.  */
+  if (__builtin_cpu_is ("bdver1"))
+    assert (val == 1);
+  else if (__builtin_cpu_is ("bdver2"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("avx2")
+	   && __builtin_cpu_supports ("ssse3"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 4);
+  else if (__builtin_cpu_is ("corei7")
+	   && __builtin_cpu_supports ("popcnt"))
+    assert (val == 5);
+  else if (__builtin_cpu_is ("corei7"))
+    assert (val == 6);
+  else if (__builtin_cpu_is ("amdfam10h"))
+    assert (val == 7);
+  else if (__builtin_cpu_is ("core2")
+	   && __builtin_cpu_supports ("sse4.2"))
+    assert (val == 8);
+  else if (__builtin_cpu_is ("core2"))
+    assert (val == 9);
+  else if (__builtin_cpu_is ("atom"))
+    assert (val == 10);
+  else
+    assert (val == 0);
+  
+  return 0;
+}
+
+int foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=corei7,popcnt")))
+foo ()
+{
+  return 5;
+}
+int __attribute__ ((target("avx2,ssse3")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("arch=core2")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("arch=corei7")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("arch=atom")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("arch=core2,sse4.2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("arch=amdfam10")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("arch=bdver1")))
+foo ()
+{
+  return 1;
+}
+
+int __attribute__ ((target("arch=bdver2")))
+foo ()
+{
+  return 2;
+}
Index: gcc/testsuite/g++.dg/mv3.C
===================================================================
--- gcc/testsuite/g++.dg/mv3.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv3.C	(revision 0)
@@ -0,0 +1,37 @@
+/* Test case to check if a call to a multiversioned function
+   is replaced with a direct call to the particular version when
+   the most specialized version's target attributes match the
+   caller.  
+  
+   In this program, foo is multiversioned but there is no default
+   function.  This is an error if the call has to go through a
+   dispatcher.  However, the call to foo in bar can be replaced
+   with a direct call to the popcnt version of foo.  Hence, this
+   test should pass.  */
+
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -mno-sse -mno-popcnt" } */
+
+
+/* Default version.  */
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target ("popcnt")))
+bar ()
+{
+  return foo ();
+}
+
+int main ()
+{
+  return bar ();
+}
Index: gcc/cp/class.c
===================================================================
--- gcc/cp/class.c	(revision 190493)
+++ gcc/cp/class.c	(working copy)
@@ -1091,7 +1091,32 @@ add_method (tree type, tree method, tree using_dec
 	      || same_type_p (TREE_TYPE (fn_type),
 			      TREE_TYPE (method_type))))
 	{
-	  if (using_decl)
+	  /* For function versions, their parms and types match
+	     but they are not duplicates.  Record function versions
+	     as and when they are found.  */
+	  if (TREE_CODE (fn) == FUNCTION_DECL
+	      && TREE_CODE (method) == FUNCTION_DECL
+	      && (DECL_FUNCTION_SPECIFIC_TARGET (fn)
+		  || DECL_FUNCTION_SPECIFIC_TARGET (method))
+	      && targetm.target_option.function_versions (fn, method))
+ 	    {
+	      /* Mark functions as versions if necessary.  Modify the mangled
+		 decl name if necessary.  */
+	      if (!DECL_FUNCTION_VERSIONED (fn))
+		{
+		  DECL_FUNCTION_VERSIONED (fn) = 1;
+		  if (DECL_ASSEMBLER_NAME_SET_P (fn))
+		    mangle_decl (fn);
+		}
+	      if (!DECL_FUNCTION_VERSIONED (method))
+		{
+		  DECL_FUNCTION_VERSIONED (method) = 1;
+		  if (DECL_ASSEMBLER_NAME_SET_P (method))
+		    mangle_decl (method);
+		}
+	      continue;
+	    }
+	  else if (using_decl)
 	    {
 	      if (DECL_CONTEXT (fn) == type)
 		/* Defer to the local function.  */
@@ -6922,6 +6947,7 @@ resolve_address_of_overloaded_function (tree targe
   tree matches = NULL_TREE;
   tree fn;
   tree target_fn_type;
+  VEC (tree, heap) *fn_ver_vec = NULL;
 
   /* By the time we get here, we should be seeing only real
      pointer-to-member types, not the internal POINTER_TYPE to
@@ -6986,9 +7012,19 @@ resolve_address_of_overloaded_function (tree targe
 	  if (DECL_ANTICIPATED (fn))
 	    continue;
 
-	  /* See if there's a match.  */
+	  /* See if there's a match.   For functions that are multi-versioned,
+	     all the versions match.  */
 	  if (same_type_p (target_fn_type, static_fn_type (fn)))
-	    matches = tree_cons (fn, NULL_TREE, matches);
+	    {
+	      matches = tree_cons (fn, NULL_TREE, matches);
+	      /*If versioned, push all possible versions into a vector.  */
+	      if (DECL_FUNCTION_VERSIONED (fn))
+		{
+		  if (fn_ver_vec == NULL)
+		   fn_ver_vec = VEC_alloc (tree, heap, 2);
+		  VEC_safe_push (tree, heap, fn_ver_vec, fn); 
+		}
+	    }
 	}
     }
 
@@ -7080,13 +7116,26 @@ resolve_address_of_overloaded_function (tree targe
     {
       /* There were too many matches.  First check if they're all
 	 the same function.  */
-      tree match;
+      tree match = NULL_TREE;
 
       fn = TREE_PURPOSE (matches);
-      for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
-	if (!decls_match (fn, TREE_PURPOSE (match)))
-	  break;
 
+      /* For multi-versioned functions, more than one match is just fine.
+	 Call decls_match to make sure they are different because they are
+	 versioned.  */
+      if (DECL_FUNCTION_VERSIONED (fn))
+	{
+          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+  	    if (decls_match (fn, TREE_PURPOSE (match)))
+	      break;
+	}
+      else
+	{
+          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+  	    if (!decls_match (fn, TREE_PURPOSE (match)))
+	      break;
+	}
+
       if (match)
 	{
 	  if (flags & tf_error)
@@ -7148,6 +7197,28 @@ resolve_address_of_overloaded_function (tree targe
       perform_or_defer_access_check (access_path, fn, fn, flags);
     }
 
+  /* If a pointer to a function that is multi-versioned is requested, the
+     pointer to the dispatcher function is returned instead.  This works
+     well because indirectly calling the function will dispatch the right
+     function version at run-time.  */
+  if (DECL_FUNCTION_VERSIONED (fn))
+    {
+      tree dispatcher_decl = NULL;
+      gcc_assert (fn_ver_vec != NULL);
+      gcc_assert (targetm.get_function_versions_dispatcher);
+      dispatcher_decl = targetm.get_function_versions_dispatcher (fn_ver_vec);
+      if (!dispatcher_decl)
+	{
+	  error_at (input_location, "Pointer to a multiversioned function"
+		    " without a default is not allowed");
+	  return error_mark_node;
+	}
+      retrofit_lang_decl (dispatcher_decl);
+      mark_used (fn);
+      VEC_free (tree, heap, fn_ver_vec);
+      fn = dispatcher_decl;
+    }
+
   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
     return cp_build_addr_expr (fn, flags);
   else
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	(revision 190493)
+++ gcc/cp/decl.c	(working copy)
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "pointer-set.h"
 #include "splay-tree.h"
 #include "plugin.h"
+#include "cgraph.h"
 
 /* Possible cases of bad specifiers type used by bad_specifiers. */
 enum bad_spec_place {
@@ -972,6 +973,29 @@ decls_match (tree newdecl, tree olddecl)
       if (t1 != t2)
 	return 0;
 
+      /* The decls dont match if they correspond to two different versions
+	 of the same function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) 
+	  && targetm.target_option.function_versions (newdecl, olddecl))
+	{
+	  /* Mark functions as versions if necessary.  Modify the mangled decl
+	     name if necessary.  */
+	  if (!DECL_FUNCTION_VERSIONED (newdecl))
+	    {
+	      DECL_FUNCTION_VERSIONED (newdecl) = 1;
+	      if (DECL_ASSEMBLER_NAME_SET_P (newdecl))
+	        mangle_decl (newdecl);
+	    }
+	  if (!DECL_FUNCTION_VERSIONED (olddecl))
+	    {
+	      DECL_FUNCTION_VERSIONED (olddecl) = 1;
+	      if (DECL_ASSEMBLER_NAME_SET_P (olddecl))
+	       mangle_decl (olddecl);
+	    }
+	  return 0;
+	}
+
       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
 	  && ! (DECL_EXTERN_C_P (newdecl)
 		&& DECL_EXTERN_C_P (olddecl)))
@@ -1490,7 +1514,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
 	      error ("previous declaration %q+#D here", olddecl);
 	      return NULL_TREE;
 	    }
-	  else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
+	  /* For function versions, params and types match, but they
+	     are not ambiguous.  */
+	  else if ((!DECL_FUNCTION_VERSIONED (newdecl)
+		    && !DECL_FUNCTION_VERSIONED (olddecl))
+		   && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
 			      TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
 	    {
 	      error ("new declaration %q#D", newdecl);
@@ -2262,6 +2290,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
+  /* If the olddecl is a version, so is the newdecl.  */
+  if (TREE_CODE (newdecl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (olddecl))
+    DECL_FUNCTION_VERSIONED (newdecl) = 1;
+
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
     {
       int function_size;
@@ -14024,7 +14057,11 @@ cxx_comdat_group (tree decl)
 	  else
 	    break;
 	}
-      name = DECL_ASSEMBLER_NAME (decl);
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl))
+	name = DECL_NAME (decl);
+      else
+        name = DECL_ASSEMBLER_NAME (decl);
     }
 
   return name;
Index: gcc/cp/error.c
===================================================================
--- gcc/cp/error.c	(revision 190493)
+++ gcc/cp/error.c	(working copy)
@@ -1539,8 +1539,15 @@ dump_exception_spec (tree t, int flags)
 static void
 dump_function_name (tree t, int flags)
 {
-  tree name = DECL_NAME (t);
+  tree name;
 
+  /* For function versions, use the assembler name as the decl name is
+     the same for all versions.  */
+  if (DECL_FUNCTION_VERSIONED (t))
+    name = DECL_ASSEMBLER_NAME (t);
+  else
+    name = DECL_NAME (t);
+
   /* We can get here with a decl that was synthesized by language-
      independent machinery (e.g. coverage.c) in which case it won't
      have a lang_specific structure attached and DECL_CONSTRUCTOR_P
Index: gcc/cp/semantics.c
===================================================================
--- gcc/cp/semantics.c	(revision 190493)
+++ gcc/cp/semantics.c	(working copy)
@@ -3775,8 +3775,11 @@ expand_or_defer_fn_1 (tree fn)
       /* If the user wants us to keep all inline functions, then mark
 	 this function as needed so that finish_file will make sure to
 	 output it later.  Similarly, all dllexport'd functions must
-	 be emitted; there may be callers in other DLLs.  */
-      if ((flag_keep_inline_functions
+	 be emitted; there may be callers in other DLLs.
+	 Also, mark this function as needed if it is marked inline but
+	 is a multi-versioned function.  */
+      if (((flag_keep_inline_functions
+	    || DECL_FUNCTION_VERSIONED (fn))
 	   && DECL_DECLARED_INLINE_P (fn)
 	   && !DECL_REALLY_EXTERN (fn))
 	  || (flag_keep_inline_dllexport
Index: gcc/cp/decl2.c
===================================================================
--- gcc/cp/decl2.c	(revision 190493)
+++ gcc/cp/decl2.c	(working copy)
@@ -674,9 +674,13 @@ check_classfn (tree ctype, tree function, tree tem
 	  if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
 	    continue;
 
+	  /* While finding a match, same types and params are not enough
+	     if the function is versioned.  Also check version ("target")
+	     attributes.  */
 	  if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
 			   TREE_TYPE (TREE_TYPE (fndecl)))
 	      && compparms (p1, p2)
+	      && !targetm.target_option.function_versions (function, fndecl)
 	      && (!is_template
 		  || comp_template_parms (template_parms,
 					  DECL_TEMPLATE_PARMS (fndecl)))
Index: gcc/cp/call.c
===================================================================
--- gcc/cp/call.c	(revision 190493)
+++ gcc/cp/call.c	(working copy)
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "timevar.h"
+#include "cgraph.h"
 
 /* The various kinds of conversion.  */
 
@@ -6407,6 +6408,35 @@ magic_varargs_p (tree fn)
   return false;
 }
 
+/* Returns the decl of the dispatcher function if FN is a function version.  */
+
+static tree
+get_function_version_dispatcher (tree fn)
+{
+  tree dispatcher_decl = NULL;
+  struct cgraph_node *node = NULL;
+
+  gcc_assert (TREE_CODE (fn) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (fn));
+
+  node = cgraph_get_node (fn);
+
+  if (node != NULL)
+    dispatcher_decl = node->version_dispatcher_decl;
+  else
+    return NULL;
+
+  if (dispatcher_decl == NULL)
+    {
+      error_at (input_location, "Call to multiversioned function"
+                " without a default is not allowed");
+      return NULL;
+    }
+  retrofit_lang_decl (dispatcher_decl);
+  gcc_assert (dispatcher_decl != NULL);
+  return dispatcher_decl;
+}
+
 /* Subroutine of the various build_*_call functions.  Overload resolution
    has chosen a winning candidate CAND; build up a CALL_EXPR accordingly.
    ARGS is a TREE_LIST of the unconverted arguments to the call.  FLAGS is a
@@ -6858,6 +6888,20 @@ build_over_call (struct z_candidate *cand, int fla
   if (!already_used)
     mark_used (fn);
 
+  /* For calls to a multi-versioned function, overload resolution
+     returns the function with the highest target priority, that is,
+     the version that will checked for dispatching first.  If this
+     version is inlinable, a direct call to this version can be made
+     otherwise the call should go through the dispatcher.  */
+
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && !targetm.target_option.can_inline_p (current_function_decl, fn))
+    {
+      fn = get_function_version_dispatcher (fn);
+      if (fn == NULL)
+	return NULL;
+    }
+
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
     {
       tree t;
@@ -8136,6 +8180,38 @@ joust (struct z_candidate *cand1, struct z_candida
       && (IS_TYPE_OR_DECL_P (cand1->fn)))
     return 1;
 
+  /* For Candidates of a multi-versioned function,  make the version with
+     the highest priority win.  This version will be checked for dispatching
+     first.  If this version can be inlined into the caller, the front-end
+     will simply make a direct call to this function.  */
+
+  if (TREE_CODE (cand1->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (cand1->fn)
+      && TREE_CODE (cand2->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (cand2->fn))
+    {
+      tree f1 = TREE_TYPE (cand1->fn);
+      tree f2 = TREE_TYPE (cand2->fn);
+      tree p1 = TYPE_ARG_TYPES (f1);
+      tree p2 = TYPE_ARG_TYPES (f2);
+     
+      /* Check if cand1->fn and cand2->fn are versions of the same function.  It
+         is possible that cand1->fn and cand2->fn are function versions but of
+         different functions.  Check types to see if they are versions of the same
+         function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)))
+	{
+	  /* Always make the version with the higher priority, more
+	     specialized, win.  */
+	  gcc_assert (targetm.compare_version_priority);
+	  if (targetm.compare_version_priority (cand1->fn, cand2->fn) >= 0)
+	    return 1;
+	  else
+	    return -1;
+	}
+    }
+
   /* a viable function F1
      is defined to be a better function than another viable function F2  if
      for  all arguments i, ICSi(F1) is not a worse conversion sequence than
@@ -8456,6 +8532,37 @@ tweak:
   return 0;
 }
 
+/* Function FN is multi-versioned and CANDIDATES contains the list of all
+   overloaded candidates for FN.  This function extracts all functions from
+   CANDIDATES that are function versions of FN and generates a dispatcher
+   function for this multi-versioned function group.  */
+
+static void
+generate_function_versions_dispatcher (tree fn, struct z_candidate *candidates)
+{
+  tree f1 = TREE_TYPE (fn);
+  tree p1 = TYPE_ARG_TYPES (f1);
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  struct z_candidate *ver = candidates;
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+
+  for (;ver; ver = ver->next)
+    {
+      tree f2 = TREE_TYPE (ver->fn);
+      tree p2 = TYPE_ARG_TYPES (f2);
+      /* If this candidate is a version of FN, types must match.  */
+      if (DECL_FUNCTION_VERSIONED (ver->fn)
+          && compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)))
+        VEC_safe_push (tree, heap, fn_ver_vec, ver->fn);
+    }
+
+  gcc_assert (targetm.get_function_versions_dispatcher);
+  targetm.get_function_versions_dispatcher (fn_ver_vec);
+  VEC_free (tree, heap, fn_ver_vec); 
+}
+
 /* Given a list of candidates for overloading, find the best one, if any.
    This algorithm has a worst case of O(2n) (winner is last), and a best
    case of O(n/2) (totally ambiguous); much better than a sorting
@@ -8508,6 +8615,17 @@ tourney (struct z_candidate *candidates, tsubst_fl
 	return NULL;
     }
 
+  /* For multiversioned functions, aggregate all the versions here for
+     generating the dispatcher body later if necessary.  Check to see if
+     the dispatcher is already generated to avoid doing this more than
+     once.  */
+
+  if (TREE_CODE (champ->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (champ->fn)
+      && (cgraph_get_node (champ->fn) == NULL
+	  || cgraph_get_node (champ->fn)->version_dispatcher_decl == NULL))
+      generate_function_versions_dispatcher (champ->fn, candidates);
+
   return champ;
 }
 
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 190493)
+++ gcc/config/i386/i386.c	(working copy)
@@ -62,6 +62,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "diagnostic.h"
 #include "dumpfile.h"
+#include "tree-pass.h"
+#include "tree-flow.h"
 
 enum upper_128bits_state
 {
@@ -28030,6 +28032,984 @@ ix86_init_mmx_sse_builtins (void)
     }
 }
 
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
+   to return a pointer to VERSION_DECL if the outcome of the expression
+   formed by PREDICATE_CHAIN is true.  This function will be called during
+   version dispatch to decide which function version to execute.  It returns
+   the basic block at the end, to which more conditions can be added.  */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     tree predicate_chain, basic_block new_bb)
+{
+  gimple return_stmt;
+  tree convert_expr, result_var;
+  gimple convert_stmt;
+  gimple call_cond_stmt;
+  gimple if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+
+  tree cond_var, and_expr_var = NULL_TREE;
+  gimple_seq gseq;
+
+  tree old_current_function_decl;
+  tree predicate_decl, predicate_arg;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+  current_function_decl = function_decl;
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node, NULL);
+  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  return_stmt = gimple_build_return (result_var);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      current_function_decl = old_current_function_decl;
+      return new_bb;
+    }
+
+  while (predicate_chain != NULL)
+    {
+      cond_var = create_tmp_var (integer_type_node, NULL);
+      predicate_decl = TREE_PURPOSE (predicate_chain);
+      predicate_arg = TREE_VALUE (predicate_chain);
+      call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+      gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+      gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+      gimple_set_bb (call_cond_stmt, new_bb);
+      gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+      predicate_chain = TREE_CHAIN (predicate_chain);
+      
+      if (and_expr_var == NULL)
+        and_expr_var = cond_var;
+      else
+	{
+	  gimple assign_stmt;
+	  /* Use MIN_EXPR to check if any integer is zero?.
+	     and_expr_var = min_expr <cond_var, and_expr_var>  */
+	  assign_stmt = gimple_build_assign (and_expr_var,
+			  build2 (MIN_EXPR, integer_type_node,
+				  cond_var, and_expr_var));
+
+	  gimple_set_block (assign_stmt, DECL_INITIAL (function_decl));
+	  gimple_set_bb (assign_stmt, new_bb);
+	  gimple_seq_add_stmt (&gseq, assign_stmt);
+	}
+    }
+
+  if_else_stmt = gimple_build_cond (GT_EXPR, and_expr_var,
+	  		            integer_zero_node,
+				    NULL_TREE, NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR, 0);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  return bb3;
+}
+
+/* This parses the attribute arguments to target in DECL and determines
+   the right builtin to use to match the platform specification.
+   For now, only one target argument ("arch=" or "<-m>xxx") is allowed.
+   It returns the priority value for this version decl.  If PREDICATE_LIST
+   is not NULL, it stores the list of cpu features that need to be checked
+   before dispatching this function.  */
+
+static unsigned int
+get_builtin_code_for_version (tree decl, tree *predicate_list)
+{
+  tree attrs;
+  struct cl_target_option cur_target;
+  tree target_node;
+  struct cl_target_option *new_target;
+  const char *arg_str = NULL;
+  const char *attrs_str = NULL;
+  char *tok_str = NULL;
+  char *token;
+  unsigned int priority = 0;
+
+  /* Priority of i386 features, greater value is higher priority.   This is
+     used to decide the order in which function dispatch must happen.  For
+     instance, a version specialized for SSE4.2 should be checked for dispatch
+     before a version for SSE3, as SSE4.2 implies SSE3.  */
+  enum feature_priority
+  {
+    P_ZERO = 0,
+    P_MMX,
+    P_SSE,
+    P_SSE2,
+    P_SSE3,
+    P_SSSE3,
+    P_PROC_SSSE3,
+    P_SSE4_a,
+    P_PROC_SSE4_a,
+    P_SSE4_1,
+    P_SSE4_2,
+    P_PROC_SSE4_2,
+    P_POPCNT,
+    P_AVX,
+    P_AVX2,
+    P_FMA,
+    P_PROC_FMA
+  };
+
+  /* These are the target attribute strings for which a dispatcher is
+     available, from fold_builtin_cpu.  */
+
+  static struct _feature_list
+    {
+      const char *const name;
+      const enum feature_priority priority;
+    }
+  const feature_list[] =
+    {
+      {"mmx", P_MMX},
+      {"sse", P_SSE},
+      {"sse2", P_SSE2},
+      {"sse3", P_SSE3},
+      {"ssse3", P_SSSE3},
+      {"sse4.1", P_SSE4_1},
+      {"sse4.2", P_SSE4_2},
+      {"popcnt", P_POPCNT},
+      {"avx", P_AVX},
+      {"avx2", P_AVX2}
+    };
+
+
+  static unsigned int NUM_FEATURES
+    = sizeof (feature_list) / sizeof (struct _feature_list);
+
+  unsigned int i;
+
+  tree predicate_chain = NULL_TREE;
+  tree predicate_decl, predicate_arg;
+
+  attrs = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  gcc_assert (attrs != NULL);
+
+  attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+  gcc_assert (TREE_CODE (attrs) == STRING_CST);
+  attrs_str = TREE_STRING_POINTER (attrs);
+
+
+  /* Handle arch= if specified.  For priority, set it to be 1 more than
+     the best instruction set the processor can handle.  For instance, if
+     there is a version for atom and a version for ssse3 (the highest ISA
+     priority for atom), the atom version must be checked for dispatch
+     before the ssse3 version. */
+  if (strstr (attrs_str, "arch=") != NULL)
+    {
+      cl_target_option_save (&cur_target, &global_options);
+      target_node = ix86_valid_target_attribute_tree (attrs);
+    
+      gcc_assert (target_node);
+      new_target = TREE_TARGET_OPTION (target_node);
+      gcc_assert (new_target);
+      
+      if (new_target->arch_specified && new_target->arch > 0)
+	{
+	  switch (new_target->arch)
+	    {
+	    case PROCESSOR_CORE2_32:
+	    case PROCESSOR_CORE2_64:
+	      arg_str = "core2";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_COREI7_32:
+	    case PROCESSOR_COREI7_64:
+	      arg_str = "corei7";
+	      priority = P_PROC_SSE4_2;
+	      break;
+	    case PROCESSOR_ATOM:
+	      arg_str = "atom";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_AMDFAM10:
+	      arg_str = "amdfam10h";
+	      priority = P_PROC_SSE4_a;
+	      break;
+	    case PROCESSOR_BDVER1:
+	      arg_str = "bdver1";
+	      priority = P_PROC_FMA;
+	      break;
+	    case PROCESSOR_BDVER2:
+	      arg_str = "bdver2";
+	      priority = P_PROC_FMA;
+	      break;
+	    }  
+	}    
+    
+      cl_target_option_restore (&global_options, &cur_target);
+	
+      if (predicate_list && arg_str == NULL)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+	    	"No dispatcher found for the versioning attributes");
+	  return 0;
+	}
+    
+      if (predicate_list)
+	{
+          predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
+          /* For a C string literal the length includes the trailing NULL.  */
+          predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+          predicate_chain = tree_cons (predicate_decl, predicate_arg,
+				       predicate_chain);
+	}
+    }
+
+  /* Process feature name.  */
+  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
+  strcpy (tok_str, attrs_str);
+  token = strtok (tok_str, ",");
+  predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
+
+  while (token != NULL)
+    {
+      /* Do not process "arch="  */
+      if (strncmp (token, "arch=", 5) == 0)
+	{
+	  token = strtok (NULL, ",");
+	  continue;
+	}
+      for (i = 0; i < NUM_FEATURES; ++i)
+	{
+	  if (strcmp (token, feature_list[i].name) == 0)
+	    {
+	      if (predicate_list)
+		{
+		  predicate_arg = build_string_literal (
+				  strlen (feature_list[i].name) + 1,
+				  feature_list[i].name);
+		  predicate_chain = tree_cons (predicate_decl, predicate_arg,
+					       predicate_chain);
+		}
+	      /* Find the maximum priority feature.  */
+	      if (feature_list[i].priority > priority)
+		priority = feature_list[i].priority;
+
+	      break;
+	    }
+	}
+      if (predicate_list && i == NUM_FEATURES)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+		    "No dispatcher found for %s", token);
+	  return 0;
+	}
+      token = strtok (NULL, ",");
+    }
+  free (tok_str);
+
+  if (predicate_list && predicate_chain == NULL_TREE)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+	        "No dispatcher found for the versioning attributes : %s",
+	        attrs_str);
+      return 0;
+    }
+  else if (predicate_list)
+    {
+      predicate_chain = nreverse (predicate_chain);
+      *predicate_list = predicate_chain;
+    }
+
+  return priority; 
+}
+
+/* This compares the priority of target features in function DECL1
+   and DECL2.  It returns positive value if DECL1 is higher priority,
+   negative value if DECL2 is higher priority and 0 if they are the
+   same.  */
+
+static int
+ix86_compare_version_priority (tree decl1, tree decl2)
+{
+  unsigned int priority1 = 0;
+  unsigned int priority2 = 0;
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl1)) != NULL)
+    priority1 = get_builtin_code_for_version (decl1, NULL);
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl2)) != NULL)
+    priority2 = get_builtin_code_for_version (decl2, NULL);
+
+  return (int)priority1 - (int)priority2;
+}
+
+/* V1 and V2 point to function versions with different priorities
+   based on the target ISA.  This function compares their priorities.  */
+ 
+static int
+feature_compare (const void *v1, const void *v2)
+{
+  typedef struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    } function_version_info;
+
+  const function_version_info c1 = *(const function_version_info *)v1;
+  const function_version_info c2 = *(const function_version_info *)v2;
+  return (c2.dispatch_priority - c1.dispatch_priority);
+}
+
+/* This function generates the dispatch function for
+   multi-versioned functions.  DISPATCH_DECL is the function which will
+   contain the dispatch logic.  FNDECLS are the function choices for
+   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
+   in DISPATCH_DECL in which the dispatch code is generated.  */
+
+static int
+dispatch_function_versions (tree dispatch_decl,
+			    void *fndecls_p,
+			    basic_block *empty_bb)
+{
+  tree default_decl;
+  gimple ifunc_cpu_init_stmt;
+  gimple_seq gseq;
+  tree old_current_function_decl;
+  int ix;
+  tree ele;
+  VEC (tree, heap) *fndecls;
+  unsigned int num_versions = 0;
+  unsigned int actual_versions = 0;
+  unsigned int i;
+
+  struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    }*function_version_info;
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /*fndecls_p is actually a vector.  */
+  fndecls = (VEC (tree, heap) *)fndecls_p;
+
+  /* Atleast one more version other than the default.  */
+  num_versions = VEC_length (tree, fndecls);
+  gcc_assert (num_versions >= 2);
+
+  function_version_info = (struct _function_version_info *)
+    xmalloc ((num_versions - 1) * sizeof (struct _function_version_info));
+
+  /* The first version in the vector is the default decl.  */
+  default_decl = VEC_index (tree, fndecls, 0);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
+  current_function_decl = dispatch_decl;
+
+  gseq = bb_seq (*empty_bb);
+  /* Function version dispatch is via IFUNC.  IFUNC resolvers fire before
+     constructors, so explicity call __builtin_cpu_init here.  */
+  ifunc_cpu_init_stmt = gimple_build_call_vec (
+                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
+  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
+  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
+  set_bb_seq (*empty_bb, gseq);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+
+  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
+    {
+      tree version_decl = ele;
+      tree predicate_chain = NULL_TREE;
+      unsigned int priority;
+      /* Get attribute string, parse it and find the right predicate decl.
+         The predicate function could be a lengthy combination of many
+	 features, like arch-type and various isa-variants.  */
+      priority = get_builtin_code_for_version (version_decl,
+	 			               &predicate_chain);
+
+      if (predicate_chain == NULL_TREE)
+	continue;
+
+      actual_versions++;
+      function_version_info [ix - 1].version_decl = version_decl;
+      function_version_info [ix - 1].predicate_chain = predicate_chain;
+      function_version_info [ix - 1].dispatch_priority = priority;
+    }
+
+  /* Sort the versions according to descending order of dispatch priority.  The
+     priority is based on the ISA.  This is not a perfect solution.  There
+     could still be ambiguity.  If more than one function version is suitable
+     to execute,  which one should be dispatched?  In future, allow the user
+     to specify a dispatch  priority next to the version.  */
+  qsort (function_version_info, actual_versions,
+         sizeof (struct _function_version_info), feature_compare);
+
+  for  (i = 0; i < actual_versions; ++i)
+    *empty_bb = add_condition_to_bb (dispatch_decl,
+				     function_version_info[i].version_decl,
+				     function_version_info[i].predicate_chain,
+				     *empty_bb);
+
+  /* dispatch default version at the end.  */
+  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl,
+				   NULL, *empty_bb);
+
+  free (function_version_info);
+  return 0;
+}
+
+/* This function returns true if fn1 and fn2 are versions of the same function.
+   Returns false if only one of the function decls has the target attribute
+   set or if the targets of the function decls are different.  This assumes
+   the fn1 and fn2 have the same signature.  */
+
+static bool
+ix86_function_versions (tree fn1, tree fn2)
+{
+  tree attr1, attr2;
+  struct cl_target_option *target1, *target2;
+
+  if (TREE_CODE (fn1) != FUNCTION_DECL
+      || TREE_CODE (fn2) != FUNCTION_DECL)
+    return false;
+
+  attr1 = DECL_FUNCTION_SPECIFIC_TARGET (fn1);
+  attr2 = DECL_FUNCTION_SPECIFIC_TARGET (fn2);
+
+  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
+    return false;
+
+  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
+      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
+    return true;
+
+  target1 = TREE_TARGET_OPTION (attr1);
+  target2 = TREE_TARGET_OPTION (attr2);
+
+  if (target1->x_ix86_isa_flags == target2->x_ix86_isa_flags
+      && target1->x_target_flags == target2->x_target_flags
+      && target1->arch == target2->arch
+      && target1->tune == target2->tune
+      && target1->x_ix86_fpmath == target2->x_ix86_fpmath
+      && target1->branch_cost == target2->branch_cost)
+    return false;
+
+  return true;
+}
+
+/* Comparator function to be used in qsort routine to sort attribute
+   specification strings to "target".  */
+
+static int
+attr_strcmp (const void *v1, const void *v2)
+{
+  const char *c1 = *(char *const*)v1;
+  const char *c2 = *(char *const*)v2;
+  return strcmp (c1, c2);
+}
+
+/* STR is the argument to target attribute.  This function tokenizes
+   the comma separated arguments, sorts them and returns a string which
+   is a unique identifier for the comma separated arguments.   It also
+   replaces non-identifier characters "=,-" with "_".  */
+
+static char *
+sorted_attr_string (const char *str)
+{
+  char **args = NULL;
+  char *attr_str, *ret_str;
+  char *attr = NULL;
+  unsigned int argnum = 1;
+  unsigned int i;
+
+  for (i = 0; i < strlen (str); i++)
+    if (str[i] == ',')
+      argnum++;
+
+  attr_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (attr_str, str);
+
+  /* Replace "=,-" with "_".  */
+  for (i = 0; i < strlen (attr_str); i++)
+    if (attr_str[i] == '=' || attr_str[i]== '-')
+      attr_str[i] = '_';
+
+  if (argnum == 1)
+    return attr_str;
+
+  args = (char **)xmalloc (argnum * sizeof (char *));
+
+  i = 0;
+  attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      args[i] = attr;
+      i++;
+      attr = strtok (NULL, ",");
+    }
+
+  qsort (args, argnum, sizeof (char*), attr_strcmp);
+
+  ret_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (ret_str, args[0]);
+  for (i = 1; i < argnum; i++)
+    {
+      strcat (ret_str, "_");
+      strcat (ret_str, args[i]);
+    }
+
+  free (args);
+  free (attr_str);
+  return ret_str;
+}
+
+/* This function changes the assembler name for functions that are
+   versions.  If DECL is a function version and has a "target"
+   attribute, it appends the attribute string to its assembler name.  */
+
+static tree
+ix86_mangle_function_version_assembler_name (tree decl, tree id)
+{
+  tree version_attr;
+  const char *orig_name, *version_string, *attr_str;
+  char *assembler_name;
+
+  if (DECL_DECLARED_INLINE_P (decl)
+      && lookup_attribute ("gnu_inline",
+			   DECL_ATTRIBUTES (decl)))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Function versions cannot be marked as gnu_inline,"
+	      " bodies have to be generated\n");
+
+  if (DECL_VIRTUAL_P (decl)
+      || DECL_VINDEX (decl))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Virtual function versioning not supported\n");
+
+  version_attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+
+  /* target attribute string is NULL for default functions.  */
+  if (version_attr == NULL_TREE)
+    return id;
+
+  orig_name = IDENTIFIER_POINTER (id);
+  version_string
+    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
+
+  attr_str = sorted_attr_string (version_string);
+  assembler_name = (char *) xmalloc (strlen (orig_name)
+				     + strlen (attr_str) + 2);
+
+  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
+  if (dump_file)
+    fprintf (stderr, "Assembler name set to %s for function version %s\n",
+	     assembler_name, IDENTIFIER_POINTER (id));
+
+  /* Allow assembler name to be modified if already set.  */
+  if (DECL_ASSEMBLER_NAME_SET_P (decl))
+    SET_DECL_RTL (decl, NULL);
+
+  return get_identifier (assembler_name);
+}
+
+static tree 
+ix86_mangle_decl_assembler_name (tree decl, tree id)
+{
+  /* For function version, add the target suffix to the assembler name.  */
+  if (TREE_CODE (decl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (decl))
+    return ix86_mangle_function_version_assembler_name (decl, id);
+
+  return id;
+}
+
+/* Return a new name by appending SUFFIX to the DECL name.  If
+   make_unique is true, append the full path name.  */
+
+static char *
+make_name (tree decl, const char *suffix, bool make_unique)
+{
+  char *global_var_name;
+  int name_len;
+  const char *name;
+  const char *unique_name = NULL;
+
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* Get a unique name that can be used globally without any chances
+     of collision at link time.  */
+  if (make_unique)
+    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
+
+  name_len = strlen (name) + strlen (suffix) + 2;
+
+  if (make_unique)
+    name_len += strlen (unique_name) + 1;
+  global_var_name = (char *) xmalloc (name_len);
+
+  /* Use '.' to concatenate names as it is demangler friendly.  */
+  if (make_unique)
+      snprintf (global_var_name, name_len, "%s.%s.%s", name,
+		unique_name, suffix);
+  else
+      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
+
+  return global_var_name;
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher
+   by the front-end.  Return the decl created.  */
+
+static tree
+make_dispatcher_decl (const tree decl)
+{
+  tree func_decl;
+  char *func_name, *resolver_name;
+  tree fn_type, func_type;
+  bool is_uniq = false;
+
+  if (TREE_PUBLIC (decl) == 0)
+    is_uniq = true;
+
+  func_name = make_name (decl, "ifunc", is_uniq);
+  resolver_name = make_name (decl, "resolver", is_uniq);
+  gcc_assert (resolver_name);
+
+  fn_type = TREE_TYPE (decl);
+  func_type = build_function_type (TREE_TYPE (fn_type),
+				   TYPE_ARG_TYPES (fn_type));
+  
+  func_decl = build_fn_decl (func_name, func_type);
+  TREE_USED (func_decl) = 1;
+  DECL_CONTEXT (func_decl) = NULL_TREE;
+  DECL_INITIAL (func_decl) = error_mark_node;
+  DECL_ARTIFICIAL (func_decl) = 1;
+  /* Mark this func as external, the resolver will flip it again if
+     it gets generated.  */
+  DECL_EXTERNAL (func_decl) = 1;
+  /* This will be of type IFUNCs have to be externally visible.  */
+  TREE_PUBLIC (func_decl) = 1;
+
+  return func_decl;  
+}
+
+/* Returns true if decl is multi-versioned and DECL is the default function,
+   that is it is not tagged with target specific optimization.  */
+
+static bool
+is_function_default_version (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL_TREE);
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher
+   by the front-end.  It also chains the cgraph nodes of all the
+   semantically identical versions in vector FN_VER_VEC_P.  Returns the
+   decl of the dispatcher function.  */
+
+static tree
+ix86_get_function_versions_dispatcher (void *fn_ver_vec_p)
+{
+  struct cgraph_node *node = NULL;
+  struct cgraph_node *default_node = NULL;
+  struct cgraph_node *curr_node = NULL;
+  int ix;
+  tree ele;
+  tree dispatch_decl = NULL;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+
+  fn_ver_vec = (VEC (tree,heap) *) fn_ver_vec_p;
+  gcc_assert (fn_ver_vec != NULL);
+
+  /* Find the default version.  */
+  for (ix = 0; VEC_iterate (tree, fn_ver_vec, ix, ele); ++ix)
+    {
+      if (is_function_default_version (ele))
+	{
+	  default_node = cgraph_get_create_node (ele);
+	  break;
+	}
+    }
+
+  /* If there is no default node, just return NULL.  */
+  if (!default_node)
+    return NULL;
+
+  if (default_node->version_dispatcher_decl)
+    return default_node->version_dispatcher_decl;
+
+#if defined (ASM_OUTPUT_TYPE_DIRECTIVE) && HAVE_GNU_INDIRECT_FUNCTION
+  /* Right now, the dispatching is done via ifunc.  */
+  dispatch_decl = make_dispatcher_decl (default_node->symbol.decl); 
+#else
+  error_at (DECL_SOURCE_LOCATION (default_node->symbol.decl),
+	    "Multiversioning needs ifunc which is not supported "
+	    "in this configuration");
+#endif
+  default_node->version_dispatcher_decl = dispatch_decl;
+  curr_node = cgraph_get_create_node (dispatch_decl);
+  gcc_assert (curr_node);
+  curr_node->dispatcher_function = 1;
+  cgraph_mark_address_taken_node (default_node);
+
+  for (ix = 0; VEC_iterate (tree, fn_ver_vec, ix, ele); ++ix)
+    {
+      node = cgraph_get_create_node (ele);
+      gcc_assert (node != NULL && DECL_FUNCTION_VERSIONED (ele));
+      if (node == default_node)
+	continue;
+      gcc_assert (DECL_FUNCTION_SPECIFIC_TARGET (ele) != NULL_TREE);
+
+      if (curr_node->next_function_version)
+ 	{
+	  node->next_function_version = curr_node->next_function_version;
+	  curr_node->next_function_version->prev_function_version = node;
+	}
+      curr_node->next_function_version = node;
+      node->prev_function_version = curr_node;
+      node->version_dispatcher_decl = dispatch_decl;
+    }
+
+  /* The default version should be the first node.  */
+  default_node->next_function_version = curr_node->next_function_version;
+  curr_node->next_function_version->prev_function_version = default_node;
+  curr_node->next_function_version = default_node;
+  
+  return dispatch_decl;
+}
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* Make the resolver function decl to dispatch the versions of
+   a multi-versioned function,  DEFAULT_DECL.  Create an
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_resolver_func (const tree default_decl,
+		    const tree dispatch_decl,
+		    basic_block *empty_bb)
+{
+  char *resolver_name;
+  tree decl, type, decl_name, t;
+  basic_block new_bb;
+  tree old_current_function_decl;
+  bool is_uniq = false;
+
+  /* IFUNC's have to be globally visible.  So, if the default_decl is
+     not, then the name of the IFUNC should be made unique.  */
+  if (TREE_PUBLIC (default_decl) == 0)
+    is_uniq = true;
+
+  /* Append the filename to the resolver function if the versions are
+     not externally visible.  This is because the resolver function has
+     to be externally visible for the loader to find it.  So, appending
+     the filename will prevent conflicts with a resolver function from
+     another module which is based on the same version name.  */
+  resolver_name = make_name (default_decl, "resolver", is_uniq);
+
+  /* The resolver function should return a (void *). */
+  type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  decl_name = get_identifier (resolver_name);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  /* IFUNC resolvers have to be externally visible.  */
+  TREE_PUBLIC (decl) = 1;
+  DECL_UNINLINABLE (decl) = 1;
+
+  DECL_EXTERNAL (decl) = 0;
+  DECL_EXTERNAL (dispatch_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+  TREE_READONLY (decl) = 0;
+  DECL_PURE_P (decl) = 0;
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
+      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
+    }
+  else if (TREE_PUBLIC (default_decl))
+    {
+      /* In this case, each translation unit with a call to this
+	 versioned function will put out a resolver.  Ensure it
+	 is comdat to keep just one copy.  */
+      DECL_COMDAT (decl) = 1;
+      make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+    }
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  current_function_decl = decl;
+  gimple_register_cfg_hooks ();
+  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
+  cfun->curr_properties |=
+    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_ssa
+     | PROP_gimple_any);
+  cfun->curr_properties = 15;
+  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
+  *empty_bb = new_bb;
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      gcc_assert (cgraph_get_node (default_decl));
+      symtab_add_to_same_comdat_group (
+	(symtab_node) cgraph_get_node (decl),
+	(symtab_node) cgraph_get_node (default_decl));
+    }
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  gcc_assert (dispatch_decl != NULL);
+  /* Mark dispatch_decl as "ifunc" with resolver as resolver_name.  */
+  DECL_ATTRIBUTES (dispatch_decl) 
+    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));
+
+  /* Create the alias for dispatch to resolver here.  */
+  cgraph_create_function_alias (dispatch_decl, decl);
+  return decl;
+}
+
+/* Generate the dispatching code body to dispatch multi-versioned function
+   DECL.  The target hook is called to process the "target" attributes and
+   provide the code to dispatch the right function at run-time.  NODE points
+   to the dispatcher decl whose body will be created.  */
+
+static tree 
+ix86_generate_version_dispatcher_body (void *node_p)
+{
+  tree resolver_decl;
+  basic_block empty_bb;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  tree old_current_function_decl;
+  tree default_ver_decl;
+  struct cgraph_node *versn;
+  struct cgraph_node *node;
+
+  node = (cgraph_node *)node_p;
+
+  gcc_assert (node->dispatcher_function);
+
+  if (node->version_dispatcher_decl)
+    return node->version_dispatcher_decl;
+
+  default_ver_decl = node->next_function_version->symbol.decl;
+  resolver_decl = make_resolver_func (default_ver_decl,
+				      node->symbol.decl, &empty_bb);
+  node->version_dispatcher_decl = resolver_decl;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (resolver_decl));
+  current_function_decl = resolver_decl;
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+
+  for (versn = node->next_function_version; versn;
+       versn = versn->next_function_version)
+    {
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (versn->symbol.decl))
+        error_at (DECL_SOURCE_LOCATION (versn->symbol.decl),
+		  "Virtual function multiversioning not supported");
+      VEC_safe_push (tree, heap, fn_ver_vec, versn->symbol.decl);
+    }
+
+  dispatch_function_versions(resolver_decl, fn_ver_vec, &empty_bb);
+
+  rebuild_cgraph_edges (); 
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+  return resolver_decl;
+}
+
 /* This builds the processor_model struct type defined in
    libgcc/config/i386/cpuinfo.c  */
 
@@ -40559,6 +41539,9 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_PROFILE_BEFORE_PROLOGUE
 #define TARGET_PROFILE_BEFORE_PROLOGUE ix86_profile_before_prologue
 
+#undef TARGET_MANGLE_DECL_ASSEMBLER_NAME
+#define TARGET_MANGLE_DECL_ASSEMBLER_NAME ix86_mangle_decl_assembler_name
+
 #undef TARGET_ASM_UNALIGNED_HI_OP
 #define TARGET_ASM_UNALIGNED_HI_OP TARGET_ASM_ALIGNED_HI_OP
 #undef TARGET_ASM_UNALIGNED_SI_OP
@@ -40652,6 +41635,17 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN ix86_fold_builtin
 
+#undef TARGET_COMPARE_VERSION_PRIORITY
+#define TARGET_COMPARE_VERSION_PRIORITY ix86_compare_version_priority
+
+#undef TARGET_GENERATE_VERSION_DISPATCHER_BODY
+#define TARGET_GENERATE_VERSION_DISPATCHER_BODY \
+  ix86_generate_version_dispatcher_body
+
+#undef TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+#define TARGET_GET_FUNCTION_VERSIONS_DISPATCHER \
+  ix86_get_function_versions_dispatcher
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
 
@@ -40789,6 +41783,9 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_OPTION_PRINT
 #define TARGET_OPTION_PRINT ix86_function_specific_print
 
+#undef TARGET_OPTION_FUNCTION_VERSIONS
+#define TARGET_OPTION_FUNCTION_VERSIONS ix86_function_versions
+
 #undef TARGET_CAN_INLINE_P
 #define TARGET_CAN_INLINE_P ix86_can_inline_p
 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-10-11  0:13                                                                       ` Sriraman Tallam
@ 2012-10-12 22:41                                                                         ` Sriraman Tallam
  2012-10-19 15:23                                                                           ` Diego Novillo
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-10-12 22:41 UTC (permalink / raw)
  To: Jason Merrill, Jan Hubicka
  Cc: Xinliang David Li, mark, nathan, H.J. Lu, Richard Guenther,
	Uros Bizjak, reply, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 5643 bytes --]

Hi Jason,

   I have attached the latest patch with more cleanups. Please let me
know what you think.

   Honza, can you please review the cgraph part?

Thanks,
-Sri.

On Wed, Oct 10, 2012 at 4:45 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi Jason,
>
>    I have addressed all your comments and attached the new patch.
>
> On Fri, Oct 5, 2012 at 11:32 AM, Jason Merrill <jason@redhat.com> wrote:
>> On 08/24/2012 08:34 PM, Sriraman Tallam wrote:
>>>
>>> +         /* For function versions, their parms and types match
>>> +            but they are not duplicates.  Record function versions
>>> +            as and when they are found.  */
>>> +         if (TREE_CODE (fn) == FUNCTION_DECL
>>> +             && TREE_CODE (method) == FUNCTION_DECL
>>> +             && (DECL_FUNCTION_SPECIFIC_TARGET (fn)
>>> +                 || DECL_FUNCTION_SPECIFIC_TARGET (method))
>>> +             && targetm.target_option.function_versions (fn, method))
>>> +           {
>>> +             targetm.set_version_assembler_name (fn);
>>> +             targetm.set_version_assembler_name (method);
>>> +             continue;
>>> +           }
>>
>>
>> This seems like an odd place to be setting assembler names; better to just
>> have the existing mangle_decl_assembler_name hook add the appropriate suffix
>> when it's called normally.
>
> I moved this to mangle_decl_assembler_name. Still,  functions may go
> from not being a version to then becoming versions after a new
> definition is detected. In such cases, I explicitly call mangle_decl
> to modify the assembler name.
>
>>
>>
>>> +        Also, mark this function as needed if it is marked inline but
>>> +        is a multi-versioned function.  */
>>
>>
>> Why?  If it's used, it should be marked needed though the normal process.
>
> How do I do this? If a versioned function is marked inline, I need to
> keep it but it has no explicit callers. How do I mark that it is
> needed?
>
>>
>>> +           error_at (location_of (DECL_NAME (OVL_CURRENT (fn))),
>>> +                     "Call to multiversioned function %<%D(%A)%> with"
>>> +                     " no default version", DECL_NAME (OVL_CURRENT (fn)),
>>> +                     build_tree_list_vec (*args));
>>
>>
>> location_of just returns input_location if you ask it for the location of an
>> identifier, so you might as well use error with no explicit location.  And
>> why not print candidates->fn instead of pasting the name/args?  Also,
>> lowercase "call".
>
> I removed this since the check already happens elsewhere.
>
>>
>>> +    {
>>> +      tree dispatcher_decl = NULL;
>>> +      struct cgraph_node *node = cgraph_get_node (fn);
>>> +      if (node != NULL)
>>> +        dispatcher_decl = cgraph_get_node (fn)->version_dispatcher_decl;
>>> +      if (dispatcher_decl == NULL)
>>> +       {
>>> +         error_at (input_location, "Call to multiversioned function"
>>> +                   " without a default is not allowed");
>>> +         return NULL;
>>> +       }
>>> +      retrofit_lang_decl (dispatcher_decl);
>>> +      gcc_assert (dispatcher_decl != NULL);
>>> +      fn = dispatcher_decl;
>>> +    }
>>
>>
>> Let's move this logic into a separate function that returns the dispatcher
>> function.
>
> Done.
>
>>
>>> +      /* Both functions must be marked versioned.  */
>>> +      gcc_assert (DECL_FUNCTION_VERSIONED (cand1->fn)
>>> +                 && DECL_FUNCTION_VERSIONED (cand2->fn));
>>
>>
>> Why can't you compare a versioned function and a non-versioned one?
>
> Right, there was a big bug in my code. I have changed this now. This
> should address your question.
>
>>
>> The code in joust should go further down in the function, before the
>> handling of two declarations of the same function.
>
> Done.
>
>>
>>> +  /* For multiversioned functions, aggregate all the versions here for
>>> +     generating the dispatcher body later if necessary.  */
>>> +
>>> +  if (TREE_CODE (candidates->fn) == FUNCTION_DECL
>>> +      && DECL_FUNCTION_VERSIONED (candidates->fn))
>>> +    {
>>>
>>> +      VEC (tree, heap) *fn_ver_vec = NULL;
>>> +      struct z_candidate *ver = candidates;
>>>
>>> +      fn_ver_vec = VEC_alloc (tree, heap, 2);
>>> +      for (;ver; ver = ver->next)
>>> +        VEC_safe_push (tree, heap, fn_ver_vec, ver->fn);
>>> +      gcc_assert (targetm.get_function_versions_dispatcher);
>>> +      targetm.get_function_versions_dispatcher (fn_ver_vec);
>>> +      VEC_free (tree, heap, fn_ver_vec);
>>> +    }
>>
>>
>> This seems to assume that all the functions in the list of candidates are
>> versioned, but there might be unrelated functions from different namespaces.
>> Also, doing this every time someone calls a versioned function seems like
>> the wrong place; I would think it would be better to build up a list of
>> versions as you seed declarations, and then use that list to define the
>> dispatcher at EOF if it's needed.
>
>
> This was the bug I was referring to earlier. I have moved this to a
> separate function. I thought it is better to do this on demand. I have
> changed the code so that the aggregation and dispatcher generation
> happens exactly once.
>
>
>>
>>> +      if (TREE_CODE (decl) == FUNCTION_DECL
>>> +          && DECL_FUNCTION_VERSIONED (decl)
>>> +         && DECL_ASSEMBLER_NAME_SET_P (decl))
>>> +       write_source_name (DECL_ASSEMBLER_NAME (decl));
>>> +      else
>>> +       write_source_name (DECL_NAME (decl));
>>
>>
>> Again, I think it's better to handle the suffix via
>> mangle_decl_assembler_name.
>
> Removed.
>
>
> Thanks for the comments. Please let me know what you think about the new patch.
>
> -Sri.
>
>>
>> Jason
>>

[-- Attachment #2: mv_fe_patch_10122012.txt --]
[-- Type: text/plain, Size: 76191 bytes --]

Overview of the patch which adds support to specify function versions.  This is
only enabled for target i386.

Example:

int foo ();  /* Default version */
int foo () __attribute__ ((target("avx,popcnt")));/*Specialized for avx and popcnt */
int foo () __attribute__ ((target("arch=core2,ssse3")));/*Specialized for core2 and ssse3*/

int main ()
{
 int (*p)() = &foo;
 return foo () + (*p)();
}

int foo ()
{
 return 0;
}

int __attribute__ ((target("avx,popcnt")))
foo ()
{
 return 0;
}

int __attribute__ ((target("arch=core2,ssse3")))
foo ()
{
 return 0;
}

The above example has foo defined 3 times, but all 3 definitions of foo are
different versions of the same function. The call to foo in main, directly and
via a pointer, are calls to the multi-versioned function foo which is dispatched
to the right foo at run-time.

Front-end changes:

The front-end changes are calls at appropriate places to target hooks that
determine the following:

* Determine if two function decls with the same signature are versions.
* Determine the assembler name of a function version.
* Generate the dispatcher function for a set of function versions.
* Compare versions to see if one has a higher priority over the other.

All the implementation happens in the target-specific config/i386/i386.c.

What does the patch do?

* Tracking decls that correspond to function versions of function
name, say "foo":

When the front-end sees more than one decl for "foo", it calls a target hook to
determine if they are versions. To prevent duplicate definition errors with other
 versions of "foo", "decls_match" function in cp/decl.c is made to return false
 when 2 decls have are deemed versions by the target. This will make all function
versions of "foo" to be added to the overload list of "foo".

* Change the assembler names of the function versions.

For i386, the target changes the assembler names of the function versions by
 suffixing the sorted list of args to "target" to the function name of "foo". For
example, he assembler name of "void foo () __attribute__ ((target ("sse4")))" will
become _Z3foov.sse4.  The target hook mangle_decl_assembler_name is used for this.

* Overload resolution:

 Function "build_over_call" in cp/call.c sees a call to function
"foo", which is multi-versioned. The overload resolution happens in
function "joust" in "cp/call.c". Here, the call to "foo" has all
possible versions of "foo" as candidates. All the candidates of "foo" are
stored in the cgraph data structures. Each version of foo is chained in a 
doubly-linked list with the default function as the first element.  This allows
any pass to access all the semantically identical versions. A call to a
multi-versioned function will be replaced by a call to a dispatcher function,
determined by a target hook, to execute the right function version at run-time.

Optimization to directly call a version when possible:
Also, in joust, where overload resolution happens, a multiversioned function
resolution is made to return the most specialized version.  This is the version
that will be checked for dispatching first and is determined by the target.
Now, if the caller can inline this function version then a direct call is made
to this function version rather than go through the dispatcher. When a direct
call cannot be made, a call to the dispatcher function is created.

* Creating the dispatcher body.

The dispatcher body, called the resolver is made only when there is a call to a
multiversioned function dispatcher or the address of a function is taken. This
is generated during build_cgraph_edges for a call or cgraph_mark_address_taken
for a pointer reference. This is done by another target hook.

* Dispatch ordering.

The order in which the function versions are checked during dispatch is based
on a priority value assigned for the ISA that is catered. More specialized 
versions are checked for dispatching first.  This is to mitigate the ambiguity
that can arise when more than one function version is valid for execution on
a particular platform.  This is not a perfect solution and in future, the user
should be allowed to assign a dispatching priority value to each version.

	* doc/tm.texi.in (TARGET_OPTION_FUNCTION_VERSIONS): New hook description.
	* (TARGET_COMPARE_VERSION_PRIORITY): New hook description.
	* (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New hook description.
	* (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New hook description.
	* doc/tm.texi: Regenerate.
	* cgraphbuild.c (build_cgraph_edges): Generate body of multiversion
	function dispatcher.
	* c-family/c-common.c (handle_target_attribute): Warn for invalid attributes.
	* target.def (compare_version_priority): New target hook.
	* (generate_version_dispatcher_body): New target hook.
	* (get_function_versions_dispatcher): New target hook.
	* (function_versions): New target hook.
	* cgraph.c (cgraph_mark_address_taken_node): Generate body of multiversion
	function dispatcher.
	* cgraph.h (cgraph_node): New members function_version,
	dispatcher_function.
	* tree.h (DECL_FUNCTION_VERSIONED): New macro.
	(tree_function_decl): New bit-field versioned_function.
	* testsuite/g++.dg/mv1.C: New test.
	* testsuite/g++.dg/mv2.C: New test.
	* testsuite/g++.dg/mv3.C: New test.
	* testsuite/g++.dg/mv4.C: New test.
	* cp/class.c:
	(add_method): Change assembler names of function versions.
	(resolve_address_of_overloaded_function): Save all function
	version candidates. Create dispatcher decl and return address of
	dispatcher instead.
	* cp/decl.c (decls_match): Make decls unmatched for versioned
	functions.
	(duplicate_decls): Remove ambiguity for versioned functions. 
	(cxx_comdat_group): Make comdat group of versioned functions be the
	same.
	* cp/error.c (dump_exception_spec): Dump assembler name for function
	versions.
	* cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
	functions that are also marked inline.
	* cp/decl2.c:(check_classfn): Check attributes of versioned functions
	for match.
	* cp/call.c: (build_new_function_call): Check if versioned functions
	have a default version.
	(build_over_call): Make calls to multiversioned functions
	to call the dispatcher.
	(joust): For calls to multi-versioned functions, make the most
	specialized function version win.
	(tourney): Generate dispatcher decl for function versions.
	(get_function_version_dispatcher): New function.
	(generate_function_versions_dispatcher): New function.
	* Makefile.in: Add multiversion.o
	* config/i386/i386.c (add_condition_to_bb): New function.
	(get_builtin_code_for_version): New function.
	(ix86_compare_version_priority): New function.
	(feature_compare): New function.
	(dispatch_function_versions): New function.
	* (ix86_function_versions): New function.
	* (attr_strcmp): New function.
	* (sorted_attr_string): New function.
	* (ix86_mangle_function_version_assembler_name): New function.
	* (ix86_mangle_decl_assembler_name): New function.
	* (make_name): New function.
	* (make_dispatcher_decl): New function.
	* (is_function_default_version): New function.
	* (ix86_get_function_versions_dispatcher): New function.
	* (make_attribute): New function.
	* (make_resolver_func): New function.
	* (ix86_generate_version_dispatcher_body): New function.
	* (TARGET_COMPARE_VERSION_PRIORITY): New macro.
	* (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New macro.
	* (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New macro.
	* (TARGET_OPTION_FUNCTION_VERSIONS): New macro.
	* (TARGET_MANGLE_DECL_ASSEMBLER_NAME): New macro.

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 192378)
+++ gcc/doc/tm.texi	(working copy)
@@ -9913,6 +9913,11 @@ changed via the optimize attribute or pragma, see
 @code{TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE}
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_OPTION_FUNCTION_VERSIONS (tree @var{decl1}, tree @var{decl2})
+This target hook returns @code{true} if @var{FN1} and @var{FN2} are
+versions of the same function.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_CAN_INLINE_P (tree @var{caller}, tree @var{callee})
 This target hook returns @code{false} if the @var{caller} function
 cannot inline @var{callee}, based on target specific information.  By
@@ -10930,6 +10935,27 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_COMPARE_VERSION_PRIORITY (tree @var{decl1}, tree @var{decl2})
+This hook is used to compare the target attributes in two functions to
+figure out which function's features get higher priority.  This is used
+during multi-versioning to figure out the order of dispatching. @var{decl1}
+and @var{decl2} are the two function decls that will be compared.
+@end deftypefn
+
+@deftypefn {Target Hook} tree TARGET_GET_FUNCTION_VERSIONS_DISPATCHER (void *@var{arglist})
+This hook is used to get the dispatcher function for a set of function
+versions.  The dispatcher function is called to invoke the rignt function
+version at run-time. @var{arglist} is the vector of function versions
+that should be considered for dispatch.
+@end deftypefn
+
+@deftypefn {Target Hook} tree TARGET_GENERATE_VERSION_DISPATCHER_BODY (void *@var{arg})
+This hook is used to generate the dispatcher logic to invoke the right
+function version at runtime for a given set of function versions.
+@var{arg} points to the callgraph node of the dispatcher function whose
+body must be generated.
+@end deftypefn
+
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 192378)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -9782,6 +9782,11 @@ changed via the optimize attribute or pragma, see
 @code{TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE}
 @end deftypefn
 
+@hook TARGET_OPTION_FUNCTION_VERSIONS
+This target hook returns @code{true} if @var{FN1} and @var{FN2} are
+versions of the same function.
+@end deftypefn
+
 @hook TARGET_CAN_INLINE_P
 This target hook returns @code{false} if the @var{caller} function
 cannot inline @var{callee}, based on target specific information.  By
@@ -10788,6 +10793,27 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@hook TARGET_COMPARE_VERSION_PRIORITY
+This hook is used to compare the target attributes in two functions to
+figure out which function's features get higher priority.  This is used
+during multi-versioning to figure out the order of dispatching. @var{decl1}
+and @var{decl2} are the two function decls that will be compared.
+@end deftypefn
+
+@hook TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+This hook is used to get the dispatcher function for a set of function
+versions.  The dispatcher function is called to invoke the rignt function
+version at run-time. @var{arglist} is the vector of function versions
+that should be considered for dispatch.
+@end deftypefn
+
+@hook TARGET_GENERATE_VERSION_DISPATCHER_BODY
+This hook is used to generate the dispatcher logic to invoke the right
+function version at runtime for a given set of function versions.
+@var{arg} points to the callgraph node of the dispatcher function whose
+body must be generated.
+@end deftypefn
+
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/cgraphbuild.c
===================================================================
--- gcc/cgraphbuild.c	(revision 192378)
+++ gcc/cgraphbuild.c	(working copy)
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-utils.h"
 #include "except.h"
 #include "ipa-inline.h"
+#include "target.h"
 
 /* Context of record_reference.  */
 struct record_reference_ctx
@@ -288,7 +289,6 @@ mark_store (gimple stmt, tree t, void *data)
      }
   return false;
 }
-
 /* Create cgraph edges for function calls.
    Also look for functions and variables having addresses taken.  */
 
@@ -317,8 +317,22 @@ build_cgraph_edges (void)
 							 bb);
 	      decl = gimple_call_fndecl (stmt);
 	      if (decl)
-		cgraph_create_edge (node, cgraph_get_create_node (decl),
-				    stmt, bb->count, freq);
+		{
+		  struct cgraph_node *callee = cgraph_get_create_node (decl);
+	          /* If a call to a multiversioned function dispatcher is
+		     found, generate the body to dispatch the right function
+		     at run-time.  */
+		  if (callee->dispatcher_function)
+		    {
+		      tree resolver_decl;
+		      gcc_assert (callee->function_version.next);
+		      gcc_assert (targetm.generate_version_dispatcher_body);
+		      resolver_decl
+			 = targetm.generate_version_dispatcher_body (callee);
+		      gcc_assert (resolver_decl != NULL_TREE);
+		    }
+		  cgraph_create_edge (node, callee, stmt, bb->count, freq);
+	        }
 	      else
 		cgraph_create_indirect_edge (node, stmt,
 					     gimple_call_flags (stmt),
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 192378)
+++ gcc/c-family/c-common.c	(working copy)
@@ -8601,9 +8601,22 @@ handle_target_attribute (tree *node, tree name, tr
       warning (OPT_Wattributes, "%qE attribute ignored", name);
       *no_add_attrs = true;
     }
-  else if (! targetm.target_option.valid_attribute_p (*node, name, args,
-						      flags))
-    *no_add_attrs = true;
+  else
+    {
+      /* When a target attribute is invalid, it may also be because the
+	 target for the compilation unit and the attribute match.  For
+         instance, target attribute "xxx" is invalid when -mxxx is used.
+         When used with multiversioning, removing the attribute will lead
+         to duplicate definitions if a default version is provided.
+	 So, generate a warning here and remove the attribute.  */
+      if (!targetm.target_option.valid_attribute_p (*node, name, args, flags))
+	{
+	  warning (OPT_Wattributes,
+		   "Invalid target attribute in function %qE, ignored.",
+		   *node);
+	  *no_add_attrs = true;
+	}
+    }
 
   return NULL_TREE;
 }
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 192378)
+++ gcc/target.def	(working copy)
@@ -1298,6 +1298,31 @@ DEFHOOK
  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
  hook_tree_tree_int_treep_bool_null)
 
+/* Target hook to compare the priority of target features in function DECL1
+   and DECL2.  It returns positive value if DECL1 is higher priority,
+   negative value if DECL2 is higher priority and 0 if they are the
+   same.  */
+DEFHOOK
+(compare_version_priority,
+ "",
+ int, (tree decl1, tree decl2), NULL)
+
+/* Target hook to generate the dispatcher body for a function version
+   dispatcher ARG, which is a cgraph_node pointer.  */
+
+DEFHOOK
+(generate_version_dispatcher_body,
+ "",
+ tree, (void *arg), NULL) 
+
+/* Target hook to generate a function version dispatcher DECL for the list
+   of function versions in arglist, which is a vector of decls.  */
+
+DEFHOOK
+(get_function_versions_dispatcher,
+ "",
+ tree, (void *arglist), NULL)
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 DEFHOOK
@@ -2725,6 +2750,14 @@ DEFHOOK
  void, (void),
  hook_void_void)
 
+/* Returns true if DECL1 and DECL2 are versions of the same function.  */
+
+DEFHOOK
+(function_versions,
+ "",
+ bool, (tree decl1, tree decl2),
+ hook_bool_tree_tree_false)
+
 /* Function to determine if one function can inline another function.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_"
Index: gcc/cgraph.c
===================================================================
--- gcc/cgraph.c	(revision 192378)
+++ gcc/cgraph.c	(working copy)
@@ -1277,6 +1277,15 @@ cgraph_mark_address_taken_node (struct cgraph_node
   node->symbol.address_taken = 1;
   node = cgraph_function_or_thunk_node (node, NULL);
   node->symbol.address_taken = 1;
+  /* If the address of a multiversioned function dispatcher is taken,
+     generate the body to dispatch the right function at run-time.  This
+     is needed as the address can be used to do an indirect call.  */
+  if (node->dispatcher_function)
+    {
+      gcc_assert (node->function_version.next);
+      gcc_assert (targetm.generate_version_dispatcher_body);
+      targetm.generate_version_dispatcher_body (node);
+    }
 }
 
 /* Return local info for the compiled function.  */
Index: gcc/cgraph.h
===================================================================
--- gcc/cgraph.h	(revision 192378)
+++ gcc/cgraph.h	(working copy)
@@ -228,6 +228,26 @@ struct GTY(()) cgraph_node {
   struct cgraph_node *prev_sibling_clone;
   struct cgraph_node *clones;
   struct cgraph_node *clone_of;
+
+  /* Function Multiversioning info.  */
+  struct {
+    /* Chains all the semantically identical function versions.  The
+       first function in this chain is the default function.  */
+    struct cgraph_node *prev;
+    /* If this node is a dispatcher for function versions, this points
+       to the default function version, the first function in the chain.  */
+    struct cgraph_node *next;
+    /* If this node corresponds to a function version, this points
+       to the dispatcher function decl, which is the function that must
+       be called to execute the right function version at run-time.
+
+       If this node is a dispatcher (if dispatcher_function is true)
+       for function versions, this points to resolver function, which
+       holds the function body of the dispatcher.  The dispatcher decl
+       is an alias to the resolver function decl.  */
+    tree dispatcher_resolver;
+  } GTY((skip(""))) function_version;
+  
   /* For functions with many calls sites it holds map from call expression
      to the edge to speed up cgraph_edge function.  */
   htab_t GTY((param_is (struct cgraph_edge))) call_site_hash;
@@ -279,6 +299,8 @@ struct GTY(()) cgraph_node {
   /* ?? We should be able to remove this.  We have enough bits in
      cgraph to calculate it.  */
   unsigned tm_clone : 1;
+  /* True if this decl a dispatcher for function versions.  */
+  unsigned dispatcher_function : 1;
 };
 
 DEF_VEC_P(symtab_node);
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 192378)
+++ gcc/tree.h	(working copy)
@@ -3472,6 +3472,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
 #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
 
+/* In FUNCTION_DECL, this is set if this function has other versions generated
+   using "target" attributes.  The default version is the one which does not
+   have any "target" attribute set. */
+#define DECL_FUNCTION_VERSIONED(NODE)\
+   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
+
 /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
    arguments/result/saved_tree fields by front ends.   It was either inherit
    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
@@ -3516,8 +3522,8 @@ struct GTY(()) tree_function_decl {
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
-
-  /* 1 bit left */
+  unsigned versioned_function : 1;
+  /* No bits left.  */
 };
 
 /* The source language of the translation-unit.  */
Index: gcc/testsuite/g++.dg/mv2.C
===================================================================
--- gcc/testsuite/g++.dg/mv2.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv2.C	(revision 0)
@@ -0,0 +1,119 @@
+/* Test case to check if Multiversioning chooses the correct
+   dispatching order when versions are for various ISAs.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -mno-sse -mno-mmx -mno-popcnt -mno-avx" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The dispatch checks should be in the exact reverse order of the
+   declarations below.  */
+int foo () __attribute__ ((target ("mmx")));
+int foo () __attribute__ ((target ("sse")));
+int foo () __attribute__ ((target ("sse2")));
+int foo () __attribute__ ((target ("sse3")));
+int foo () __attribute__ ((target ("ssse3")));
+int foo () __attribute__ ((target ("sse4.1")));
+int foo () __attribute__ ((target ("sse4.2")));
+int foo () __attribute__ ((target ("popcnt")));
+int foo () __attribute__ ((target ("avx")));
+int foo () __attribute__ ((target ("avx2")));
+
+int main ()
+{
+
+  int val = foo ();
+
+  if (__builtin_cpu_supports ("avx2"))
+    assert (val == 1);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("popcnt"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("sse4.2"))
+    assert (val == 4);
+  else if (__builtin_cpu_supports ("sse4.1"))
+    assert (val == 5);
+  else if (__builtin_cpu_supports ("ssse3"))
+    assert (val == 6);
+  else if (__builtin_cpu_supports ("sse3"))
+    assert (val == 7);
+  else if (__builtin_cpu_supports ("sse2"))
+    assert (val == 8);
+  else if (__builtin_cpu_supports ("sse"))
+    assert (val == 9);
+  else if (__builtin_cpu_supports ("mmx"))
+    assert (val == 10);
+  else
+    assert (val == 0);
+
+  return 0;
+}
+
+int
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 2;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return 1;
+}
Index: gcc/testsuite/g++.dg/mv4.C
===================================================================
--- gcc/testsuite/g++.dg/mv4.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv4.C	(revision 0)
@@ -0,0 +1,23 @@
+/* Test case to check if the compiler generates an error message
+   when the default version of a multiversioned function is absent
+   and its pointer is taken.  */
+
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -mno-sse -mno-popcnt" } */
+
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int main ()
+{
+  int (*p)() = &foo; /* { dg-error "Pointer to a multiversioned function without a default is not allowed" {} } */
+  return (*p)();
+}
Index: gcc/testsuite/g++.dg/mv1.C
===================================================================
--- gcc/testsuite/g++.dg/mv1.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv1.C	(revision 0)
@@ -0,0 +1,130 @@
+/* Test case to check if Multiversioning works.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -fPIC -mno-avx -mno-popcnt" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The other versions of foo.  Mix up the ordering and 
+   check if the dispatching does it in the order of priority. */
+/* Check combination of target attributes.  */
+int foo () __attribute__ ((target("arch=corei7,popcnt")));
+/* The target operands in this declaration and the definition are re-ordered.
+   This should still work.  */
+int foo () __attribute__ ((target("ssse3,avx2")));
+
+/* Check for all target attributes for which dispatchers are available.  */
+/* Check arch= */
+int foo () __attribute__((target("arch=core2")));
+int foo () __attribute__((target("arch=corei7")));
+int foo () __attribute__((target("arch=atom")));
+/* Check ISAs  */
+int foo () __attribute__((target("avx")));
+int foo () __attribute__ ((target("arch=core2,sse4.2")));
+/* Check more arch=.  */
+int foo () __attribute__((target("arch=amdfam10")));
+int foo () __attribute__((target("arch=bdver1")));
+int foo () __attribute__((target("arch=bdver2")));
+
+int (*p)() = &foo;
+int main ()
+{
+  int val = foo ();
+  assert (val ==  (*p)());
+
+  /* Check in the exact same order in which the dispatching
+     is expected to happen.  */
+  if (__builtin_cpu_is ("bdver1"))
+    assert (val == 1);
+  else if (__builtin_cpu_is ("bdver2"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("avx2")
+	   && __builtin_cpu_supports ("ssse3"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 4);
+  else if (__builtin_cpu_is ("corei7")
+	   && __builtin_cpu_supports ("popcnt"))
+    assert (val == 5);
+  else if (__builtin_cpu_is ("corei7"))
+    assert (val == 6);
+  else if (__builtin_cpu_is ("amdfam10h"))
+    assert (val == 7);
+  else if (__builtin_cpu_is ("core2")
+	   && __builtin_cpu_supports ("sse4.2"))
+    assert (val == 8);
+  else if (__builtin_cpu_is ("core2"))
+    assert (val == 9);
+  else if (__builtin_cpu_is ("atom"))
+    assert (val == 10);
+  else
+    assert (val == 0);
+  
+  return 0;
+}
+
+int foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=corei7,popcnt")))
+foo ()
+{
+  return 5;
+}
+int __attribute__ ((target("avx2,ssse3")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("arch=core2")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("arch=corei7")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("arch=atom")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("arch=core2,sse4.2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("arch=amdfam10")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("arch=bdver1")))
+foo ()
+{
+  return 1;
+}
+
+int __attribute__ ((target("arch=bdver2")))
+foo ()
+{
+  return 2;
+}
Index: gcc/testsuite/g++.dg/mv3.C
===================================================================
--- gcc/testsuite/g++.dg/mv3.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv3.C	(revision 0)
@@ -0,0 +1,37 @@
+/* Test case to check if a call to a multiversioned function
+   is replaced with a direct call to the particular version when
+   the most specialized version's target attributes match the
+   caller.  
+  
+   In this program, foo is multiversioned but there is no default
+   function.  This is an error if the call has to go through a
+   dispatcher.  However, the call to foo in bar can be replaced
+   with a direct call to the popcnt version of foo.  Hence, this
+   test should pass.  */
+
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -mno-sse -mno-popcnt" } */
+
+
+/* Default version.  */
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target ("popcnt")))
+bar ()
+{
+  return foo ();
+}
+
+int main ()
+{
+  return bar ();
+}
Index: gcc/cp/class.c
===================================================================
--- gcc/cp/class.c	(revision 192378)
+++ gcc/cp/class.c	(working copy)
@@ -1087,7 +1087,32 @@ add_method (tree type, tree method, tree using_dec
 	      || same_type_p (TREE_TYPE (fn_type),
 			      TREE_TYPE (method_type))))
 	{
-	  if (using_decl)
+	  /* For function versions, their parms and types match
+	     but they are not duplicates.  Record function versions
+	     as and when they are found.  */
+	  if (TREE_CODE (fn) == FUNCTION_DECL
+	      && TREE_CODE (method) == FUNCTION_DECL
+	      && (DECL_FUNCTION_SPECIFIC_TARGET (fn)
+		  || DECL_FUNCTION_SPECIFIC_TARGET (method))
+	      && targetm.target_option.function_versions (fn, method))
+ 	    {
+	      /* Mark functions as versions if necessary.  Modify the mangled
+		 decl name if necessary.  */
+	      if (!DECL_FUNCTION_VERSIONED (fn))
+		{
+		  DECL_FUNCTION_VERSIONED (fn) = 1;
+		  if (DECL_ASSEMBLER_NAME_SET_P (fn))
+		    mangle_decl (fn);
+		}
+	      if (!DECL_FUNCTION_VERSIONED (method))
+		{
+		  DECL_FUNCTION_VERSIONED (method) = 1;
+		  if (DECL_ASSEMBLER_NAME_SET_P (method))
+		    mangle_decl (method);
+		}
+	      continue;
+	    }
+	  else if (using_decl)
 	    {
 	      if (DECL_CONTEXT (fn) == type)
 		/* Defer to the local function.  */
@@ -6915,6 +6940,7 @@ resolve_address_of_overloaded_function (tree targe
   tree matches = NULL_TREE;
   tree fn;
   tree target_fn_type;
+  VEC (tree, heap) *fn_ver_vec = NULL;
 
   /* By the time we get here, we should be seeing only real
      pointer-to-member types, not the internal POINTER_TYPE to
@@ -6979,9 +7005,19 @@ resolve_address_of_overloaded_function (tree targe
 	  if (DECL_ANTICIPATED (fn))
 	    continue;
 
-	  /* See if there's a match.  */
+	  /* See if there's a match.   For functions that are multi-versioned,
+	     all the versions match.  */
 	  if (same_type_p (target_fn_type, static_fn_type (fn)))
-	    matches = tree_cons (fn, NULL_TREE, matches);
+	    {
+	      matches = tree_cons (fn, NULL_TREE, matches);
+	      /*If versioned, push all possible versions into a vector.  */
+	      if (DECL_FUNCTION_VERSIONED (fn))
+		{
+		  if (fn_ver_vec == NULL)
+		   fn_ver_vec = VEC_alloc (tree, heap, 2);
+		  VEC_safe_push (tree, heap, fn_ver_vec, fn); 
+		}
+	    }
 	}
     }
 
@@ -7069,13 +7105,26 @@ resolve_address_of_overloaded_function (tree targe
     {
       /* There were too many matches.  First check if they're all
 	 the same function.  */
-      tree match;
+      tree match = NULL_TREE;
 
       fn = TREE_PURPOSE (matches);
-      for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
-	if (!decls_match (fn, TREE_PURPOSE (match)))
-	  break;
 
+      /* For multi-versioned functions, more than one match is just fine.
+	 Call decls_match to make sure they are different because they are
+	 versioned.  */
+      if (DECL_FUNCTION_VERSIONED (fn))
+	{
+          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+  	    if (decls_match (fn, TREE_PURPOSE (match)))
+	      break;
+	}
+      else
+	{
+          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+  	    if (!decls_match (fn, TREE_PURPOSE (match)))
+	      break;
+	}
+
       if (match)
 	{
 	  if (flags & tf_error)
@@ -7137,6 +7186,28 @@ resolve_address_of_overloaded_function (tree targe
       perform_or_defer_access_check (access_path, fn, fn, flags);
     }
 
+  /* If a pointer to a function that is multi-versioned is requested, the
+     pointer to the dispatcher function is returned instead.  This works
+     well because indirectly calling the function will dispatch the right
+     function version at run-time.  */
+  if (DECL_FUNCTION_VERSIONED (fn))
+    {
+      tree dispatcher_decl = NULL;
+      gcc_assert (fn_ver_vec != NULL);
+      gcc_assert (targetm.get_function_versions_dispatcher);
+      dispatcher_decl = targetm.get_function_versions_dispatcher (fn_ver_vec);
+      if (!dispatcher_decl)
+	{
+	  error_at (input_location, "Pointer to a multiversioned function"
+		    " without a default is not allowed");
+	  return error_mark_node;
+	}
+      retrofit_lang_decl (dispatcher_decl);
+      mark_used (fn);
+      VEC_free (tree, heap, fn_ver_vec);
+      fn = dispatcher_decl;
+    }
+
   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
     return cp_build_addr_expr (fn, flags);
   else
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	(revision 192378)
+++ gcc/cp/decl.c	(working copy)
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "pointer-set.h"
 #include "splay-tree.h"
 #include "plugin.h"
+#include "cgraph.h"
 
 /* Possible cases of bad specifiers type used by bad_specifiers. */
 enum bad_spec_place {
@@ -981,6 +982,29 @@ decls_match (tree newdecl, tree olddecl)
       if (t1 != t2)
 	return 0;
 
+      /* The decls dont match if they correspond to two different versions
+	 of the same function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) 
+	  && targetm.target_option.function_versions (newdecl, olddecl))
+	{
+	  /* Mark functions as versions if necessary.  Modify the mangled decl
+	     name if necessary.  */
+	  if (!DECL_FUNCTION_VERSIONED (newdecl))
+	    {
+	      DECL_FUNCTION_VERSIONED (newdecl) = 1;
+	      if (DECL_ASSEMBLER_NAME_SET_P (newdecl))
+	        mangle_decl (newdecl);
+	    }
+	  if (!DECL_FUNCTION_VERSIONED (olddecl))
+	    {
+	      DECL_FUNCTION_VERSIONED (olddecl) = 1;
+	      if (DECL_ASSEMBLER_NAME_SET_P (olddecl))
+	       mangle_decl (olddecl);
+	    }
+	  return 0;
+	}
+
       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
 	  && ! (DECL_EXTERN_C_P (newdecl)
 		&& DECL_EXTERN_C_P (olddecl)))
@@ -1499,7 +1523,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
 	      error ("previous declaration %q+#D here", olddecl);
 	      return NULL_TREE;
 	    }
-	  else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
+	  /* For function versions, params and types match, but they
+	     are not ambiguous.  */
+	  else if ((!DECL_FUNCTION_VERSIONED (newdecl)
+		    && !DECL_FUNCTION_VERSIONED (olddecl))
+		   && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
 			      TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
 	    {
 	      error ("new declaration %q#D", newdecl);
@@ -2272,6 +2300,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
+  /* If the olddecl is a version, so is the newdecl.  */
+  if (TREE_CODE (newdecl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (olddecl))
+    DECL_FUNCTION_VERSIONED (newdecl) = 1;
+
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
     {
       int function_size;
@@ -14222,7 +14255,11 @@ cxx_comdat_group (tree decl)
 	  else
 	    break;
 	}
-      name = DECL_ASSEMBLER_NAME (decl);
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl))
+	name = DECL_NAME (decl);
+      else
+        name = DECL_ASSEMBLER_NAME (decl);
     }
 
   return name;
Index: gcc/cp/error.c
===================================================================
--- gcc/cp/error.c	(revision 192378)
+++ gcc/cp/error.c	(working copy)
@@ -1541,8 +1541,15 @@ dump_exception_spec (tree t, int flags)
 static void
 dump_function_name (tree t, int flags)
 {
-  tree name = DECL_NAME (t);
+  tree name;
 
+  /* For function versions, use the assembler name as the decl name is
+     the same for all versions.  */
+  if (DECL_FUNCTION_VERSIONED (t))
+    name = DECL_ASSEMBLER_NAME (t);
+  else
+    name = DECL_NAME (t);
+
   /* We can get here with a decl that was synthesized by language-
      independent machinery (e.g. coverage.c) in which case it won't
      have a lang_specific structure attached and DECL_CONSTRUCTOR_P
Index: gcc/cp/semantics.c
===================================================================
--- gcc/cp/semantics.c	(revision 192378)
+++ gcc/cp/semantics.c	(working copy)
@@ -3799,8 +3799,11 @@ expand_or_defer_fn_1 (tree fn)
       /* If the user wants us to keep all inline functions, then mark
 	 this function as needed so that finish_file will make sure to
 	 output it later.  Similarly, all dllexport'd functions must
-	 be emitted; there may be callers in other DLLs.  */
-      if ((flag_keep_inline_functions
+	 be emitted; there may be callers in other DLLs.
+	 Also, mark this function as needed if it is marked inline but
+	 is a multi-versioned function.  */
+      if (((flag_keep_inline_functions
+	    || DECL_FUNCTION_VERSIONED (fn))
 	   && DECL_DECLARED_INLINE_P (fn)
 	   && !DECL_REALLY_EXTERN (fn))
 	  || (flag_keep_inline_dllexport
Index: gcc/cp/decl2.c
===================================================================
--- gcc/cp/decl2.c	(revision 192378)
+++ gcc/cp/decl2.c	(working copy)
@@ -674,9 +674,13 @@ check_classfn (tree ctype, tree function, tree tem
 	  if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
 	    continue;
 
+	  /* While finding a match, same types and params are not enough
+	     if the function is versioned.  Also check version ("target")
+	     attributes.  */
 	  if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
 			   TREE_TYPE (TREE_TYPE (fndecl)))
 	      && compparms (p1, p2)
+	      && !targetm.target_option.function_versions (function, fndecl)
 	      && (!is_template
 		  || comp_template_parms (template_parms,
 					  DECL_TEMPLATE_PARMS (fndecl)))
Index: gcc/cp/call.c
===================================================================
--- gcc/cp/call.c	(revision 192378)
+++ gcc/cp/call.c	(working copy)
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "timevar.h"
+#include "cgraph.h"
 
 /* The various kinds of conversion.  */
 
@@ -6400,6 +6401,35 @@ magic_varargs_p (tree fn)
   return false;
 }
 
+/* Returns the decl of the dispatcher function if FN is a function version.  */
+
+static tree
+get_function_version_dispatcher (tree fn)
+{
+  tree dispatcher_decl = NULL;
+  struct cgraph_node *node = NULL;
+
+  gcc_assert (TREE_CODE (fn) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (fn));
+
+  node = cgraph_get_node (fn);
+
+  if (node != NULL)
+    dispatcher_decl = node->function_version.dispatcher_resolver;
+  else
+    return NULL;
+
+  if (dispatcher_decl == NULL)
+    {
+      error_at (input_location, "Call to multiversioned function"
+                " without a default is not allowed");
+      return NULL;
+    }
+  retrofit_lang_decl (dispatcher_decl);
+  gcc_assert (dispatcher_decl != NULL);
+  return dispatcher_decl;
+}
+
 /* Subroutine of the various build_*_call functions.  Overload resolution
    has chosen a winning candidate CAND; build up a CALL_EXPR accordingly.
    ARGS is a TREE_LIST of the unconverted arguments to the call.  FLAGS is a
@@ -6852,6 +6882,20 @@ build_over_call (struct z_candidate *cand, int fla
   if (!already_used)
     mark_used (fn);
 
+  /* For calls to a multi-versioned function, overload resolution
+     returns the function with the highest target priority, that is,
+     the version that will checked for dispatching first.  If this
+     version is inlinable, a direct call to this version can be made
+     otherwise the call should go through the dispatcher.  */
+
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && !targetm.target_option.can_inline_p (current_function_decl, fn))
+    {
+      fn = get_function_version_dispatcher (fn);
+      if (fn == NULL)
+	return NULL;
+    }
+
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
     {
       tree t;
@@ -8132,6 +8176,38 @@ joust (struct z_candidate *cand1, struct z_candida
       && (IS_TYPE_OR_DECL_P (cand1->fn)))
     return 1;
 
+  /* For Candidates of a multi-versioned function,  make the version with
+     the highest priority win.  This version will be checked for dispatching
+     first.  If this version can be inlined into the caller, the front-end
+     will simply make a direct call to this function.  */
+
+  if (TREE_CODE (cand1->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (cand1->fn)
+      && TREE_CODE (cand2->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (cand2->fn))
+    {
+      tree f1 = TREE_TYPE (cand1->fn);
+      tree f2 = TREE_TYPE (cand2->fn);
+      tree p1 = TYPE_ARG_TYPES (f1);
+      tree p2 = TYPE_ARG_TYPES (f2);
+     
+      /* Check if cand1->fn and cand2->fn are versions of the same function.  It
+         is possible that cand1->fn and cand2->fn are function versions but of
+         different functions.  Check types to see if they are versions of the same
+         function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)))
+	{
+	  /* Always make the version with the higher priority, more
+	     specialized, win.  */
+	  gcc_assert (targetm.compare_version_priority);
+	  if (targetm.compare_version_priority (cand1->fn, cand2->fn) >= 0)
+	    return 1;
+	  else
+	    return -1;
+	}
+    }
+
   /* a viable function F1
      is defined to be a better function than another viable function F2  if
      for  all arguments i, ICSi(F1) is not a worse conversion sequence than
@@ -8452,6 +8528,37 @@ tweak:
   return 0;
 }
 
+/* Function FN is multi-versioned and CANDIDATES contains the list of all
+   overloaded candidates for FN.  This function extracts all functions from
+   CANDIDATES that are function versions of FN and generates a dispatcher
+   function for this multi-versioned function group.  */
+
+static void
+generate_function_versions_dispatcher (tree fn, struct z_candidate *candidates)
+{
+  tree f1 = TREE_TYPE (fn);
+  tree p1 = TYPE_ARG_TYPES (f1);
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  struct z_candidate *ver = candidates;
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+
+  for (;ver; ver = ver->next)
+    {
+      tree f2 = TREE_TYPE (ver->fn);
+      tree p2 = TYPE_ARG_TYPES (f2);
+      /* If this candidate is a version of FN, types must match.  */
+      if (DECL_FUNCTION_VERSIONED (ver->fn)
+          && compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)))
+        VEC_safe_push (tree, heap, fn_ver_vec, ver->fn);
+    }
+
+  gcc_assert (targetm.get_function_versions_dispatcher);
+  targetm.get_function_versions_dispatcher (fn_ver_vec);
+  VEC_free (tree, heap, fn_ver_vec); 
+}
+
 /* Given a list of candidates for overloading, find the best one, if any.
    This algorithm has a worst case of O(2n) (winner is last), and a best
    case of O(n/2) (totally ambiguous); much better than a sorting
@@ -8504,6 +8611,18 @@ tourney (struct z_candidate *candidates, tsubst_fl
 	return NULL;
     }
 
+  /* For multiversioned functions, aggregate all the versions here for
+     generating the dispatcher body later if necessary.  Check to see if
+     the dispatcher is already generated to avoid doing this more than
+     once.  */
+
+  if (TREE_CODE (champ->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (champ->fn)
+      && (cgraph_get_node (champ->fn) == NULL
+	  || (cgraph_get_node (champ->fn)->function_version.dispatcher_resolver
+	      == NULL)))
+      generate_function_versions_dispatcher (champ->fn, candidates);
+
   return champ;
 }
 
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 192378)
+++ gcc/config/i386/i386.c	(working copy)
@@ -62,6 +62,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "diagnostic.h"
 #include "dumpfile.h"
+#include "tree-pass.h"
+#include "tree-flow.h"
 
 enum upper_128bits_state
 {
@@ -28399,6 +28401,988 @@ ix86_init_mmx_sse_builtins (void)
     }
 }
 
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
+   to return a pointer to VERSION_DECL if the outcome of the expression
+   formed by PREDICATE_CHAIN is true.  This function will be called during
+   version dispatch to decide which function version to execute.  It returns
+   the basic block at the end, to which more conditions can be added.  */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     tree predicate_chain, basic_block new_bb)
+{
+  gimple return_stmt;
+  tree convert_expr, result_var;
+  gimple convert_stmt;
+  gimple call_cond_stmt;
+  gimple if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+
+  tree cond_var, and_expr_var = NULL_TREE;
+  gimple_seq gseq;
+
+  tree old_current_function_decl;
+  tree predicate_decl, predicate_arg;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+  current_function_decl = function_decl;
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node, NULL);
+  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  return_stmt = gimple_build_return (result_var);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      current_function_decl = old_current_function_decl;
+      return new_bb;
+    }
+
+  while (predicate_chain != NULL)
+    {
+      cond_var = create_tmp_var (integer_type_node, NULL);
+      predicate_decl = TREE_PURPOSE (predicate_chain);
+      predicate_arg = TREE_VALUE (predicate_chain);
+      call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+      gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+      gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+      gimple_set_bb (call_cond_stmt, new_bb);
+      gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+      predicate_chain = TREE_CHAIN (predicate_chain);
+      
+      if (and_expr_var == NULL)
+        and_expr_var = cond_var;
+      else
+	{
+	  gimple assign_stmt;
+	  /* Use MIN_EXPR to check if any integer is zero?.
+	     and_expr_var = min_expr <cond_var, and_expr_var>  */
+	  assign_stmt = gimple_build_assign (and_expr_var,
+			  build2 (MIN_EXPR, integer_type_node,
+				  cond_var, and_expr_var));
+
+	  gimple_set_block (assign_stmt, DECL_INITIAL (function_decl));
+	  gimple_set_bb (assign_stmt, new_bb);
+	  gimple_seq_add_stmt (&gseq, assign_stmt);
+	}
+    }
+
+  if_else_stmt = gimple_build_cond (GT_EXPR, and_expr_var,
+	  		            integer_zero_node,
+				    NULL_TREE, NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR, 0);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  return bb3;
+}
+
+/* This parses the attribute arguments to target in DECL and determines
+   the right builtin to use to match the platform specification.
+   For now, only one target argument ("arch=" or "<-m>xxx") is allowed.
+   It returns the priority value for this version decl.  If PREDICATE_LIST
+   is not NULL, it stores the list of cpu features that need to be checked
+   before dispatching this function.  */
+
+static unsigned int
+get_builtin_code_for_version (tree decl, tree *predicate_list)
+{
+  tree attrs;
+  struct cl_target_option cur_target;
+  tree target_node;
+  struct cl_target_option *new_target;
+  const char *arg_str = NULL;
+  const char *attrs_str = NULL;
+  char *tok_str = NULL;
+  char *token;
+  unsigned int priority = 0;
+
+  /* Priority of i386 features, greater value is higher priority.   This is
+     used to decide the order in which function dispatch must happen.  For
+     instance, a version specialized for SSE4.2 should be checked for dispatch
+     before a version for SSE3, as SSE4.2 implies SSE3.  */
+  enum feature_priority
+  {
+    P_ZERO = 0,
+    P_MMX,
+    P_SSE,
+    P_SSE2,
+    P_SSE3,
+    P_SSSE3,
+    P_PROC_SSSE3,
+    P_SSE4_a,
+    P_PROC_SSE4_a,
+    P_SSE4_1,
+    P_SSE4_2,
+    P_PROC_SSE4_2,
+    P_POPCNT,
+    P_AVX,
+    P_AVX2,
+    P_FMA,
+    P_PROC_FMA
+  };
+
+  /* These are the target attribute strings for which a dispatcher is
+     available, from fold_builtin_cpu.  */
+
+  static struct _feature_list
+    {
+      const char *const name;
+      const enum feature_priority priority;
+    }
+  const feature_list[] =
+    {
+      {"mmx", P_MMX},
+      {"sse", P_SSE},
+      {"sse2", P_SSE2},
+      {"sse3", P_SSE3},
+      {"ssse3", P_SSSE3},
+      {"sse4.1", P_SSE4_1},
+      {"sse4.2", P_SSE4_2},
+      {"popcnt", P_POPCNT},
+      {"avx", P_AVX},
+      {"avx2", P_AVX2}
+    };
+
+
+  static unsigned int NUM_FEATURES
+    = sizeof (feature_list) / sizeof (struct _feature_list);
+
+  unsigned int i;
+
+  tree predicate_chain = NULL_TREE;
+  tree predicate_decl, predicate_arg;
+
+  attrs = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  gcc_assert (attrs != NULL);
+
+  attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+  gcc_assert (TREE_CODE (attrs) == STRING_CST);
+  attrs_str = TREE_STRING_POINTER (attrs);
+
+
+  /* Handle arch= if specified.  For priority, set it to be 1 more than
+     the best instruction set the processor can handle.  For instance, if
+     there is a version for atom and a version for ssse3 (the highest ISA
+     priority for atom), the atom version must be checked for dispatch
+     before the ssse3 version. */
+  if (strstr (attrs_str, "arch=") != NULL)
+    {
+      cl_target_option_save (&cur_target, &global_options);
+      target_node = ix86_valid_target_attribute_tree (attrs);
+    
+      gcc_assert (target_node);
+      new_target = TREE_TARGET_OPTION (target_node);
+      gcc_assert (new_target);
+      
+      if (new_target->arch_specified && new_target->arch > 0)
+	{
+	  switch (new_target->arch)
+	    {
+	    case PROCESSOR_CORE2_32:
+	    case PROCESSOR_CORE2_64:
+	      arg_str = "core2";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_COREI7_32:
+	    case PROCESSOR_COREI7_64:
+	      arg_str = "corei7";
+	      priority = P_PROC_SSE4_2;
+	      break;
+	    case PROCESSOR_ATOM:
+	      arg_str = "atom";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_AMDFAM10:
+	      arg_str = "amdfam10h";
+	      priority = P_PROC_SSE4_a;
+	      break;
+	    case PROCESSOR_BDVER1:
+	      arg_str = "bdver1";
+	      priority = P_PROC_FMA;
+	      break;
+	    case PROCESSOR_BDVER2:
+	      arg_str = "bdver2";
+	      priority = P_PROC_FMA;
+	      break;
+	    }  
+	}    
+    
+      cl_target_option_restore (&global_options, &cur_target);
+	
+      if (predicate_list && arg_str == NULL)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+	    	"No dispatcher found for the versioning attributes");
+	  return 0;
+	}
+    
+      if (predicate_list)
+	{
+          predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
+          /* For a C string literal the length includes the trailing NULL.  */
+          predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+          predicate_chain = tree_cons (predicate_decl, predicate_arg,
+				       predicate_chain);
+	}
+    }
+
+  /* Process feature name.  */
+  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
+  strcpy (tok_str, attrs_str);
+  token = strtok (tok_str, ",");
+  predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
+
+  while (token != NULL)
+    {
+      /* Do not process "arch="  */
+      if (strncmp (token, "arch=", 5) == 0)
+	{
+	  token = strtok (NULL, ",");
+	  continue;
+	}
+      for (i = 0; i < NUM_FEATURES; ++i)
+	{
+	  if (strcmp (token, feature_list[i].name) == 0)
+	    {
+	      if (predicate_list)
+		{
+		  predicate_arg = build_string_literal (
+				  strlen (feature_list[i].name) + 1,
+				  feature_list[i].name);
+		  predicate_chain = tree_cons (predicate_decl, predicate_arg,
+					       predicate_chain);
+		}
+	      /* Find the maximum priority feature.  */
+	      if (feature_list[i].priority > priority)
+		priority = feature_list[i].priority;
+
+	      break;
+	    }
+	}
+      if (predicate_list && i == NUM_FEATURES)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+		    "No dispatcher found for %s", token);
+	  return 0;
+	}
+      token = strtok (NULL, ",");
+    }
+  free (tok_str);
+
+  if (predicate_list && predicate_chain == NULL_TREE)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+	        "No dispatcher found for the versioning attributes : %s",
+	        attrs_str);
+      return 0;
+    }
+  else if (predicate_list)
+    {
+      predicate_chain = nreverse (predicate_chain);
+      *predicate_list = predicate_chain;
+    }
+
+  return priority; 
+}
+
+/* This compares the priority of target features in function DECL1
+   and DECL2.  It returns positive value if DECL1 is higher priority,
+   negative value if DECL2 is higher priority and 0 if they are the
+   same.  */
+
+static int
+ix86_compare_version_priority (tree decl1, tree decl2)
+{
+  unsigned int priority1 = 0;
+  unsigned int priority2 = 0;
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl1)) != NULL)
+    priority1 = get_builtin_code_for_version (decl1, NULL);
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl2)) != NULL)
+    priority2 = get_builtin_code_for_version (decl2, NULL);
+
+  return (int)priority1 - (int)priority2;
+}
+
+/* V1 and V2 point to function versions with different priorities
+   based on the target ISA.  This function compares their priorities.  */
+ 
+static int
+feature_compare (const void *v1, const void *v2)
+{
+  typedef struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    } function_version_info;
+
+  const function_version_info c1 = *(const function_version_info *)v1;
+  const function_version_info c2 = *(const function_version_info *)v2;
+  return (c2.dispatch_priority - c1.dispatch_priority);
+}
+
+/* This function generates the dispatch function for
+   multi-versioned functions.  DISPATCH_DECL is the function which will
+   contain the dispatch logic.  FNDECLS are the function choices for
+   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
+   in DISPATCH_DECL in which the dispatch code is generated.  */
+
+static int
+dispatch_function_versions (tree dispatch_decl,
+			    void *fndecls_p,
+			    basic_block *empty_bb)
+{
+  tree default_decl;
+  gimple ifunc_cpu_init_stmt;
+  gimple_seq gseq;
+  tree old_current_function_decl;
+  int ix;
+  tree ele;
+  VEC (tree, heap) *fndecls;
+  unsigned int num_versions = 0;
+  unsigned int actual_versions = 0;
+  unsigned int i;
+
+  struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    }*function_version_info;
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /*fndecls_p is actually a vector.  */
+  fndecls = (VEC (tree, heap) *)fndecls_p;
+
+  /* Atleast one more version other than the default.  */
+  num_versions = VEC_length (tree, fndecls);
+  gcc_assert (num_versions >= 2);
+
+  function_version_info = (struct _function_version_info *)
+    xmalloc ((num_versions - 1) * sizeof (struct _function_version_info));
+
+  /* The first version in the vector is the default decl.  */
+  default_decl = VEC_index (tree, fndecls, 0);
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
+  current_function_decl = dispatch_decl;
+
+  gseq = bb_seq (*empty_bb);
+  /* Function version dispatch is via IFUNC.  IFUNC resolvers fire before
+     constructors, so explicity call __builtin_cpu_init here.  */
+  ifunc_cpu_init_stmt = gimple_build_call_vec (
+                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
+  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
+  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
+  set_bb_seq (*empty_bb, gseq);
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+
+  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
+    {
+      tree version_decl = ele;
+      tree predicate_chain = NULL_TREE;
+      unsigned int priority;
+      /* Get attribute string, parse it and find the right predicate decl.
+         The predicate function could be a lengthy combination of many
+	 features, like arch-type and various isa-variants.  */
+      priority = get_builtin_code_for_version (version_decl,
+	 			               &predicate_chain);
+
+      if (predicate_chain == NULL_TREE)
+	continue;
+
+      actual_versions++;
+      function_version_info [ix - 1].version_decl = version_decl;
+      function_version_info [ix - 1].predicate_chain = predicate_chain;
+      function_version_info [ix - 1].dispatch_priority = priority;
+    }
+
+  /* Sort the versions according to descending order of dispatch priority.  The
+     priority is based on the ISA.  This is not a perfect solution.  There
+     could still be ambiguity.  If more than one function version is suitable
+     to execute,  which one should be dispatched?  In future, allow the user
+     to specify a dispatch  priority next to the version.  */
+  qsort (function_version_info, actual_versions,
+         sizeof (struct _function_version_info), feature_compare);
+
+  for  (i = 0; i < actual_versions; ++i)
+    *empty_bb = add_condition_to_bb (dispatch_decl,
+				     function_version_info[i].version_decl,
+				     function_version_info[i].predicate_chain,
+				     *empty_bb);
+
+  /* dispatch default version at the end.  */
+  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl,
+				   NULL, *empty_bb);
+
+  free (function_version_info);
+  return 0;
+}
+
+/* This function returns true if fn1 and fn2 are versions of the same function.
+   Returns false if only one of the function decls has the target attribute
+   set or if the targets of the function decls are different.  This assumes
+   the fn1 and fn2 have the same signature.  */
+
+static bool
+ix86_function_versions (tree fn1, tree fn2)
+{
+  tree attr1, attr2;
+  struct cl_target_option *target1, *target2;
+
+  if (TREE_CODE (fn1) != FUNCTION_DECL
+      || TREE_CODE (fn2) != FUNCTION_DECL)
+    return false;
+
+  attr1 = DECL_FUNCTION_SPECIFIC_TARGET (fn1);
+  attr2 = DECL_FUNCTION_SPECIFIC_TARGET (fn2);
+
+  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
+    return false;
+
+  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
+      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
+    return true;
+
+  target1 = TREE_TARGET_OPTION (attr1);
+  target2 = TREE_TARGET_OPTION (attr2);
+
+  if (target1->x_ix86_isa_flags == target2->x_ix86_isa_flags
+      && target1->x_target_flags == target2->x_target_flags
+      && target1->arch == target2->arch
+      && target1->tune == target2->tune
+      && target1->x_ix86_fpmath == target2->x_ix86_fpmath
+      && target1->branch_cost == target2->branch_cost)
+    return false;
+
+  return true;
+}
+
+/* Comparator function to be used in qsort routine to sort attribute
+   specification strings to "target".  */
+
+static int
+attr_strcmp (const void *v1, const void *v2)
+{
+  const char *c1 = *(char *const*)v1;
+  const char *c2 = *(char *const*)v2;
+  return strcmp (c1, c2);
+}
+
+/* STR is the argument to target attribute.  This function tokenizes
+   the comma separated arguments, sorts them and returns a string which
+   is a unique identifier for the comma separated arguments.   It also
+   replaces non-identifier characters "=,-" with "_".  */
+
+static char *
+sorted_attr_string (const char *str)
+{
+  char **args = NULL;
+  char *attr_str, *ret_str;
+  char *attr = NULL;
+  unsigned int argnum = 1;
+  unsigned int i;
+
+  for (i = 0; i < strlen (str); i++)
+    if (str[i] == ',')
+      argnum++;
+
+  attr_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (attr_str, str);
+
+  /* Replace "=,-" with "_".  */
+  for (i = 0; i < strlen (attr_str); i++)
+    if (attr_str[i] == '=' || attr_str[i]== '-')
+      attr_str[i] = '_';
+
+  if (argnum == 1)
+    return attr_str;
+
+  args = (char **)xmalloc (argnum * sizeof (char *));
+
+  i = 0;
+  attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      args[i] = attr;
+      i++;
+      attr = strtok (NULL, ",");
+    }
+
+  qsort (args, argnum, sizeof (char*), attr_strcmp);
+
+  ret_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (ret_str, args[0]);
+  for (i = 1; i < argnum; i++)
+    {
+      strcat (ret_str, "_");
+      strcat (ret_str, args[i]);
+    }
+
+  free (args);
+  free (attr_str);
+  return ret_str;
+}
+
+/* This function changes the assembler name for functions that are
+   versions.  If DECL is a function version and has a "target"
+   attribute, it appends the attribute string to its assembler name.  */
+
+static tree
+ix86_mangle_function_version_assembler_name (tree decl, tree id)
+{
+  tree version_attr;
+  const char *orig_name, *version_string, *attr_str;
+  char *assembler_name;
+
+  if (DECL_DECLARED_INLINE_P (decl)
+      && lookup_attribute ("gnu_inline",
+			   DECL_ATTRIBUTES (decl)))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Function versions cannot be marked as gnu_inline,"
+	      " bodies have to be generated\n");
+
+  if (DECL_VIRTUAL_P (decl)
+      || DECL_VINDEX (decl))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Virtual function versioning not supported\n");
+
+  version_attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+
+  /* target attribute string is NULL for default functions.  */
+  if (version_attr == NULL_TREE)
+    return id;
+
+  orig_name = IDENTIFIER_POINTER (id);
+  version_string
+    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
+
+  attr_str = sorted_attr_string (version_string);
+  assembler_name = (char *) xmalloc (strlen (orig_name)
+				     + strlen (attr_str) + 2);
+
+  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
+  if (dump_file)
+    fprintf (stderr, "Assembler name set to %s for function version %s\n",
+	     assembler_name, IDENTIFIER_POINTER (id));
+
+  /* Allow assembler name to be modified if already set.  */
+  if (DECL_ASSEMBLER_NAME_SET_P (decl))
+    SET_DECL_RTL (decl, NULL);
+
+  return get_identifier (assembler_name);
+}
+
+static tree 
+ix86_mangle_decl_assembler_name (tree decl, tree id)
+{
+  /* For function version, add the target suffix to the assembler name.  */
+  if (TREE_CODE (decl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (decl))
+    return ix86_mangle_function_version_assembler_name (decl, id);
+
+  return id;
+}
+
+/* Return a new name by appending SUFFIX to the DECL name.  If
+   make_unique is true, append the full path name.  */
+
+static char *
+make_name (tree decl, const char *suffix, bool make_unique)
+{
+  char *global_var_name;
+  int name_len;
+  const char *name;
+  const char *unique_name = NULL;
+
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* Get a unique name that can be used globally without any chances
+     of collision at link time.  */
+  if (make_unique)
+    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
+
+  name_len = strlen (name) + strlen (suffix) + 2;
+
+  if (make_unique)
+    name_len += strlen (unique_name) + 1;
+  global_var_name = (char *) xmalloc (name_len);
+
+  /* Use '.' to concatenate names as it is demangler friendly.  */
+  if (make_unique)
+      snprintf (global_var_name, name_len, "%s.%s.%s", name,
+		unique_name, suffix);
+  else
+      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
+
+  return global_var_name;
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher
+   by the front-end.  Return the decl created.  */
+
+static tree
+make_dispatcher_decl (const tree decl)
+{
+  tree func_decl;
+  char *func_name, *resolver_name;
+  tree fn_type, func_type;
+  bool is_uniq = false;
+
+  if (TREE_PUBLIC (decl) == 0)
+    is_uniq = true;
+
+  func_name = make_name (decl, "ifunc", is_uniq);
+  resolver_name = make_name (decl, "resolver", is_uniq);
+  gcc_assert (resolver_name);
+
+  fn_type = TREE_TYPE (decl);
+  func_type = build_function_type (TREE_TYPE (fn_type),
+				   TYPE_ARG_TYPES (fn_type));
+  
+  func_decl = build_fn_decl (func_name, func_type);
+  TREE_USED (func_decl) = 1;
+  DECL_CONTEXT (func_decl) = NULL_TREE;
+  DECL_INITIAL (func_decl) = error_mark_node;
+  DECL_ARTIFICIAL (func_decl) = 1;
+  /* Mark this func as external, the resolver will flip it again if
+     it gets generated.  */
+  DECL_EXTERNAL (func_decl) = 1;
+  /* This will be of type IFUNCs have to be externally visible.  */
+  TREE_PUBLIC (func_decl) = 1;
+
+  return func_decl;  
+}
+
+/* Returns true if decl is multi-versioned and DECL is the default function,
+   that is it is not tagged with target specific optimization.  */
+
+static bool
+is_function_default_version (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL_TREE);
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher
+   by the front-end.  It also chains the cgraph nodes of all the
+   semantically identical versions in vector FN_VER_VEC_P.  Returns the
+   decl of the dispatcher function.  */
+
+static tree
+ix86_get_function_versions_dispatcher (void *fn_ver_vec_p)
+{
+  struct cgraph_node *node = NULL;
+  struct cgraph_node *default_node = NULL;
+  struct cgraph_node *dispatcher_node = NULL;
+  int ix;
+  tree ele;
+  tree dispatch_decl = NULL;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+
+  fn_ver_vec = (VEC (tree,heap) *) fn_ver_vec_p;
+  gcc_assert (fn_ver_vec != NULL);
+
+  /* Find the default version.  */
+  for (ix = 0; VEC_iterate (tree, fn_ver_vec, ix, ele); ++ix)
+    {
+      if (is_function_default_version (ele))
+	{
+	  default_node = cgraph_get_create_node (ele);
+	  break;
+	}
+    }
+
+  /* If there is no default node, just return NULL.  */
+  if (!default_node)
+    return NULL;
+
+  if (default_node->function_version.dispatcher_resolver)
+    return default_node->function_version.dispatcher_resolver;
+
+#if defined (ASM_OUTPUT_TYPE_DIRECTIVE) && HAVE_GNU_INDIRECT_FUNCTION
+  /* Right now, the dispatching is done via ifunc.  */
+  dispatch_decl = make_dispatcher_decl (default_node->symbol.decl); 
+#else
+  error_at (DECL_SOURCE_LOCATION (default_node->symbol.decl),
+	    "Multiversioning needs ifunc which is not supported "
+	    "in this configuration");
+#endif
+  default_node->function_version.dispatcher_resolver = dispatch_decl;
+  dispatcher_node = cgraph_get_create_node (dispatch_decl);
+  gcc_assert (dispatcher_node);
+  dispatcher_node->dispatcher_function = 1;
+  cgraph_mark_address_taken_node (default_node);
+
+  for (ix = 0; VEC_iterate (tree, fn_ver_vec, ix, ele); ++ix)
+    {
+      node = cgraph_get_create_node (ele);
+      gcc_assert (node != NULL && DECL_FUNCTION_VERSIONED (ele));
+      if (node == default_node)
+	continue;
+      gcc_assert (DECL_FUNCTION_SPECIFIC_TARGET (ele) != NULL_TREE);
+
+      if (dispatcher_node->function_version.next)
+ 	{
+	  struct cgraph_node *dispatcher_node_next
+	    = dispatcher_node->function_version.next;
+	  node->function_version.next = dispatcher_node_next;
+	  dispatcher_node_next->function_version.prev = node;
+	}
+      dispatcher_node->function_version.next = node;
+      node->function_version.prev = dispatcher_node;
+      node->function_version.dispatcher_resolver = dispatch_decl;
+    }
+
+  /* The default version should be the first node.  */
+  default_node->function_version.next = dispatcher_node->function_version.next;
+  (dispatcher_node->function_version.next)->function_version.prev
+     = default_node;
+  /* The dispatcher node should directly point to the default node.  */
+  dispatcher_node->function_version.next = default_node;
+  
+  return dispatch_decl;
+}
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* Make the resolver function decl to dispatch the versions of
+   a multi-versioned function,  DEFAULT_DECL.  Create an
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_resolver_func (const tree default_decl,
+		    const tree dispatch_decl,
+		    basic_block *empty_bb)
+{
+  char *resolver_name;
+  tree decl, type, decl_name, t;
+  basic_block new_bb;
+  tree old_current_function_decl;
+  bool is_uniq = false;
+
+  /* IFUNC's have to be globally visible.  So, if the default_decl is
+     not, then the name of the IFUNC should be made unique.  */
+  if (TREE_PUBLIC (default_decl) == 0)
+    is_uniq = true;
+
+  /* Append the filename to the resolver function if the versions are
+     not externally visible.  This is because the resolver function has
+     to be externally visible for the loader to find it.  So, appending
+     the filename will prevent conflicts with a resolver function from
+     another module which is based on the same version name.  */
+  resolver_name = make_name (default_decl, "resolver", is_uniq);
+
+  /* The resolver function should return a (void *). */
+  type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  decl_name = get_identifier (resolver_name);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  /* IFUNC resolvers have to be externally visible.  */
+  TREE_PUBLIC (decl) = 1;
+  DECL_UNINLINABLE (decl) = 1;
+
+  DECL_EXTERNAL (decl) = 0;
+  DECL_EXTERNAL (dispatch_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+  TREE_READONLY (decl) = 0;
+  DECL_PURE_P (decl) = 0;
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
+      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
+    }
+  else if (TREE_PUBLIC (default_decl))
+    {
+      /* In this case, each translation unit with a call to this
+	 versioned function will put out a resolver.  Ensure it
+	 is comdat to keep just one copy.  */
+      DECL_COMDAT (decl) = 1;
+      make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+    }
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  current_function_decl = decl;
+  gimple_register_cfg_hooks ();
+  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
+  cfun->curr_properties |=
+    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_ssa
+     | PROP_gimple_any);
+  cfun->curr_properties = 15;
+  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
+  *empty_bb = new_bb;
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      gcc_assert (cgraph_get_node (default_decl));
+      symtab_add_to_same_comdat_group (
+	(symtab_node) cgraph_get_node (decl),
+	(symtab_node) cgraph_get_node (default_decl));
+    }
+
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+
+  gcc_assert (dispatch_decl != NULL);
+  /* Mark dispatch_decl as "ifunc" with resolver as resolver_name.  */
+  DECL_ATTRIBUTES (dispatch_decl) 
+    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));
+
+  /* Create the alias for dispatch to resolver here.  */
+  cgraph_create_function_alias (dispatch_decl, decl);
+  return decl;
+}
+
+/* Generate the dispatching code body to dispatch multi-versioned function
+   DECL.  The target hook is called to process the "target" attributes and
+   provide the code to dispatch the right function at run-time.  NODE points
+   to the dispatcher decl whose body will be created.  */
+
+static tree 
+ix86_generate_version_dispatcher_body (void *node_p)
+{
+  tree resolver_decl;
+  basic_block empty_bb;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  tree old_current_function_decl;
+  tree default_ver_decl;
+  struct cgraph_node *versn;
+  struct cgraph_node *node;
+
+  node = (cgraph_node *)node_p;
+
+  gcc_assert (node->dispatcher_function);
+
+  if (node->function_version.dispatcher_resolver)
+    return node->function_version.dispatcher_resolver;
+
+  default_ver_decl = (node->function_version.next)->symbol.decl;
+  resolver_decl = make_resolver_func (default_ver_decl,
+				      node->symbol.decl, &empty_bb);
+  node->function_version.dispatcher_resolver = resolver_decl;
+
+  old_current_function_decl = current_function_decl;
+  push_cfun (DECL_STRUCT_FUNCTION (resolver_decl));
+  current_function_decl = resolver_decl;
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+
+  for (versn = node->function_version.next; versn;
+       versn = versn->function_version.next)
+    {
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (versn->symbol.decl))
+        error_at (DECL_SOURCE_LOCATION (versn->symbol.decl),
+		  "Virtual function multiversioning not supported");
+      VEC_safe_push (tree, heap, fn_ver_vec, versn->symbol.decl);
+    }
+
+  dispatch_function_versions(resolver_decl, fn_ver_vec, &empty_bb);
+
+  rebuild_cgraph_edges (); 
+  pop_cfun ();
+  current_function_decl = old_current_function_decl;
+  return resolver_decl;
+}
+
 /* This builds the processor_model struct type defined in
    libgcc/config/i386/cpuinfo.c  */
 
@@ -40932,6 +41916,9 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_PROFILE_BEFORE_PROLOGUE
 #define TARGET_PROFILE_BEFORE_PROLOGUE ix86_profile_before_prologue
 
+#undef TARGET_MANGLE_DECL_ASSEMBLER_NAME
+#define TARGET_MANGLE_DECL_ASSEMBLER_NAME ix86_mangle_decl_assembler_name
+
 #undef TARGET_ASM_UNALIGNED_HI_OP
 #define TARGET_ASM_UNALIGNED_HI_OP TARGET_ASM_ALIGNED_HI_OP
 #undef TARGET_ASM_UNALIGNED_SI_OP
@@ -41025,6 +42012,17 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN ix86_fold_builtin
 
+#undef TARGET_COMPARE_VERSION_PRIORITY
+#define TARGET_COMPARE_VERSION_PRIORITY ix86_compare_version_priority
+
+#undef TARGET_GENERATE_VERSION_DISPATCHER_BODY
+#define TARGET_GENERATE_VERSION_DISPATCHER_BODY \
+  ix86_generate_version_dispatcher_body
+
+#undef TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+#define TARGET_GET_FUNCTION_VERSIONS_DISPATCHER \
+  ix86_get_function_versions_dispatcher
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
 
@@ -41165,6 +42163,9 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_OPTION_PRINT
 #define TARGET_OPTION_PRINT ix86_function_specific_print
 
+#undef TARGET_OPTION_FUNCTION_VERSIONS
+#define TARGET_OPTION_FUNCTION_VERSIONS ix86_function_versions
+
 #undef TARGET_CAN_INLINE_P
 #define TARGET_CAN_INLINE_P ix86_can_inline_p
 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-10-12 22:41                                                                         ` Sriraman Tallam
@ 2012-10-19 15:23                                                                           ` Diego Novillo
  2012-10-20  4:29                                                                             ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: Diego Novillo @ 2012-10-19 15:23 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Jason Merrill, Jan Hubicka, Xinliang David Li, mark, nathan,
	H.J. Lu, Richard Guenther, Uros Bizjak, reply, GCC Patches

On 2012-10-12 18:19 , Sriraman Tallam wrote:

> When the front-end sees more than one decl for "foo", it calls a target hook to
> determine if they are versions. To prevent duplicate definition errors with other
>  versions of "foo", "decls_match" function in cp/decl.c is made to return false
>  when 2 decls have are deemed versions by the target. This will make all function
> versions of "foo" to be added to the overload list of "foo".

So, this means that this can only work for C++, right?  Or could the 
same trickery be done some other way in other FEs?

I see no handling of different FEs.  If the user tries to use these 
attributes from languages other than C++, we should emit a diagnostic.

> +@deftypefn {Target Hook} tree TARGET_GET_FUNCTION_VERSIONS_DISPATCHER (void *@var{arglist})
> +This hook is used to get the dispatcher function for a set of function
> +versions.  The dispatcher function is called to invoke the rignt function

s/rignt/right/

> +version at run-time. @var{arglist} is the vector of function versions
> +that should be considered for dispatch.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} tree TARGET_GENERATE_VERSION_DISPATCHER_BODY (void *@var{arg})
> +This hook is used to generate the dispatcher logic to invoke the right
> +function version at runtime for a given set of function versions.

s/runtime/run-time/

> +@hook TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
> +This hook is used to get the dispatcher function for a set of function
> +versions.  The dispatcher function is called to invoke the rignt function

s/rignt/right/

> +version at run-time. @var{arglist} is the vector of function versions
> +that should be considered for dispatch.
> +@end deftypefn
> +
> +@hook TARGET_GENERATE_VERSION_DISPATCHER_BODY
> +This hook is used to generate the dispatcher logic to invoke the right
> +function version at runtime for a given set of function versions.

s/runtime/run-time/

> @@ -288,7 +289,6 @@ mark_store (gimple stmt, tree t, void *data)
>       }
>    return false;
>  }
> -
>  /* Create cgraph edges for function calls.
>     Also look for functions and variables having addresses taken.  */

Don't remove vertical white space, please.

> +		{
> +		  struct cgraph_node *callee = cgraph_get_create_node (decl);
> +	          /* If a call to a multiversioned function dispatcher is
> +		     found, generate the body to dispatch the right function
> +		     at run-time.  */
> +		  if (callee->dispatcher_function)
> +		    {
> +		      tree resolver_decl;
> +		      gcc_assert (callee->function_version.next);

What if callee is the last version in the list?  Not sure what you are 
trying to check here.


> @@ -8601,9 +8601,22 @@ handle_target_attribute (tree *node, tree name, tr
>        warning (OPT_Wattributes, "%qE attribute ignored", name);
>        *no_add_attrs = true;
>      }
> -  else if (! targetm.target_option.valid_attribute_p (*node, name, args,
> -						      flags))
> -    *no_add_attrs = true;
> +  else
> +    {
> +      /* When a target attribute is invalid, it may also be because the
> +	 target for the compilation unit and the attribute match.  For
> +         instance, target attribute "xxx" is invalid when -mxxx is used.
> +         When used with multiversioning, removing the attribute will lead
> +         to duplicate definitions if a default version is provided.
> +	 So, generate a warning here and remove the attribute.  */
> +      if (!targetm.target_option.valid_attribute_p (*node, name, args, flags))
> +	{
> +	  warning (OPT_Wattributes,
> +		   "Invalid target attribute in function %qE, ignored.",
> +		   *node);
> +	  *no_add_attrs = true;

If you do this, isn't the compiler going to generate two warning 
messages?  One for the invalid target attribute, the second for the 
duplicate definition.

> @@ -228,6 +228,26 @@ struct GTY(()) cgraph_node {
>    struct cgraph_node *prev_sibling_clone;
>    struct cgraph_node *clones;
>    struct cgraph_node *clone_of;
> +
> +  /* Function Multiversioning info.  */
> +  struct {
> +    /* Chains all the semantically identical function versions.  The
> +       first function in this chain is the default function.  */
> +    struct cgraph_node *prev;
> +    /* If this node is a dispatcher for function versions, this points
> +       to the default function version, the first function in the chain.  */
> +    struct cgraph_node *next;

Why not a VEC of function decls?  Seems easier to manage and less size 
overhead.


> @@ -3516,8 +3522,8 @@ struct GTY(()) tree_function_decl {
>    unsigned looping_const_or_pure_flag : 1;
>    unsigned has_debug_args_flag : 1;
>    unsigned tm_clone_flag : 1;
> -
> -  /* 1 bit left */
> +  unsigned versioned_function : 1;
> +  /* No bits left.  */

You ate the last bit!  How rude ;)

> @@ -8132,6 +8176,38 @@ joust (struct z_candidate *cand1, struct z_candida
>        && (IS_TYPE_OR_DECL_P (cand1->fn)))
>      return 1;
>
> +  /* For Candidates of a multi-versioned function,  make the version with

s/Candidates/candidates/

> +  old_current_function_decl = current_function_decl;
> +  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
> +  current_function_decl = function_decl;

push_cfun will set current_function_decl for you.  No need to keep track 
of old_current_function_decl.

> +  enum feature_priority
> +  {
> +    P_ZERO = 0,
> +    P_MMX,
> +    P_SSE,
> +    P_SSE2,
> +    P_SSE3,
> +    P_SSSE3,
> +    P_PROC_SSSE3,
> +    P_SSE4_a,
> +    P_PROC_SSE4_a,
> +    P_SSE4_1,
> +    P_SSE4_2,
> +    P_PROC_SSE4_2,
> +    P_POPCNT,
> +    P_AVX,
> +    P_AVX2,
> +    P_FMA,
> +    P_PROC_FMA
> +  };

There's no need to have this list dynamically defined, right?

> +	}
> +    }
> +
> +  /* Process feature name.  */
> +  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);

XNEWVEC(char, strlen (attrs_str) + 1);

> +  /* Atleast one more version other than the default.  */

s/Atleast/At least/

> +  num_versions = VEC_length (tree, fndecls);
> +  gcc_assert (num_versions >= 2);
> +
> +  function_version_info = (struct _function_version_info *)
> +    xmalloc ((num_versions - 1) * sizeof (struct _function_version_info));

Better use VEC() here.

> +
> +  /* The first version in the vector is the default decl.  */
> +  default_decl = VEC_index (tree, fndecls, 0);
> +
> +  old_current_function_decl = current_function_decl;
> +  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
> +  current_function_decl = dispatch_decl;

No need to set current_function_decl.

> +
> +  gseq = bb_seq (*empty_bb);
> +  /* Function version dispatch is via IFUNC.  IFUNC resolvers fire before
> +     constructors, so explicity call __builtin_cpu_init here.  */
> +  ifunc_cpu_init_stmt = gimple_build_call_vec (
> +                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
> +  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
> +  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
> +  set_bb_seq (*empty_bb, gseq);
> +
> +  pop_cfun ();
> +  current_function_decl = old_current_function_decl;

Likewise here.

> +/* This function returns true if fn1 and fn2 are versions of the same function.
> +   Returns false if only one of the function decls has the target attribute
> +   set or if the targets of the function decls are different.  This assumes
> +   the fn1 and fn2 have the same signature.  */

Mention the arguments in capitals.

> +  for (i = 0; i < strlen (str); i++)
> +    if (str[i] == ',')
> +      argnum++;
> +
> +  attr_str = (char *)xmalloc (strlen (str) + 1);

XNEWVEC()

> +  strcpy (attr_str, str);
> +
> +  /* Replace "=,-" with "_".  */
> +  for (i = 0; i < strlen (attr_str); i++)
> +    if (attr_str[i] == '=' || attr_str[i]== '-')
> +      attr_str[i] = '_';
> +
> +  if (argnum == 1)
> +    return attr_str;
> +
> +  args = (char **)xmalloc (argnum * sizeof (char *));

VEC()?

> +  if (DECL_DECLARED_INLINE_P (decl)
> +      && lookup_attribute ("gnu_inline",
> +			   DECL_ATTRIBUTES (decl)))
> +    error_at (DECL_SOURCE_LOCATION (decl),
> +	      "Function versions cannot be marked as gnu_inline,"
> +	      " bodies have to be generated\n");

No newline at the end of the error message.

> +  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
> +  if (dump_file)
> +    fprintf (stderr, "Assembler name set to %s for function version %s\n",
> +	     assembler_name, IDENTIFIER_POINTER (id));

This dumps to stderr instead of dump_file.  Also, use the new dumping 
facility?

> +/* Return a new name by appending SUFFIX to the DECL name.  If
> +   make_unique is true, append the full path name.  */

Full path name of what?

> +
> +static char *
> +make_name (tree decl, const char *suffix, bool make_unique)
> +{
> +  char *global_var_name;
> +  int name_len;
> +  const char *name;
> +  const char *unique_name = NULL;
> +
> +  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
> +
> +  /* Get a unique name that can be used globally without any chances
> +     of collision at link time.  */
> +  if (make_unique)
> +    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
> +
> +  name_len = strlen (name) + strlen (suffix) + 2;
> +
> +  if (make_unique)
> +    name_len += strlen (unique_name) + 1;
> +  global_var_name = (char *) xmalloc (name_len);

XNEWVEC.



Diego.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-10-19 15:23                                                                           ` Diego Novillo
@ 2012-10-20  4:29                                                                             ` Sriraman Tallam
  2012-10-23 21:21                                                                               ` Sriraman Tallam
  2012-10-26 14:11                                                                               ` Diego Novillo
  0 siblings, 2 replies; 93+ messages in thread
From: Sriraman Tallam @ 2012-10-20  4:29 UTC (permalink / raw)
  To: Diego Novillo
  Cc: Jason Merrill, Jan Hubicka, Xinliang David Li, mark, nathan,
	H.J. Lu, Richard Guenther, Uros Bizjak, reply, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 12126 bytes --]

Hi Diego,

   Thanks for the review. I have addressed all your comments.  New
patch attached.

Thanks,
-Sri.

On Fri, Oct 19, 2012 at 8:10 AM, Diego Novillo <dnovillo@google.com> wrote:
> On 2012-10-12 18:19 , Sriraman Tallam wrote:
>
>> When the front-end sees more than one decl for "foo", it calls a target
>> hook to
>> determine if they are versions. To prevent duplicate definition errors
>> with other
>>  versions of "foo", "decls_match" function in cp/decl.c is made to return
>> false
>>  when 2 decls have are deemed versions by the target. This will make all
>> function
>>
>> versions of "foo" to be added to the overload list of "foo".
>
>
> So, this means that this can only work for C++, right?  Or could the same
> trickery be done some other way in other FEs?
>
> I see no handling of different FEs.  If the user tries to use these
> attributes from languages other than C++, we should emit a diagnostic.

Yes, the support is only for C++ for now. "target" attribute is not
new and if the user tries to use this with 'C' then a duplicate
defintion error would occur just like now.
I have plans to implement this for C too.

>
>> +@deftypefn {Target Hook} tree TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
>> (void *@var{arglist})
>> +This hook is used to get the dispatcher function for a set of function
>> +versions.  The dispatcher function is called to invoke the rignt function
>
>
> s/rignt/right/
>
>> +version at run-time. @var{arglist} is the vector of function versions
>> +that should be considered for dispatch.
>> +@end deftypefn
>> +
>> +@deftypefn {Target Hook} tree TARGET_GENERATE_VERSION_DISPATCHER_BODY
>> (void *@var{arg})
>> +This hook is used to generate the dispatcher logic to invoke the right
>> +function version at runtime for a given set of function versions.
>
>
> s/runtime/run-time/
>
>> +@hook TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
>> +This hook is used to get the dispatcher function for a set of function
>> +versions.  The dispatcher function is called to invoke the rignt function
>
>
> s/rignt/right/
>
>> +version at run-time. @var{arglist} is the vector of function versions
>> +that should be considered for dispatch.
>> +@end deftypefn
>> +
>> +@hook TARGET_GENERATE_VERSION_DISPATCHER_BODY
>> +This hook is used to generate the dispatcher logic to invoke the right
>> +function version at runtime for a given set of function versions.
>
>
> s/runtime/run-time/
>
>> @@ -288,7 +289,6 @@ mark_store (gimple stmt, tree t, void *data)
>>       }
>>    return false;
>>  }
>> -
>>  /* Create cgraph edges for function calls.
>>     Also look for functions and variables having addresses taken.  */
>
>
> Don't remove vertical white space, please.
>
>> +               {
>> +                 struct cgraph_node *callee = cgraph_get_create_node
>> (decl);
>> +                 /* If a call to a multiversioned function dispatcher is
>> +                    found, generate the body to dispatch the right
>> function
>> +                    at run-time.  */
>> +                 if (callee->dispatcher_function)
>> +                   {
>> +                     tree resolver_decl;
>> +                     gcc_assert (callee->function_version.next);
>
>
> What if callee is the last version in the list?  Not sure what you are
> trying to check here.

So, callee here is the dispatcher function and it points to the set of
semantically identical function versions. At this point, the
dispatcher (callee) should have all the function versions chained in
function_version, which is what the assert is checking.

>
>
>> @@ -8601,9 +8601,22 @@ handle_target_attribute (tree *node, tree name, tr
>>        warning (OPT_Wattributes, "%qE attribute ignored", name);
>>        *no_add_attrs = true;
>>      }
>> -  else if (! targetm.target_option.valid_attribute_p (*node, name, args,
>> -                                                     flags))
>> -    *no_add_attrs = true;
>> +  else
>> +    {
>> +      /* When a target attribute is invalid, it may also be because the
>> +        target for the compilation unit and the attribute match.  For
>> +         instance, target attribute "xxx" is invalid when -mxxx is used.
>> +         When used with multiversioning, removing the attribute will lead
>> +         to duplicate definitions if a default version is provided.
>> +        So, generate a warning here and remove the attribute.  */
>> +      if (!targetm.target_option.valid_attribute_p (*node, name, args,
>> flags))
>> +       {
>> +         warning (OPT_Wattributes,
>> +                  "Invalid target attribute in function %qE, ignored.",
>> +                  *node);
>> +         *no_add_attrs = true;
>
>
> If you do this, isn't the compiler going to generate two warning messages?
> One for the invalid target attribute, the second for the duplicate
> definition.

This will be a warning and the duplicate definition would be an error.
The warning would help the user understand why this error occurred.
Example:

ver.cc
int __attribute__((target("popcnt")))
bar (bool a)
{
  return 0;
}

int
bar (bool a)
{
  return 1;
}

$ g++ -mpopcnt  ver.cc

ver.cc:2:12: warning: Invalid target attribute in function ‘bar’,
ignored. [-Wattributes]
 bar (bool a)
            ^
ver.cc: In function ‘int bar(bool)’:
ver.cc:7:1: error: redefinition of ‘int bar(bool)’
 bar (bool a)
 ^
ver.cc:2:1: error: ‘int bar(bool)’ previously defined here
 bar (bool a)

When compiled with -mpopcnt, the new version does not differ from the default.
Now, the warning makes it  clear why the redefinition error occurred.



>
>> @@ -228,6 +228,26 @@ struct GTY(()) cgraph_node {
>>    struct cgraph_node *prev_sibling_clone;
>>    struct cgraph_node *clones;
>>    struct cgraph_node *clone_of;
>> +
>> +  /* Function Multiversioning info.  */
>> +  struct {
>>
>> +    /* Chains all the semantically identical function versions.  The
>> +       first function in this chain is the default function.  */
>> +    struct cgraph_node *prev;
>> +    /* If this node is a dispatcher for function versions, this points
>> +       to the default function version, the first function in the chain.
>> */
>> +    struct cgraph_node *next;
>
>
> Why not a VEC of function decls?  Seems easier to manage and less size
> overhead.

I have solved the size overhead by moving function_version_info
outside cgraph. I think it is better to chain the decls as it is very
easy to traverse the list of semantically identical versions from any
given function version.

>
>
>> @@ -3516,8 +3522,8 @@ struct GTY(()) tree_function_decl {
>>
>>    unsigned looping_const_or_pure_flag : 1;
>>    unsigned has_debug_args_flag : 1;
>>    unsigned tm_clone_flag : 1;
>> -
>> -  /* 1 bit left */
>> +  unsigned versioned_function : 1;
>> +  /* No bits left.  */
>
>
> You ate the last bit!  How rude ;)

I should get the patch in before somebody else really eats it ;-)

>
>> @@ -8132,6 +8176,38 @@ joust (struct z_candidate *cand1, struct z_candida
>>        && (IS_TYPE_OR_DECL_P (cand1->fn)))
>>      return 1;
>>
>> +  /* For Candidates of a multi-versioned function,  make the version with
>
>
> s/Candidates/candidates/
>
>
>> +  old_current_function_decl = current_function_decl;
>> +  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
>> +  current_function_decl = function_decl;
>
>
> push_cfun will set current_function_decl for you.  No need to keep track of
> old_current_function_decl.
>
>> +  enum feature_priority
>> +  {
>> +    P_ZERO = 0,
>> +    P_MMX,
>> +    P_SSE,
>> +    P_SSE2,
>> +    P_SSE3,
>> +    P_SSSE3,
>> +    P_PROC_SSSE3,
>> +    P_SSE4_a,
>> +    P_PROC_SSE4_a,
>> +    P_SSE4_1,
>> +    P_SSE4_2,
>> +    P_PROC_SSE4_2,
>> +    P_POPCNT,
>> +    P_AVX,
>> +    P_AVX2,
>> +    P_FMA,
>> +    P_PROC_FMA
>> +  };
>
>
> There's no need to have this list dynamically defined, right?

I dont understand, why expose the enum outside the function?

>
>> +       }
>> +    }
>> +
>> +  /* Process feature name.  */
>> +  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
>
>
> XNEWVEC(char, strlen (attrs_str) + 1);
>
>
>> +  /* Atleast one more version other than the default.  */
>
>
> s/Atleast/At least/
>
>> +  num_versions = VEC_length (tree, fndecls);
>> +  gcc_assert (num_versions >= 2);
>> +
>> +  function_version_info = (struct _function_version_info *)
>> +    xmalloc ((num_versions - 1) * sizeof (struct
>> _function_version_info));
>
>
> Better use VEC() here.
>
>
>> +
>> +  /* The first version in the vector is the default decl.  */
>> +  default_decl = VEC_index (tree, fndecls, 0);
>> +
>> +  old_current_function_decl = current_function_decl;
>> +  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
>> +  current_function_decl = dispatch_decl;
>
>
> No need to set current_function_decl.
>
>> +
>> +  gseq = bb_seq (*empty_bb);
>> +  /* Function version dispatch is via IFUNC.  IFUNC resolvers fire before
>> +     constructors, so explicity call __builtin_cpu_init here.  */
>>
>> +  ifunc_cpu_init_stmt = gimple_build_call_vec (
>> +                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
>> +  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
>> +  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
>> +  set_bb_seq (*empty_bb, gseq);
>> +
>> +  pop_cfun ();
>> +  current_function_decl = old_current_function_decl;
>
>
> Likewise here.
>
>> +/* This function returns true if fn1 and fn2 are versions of the same
>> function.
>> +   Returns false if only one of the function decls has the target
>> attribute
>> +   set or if the targets of the function decls are different.  This
>> assumes
>> +   the fn1 and fn2 have the same signature.  */
>
>
> Mention the arguments in capitals.
>
>
>> +  for (i = 0; i < strlen (str); i++)
>> +    if (str[i] == ',')
>> +      argnum++;
>> +
>> +  attr_str = (char *)xmalloc (strlen (str) + 1);
>
>
> XNEWVEC()
>
>> +  strcpy (attr_str, str);
>> +
>> +  /* Replace "=,-" with "_".  */
>>
>> +  for (i = 0; i < strlen (attr_str); i++)
>> +    if (attr_str[i] == '=' || attr_str[i]== '-')
>>
>> +      attr_str[i] = '_';
>> +
>> +  if (argnum == 1)
>> +    return attr_str;
>> +
>> +  args = (char **)xmalloc (argnum * sizeof (char *));
>
>
> VEC()?
>
>> +  if (DECL_DECLARED_INLINE_P (decl)
>> +      && lookup_attribute ("gnu_inline",
>>
>> +                          DECL_ATTRIBUTES (decl)))
>> +    error_at (DECL_SOURCE_LOCATION (decl),
>> +             "Function versions cannot be marked as gnu_inline,"
>> +             " bodies have to be generated\n");
>
>
> No newline at the end of the error message.
>
>> +  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
>> +  if (dump_file)
>> +    fprintf (stderr, "Assembler name set to %s for function version
>> %s\n",
>> +            assembler_name, IDENTIFIER_POINTER (id));
>
>
> This dumps to stderr instead of dump_file.  Also, use the new dumping
> facility?
>
>
>> +/* Return a new name by appending SUFFIX to the DECL name.  If
>> +   make_unique is true, append the full path name.  */
>
>
> Full path name of what?
>
>
>> +
>> +static char *
>> +make_name (tree decl, const char *suffix, bool make_unique)
>> +{
>> +  char *global_var_name;
>> +  int name_len;
>> +  const char *name;
>> +  const char *unique_name = NULL;
>> +
>> +  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
>> +
>> +  /* Get a unique name that can be used globally without any chances
>> +     of collision at link time.  */
>> +  if (make_unique)
>> +    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
>> +
>> +  name_len = strlen (name) + strlen (suffix) + 2;
>> +
>> +  if (make_unique)
>> +    name_len += strlen (unique_name) + 1;
>> +  global_var_name = (char *) xmalloc (name_len);
>
>
> XNEWVEC.
>
>
>
> Diego.

[-- Attachment #2: mv_fe_patch_10192012.txt --]
[-- Type: text/plain, Size: 75828 bytes --]

Overview of the patch which adds support to specify function versions.  This is
only enabled for target i386.

Example:

int foo ();  /* Default version */
int foo () __attribute__ ((target("avx,popcnt")));/*Specialized for avx and popcnt */
int foo () __attribute__ ((target("arch=core2,ssse3")));/*Specialized for core2 and ssse3*/

int main ()
{
 int (*p)() = &foo;
 return foo () + (*p)();
}

int foo ()
{
 return 0;
}

int __attribute__ ((target("avx,popcnt")))
foo ()
{
 return 0;
}

int __attribute__ ((target("arch=core2,ssse3")))
foo ()
{
 return 0;
}

The above example has foo defined 3 times, but all 3 definitions of foo are
different versions of the same function. The call to foo in main, directly and
via a pointer, are calls to the multi-versioned function foo which is dispatched
to the right foo at run-time.

Front-end changes:

The front-end changes are calls at appropriate places to target hooks that
determine the following:

* Determine if two function decls with the same signature are versions.
* Determine the assembler name of a function version.
* Generate the dispatcher function for a set of function versions.
* Compare versions to see if one has a higher priority over the other.

All the implementation happens in the target-specific config/i386/i386.c.

What does the patch do?

* Tracking decls that correspond to function versions of function
name, say "foo":

When the front-end sees more than one decl for "foo", it calls a target hook to
determine if they are versions. To prevent duplicate definition errors with other
 versions of "foo", "decls_match" function in cp/decl.c is made to return false
 when 2 decls have are deemed versions by the target. This will make all function
versions of "foo" to be added to the overload list of "foo".

* Change the assembler names of the function versions.

For i386, the target changes the assembler names of the function versions by
 suffixing the sorted list of args to "target" to the function name of "foo". For
example, he assembler name of "void foo () __attribute__ ((target ("sse4")))" will
become _Z3foov.sse4.  The target hook mangle_decl_assembler_name is used for this.

* Overload resolution:

 Function "build_over_call" in cp/call.c sees a call to function
"foo", which is multi-versioned. The overload resolution happens in
function "joust" in "cp/call.c". Here, the call to "foo" has all
possible versions of "foo" as candidates. All the candidates of "foo" are
stored in the cgraph data structures. Each version of foo is chained in a 
doubly-linked list with the default function as the first element.  This allows
any pass to access all the semantically identical versions. A call to a
multi-versioned function will be replaced by a call to a dispatcher function,
determined by a target hook, to execute the right function version at run-time.

Optimization to directly call a version when possible:
Also, in joust, where overload resolution happens, a multiversioned function
resolution is made to return the most specialized version.  This is the version
that will be checked for dispatching first and is determined by the target.
Now, if the caller can inline this function version then a direct call is made
to this function version rather than go through the dispatcher. When a direct
call cannot be made, a call to the dispatcher function is created.

* Creating the dispatcher body.

The dispatcher body, called the resolver is made only when there is a call to a
multiversioned function dispatcher or the address of a function is taken. This
is generated during build_cgraph_edges for a call or cgraph_mark_address_taken
for a pointer reference. This is done by another target hook.

* Dispatch ordering.

The order in which the function versions are checked during dispatch is based
on a priority value assigned for the ISA that is catered. More specialized 
versions are checked for dispatching first.  This is to mitigate the ambiguity
that can arise when more than one function version is valid for execution on
a particular platform.  This is not a perfect solution and in future, the user
should be allowed to assign a dispatching priority value to each version.

	* doc/tm.texi.in (TARGET_OPTION_FUNCTION_VERSIONS): New hook description.
	* (TARGET_COMPARE_VERSION_PRIORITY): New hook description.
	* (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New hook description.
	* (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New hook description.
	* doc/tm.texi: Regenerate.
	* cgraphbuild.c (build_cgraph_edges): Generate body of multiversion
	function dispatcher.
	* c-family/c-common.c (handle_target_attribute): Warn for invalid attributes.
	* target.def (compare_version_priority): New target hook.
	* (generate_version_dispatcher_body): New target hook.
	* (get_function_versions_dispatcher): New target hook.
	* (function_versions): New target hook.
	* cgraph.c (cgraph_mark_address_taken_node): Generate body of multiversion
	function dispatcher.
	* cgraph.h (cgraph_node): New members function_version,
	dispatcher_function.
	(cgraph_function_version_info): New struct.
	* tree.h (DECL_FUNCTION_VERSIONED): New macro.
	(tree_function_decl): New bit-field versioned_function.
	* testsuite/g++.dg/mv1.C: New test.
	* testsuite/g++.dg/mv2.C: New test.
	* testsuite/g++.dg/mv3.C: New test.
	* testsuite/g++.dg/mv4.C: New test.
	* cp/class.c:
	(add_method): Change assembler names of function versions.
	(resolve_address_of_overloaded_function): Save all function
	version candidates. Create dispatcher decl and return address of
	dispatcher instead.
	* cp/decl.c (decls_match): Make decls unmatched for versioned
	functions.
	(duplicate_decls): Remove ambiguity for versioned functions. 
	(cxx_comdat_group): Make comdat group of versioned functions be the
	same.
	* cp/error.c (dump_exception_spec): Dump assembler name for function
	versions.
	* cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
	functions that are also marked inline.
	* cp/decl2.c:(check_classfn): Check attributes of versioned functions
	for match.
	* cp/call.c: (build_new_function_call): Check if versioned functions
	have a default version.
	(build_over_call): Make calls to multiversioned functions
	to call the dispatcher.
	(joust): For calls to multi-versioned functions, make the most
	specialized function version win.
	(tourney): Generate dispatcher decl for function versions.
	(get_function_version_dispatcher): New function.
	(generate_function_versions_dispatcher): New function.
	* Makefile.in: Add multiversion.o
	* config/i386/i386.c (add_condition_to_bb): New function.
	(get_builtin_code_for_version): New function.
	(ix86_compare_version_priority): New function.
	(feature_compare): New function.
	(dispatch_function_versions): New function.
	* (ix86_function_versions): New function.
	* (attr_strcmp): New function.
	* (sorted_attr_string): New function.
	* (ix86_mangle_function_version_assembler_name): New function.
	* (ix86_mangle_decl_assembler_name): New function.
	* (make_name): New function.
	* (make_dispatcher_decl): New function.
	* (is_function_default_version): New function.
	* (ix86_get_function_versions_dispatcher): New function.
	* (make_attribute): New function.
	* (make_resolver_func): New function.
	* (ix86_generate_version_dispatcher_body): New function.
	* (TARGET_COMPARE_VERSION_PRIORITY): New macro.
	* (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New macro.
	* (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New macro.
	* (TARGET_OPTION_FUNCTION_VERSIONS): New macro.
	* (TARGET_MANGLE_DECL_ASSEMBLER_NAME): New macro.


Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 192623)
+++ gcc/doc/tm.texi	(working copy)
@@ -9913,6 +9913,11 @@ changed via the optimize attribute or pragma, see
 @code{TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE}
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_OPTION_FUNCTION_VERSIONS (tree @var{decl1}, tree @var{decl2})
+This target hook returns @code{true} if @var{FN1} and @var{FN2} are
+versions of the same function.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_CAN_INLINE_P (tree @var{caller}, tree @var{callee})
 This target hook returns @code{false} if the @var{caller} function
 cannot inline @var{callee}, based on target specific information.  By
@@ -10930,6 +10935,27 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_COMPARE_VERSION_PRIORITY (tree @var{decl1}, tree @var{decl2})
+This hook is used to compare the target attributes in two functions to
+figure out which function's features get higher priority.  This is used
+during multi-versioning to figure out the order of dispatching. @var{decl1}
+and @var{decl2} are the two function decls that will be compared.
+@end deftypefn
+
+@deftypefn {Target Hook} tree TARGET_GET_FUNCTION_VERSIONS_DISPATCHER (void *@var{arglist})
+This hook is used to get the dispatcher function for a set of function
+versions.  The dispatcher function is called to invoke the right function
+version at run-time. @var{arglist} is the vector of function versions
+that should be considered for dispatch.
+@end deftypefn
+
+@deftypefn {Target Hook} tree TARGET_GENERATE_VERSION_DISPATCHER_BODY (void *@var{arg})
+This hook is used to generate the dispatcher logic to invoke the right
+function version at run-time for a given set of function versions.
+@var{arg} points to the callgraph node of the dispatcher function whose
+body must be generated.
+@end deftypefn
+
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 192623)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -9782,6 +9782,11 @@ changed via the optimize attribute or pragma, see
 @code{TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE}
 @end deftypefn
 
+@hook TARGET_OPTION_FUNCTION_VERSIONS
+This target hook returns @code{true} if @var{FN1} and @var{FN2} are
+versions of the same function.
+@end deftypefn
+
 @hook TARGET_CAN_INLINE_P
 This target hook returns @code{false} if the @var{caller} function
 cannot inline @var{callee}, based on target specific information.  By
@@ -10788,6 +10793,27 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@hook TARGET_COMPARE_VERSION_PRIORITY
+This hook is used to compare the target attributes in two functions to
+figure out which function's features get higher priority.  This is used
+during multi-versioning to figure out the order of dispatching. @var{decl1}
+and @var{decl2} are the two function decls that will be compared.
+@end deftypefn
+
+@hook TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+This hook is used to get the dispatcher function for a set of function
+versions.  The dispatcher function is called to invoke the right function
+version at run-time. @var{arglist} is the vector of function versions
+that should be considered for dispatch.
+@end deftypefn
+
+@hook TARGET_GENERATE_VERSION_DISPATCHER_BODY
+This hook is used to generate the dispatcher logic to invoke the right
+function version at run-time for a given set of function versions.
+@var{arg} points to the callgraph node of the dispatcher function whose
+body must be generated.
+@end deftypefn
+
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/cgraphbuild.c
===================================================================
--- gcc/cgraphbuild.c	(revision 192623)
+++ gcc/cgraphbuild.c	(working copy)
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-utils.h"
 #include "except.h"
 #include "ipa-inline.h"
+#include "target.h"
 
 /* Context of record_reference.  */
 struct record_reference_ctx
@@ -317,8 +318,23 @@ build_cgraph_edges (void)
 							 bb);
 	      decl = gimple_call_fndecl (stmt);
 	      if (decl)
-		cgraph_create_edge (node, cgraph_get_create_node (decl),
-				    stmt, bb->count, freq);
+		{
+		  struct cgraph_node *callee = cgraph_get_create_node (decl);
+	          /* If a call to a multiversioned function dispatcher is
+		     found, generate the body to dispatch the right function
+		     at run-time.  */
+		  if (callee->dispatcher_function)
+		    {
+		      tree resolver_decl;
+		      gcc_assert (callee->function_version
+				  && callee->function_version->next);
+		      gcc_assert (targetm.generate_version_dispatcher_body);
+		      resolver_decl
+			 = targetm.generate_version_dispatcher_body (callee);
+		      gcc_assert (resolver_decl != NULL_TREE);
+		    }
+		  cgraph_create_edge (node, callee, stmt, bb->count, freq);
+	        }
 	      else
 		cgraph_create_indirect_edge (node, stmt,
 					     gimple_call_flags (stmt),
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 192623)
+++ gcc/c-family/c-common.c	(working copy)
@@ -8737,9 +8737,22 @@ handle_target_attribute (tree *node, tree name, tr
       warning (OPT_Wattributes, "%qE attribute ignored", name);
       *no_add_attrs = true;
     }
-  else if (! targetm.target_option.valid_attribute_p (*node, name, args,
-						      flags))
-    *no_add_attrs = true;
+  else
+    {
+      /* When a target attribute is invalid, it may also be because the
+	 target for the compilation unit and the attribute match.  For
+         instance, target attribute "xxx" is invalid when -mxxx is used.
+         When used with multiversioning, removing the attribute will lead
+         to duplicate definitions if a default version is provided.
+	 So, generate a warning here and remove the attribute.  */
+      if (!targetm.target_option.valid_attribute_p (*node, name, args, flags))
+	{
+	  warning (OPT_Wattributes,
+		   "Invalid target attribute in function %qE, ignored.",
+		   *node);
+	  *no_add_attrs = true;
+	}
+    }
 
   return NULL_TREE;
 }
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 192623)
+++ gcc/target.def	(working copy)
@@ -1298,6 +1298,31 @@ DEFHOOK
  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
  hook_tree_tree_int_treep_bool_null)
 
+/* Target hook to compare the priority of target features in function DECL1
+   and DECL2.  It returns positive value if DECL1 is higher priority,
+   negative value if DECL2 is higher priority and 0 if they are the
+   same.  */
+DEFHOOK
+(compare_version_priority,
+ "",
+ int, (tree decl1, tree decl2), NULL)
+
+/* Target hook to generate the dispatcher body for a function version
+   dispatcher ARG, which is a cgraph_node pointer.  */
+
+DEFHOOK
+(generate_version_dispatcher_body,
+ "",
+ tree, (void *arg), NULL) 
+
+/* Target hook to generate a function version dispatcher DECL for the list
+   of function versions in arglist, which is a vector of decls.  */
+
+DEFHOOK
+(get_function_versions_dispatcher,
+ "",
+ tree, (void *arglist), NULL)
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 DEFHOOK
@@ -2725,6 +2750,14 @@ DEFHOOK
  void, (void),
  hook_void_void)
 
+/* Returns true if DECL1 and DECL2 are versions of the same function.  */
+
+DEFHOOK
+(function_versions,
+ "",
+ bool, (tree decl1, tree decl2),
+ hook_bool_tree_tree_false)
+
 /* Function to determine if one function can inline another function.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_"
Index: gcc/cgraph.c
===================================================================
--- gcc/cgraph.c	(revision 192623)
+++ gcc/cgraph.c	(working copy)
@@ -1277,6 +1277,16 @@ cgraph_mark_address_taken_node (struct cgraph_node
   node->symbol.address_taken = 1;
   node = cgraph_function_or_thunk_node (node, NULL);
   node->symbol.address_taken = 1;
+  /* If the address of a multiversioned function dispatcher is taken,
+     generate the body to dispatch the right function at run-time.  This
+     is needed as the address can be used to do an indirect call.  */
+  if (node->dispatcher_function)
+    {
+      gcc_assert (node->function_version
+		  && node->function_version->next);
+      gcc_assert (targetm.generate_version_dispatcher_body);
+      targetm.generate_version_dispatcher_body (node);
+    }
 }
 
 /* Return local info for the compiled function.  */
Index: gcc/cgraph.h
===================================================================
--- gcc/cgraph.h	(revision 192623)
+++ gcc/cgraph.h	(working copy)
@@ -200,6 +200,12 @@ struct GTY(()) cgraph_clone_info
   bitmap combined_args_to_skip;
 };
 
+struct GTY(()) cgraph_function_version_info
+{
+  struct cgraph_node *next;
+  struct cgraph_node *prev;
+  tree dispatcher_resolver;
+};
 
 /* The cgraph data structure.
    Each function decl has assigned cgraph_node listing callees and callers.  */
@@ -228,6 +234,10 @@ struct GTY(()) cgraph_node {
   struct cgraph_node *prev_sibling_clone;
   struct cgraph_node *clones;
   struct cgraph_node *clone_of;
+
+  /* Function Multiversioning Info.  */
+  struct cgraph_function_version_info *function_version;
+  
   /* For functions with many calls sites it holds map from call expression
      to the edge to speed up cgraph_edge function.  */
   htab_t GTY((param_is (struct cgraph_edge))) call_site_hash;
@@ -279,6 +289,8 @@ struct GTY(()) cgraph_node {
   /* ?? We should be able to remove this.  We have enough bits in
      cgraph to calculate it.  */
   unsigned tm_clone : 1;
+  /* True if this decl is a dispatcher for function versions.  */
+  unsigned dispatcher_function : 1;
 };
 
 DEF_VEC_P(symtab_node);
@@ -656,6 +668,7 @@ void cgraph_rebuild_references (void);
 int compute_call_stmt_bb_frequency (tree, basic_block bb);
 void record_references_in_initializer (tree, bool);
 
+
 /* In ipa.c  */
 bool symtab_remove_unreachable_nodes (bool, FILE *);
 cgraph_node_set cgraph_node_set_new (void);
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 192623)
+++ gcc/tree.h	(working copy)
@@ -3476,6 +3476,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
 #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
 
+/* In FUNCTION_DECL, this is set if this function has other versions generated
+   using "target" attributes.  The default version is the one which does not
+   have any "target" attribute set. */
+#define DECL_FUNCTION_VERSIONED(NODE)\
+   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
+
 /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
    arguments/result/saved_tree fields by front ends.   It was either inherit
    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
@@ -3520,8 +3526,8 @@ struct GTY(()) tree_function_decl {
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
-
-  /* 1 bit left */
+  unsigned versioned_function : 1;
+  /* No bits left.  */
 };
 
 /* The source language of the translation-unit.  */
Index: gcc/testsuite/g++.dg/mv2.C
===================================================================
--- gcc/testsuite/g++.dg/mv2.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv2.C	(revision 0)
@@ -0,0 +1,119 @@
+/* Test case to check if Multiversioning chooses the correct
+   dispatching order when versions are for various ISAs.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -mno-sse -mno-mmx -mno-popcnt -mno-avx" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The dispatch checks should be in the exact reverse order of the
+   declarations below.  */
+int foo () __attribute__ ((target ("mmx")));
+int foo () __attribute__ ((target ("sse")));
+int foo () __attribute__ ((target ("sse2")));
+int foo () __attribute__ ((target ("sse3")));
+int foo () __attribute__ ((target ("ssse3")));
+int foo () __attribute__ ((target ("sse4.1")));
+int foo () __attribute__ ((target ("sse4.2")));
+int foo () __attribute__ ((target ("popcnt")));
+int foo () __attribute__ ((target ("avx")));
+int foo () __attribute__ ((target ("avx2")));
+
+int main ()
+{
+
+  int val = foo ();
+
+  if (__builtin_cpu_supports ("avx2"))
+    assert (val == 1);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("popcnt"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("sse4.2"))
+    assert (val == 4);
+  else if (__builtin_cpu_supports ("sse4.1"))
+    assert (val == 5);
+  else if (__builtin_cpu_supports ("ssse3"))
+    assert (val == 6);
+  else if (__builtin_cpu_supports ("sse3"))
+    assert (val == 7);
+  else if (__builtin_cpu_supports ("sse2"))
+    assert (val == 8);
+  else if (__builtin_cpu_supports ("sse"))
+    assert (val == 9);
+  else if (__builtin_cpu_supports ("mmx"))
+    assert (val == 10);
+  else
+    assert (val == 0);
+
+  return 0;
+}
+
+int
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 2;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return 1;
+}
Index: gcc/testsuite/g++.dg/mv4.C
===================================================================
--- gcc/testsuite/g++.dg/mv4.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv4.C	(revision 0)
@@ -0,0 +1,23 @@
+/* Test case to check if the compiler generates an error message
+   when the default version of a multiversioned function is absent
+   and its pointer is taken.  */
+
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -mno-sse -mno-popcnt" } */
+
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int main ()
+{
+  int (*p)() = &foo; /* { dg-error "Pointer to a multiversioned function without a default is not allowed" {} } */
+  return (*p)();
+}
Index: gcc/testsuite/g++.dg/mv1.C
===================================================================
--- gcc/testsuite/g++.dg/mv1.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv1.C	(revision 0)
@@ -0,0 +1,130 @@
+/* Test case to check if Multiversioning works.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -fPIC -mno-avx -mno-popcnt" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The other versions of foo.  Mix up the ordering and 
+   check if the dispatching does it in the order of priority. */
+/* Check combination of target attributes.  */
+int foo () __attribute__ ((target("arch=corei7,popcnt")));
+/* The target operands in this declaration and the definition are re-ordered.
+   This should still work.  */
+int foo () __attribute__ ((target("ssse3,avx2")));
+
+/* Check for all target attributes for which dispatchers are available.  */
+/* Check arch= */
+int foo () __attribute__((target("arch=core2")));
+int foo () __attribute__((target("arch=corei7")));
+int foo () __attribute__((target("arch=atom")));
+/* Check ISAs  */
+int foo () __attribute__((target("avx")));
+int foo () __attribute__ ((target("arch=core2,sse4.2")));
+/* Check more arch=.  */
+int foo () __attribute__((target("arch=amdfam10")));
+int foo () __attribute__((target("arch=bdver1")));
+int foo () __attribute__((target("arch=bdver2")));
+
+int (*p)() = &foo;
+int main ()
+{
+  int val = foo ();
+  assert (val ==  (*p)());
+
+  /* Check in the exact same order in which the dispatching
+     is expected to happen.  */
+  if (__builtin_cpu_is ("bdver1"))
+    assert (val == 1);
+  else if (__builtin_cpu_is ("bdver2"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("avx2")
+	   && __builtin_cpu_supports ("ssse3"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 4);
+  else if (__builtin_cpu_is ("corei7")
+	   && __builtin_cpu_supports ("popcnt"))
+    assert (val == 5);
+  else if (__builtin_cpu_is ("corei7"))
+    assert (val == 6);
+  else if (__builtin_cpu_is ("amdfam10h"))
+    assert (val == 7);
+  else if (__builtin_cpu_is ("core2")
+	   && __builtin_cpu_supports ("sse4.2"))
+    assert (val == 8);
+  else if (__builtin_cpu_is ("core2"))
+    assert (val == 9);
+  else if (__builtin_cpu_is ("atom"))
+    assert (val == 10);
+  else
+    assert (val == 0);
+  
+  return 0;
+}
+
+int foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=corei7,popcnt")))
+foo ()
+{
+  return 5;
+}
+int __attribute__ ((target("avx2,ssse3")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("arch=core2")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("arch=corei7")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("arch=atom")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("arch=core2,sse4.2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("arch=amdfam10")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("arch=bdver1")))
+foo ()
+{
+  return 1;
+}
+
+int __attribute__ ((target("arch=bdver2")))
+foo ()
+{
+  return 2;
+}
Index: gcc/testsuite/g++.dg/mv3.C
===================================================================
--- gcc/testsuite/g++.dg/mv3.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv3.C	(revision 0)
@@ -0,0 +1,37 @@
+/* Test case to check if a call to a multiversioned function
+   is replaced with a direct call to the particular version when
+   the most specialized version's target attributes match the
+   caller.  
+  
+   In this program, foo is multiversioned but there is no default
+   function.  This is an error if the call has to go through a
+   dispatcher.  However, the call to foo in bar can be replaced
+   with a direct call to the popcnt version of foo.  Hence, this
+   test should pass.  */
+
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -mno-sse -mno-popcnt" } */
+
+
+/* Default version.  */
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target ("popcnt")))
+bar ()
+{
+  return foo ();
+}
+
+int main ()
+{
+  return bar ();
+}
Index: gcc/cp/class.c
===================================================================
--- gcc/cp/class.c	(revision 192623)
+++ gcc/cp/class.c	(working copy)
@@ -1087,6 +1087,31 @@ add_method (tree type, tree method, tree using_dec
 	      || same_type_p (TREE_TYPE (fn_type),
 			      TREE_TYPE (method_type))))
 	{
+	  /* For function versions, their parms and types match
+	     but they are not duplicates.  Record function versions
+	     as and when they are found.  */
+	  if (TREE_CODE (fn) == FUNCTION_DECL
+	      && TREE_CODE (method) == FUNCTION_DECL
+	      && (DECL_FUNCTION_SPECIFIC_TARGET (fn)
+		  || DECL_FUNCTION_SPECIFIC_TARGET (method))
+	      && targetm.target_option.function_versions (fn, method))
+ 	    {
+	      /* Mark functions as versions if necessary.  Modify the mangled
+		 decl name if necessary.  */
+	      if (!DECL_FUNCTION_VERSIONED (fn))
+		{
+		  DECL_FUNCTION_VERSIONED (fn) = 1;
+		  if (DECL_ASSEMBLER_NAME_SET_P (fn))
+		    mangle_decl (fn);
+		}
+	      if (!DECL_FUNCTION_VERSIONED (method))
+		{
+		  DECL_FUNCTION_VERSIONED (method) = 1;
+		  if (DECL_ASSEMBLER_NAME_SET_P (method))
+		    mangle_decl (method);
+		}
+	      continue;
+	    }
 	  if (DECL_INHERITED_CTOR_BASE (method))
 	    {
 	      if (DECL_INHERITED_CTOR_BASE (fn))
@@ -6995,6 +7020,7 @@ resolve_address_of_overloaded_function (tree targe
   tree matches = NULL_TREE;
   tree fn;
   tree target_fn_type;
+  VEC (tree, heap) *fn_ver_vec = NULL;
 
   /* By the time we get here, we should be seeing only real
      pointer-to-member types, not the internal POINTER_TYPE to
@@ -7059,9 +7085,19 @@ resolve_address_of_overloaded_function (tree targe
 	  if (DECL_ANTICIPATED (fn))
 	    continue;
 
-	  /* See if there's a match.  */
+	  /* See if there's a match.   For functions that are multi-versioned,
+	     all the versions match.  */
 	  if (same_type_p (target_fn_type, static_fn_type (fn)))
-	    matches = tree_cons (fn, NULL_TREE, matches);
+	    {
+	      matches = tree_cons (fn, NULL_TREE, matches);
+	      /*If versioned, push all possible versions into a vector.  */
+	      if (DECL_FUNCTION_VERSIONED (fn))
+		{
+		  if (fn_ver_vec == NULL)
+		   fn_ver_vec = VEC_alloc (tree, heap, 2);
+		  VEC_safe_push (tree, heap, fn_ver_vec, fn); 
+		}
+	    }
 	}
     }
 
@@ -7149,13 +7185,26 @@ resolve_address_of_overloaded_function (tree targe
     {
       /* There were too many matches.  First check if they're all
 	 the same function.  */
-      tree match;
+      tree match = NULL_TREE;
 
       fn = TREE_PURPOSE (matches);
-      for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
-	if (!decls_match (fn, TREE_PURPOSE (match)))
-	  break;
 
+      /* For multi-versioned functions, more than one match is just fine.
+	 Call decls_match to make sure they are different because they are
+	 versioned.  */
+      if (DECL_FUNCTION_VERSIONED (fn))
+	{
+          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+  	    if (decls_match (fn, TREE_PURPOSE (match)))
+	      break;
+	}
+      else
+	{
+          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+  	    if (!decls_match (fn, TREE_PURPOSE (match)))
+	      break;
+	}
+
       if (match)
 	{
 	  if (flags & tf_error)
@@ -7217,6 +7266,28 @@ resolve_address_of_overloaded_function (tree targe
       perform_or_defer_access_check (access_path, fn, fn, flags);
     }
 
+  /* If a pointer to a function that is multi-versioned is requested, the
+     pointer to the dispatcher function is returned instead.  This works
+     well because indirectly calling the function will dispatch the right
+     function version at run-time.  */
+  if (DECL_FUNCTION_VERSIONED (fn))
+    {
+      tree dispatcher_decl = NULL;
+      gcc_assert (fn_ver_vec != NULL);
+      gcc_assert (targetm.get_function_versions_dispatcher);
+      dispatcher_decl = targetm.get_function_versions_dispatcher (fn_ver_vec);
+      if (!dispatcher_decl)
+	{
+	  error_at (input_location, "Pointer to a multiversioned function"
+		    " without a default is not allowed");
+	  return error_mark_node;
+	}
+      retrofit_lang_decl (dispatcher_decl);
+      mark_used (fn);
+      VEC_free (tree, heap, fn_ver_vec);
+      fn = dispatcher_decl;
+    }
+
   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
     return cp_build_addr_expr (fn, flags);
   else
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	(revision 192623)
+++ gcc/cp/decl.c	(working copy)
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "pointer-set.h"
 #include "splay-tree.h"
 #include "plugin.h"
+#include "cgraph.h"
 
 /* Possible cases of bad specifiers type used by bad_specifiers. */
 enum bad_spec_place {
@@ -981,6 +982,29 @@ decls_match (tree newdecl, tree olddecl)
       if (t1 != t2)
 	return 0;
 
+      /* The decls dont match if they correspond to two different versions
+	 of the same function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) 
+	  && targetm.target_option.function_versions (newdecl, olddecl))
+	{
+	  /* Mark functions as versions if necessary.  Modify the mangled decl
+	     name if necessary.  */
+	  if (!DECL_FUNCTION_VERSIONED (newdecl))
+	    {
+	      DECL_FUNCTION_VERSIONED (newdecl) = 1;
+	      if (DECL_ASSEMBLER_NAME_SET_P (newdecl))
+	        mangle_decl (newdecl);
+	    }
+	  if (!DECL_FUNCTION_VERSIONED (olddecl))
+	    {
+	      DECL_FUNCTION_VERSIONED (olddecl) = 1;
+	      if (DECL_ASSEMBLER_NAME_SET_P (olddecl))
+	       mangle_decl (olddecl);
+	    }
+	  return 0;
+	}
+
       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
 	  && ! (DECL_EXTERN_C_P (newdecl)
 		&& DECL_EXTERN_C_P (olddecl)))
@@ -1499,7 +1523,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
 	      error ("previous declaration %q+#D here", olddecl);
 	      return NULL_TREE;
 	    }
-	  else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
+	  /* For function versions, params and types match, but they
+	     are not ambiguous.  */
+	  else if ((!DECL_FUNCTION_VERSIONED (newdecl)
+		    && !DECL_FUNCTION_VERSIONED (olddecl))
+		   && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
 			      TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
 	    {
 	      error ("new declaration %q#D", newdecl);
@@ -2272,6 +2300,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
+  /* If the olddecl is a version, so is the newdecl.  */
+  if (TREE_CODE (newdecl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (olddecl))
+    DECL_FUNCTION_VERSIONED (newdecl) = 1;
+
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
     {
       int function_size;
@@ -14227,7 +14260,11 @@ cxx_comdat_group (tree decl)
 	  else
 	    break;
 	}
-      name = DECL_ASSEMBLER_NAME (decl);
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl))
+	name = DECL_NAME (decl);
+      else
+        name = DECL_ASSEMBLER_NAME (decl);
     }
 
   return name;
Index: gcc/cp/error.c
===================================================================
--- gcc/cp/error.c	(revision 192623)
+++ gcc/cp/error.c	(working copy)
@@ -1541,8 +1541,16 @@ dump_exception_spec (tree t, int flags)
 static void
 dump_function_name (tree t, int flags)
 {
-  tree name = DECL_NAME (t);
+  tree name;
 
+  /* For function versions, use the assembler name as the decl name is
+     the same for all versions.  */
+  if (TREE_CODE (t) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (t))
+    name = DECL_ASSEMBLER_NAME (t);
+  else
+    name = DECL_NAME (t);
+
   /* We can get here with a decl that was synthesized by language-
      independent machinery (e.g. coverage.c) in which case it won't
      have a lang_specific structure attached and DECL_CONSTRUCTOR_P
Index: gcc/cp/semantics.c
===================================================================
--- gcc/cp/semantics.c	(revision 192623)
+++ gcc/cp/semantics.c	(working copy)
@@ -3813,8 +3813,11 @@ expand_or_defer_fn_1 (tree fn)
       /* If the user wants us to keep all inline functions, then mark
 	 this function as needed so that finish_file will make sure to
 	 output it later.  Similarly, all dllexport'd functions must
-	 be emitted; there may be callers in other DLLs.  */
-      if ((flag_keep_inline_functions
+	 be emitted; there may be callers in other DLLs.
+	 Also, mark this function as needed if it is marked inline but
+	 is a multi-versioned function.  */
+      if (((flag_keep_inline_functions
+	    || DECL_FUNCTION_VERSIONED (fn))
 	   && DECL_DECLARED_INLINE_P (fn)
 	   && !DECL_REALLY_EXTERN (fn))
 	  || (flag_keep_inline_dllexport
Index: gcc/cp/decl2.c
===================================================================
--- gcc/cp/decl2.c	(revision 192623)
+++ gcc/cp/decl2.c	(working copy)
@@ -674,9 +674,13 @@ check_classfn (tree ctype, tree function, tree tem
 	  if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
 	    continue;
 
+	  /* While finding a match, same types and params are not enough
+	     if the function is versioned.  Also check version ("target")
+	     attributes.  */
 	  if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
 			   TREE_TYPE (TREE_TYPE (fndecl)))
 	      && compparms (p1, p2)
+	      && !targetm.target_option.function_versions (function, fndecl)
 	      && (!is_template
 		  || comp_template_parms (template_parms,
 					  DECL_TEMPLATE_PARMS (fndecl)))
Index: gcc/cp/call.c
===================================================================
--- gcc/cp/call.c	(revision 192623)
+++ gcc/cp/call.c	(working copy)
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "timevar.h"
+#include "cgraph.h"
 
 /* The various kinds of conversion.  */
 
@@ -6444,6 +6445,35 @@ magic_varargs_p (tree fn)
   return false;
 }
 
+/* Returns the decl of the dispatcher function if FN is a function version.  */
+
+static tree
+get_function_version_dispatcher (tree fn)
+{
+  tree dispatcher_decl = NULL;
+  struct cgraph_node *node = NULL;
+
+  gcc_assert (TREE_CODE (fn) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (fn));
+
+  node = cgraph_get_node (fn);
+
+  if (node != NULL && node->function_version != NULL)
+    dispatcher_decl = node->function_version->dispatcher_resolver;
+  else
+    return NULL;
+
+  if (dispatcher_decl == NULL)
+    {
+      error_at (input_location, "Call to multiversioned function"
+                " without a default is not allowed");
+      return NULL;
+    }
+  retrofit_lang_decl (dispatcher_decl);
+  gcc_assert (dispatcher_decl != NULL);
+  return dispatcher_decl;
+}
+
 /* Subroutine of the various build_*_call functions.  Overload resolution
    has chosen a winning candidate CAND; build up a CALL_EXPR accordingly.
    ARGS is a TREE_LIST of the unconverted arguments to the call.  FLAGS is a
@@ -6896,6 +6926,20 @@ build_over_call (struct z_candidate *cand, int fla
   if (!already_used)
     mark_used (fn);
 
+  /* For calls to a multi-versioned function, overload resolution
+     returns the function with the highest target priority, that is,
+     the version that will checked for dispatching first.  If this
+     version is inlinable, a direct call to this version can be made
+     otherwise the call should go through the dispatcher.  */
+
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && !targetm.target_option.can_inline_p (current_function_decl, fn))
+    {
+      fn = get_function_version_dispatcher (fn);
+      if (fn == NULL)
+	return NULL;
+    }
+
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
     {
       tree t;
@@ -8176,6 +8220,38 @@ joust (struct z_candidate *cand1, struct z_candida
       && (IS_TYPE_OR_DECL_P (cand1->fn)))
     return 1;
 
+  /* For candidates of a multi-versioned function,  make the version with
+     the highest priority win.  This version will be checked for dispatching
+     first.  If this version can be inlined into the caller, the front-end
+     will simply make a direct call to this function.  */
+
+  if (TREE_CODE (cand1->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (cand1->fn)
+      && TREE_CODE (cand2->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (cand2->fn))
+    {
+      tree f1 = TREE_TYPE (cand1->fn);
+      tree f2 = TREE_TYPE (cand2->fn);
+      tree p1 = TYPE_ARG_TYPES (f1);
+      tree p2 = TYPE_ARG_TYPES (f2);
+     
+      /* Check if cand1->fn and cand2->fn are versions of the same function.  It
+         is possible that cand1->fn and cand2->fn are function versions but of
+         different functions.  Check types to see if they are versions of the same
+         function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)))
+	{
+	  /* Always make the version with the higher priority, more
+	     specialized, win.  */
+	  gcc_assert (targetm.compare_version_priority);
+	  if (targetm.compare_version_priority (cand1->fn, cand2->fn) >= 0)
+	    return 1;
+	  else
+	    return -1;
+	}
+    }
+
   /* a viable function F1
      is defined to be a better function than another viable function F2  if
      for  all arguments i, ICSi(F1) is not a worse conversion sequence than
@@ -8496,6 +8572,37 @@ tweak:
   return 0;
 }
 
+/* Function FN is multi-versioned and CANDIDATES contains the list of all
+   overloaded candidates for FN.  This function extracts all functions from
+   CANDIDATES that are function versions of FN and generates a dispatcher
+   function for this multi-versioned function group.  */
+
+static void
+generate_function_versions_dispatcher (tree fn, struct z_candidate *candidates)
+{
+  tree f1 = TREE_TYPE (fn);
+  tree p1 = TYPE_ARG_TYPES (f1);
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  struct z_candidate *ver = candidates;
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+
+  for (;ver; ver = ver->next)
+    {
+      tree f2 = TREE_TYPE (ver->fn);
+      tree p2 = TYPE_ARG_TYPES (f2);
+      /* If this candidate is a version of FN, types must match.  */
+      if (DECL_FUNCTION_VERSIONED (ver->fn)
+          && compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)))
+        VEC_safe_push (tree, heap, fn_ver_vec, ver->fn);
+    }
+
+  gcc_assert (targetm.get_function_versions_dispatcher);
+  targetm.get_function_versions_dispatcher (fn_ver_vec);
+  VEC_free (tree, heap, fn_ver_vec); 
+}
+
 /* Given a list of candidates for overloading, find the best one, if any.
    This algorithm has a worst case of O(2n) (winner is last), and a best
    case of O(n/2) (totally ambiguous); much better than a sorting
@@ -8548,6 +8655,22 @@ tourney (struct z_candidate *candidates, tsubst_fl
 	return NULL;
     }
 
+  /* For multiversioned functions, aggregate all the versions here for
+     generating the dispatcher body later if necessary.  Check to see if
+     the dispatcher is already generated to avoid doing this more than
+     once.  */
+
+  if (TREE_CODE (champ->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (champ->fn))
+    {
+      struct cgraph_node *champ_node = cgraph_get_node (champ->fn);
+      if (champ_node == NULL
+	  || champ_node->function_version == NULL
+	  || champ_node->function_version->dispatcher_resolver == NULL)
+        generate_function_versions_dispatcher (champ->fn, candidates);
+
+    }
+
   return champ;
 }
 
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 192623)
+++ gcc/config/i386/i386.c	(working copy)
@@ -62,6 +62,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "diagnostic.h"
 #include "dumpfile.h"
+#include "tree-pass.h"
+#include "tree-flow.h"
 
 enum upper_128bits_state
 {
@@ -28413,6 +28415,990 @@ ix86_init_mmx_sse_builtins (void)
     }
 }
 
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
+   to return a pointer to VERSION_DECL if the outcome of the expression
+   formed by PREDICATE_CHAIN is true.  This function will be called during
+   version dispatch to decide which function version to execute.  It returns
+   the basic block at the end, to which more conditions can be added.  */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     tree predicate_chain, basic_block new_bb)
+{
+  gimple return_stmt;
+  tree convert_expr, result_var;
+  gimple convert_stmt;
+  gimple call_cond_stmt;
+  gimple if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+
+  tree cond_var, and_expr_var = NULL_TREE;
+  gimple_seq gseq;
+
+  tree predicate_decl, predicate_arg;
+
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node, NULL);
+  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  return_stmt = gimple_build_return (result_var);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      return new_bb;
+    }
+
+  while (predicate_chain != NULL)
+    {
+      cond_var = create_tmp_var (integer_type_node, NULL);
+      predicate_decl = TREE_PURPOSE (predicate_chain);
+      predicate_arg = TREE_VALUE (predicate_chain);
+      call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+      gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+      gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+      gimple_set_bb (call_cond_stmt, new_bb);
+      gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+      predicate_chain = TREE_CHAIN (predicate_chain);
+      
+      if (and_expr_var == NULL)
+        and_expr_var = cond_var;
+      else
+	{
+	  gimple assign_stmt;
+	  /* Use MIN_EXPR to check if any integer is zero?.
+	     and_expr_var = min_expr <cond_var, and_expr_var>  */
+	  assign_stmt = gimple_build_assign (and_expr_var,
+			  build2 (MIN_EXPR, integer_type_node,
+				  cond_var, and_expr_var));
+
+	  gimple_set_block (assign_stmt, DECL_INITIAL (function_decl));
+	  gimple_set_bb (assign_stmt, new_bb);
+	  gimple_seq_add_stmt (&gseq, assign_stmt);
+	}
+    }
+
+  if_else_stmt = gimple_build_cond (GT_EXPR, and_expr_var,
+	  		            integer_zero_node,
+				    NULL_TREE, NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR, 0);
+
+  pop_cfun ();
+
+  return bb3;
+}
+
+/* This parses the attribute arguments to target in DECL and determines
+   the right builtin to use to match the platform specification.
+   For now, only one target argument ("arch=" or "<-m>xxx") is allowed.
+   It returns the priority value for this version decl.  If PREDICATE_LIST
+   is not NULL, it stores the list of cpu features that need to be checked
+   before dispatching this function.  */
+
+static unsigned int
+get_builtin_code_for_version (tree decl, tree *predicate_list)
+{
+  tree attrs;
+  struct cl_target_option cur_target;
+  tree target_node;
+  struct cl_target_option *new_target;
+  const char *arg_str = NULL;
+  const char *attrs_str = NULL;
+  char *tok_str = NULL;
+  char *token;
+
+  /* Priority of i386 features, greater value is higher priority.   This is
+     used to decide the order in which function dispatch must happen.  For
+     instance, a version specialized for SSE4.2 should be checked for dispatch
+     before a version for SSE3, as SSE4.2 implies SSE3.  */
+  enum feature_priority
+  {
+    P_ZERO = 0,
+    P_MMX,
+    P_SSE,
+    P_SSE2,
+    P_SSE3,
+    P_SSSE3,
+    P_PROC_SSSE3,
+    P_SSE4_a,
+    P_PROC_SSE4_a,
+    P_SSE4_1,
+    P_SSE4_2,
+    P_PROC_SSE4_2,
+    P_POPCNT,
+    P_AVX,
+    P_AVX2,
+    P_FMA,
+    P_PROC_FMA
+  };
+
+ enum feature_priority priority = P_ZERO;
+
+  /* These are the target attribute strings for which a dispatcher is
+     available, from fold_builtin_cpu.  */
+
+  static struct _feature_list
+    {
+      const char *const name;
+      const enum feature_priority priority;
+    }
+  const feature_list[] =
+    {
+      {"mmx", P_MMX},
+      {"sse", P_SSE},
+      {"sse2", P_SSE2},
+      {"sse3", P_SSE3},
+      {"ssse3", P_SSSE3},
+      {"sse4.1", P_SSE4_1},
+      {"sse4.2", P_SSE4_2},
+      {"popcnt", P_POPCNT},
+      {"avx", P_AVX},
+      {"avx2", P_AVX2}
+    };
+
+
+  static unsigned int NUM_FEATURES
+    = sizeof (feature_list) / sizeof (struct _feature_list);
+
+  unsigned int i;
+
+  tree predicate_chain = NULL_TREE;
+  tree predicate_decl, predicate_arg;
+
+  attrs = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  gcc_assert (attrs != NULL);
+
+  attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+  gcc_assert (TREE_CODE (attrs) == STRING_CST);
+  attrs_str = TREE_STRING_POINTER (attrs);
+
+
+  /* Handle arch= if specified.  For priority, set it to be 1 more than
+     the best instruction set the processor can handle.  For instance, if
+     there is a version for atom and a version for ssse3 (the highest ISA
+     priority for atom), the atom version must be checked for dispatch
+     before the ssse3 version. */
+  if (strstr (attrs_str, "arch=") != NULL)
+    {
+      cl_target_option_save (&cur_target, &global_options);
+      target_node = ix86_valid_target_attribute_tree (attrs);
+    
+      gcc_assert (target_node);
+      new_target = TREE_TARGET_OPTION (target_node);
+      gcc_assert (new_target);
+      
+      if (new_target->arch_specified && new_target->arch > 0)
+	{
+	  switch (new_target->arch)
+	    {
+	    case PROCESSOR_CORE2_32:
+	    case PROCESSOR_CORE2_64:
+	      arg_str = "core2";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_COREI7_32:
+	    case PROCESSOR_COREI7_64:
+	      arg_str = "corei7";
+	      priority = P_PROC_SSE4_2;
+	      break;
+	    case PROCESSOR_ATOM:
+	      arg_str = "atom";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_AMDFAM10:
+	      arg_str = "amdfam10h";
+	      priority = P_PROC_SSE4_a;
+	      break;
+	    case PROCESSOR_BDVER1:
+	      arg_str = "bdver1";
+	      priority = P_PROC_FMA;
+	      break;
+	    case PROCESSOR_BDVER2:
+	      arg_str = "bdver2";
+	      priority = P_PROC_FMA;
+	      break;
+	    }  
+	}    
+    
+      cl_target_option_restore (&global_options, &cur_target);
+	
+      if (predicate_list && arg_str == NULL)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+	    	"No dispatcher found for the versioning attributes");
+	  return 0;
+	}
+    
+      if (predicate_list)
+	{
+          predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
+          /* For a C string literal the length includes the trailing NULL.  */
+          predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+          predicate_chain = tree_cons (predicate_decl, predicate_arg,
+				       predicate_chain);
+	}
+    }
+
+  /* Process feature name.  */
+  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
+  strcpy (tok_str, attrs_str);
+  token = strtok (tok_str, ",");
+  predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
+
+  while (token != NULL)
+    {
+      /* Do not process "arch="  */
+      if (strncmp (token, "arch=", 5) == 0)
+	{
+	  token = strtok (NULL, ",");
+	  continue;
+	}
+      for (i = 0; i < NUM_FEATURES; ++i)
+	{
+	  if (strcmp (token, feature_list[i].name) == 0)
+	    {
+	      if (predicate_list)
+		{
+		  predicate_arg = build_string_literal (
+				  strlen (feature_list[i].name) + 1,
+				  feature_list[i].name);
+		  predicate_chain = tree_cons (predicate_decl, predicate_arg,
+					       predicate_chain);
+		}
+	      /* Find the maximum priority feature.  */
+	      if (feature_list[i].priority > priority)
+		priority = feature_list[i].priority;
+
+	      break;
+	    }
+	}
+      if (predicate_list && i == NUM_FEATURES)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+		    "No dispatcher found for %s", token);
+	  return 0;
+	}
+      token = strtok (NULL, ",");
+    }
+  free (tok_str);
+
+  if (predicate_list && predicate_chain == NULL_TREE)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+	        "No dispatcher found for the versioning attributes : %s",
+	        attrs_str);
+      return 0;
+    }
+  else if (predicate_list)
+    {
+      predicate_chain = nreverse (predicate_chain);
+      *predicate_list = predicate_chain;
+    }
+
+  return priority; 
+}
+
+/* This compares the priority of target features in function DECL1
+   and DECL2.  It returns positive value if DECL1 is higher priority,
+   negative value if DECL2 is higher priority and 0 if they are the
+   same.  */
+
+static int
+ix86_compare_version_priority (tree decl1, tree decl2)
+{
+  unsigned int priority1 = 0;
+  unsigned int priority2 = 0;
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl1)) != NULL)
+    priority1 = get_builtin_code_for_version (decl1, NULL);
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl2)) != NULL)
+    priority2 = get_builtin_code_for_version (decl2, NULL);
+
+  return (int)priority1 - (int)priority2;
+}
+
+/* V1 and V2 point to function versions with different priorities
+   based on the target ISA.  This function compares their priorities.  */
+ 
+static int
+feature_compare (const void *v1, const void *v2)
+{
+  typedef struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    } function_version_info;
+
+  const function_version_info c1 = *(const function_version_info *)v1;
+  const function_version_info c2 = *(const function_version_info *)v2;
+  return (c2.dispatch_priority - c1.dispatch_priority);
+}
+
+/* This function generates the dispatch function for
+   multi-versioned functions.  DISPATCH_DECL is the function which will
+   contain the dispatch logic.  FNDECLS are the function choices for
+   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
+   in DISPATCH_DECL in which the dispatch code is generated.  */
+
+static int
+dispatch_function_versions (tree dispatch_decl,
+			    void *fndecls_p,
+			    basic_block *empty_bb)
+{
+  tree default_decl;
+  gimple ifunc_cpu_init_stmt;
+  gimple_seq gseq;
+  int ix;
+  tree ele;
+  VEC (tree, heap) *fndecls;
+  unsigned int num_versions = 0;
+  unsigned int actual_versions = 0;
+  unsigned int i;
+
+  struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    }*function_version_info;
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /*fndecls_p is actually a vector.  */
+  fndecls = (VEC (tree, heap) *)fndecls_p;
+
+  /* At least one more version other than the default.  */
+  num_versions = VEC_length (tree, fndecls);
+  gcc_assert (num_versions >= 2);
+
+  function_version_info = (struct _function_version_info *)
+    XNEWVEC (struct _function_version_info, (num_versions - 1));
+
+  /* The first version in the vector is the default decl.  */
+  default_decl = VEC_index (tree, fndecls, 0);
+
+  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
+
+  gseq = bb_seq (*empty_bb);
+  /* Function version dispatch is via IFUNC.  IFUNC resolvers fire before
+     constructors, so explicity call __builtin_cpu_init here.  */
+  ifunc_cpu_init_stmt = gimple_build_call_vec (
+                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
+  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
+  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
+  set_bb_seq (*empty_bb, gseq);
+
+  pop_cfun ();
+
+
+  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
+    {
+      tree version_decl = ele;
+      tree predicate_chain = NULL_TREE;
+      unsigned int priority;
+      /* Get attribute string, parse it and find the right predicate decl.
+         The predicate function could be a lengthy combination of many
+	 features, like arch-type and various isa-variants.  */
+      priority = get_builtin_code_for_version (version_decl,
+	 			               &predicate_chain);
+
+      if (predicate_chain == NULL_TREE)
+	continue;
+
+      actual_versions++;
+      function_version_info [ix - 1].version_decl = version_decl;
+      function_version_info [ix - 1].predicate_chain = predicate_chain;
+      function_version_info [ix - 1].dispatch_priority = priority;
+    }
+
+  /* Sort the versions according to descending order of dispatch priority.  The
+     priority is based on the ISA.  This is not a perfect solution.  There
+     could still be ambiguity.  If more than one function version is suitable
+     to execute,  which one should be dispatched?  In future, allow the user
+     to specify a dispatch  priority next to the version.  */
+  qsort (function_version_info, actual_versions,
+         sizeof (struct _function_version_info), feature_compare);
+
+  for  (i = 0; i < actual_versions; ++i)
+    *empty_bb = add_condition_to_bb (dispatch_decl,
+				     function_version_info[i].version_decl,
+				     function_version_info[i].predicate_chain,
+				     *empty_bb);
+
+  /* dispatch default version at the end.  */
+  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl,
+				   NULL, *empty_bb);
+
+  free (function_version_info);
+  return 0;
+}
+
+/* This function returns true if FN1 and FN2 are versions of the same function.
+   Returns false if only one of the function decls has the target attribute
+   set or if the targets of the function decls are different.  This assumes
+   the FN1 and FN2 have the same signature.  */
+
+static bool
+ix86_function_versions (tree fn1, tree fn2)
+{
+  tree attr1, attr2;
+  struct cl_target_option *target1, *target2;
+
+  if (TREE_CODE (fn1) != FUNCTION_DECL
+      || TREE_CODE (fn2) != FUNCTION_DECL)
+    return false;
+
+  attr1 = DECL_FUNCTION_SPECIFIC_TARGET (fn1);
+  attr2 = DECL_FUNCTION_SPECIFIC_TARGET (fn2);
+
+  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
+    return false;
+
+  if ((attr1 == NULL_TREE && attr2 != NULL_TREE)
+      || (attr1 != NULL_TREE && attr2 == NULL_TREE))
+    return true;
+
+  target1 = TREE_TARGET_OPTION (attr1);
+  target2 = TREE_TARGET_OPTION (attr2);
+
+  if (target1->x_ix86_isa_flags == target2->x_ix86_isa_flags
+      && target1->x_target_flags == target2->x_target_flags
+      && target1->arch == target2->arch
+      && target1->tune == target2->tune
+      && target1->x_ix86_fpmath == target2->x_ix86_fpmath
+      && target1->branch_cost == target2->branch_cost)
+    return false;
+
+  return true;
+}
+
+/* Comparator function to be used in qsort routine to sort attribute
+   specification strings to "target".  */
+
+static int
+attr_strcmp (const void *v1, const void *v2)
+{
+  const char *c1 = *(char *const*)v1;
+  const char *c2 = *(char *const*)v2;
+  return strcmp (c1, c2);
+}
+
+/* STR is the argument to target attribute.  This function tokenizes
+   the comma separated arguments, sorts them and returns a string which
+   is a unique identifier for the comma separated arguments.   It also
+   replaces non-identifier characters "=,-" with "_".  */
+
+static char *
+sorted_attr_string (const char *str)
+{
+  char **args = NULL;
+  char *attr_str, *ret_str;
+  char *attr = NULL;
+  unsigned int argnum = 1;
+  unsigned int i;
+
+  for (i = 0; i < strlen (str); i++)
+    if (str[i] == ',')
+      argnum++;
+
+  attr_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (attr_str, str);
+
+  /* Replace "=,-" with "_".  */
+  for (i = 0; i < strlen (attr_str); i++)
+    if (attr_str[i] == '=' || attr_str[i]== '-')
+      attr_str[i] = '_';
+
+  if (argnum == 1)
+    return attr_str;
+
+  args = XNEWVEC (char *, argnum);
+
+  i = 0;
+  attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      args[i] = attr;
+      i++;
+      attr = strtok (NULL, ",");
+    }
+
+  qsort (args, argnum, sizeof (char*), attr_strcmp);
+
+  ret_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (ret_str, args[0]);
+  for (i = 1; i < argnum; i++)
+    {
+      strcat (ret_str, "_");
+      strcat (ret_str, args[i]);
+    }
+
+  free (args);
+  free (attr_str);
+  return ret_str;
+}
+
+/* This function changes the assembler name for functions that are
+   versions.  If DECL is a function version and has a "target"
+   attribute, it appends the attribute string to its assembler name.  */
+
+static tree
+ix86_mangle_function_version_assembler_name (tree decl, tree id)
+{
+  tree version_attr;
+  const char *orig_name, *version_string, *attr_str;
+  char *assembler_name;
+
+  if (DECL_DECLARED_INLINE_P (decl)
+      && lookup_attribute ("gnu_inline",
+			   DECL_ATTRIBUTES (decl)))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Function versions cannot be marked as gnu_inline,"
+	      " bodies have to be generated");
+
+  if (DECL_VIRTUAL_P (decl)
+      || DECL_VINDEX (decl))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Virtual function versioning not supported\n");
+
+  version_attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+
+  /* target attribute string is NULL for default functions.  */
+  if (version_attr == NULL_TREE)
+    return id;
+
+  orig_name = IDENTIFIER_POINTER (id);
+  version_string
+    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
+
+  attr_str = sorted_attr_string (version_string);
+  assembler_name = (char *) xmalloc (strlen (orig_name)
+				     + strlen (attr_str) + 2);
+
+  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
+
+  if (dump_kind_p (MSG_NOTE))
+    dump_printf (MSG_NOTE, "Assembler name set to %s for function version %s",
+	         assembler_name, IDENTIFIER_POINTER (id));
+
+  /* Allow assembler name to be modified if already set.  */
+  if (DECL_ASSEMBLER_NAME_SET_P (decl))
+    SET_DECL_RTL (decl, NULL);
+
+  return get_identifier (assembler_name);
+}
+
+static tree 
+ix86_mangle_decl_assembler_name (tree decl, tree id)
+{
+  /* For function version, add the target suffix to the assembler name.  */
+  if (TREE_CODE (decl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (decl))
+    return ix86_mangle_function_version_assembler_name (decl, id);
+
+  return id;
+}
+
+/* Return a new name by appending SUFFIX to the DECL name.  If make_unique
+   is true, append the full path name of the source file.  */
+
+static char *
+make_name (tree decl, const char *suffix, bool make_unique)
+{
+  char *global_var_name;
+  int name_len;
+  const char *name;
+  const char *unique_name = NULL;
+
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* Get a unique name that can be used globally without any chances
+     of collision at link time.  */
+  if (make_unique)
+    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
+
+  name_len = strlen (name) + strlen (suffix) + 2;
+
+  if (make_unique)
+    name_len += strlen (unique_name) + 1;
+  global_var_name = XNEWVEC (char, name_len);
+
+  /* Use '.' to concatenate names as it is demangler friendly.  */
+  if (make_unique)
+      snprintf (global_var_name, name_len, "%s.%s.%s", name,
+		unique_name, suffix);
+  else
+      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
+
+  return global_var_name;
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher
+   by the front-end.  Return the decl created.  */
+
+static tree
+make_dispatcher_decl (const tree decl)
+{
+  tree func_decl;
+  char *func_name, *resolver_name;
+  tree fn_type, func_type;
+  bool is_uniq = false;
+
+  if (TREE_PUBLIC (decl) == 0)
+    is_uniq = true;
+
+  func_name = make_name (decl, "ifunc", is_uniq);
+  resolver_name = make_name (decl, "resolver", is_uniq);
+  gcc_assert (resolver_name);
+
+  fn_type = TREE_TYPE (decl);
+  func_type = build_function_type (TREE_TYPE (fn_type),
+				   TYPE_ARG_TYPES (fn_type));
+  
+  func_decl = build_fn_decl (func_name, func_type);
+  TREE_USED (func_decl) = 1;
+  DECL_CONTEXT (func_decl) = NULL_TREE;
+  DECL_INITIAL (func_decl) = error_mark_node;
+  DECL_ARTIFICIAL (func_decl) = 1;
+  /* Mark this func as external, the resolver will flip it again if
+     it gets generated.  */
+  DECL_EXTERNAL (func_decl) = 1;
+  /* This will be of type IFUNCs have to be externally visible.  */
+  TREE_PUBLIC (func_decl) = 1;
+
+  return func_decl;  
+}
+
+/* Returns true if decl is multi-versioned and DECL is the default function,
+   that is it is not tagged with target specific optimization.  */
+
+static bool
+is_function_default_version (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL_TREE);
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher
+   by the front-end.  It also chains the cgraph nodes of all the
+   semantically identical versions in vector FN_VER_VEC_P.  Returns the
+   decl of the dispatcher function.  */
+
+static tree
+ix86_get_function_versions_dispatcher (void *fn_ver_vec_p)
+{
+  struct cgraph_node *node = NULL;
+  struct cgraph_node *default_node = NULL;
+  struct cgraph_node *dispatcher_node = NULL;
+  int ix;
+  tree ele;
+  tree dispatch_decl = NULL;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+
+  fn_ver_vec = (VEC (tree,heap) *) fn_ver_vec_p;
+  gcc_assert (fn_ver_vec != NULL);
+
+  /* Find the default version.  */
+  for (ix = 0; VEC_iterate (tree, fn_ver_vec, ix, ele); ++ix)
+    {
+      if (is_function_default_version (ele))
+	{
+	  default_node = cgraph_get_create_node (ele);
+	  break;
+	}
+    }
+
+  /* If there is no default node, just return NULL.  */
+  if (!default_node)
+    return NULL;
+
+  /* If the dispatcher is already there, return it.  */
+  if (default_node->function_version
+      && default_node->function_version->dispatcher_resolver)
+    return default_node->function_version->dispatcher_resolver;
+
+  if (default_node->function_version == NULL)
+    default_node->function_version
+      = ggc_alloc_cleared_cgraph_function_version_info ();
+
+#if defined (ASM_OUTPUT_TYPE_DIRECTIVE) && HAVE_GNU_INDIRECT_FUNCTION
+  /* Right now, the dispatching is done via ifunc.  */
+  dispatch_decl = make_dispatcher_decl (default_node->symbol.decl); 
+#else
+  error_at (DECL_SOURCE_LOCATION (default_node->symbol.decl),
+	    "Multiversioning needs ifunc which is not supported "
+	    "in this configuration");
+#endif
+  default_node->function_version->dispatcher_resolver = dispatch_decl;
+  dispatcher_node = cgraph_get_create_node (dispatch_decl);
+  gcc_assert (dispatcher_node);
+  dispatcher_node->dispatcher_function = 1;
+  dispatcher_node->function_version
+    = ggc_alloc_cleared_cgraph_function_version_info ();
+  cgraph_mark_address_taken_node (default_node);
+
+  for (ix = 0; VEC_iterate (tree, fn_ver_vec, ix, ele); ++ix)
+    {
+      node = cgraph_get_create_node (ele);
+      gcc_assert (node != NULL && DECL_FUNCTION_VERSIONED (ele));
+
+      if (node == default_node)
+	continue;
+
+      if (node->function_version == NULL)
+	node->function_version
+	  = ggc_alloc_cleared_cgraph_function_version_info ();
+
+      gcc_assert (DECL_FUNCTION_SPECIFIC_TARGET (ele) != NULL_TREE);
+
+      if (dispatcher_node->function_version->next)
+ 	{
+	  struct cgraph_node *dispatcher_node_next
+	    = dispatcher_node->function_version->next;
+	  node->function_version->next = dispatcher_node_next;
+	  dispatcher_node_next->function_version->prev = node;
+	}
+
+      dispatcher_node->function_version->next = node;
+      node->function_version->prev = dispatcher_node;
+      node->function_version->dispatcher_resolver = dispatch_decl;
+    }
+
+  /* The default version should be the first node.  */
+  default_node->function_version->next = dispatcher_node->function_version->next;
+  (dispatcher_node->function_version->next)->function_version->prev
+     = default_node;
+  /* The dispatcher node should directly point to the default node.  */
+  dispatcher_node->function_version->next = default_node;
+  
+  return dispatch_decl;
+}
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* Make the resolver function decl to dispatch the versions of
+   a multi-versioned function,  DEFAULT_DECL.  Create an
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_resolver_func (const tree default_decl,
+		    const tree dispatch_decl,
+		    basic_block *empty_bb)
+{
+  char *resolver_name;
+  tree decl, type, decl_name, t;
+  basic_block new_bb;
+  bool is_uniq = false;
+
+  /* IFUNC's have to be globally visible.  So, if the default_decl is
+     not, then the name of the IFUNC should be made unique.  */
+  if (TREE_PUBLIC (default_decl) == 0)
+    is_uniq = true;
+
+  /* Append the filename to the resolver function if the versions are
+     not externally visible.  This is because the resolver function has
+     to be externally visible for the loader to find it.  So, appending
+     the filename will prevent conflicts with a resolver function from
+     another module which is based on the same version name.  */
+  resolver_name = make_name (default_decl, "resolver", is_uniq);
+
+  /* The resolver function should return a (void *). */
+  type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  decl_name = get_identifier (resolver_name);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  /* IFUNC resolvers have to be externally visible.  */
+  TREE_PUBLIC (decl) = 1;
+  DECL_UNINLINABLE (decl) = 1;
+
+  DECL_EXTERNAL (decl) = 0;
+  DECL_EXTERNAL (dispatch_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+  TREE_READONLY (decl) = 0;
+  DECL_PURE_P (decl) = 0;
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
+      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
+    }
+  else if (TREE_PUBLIC (default_decl))
+    {
+      /* In this case, each translation unit with a call to this
+	 versioned function will put out a resolver.  Ensure it
+	 is comdat to keep just one copy.  */
+      DECL_COMDAT (decl) = 1;
+      make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+    }
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  gimple_register_cfg_hooks ();
+  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
+  cfun->curr_properties |=
+    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_ssa
+     | PROP_gimple_any);
+  cfun->curr_properties = 15;
+  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
+  *empty_bb = new_bb;
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      gcc_assert (cgraph_get_node (default_decl));
+      symtab_add_to_same_comdat_group (
+	(symtab_node) cgraph_get_node (decl),
+	(symtab_node) cgraph_get_node (default_decl));
+    }
+
+  pop_cfun ();
+
+  gcc_assert (dispatch_decl != NULL);
+  /* Mark dispatch_decl as "ifunc" with resolver as resolver_name.  */
+  DECL_ATTRIBUTES (dispatch_decl) 
+    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));
+
+  /* Create the alias for dispatch to resolver here.  */
+  cgraph_create_function_alias (dispatch_decl, decl);
+  return decl;
+}
+
+/* Generate the dispatching code body to dispatch multi-versioned function
+   DECL.  The target hook is called to process the "target" attributes and
+   provide the code to dispatch the right function at run-time.  NODE points
+   to the dispatcher decl whose body will be created.  */
+
+static tree 
+ix86_generate_version_dispatcher_body (void *node_p)
+{
+  tree resolver_decl;
+  basic_block empty_bb;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  tree default_ver_decl;
+  struct cgraph_node *versn;
+  struct cgraph_node *node;
+
+  node = (cgraph_node *)node_p;
+
+  gcc_assert (node->dispatcher_function
+	      && (node->function_version != NULL));
+
+  if (node->function_version->dispatcher_resolver)
+    return node->function_version->dispatcher_resolver;
+
+  default_ver_decl = (node->function_version->next)->symbol.decl;
+  resolver_decl = make_resolver_func (default_ver_decl,
+				      node->symbol.decl, &empty_bb);
+  node->function_version->dispatcher_resolver = resolver_decl;
+
+  push_cfun (DECL_STRUCT_FUNCTION (resolver_decl));
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+
+  for (versn = node->function_version->next; versn;
+       versn = versn->function_version->next)
+    {
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (versn->symbol.decl))
+        error_at (DECL_SOURCE_LOCATION (versn->symbol.decl),
+		  "Virtual function multiversioning not supported");
+      VEC_safe_push (tree, heap, fn_ver_vec, versn->symbol.decl);
+      gcc_assert (versn->function_version);
+    }
+
+  dispatch_function_versions(resolver_decl, fn_ver_vec, &empty_bb);
+
+  rebuild_cgraph_edges (); 
+  pop_cfun ();
+  return resolver_decl;
+}
+
 /* This builds the processor_model struct type defined in
    libgcc/config/i386/cpuinfo.c  */
 
@@ -41005,6 +41991,9 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_PROFILE_BEFORE_PROLOGUE
 #define TARGET_PROFILE_BEFORE_PROLOGUE ix86_profile_before_prologue
 
+#undef TARGET_MANGLE_DECL_ASSEMBLER_NAME
+#define TARGET_MANGLE_DECL_ASSEMBLER_NAME ix86_mangle_decl_assembler_name
+
 #undef TARGET_ASM_UNALIGNED_HI_OP
 #define TARGET_ASM_UNALIGNED_HI_OP TARGET_ASM_ALIGNED_HI_OP
 #undef TARGET_ASM_UNALIGNED_SI_OP
@@ -41098,6 +42087,17 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN ix86_fold_builtin
 
+#undef TARGET_COMPARE_VERSION_PRIORITY
+#define TARGET_COMPARE_VERSION_PRIORITY ix86_compare_version_priority
+
+#undef TARGET_GENERATE_VERSION_DISPATCHER_BODY
+#define TARGET_GENERATE_VERSION_DISPATCHER_BODY \
+  ix86_generate_version_dispatcher_body
+
+#undef TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+#define TARGET_GET_FUNCTION_VERSIONS_DISPATCHER \
+  ix86_get_function_versions_dispatcher
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
 
@@ -41238,6 +42238,9 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_OPTION_PRINT
 #define TARGET_OPTION_PRINT ix86_function_specific_print
 
+#undef TARGET_OPTION_FUNCTION_VERSIONS
+#define TARGET_OPTION_FUNCTION_VERSIONS ix86_function_versions
+
 #undef TARGET_CAN_INLINE_P
 #define TARGET_CAN_INLINE_P ix86_can_inline_p
 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-10-20  4:29                                                                             ` Sriraman Tallam
@ 2012-10-23 21:21                                                                               ` Sriraman Tallam
  2012-10-26 16:53                                                                                 ` Jan Hubicka
  2012-10-26 14:11                                                                               ` Diego Novillo
  1 sibling, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-10-23 21:21 UTC (permalink / raw)
  To: Diego Novillo
  Cc: Jason Merrill, Jan Hubicka, Xinliang David Li, mark, nathan,
	H.J. Lu, Richard Guenther, Uros Bizjak, reply, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 12856 bytes --]

Hi,

   I have attached the latest patch with bug fixes, comments. I have
also added a description of the function multiversioning syntax
supported by the Intel compiler.

Thanks,
-Sri.

On Fri, Oct 19, 2012 at 7:33 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi Diego,
>
>    Thanks for the review. I have addressed all your comments.  New
> patch attached.
>
> Thanks,
> -Sri.
>
> On Fri, Oct 19, 2012 at 8:10 AM, Diego Novillo <dnovillo@google.com> wrote:
>> On 2012-10-12 18:19 , Sriraman Tallam wrote:
>>
>>> When the front-end sees more than one decl for "foo", it calls a target
>>> hook to
>>> determine if they are versions. To prevent duplicate definition errors
>>> with other
>>>  versions of "foo", "decls_match" function in cp/decl.c is made to return
>>> false
>>>  when 2 decls have are deemed versions by the target. This will make all
>>> function
>>>
>>> versions of "foo" to be added to the overload list of "foo".
>>
>>
>> So, this means that this can only work for C++, right?  Or could the same
>> trickery be done some other way in other FEs?
>>
>> I see no handling of different FEs.  If the user tries to use these
>> attributes from languages other than C++, we should emit a diagnostic.
>
> Yes, the support is only for C++ for now. "target" attribute is not
> new and if the user tries to use this with 'C' then a duplicate
> defintion error would occur just like now.
> I have plans to implement this for C too.
>
>>
>>> +@deftypefn {Target Hook} tree TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
>>> (void *@var{arglist})
>>> +This hook is used to get the dispatcher function for a set of function
>>> +versions.  The dispatcher function is called to invoke the rignt function
>>
>>
>> s/rignt/right/
>>
>>> +version at run-time. @var{arglist} is the vector of function versions
>>> +that should be considered for dispatch.
>>> +@end deftypefn
>>> +
>>> +@deftypefn {Target Hook} tree TARGET_GENERATE_VERSION_DISPATCHER_BODY
>>> (void *@var{arg})
>>> +This hook is used to generate the dispatcher logic to invoke the right
>>> +function version at runtime for a given set of function versions.
>>
>>
>> s/runtime/run-time/
>>
>>> +@hook TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
>>> +This hook is used to get the dispatcher function for a set of function
>>> +versions.  The dispatcher function is called to invoke the rignt function
>>
>>
>> s/rignt/right/
>>
>>> +version at run-time. @var{arglist} is the vector of function versions
>>> +that should be considered for dispatch.
>>> +@end deftypefn
>>> +
>>> +@hook TARGET_GENERATE_VERSION_DISPATCHER_BODY
>>> +This hook is used to generate the dispatcher logic to invoke the right
>>> +function version at runtime for a given set of function versions.
>>
>>
>> s/runtime/run-time/
>>
>>> @@ -288,7 +289,6 @@ mark_store (gimple stmt, tree t, void *data)
>>>       }
>>>    return false;
>>>  }
>>> -
>>>  /* Create cgraph edges for function calls.
>>>     Also look for functions and variables having addresses taken.  */
>>
>>
>> Don't remove vertical white space, please.
>>
>>> +               {
>>> +                 struct cgraph_node *callee = cgraph_get_create_node
>>> (decl);
>>> +                 /* If a call to a multiversioned function dispatcher is
>>> +                    found, generate the body to dispatch the right
>>> function
>>> +                    at run-time.  */
>>> +                 if (callee->dispatcher_function)
>>> +                   {
>>> +                     tree resolver_decl;
>>> +                     gcc_assert (callee->function_version.next);
>>
>>
>> What if callee is the last version in the list?  Not sure what you are
>> trying to check here.
>
> So, callee here is the dispatcher function and it points to the set of
> semantically identical function versions. At this point, the
> dispatcher (callee) should have all the function versions chained in
> function_version, which is what the assert is checking.
>
>>
>>
>>> @@ -8601,9 +8601,22 @@ handle_target_attribute (tree *node, tree name, tr
>>>        warning (OPT_Wattributes, "%qE attribute ignored", name);
>>>        *no_add_attrs = true;
>>>      }
>>> -  else if (! targetm.target_option.valid_attribute_p (*node, name, args,
>>> -                                                     flags))
>>> -    *no_add_attrs = true;
>>> +  else
>>> +    {
>>> +      /* When a target attribute is invalid, it may also be because the
>>> +        target for the compilation unit and the attribute match.  For
>>> +         instance, target attribute "xxx" is invalid when -mxxx is used.
>>> +         When used with multiversioning, removing the attribute will lead
>>> +         to duplicate definitions if a default version is provided.
>>> +        So, generate a warning here and remove the attribute.  */
>>> +      if (!targetm.target_option.valid_attribute_p (*node, name, args,
>>> flags))
>>> +       {
>>> +         warning (OPT_Wattributes,
>>> +                  "Invalid target attribute in function %qE, ignored.",
>>> +                  *node);
>>> +         *no_add_attrs = true;
>>
>>
>> If you do this, isn't the compiler going to generate two warning messages?
>> One for the invalid target attribute, the second for the duplicate
>> definition.
>
> This will be a warning and the duplicate definition would be an error.
> The warning would help the user understand why this error occurred.
> Example:
>
> ver.cc
> int __attribute__((target("popcnt")))
> bar (bool a)
> {
>   return 0;
> }
>
> int
> bar (bool a)
> {
>   return 1;
> }
>
> $ g++ -mpopcnt  ver.cc
>
> ver.cc:2:12: warning: Invalid target attribute in function ‘bar’,
> ignored. [-Wattributes]
>  bar (bool a)
>             ^
> ver.cc: In function ‘int bar(bool)’:
> ver.cc:7:1: error: redefinition of ‘int bar(bool)’
>  bar (bool a)
>  ^
> ver.cc:2:1: error: ‘int bar(bool)’ previously defined here
>  bar (bool a)
>
> When compiled with -mpopcnt, the new version does not differ from the default.
> Now, the warning makes it  clear why the redefinition error occurred.
>
>
>
>>
>>> @@ -228,6 +228,26 @@ struct GTY(()) cgraph_node {
>>>    struct cgraph_node *prev_sibling_clone;
>>>    struct cgraph_node *clones;
>>>    struct cgraph_node *clone_of;
>>> +
>>> +  /* Function Multiversioning info.  */
>>> +  struct {
>>>
>>> +    /* Chains all the semantically identical function versions.  The
>>> +       first function in this chain is the default function.  */
>>> +    struct cgraph_node *prev;
>>> +    /* If this node is a dispatcher for function versions, this points
>>> +       to the default function version, the first function in the chain.
>>> */
>>> +    struct cgraph_node *next;
>>
>>
>> Why not a VEC of function decls?  Seems easier to manage and less size
>> overhead.
>
> I have solved the size overhead by moving function_version_info
> outside cgraph. I think it is better to chain the decls as it is very
> easy to traverse the list of semantically identical versions from any
> given function version.
>
>>
>>
>>> @@ -3516,8 +3522,8 @@ struct GTY(()) tree_function_decl {
>>>
>>>    unsigned looping_const_or_pure_flag : 1;
>>>    unsigned has_debug_args_flag : 1;
>>>    unsigned tm_clone_flag : 1;
>>> -
>>> -  /* 1 bit left */
>>> +  unsigned versioned_function : 1;
>>> +  /* No bits left.  */
>>
>>
>> You ate the last bit!  How rude ;)
>
> I should get the patch in before somebody else really eats it ;-)
>
>>
>>> @@ -8132,6 +8176,38 @@ joust (struct z_candidate *cand1, struct z_candida
>>>        && (IS_TYPE_OR_DECL_P (cand1->fn)))
>>>      return 1;
>>>
>>> +  /* For Candidates of a multi-versioned function,  make the version with
>>
>>
>> s/Candidates/candidates/
>>
>>
>>> +  old_current_function_decl = current_function_decl;
>>> +  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
>>> +  current_function_decl = function_decl;
>>
>>
>> push_cfun will set current_function_decl for you.  No need to keep track of
>> old_current_function_decl.
>>
>>> +  enum feature_priority
>>> +  {
>>> +    P_ZERO = 0,
>>> +    P_MMX,
>>> +    P_SSE,
>>> +    P_SSE2,
>>> +    P_SSE3,
>>> +    P_SSSE3,
>>> +    P_PROC_SSSE3,
>>> +    P_SSE4_a,
>>> +    P_PROC_SSE4_a,
>>> +    P_SSE4_1,
>>> +    P_SSE4_2,
>>> +    P_PROC_SSE4_2,
>>> +    P_POPCNT,
>>> +    P_AVX,
>>> +    P_AVX2,
>>> +    P_FMA,
>>> +    P_PROC_FMA
>>> +  };
>>
>>
>> There's no need to have this list dynamically defined, right?
>
> I dont understand, why expose the enum outside the function?
>
>>
>>> +       }
>>> +    }
>>> +
>>> +  /* Process feature name.  */
>>> +  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
>>
>>
>> XNEWVEC(char, strlen (attrs_str) + 1);
>>
>>
>>> +  /* Atleast one more version other than the default.  */
>>
>>
>> s/Atleast/At least/
>>
>>> +  num_versions = VEC_length (tree, fndecls);
>>> +  gcc_assert (num_versions >= 2);
>>> +
>>> +  function_version_info = (struct _function_version_info *)
>>> +    xmalloc ((num_versions - 1) * sizeof (struct
>>> _function_version_info));
>>
>>
>> Better use VEC() here.
>>
>>
>>> +
>>> +  /* The first version in the vector is the default decl.  */
>>> +  default_decl = VEC_index (tree, fndecls, 0);
>>> +
>>> +  old_current_function_decl = current_function_decl;
>>> +  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
>>> +  current_function_decl = dispatch_decl;
>>
>>
>> No need to set current_function_decl.
>>
>>> +
>>> +  gseq = bb_seq (*empty_bb);
>>> +  /* Function version dispatch is via IFUNC.  IFUNC resolvers fire before
>>> +     constructors, so explicity call __builtin_cpu_init here.  */
>>>
>>> +  ifunc_cpu_init_stmt = gimple_build_call_vec (
>>> +                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
>>> +  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
>>> +  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
>>> +  set_bb_seq (*empty_bb, gseq);
>>> +
>>> +  pop_cfun ();
>>> +  current_function_decl = old_current_function_decl;
>>
>>
>> Likewise here.
>>
>>> +/* This function returns true if fn1 and fn2 are versions of the same
>>> function.
>>> +   Returns false if only one of the function decls has the target
>>> attribute
>>> +   set or if the targets of the function decls are different.  This
>>> assumes
>>> +   the fn1 and fn2 have the same signature.  */
>>
>>
>> Mention the arguments in capitals.
>>
>>
>>> +  for (i = 0; i < strlen (str); i++)
>>> +    if (str[i] == ',')
>>> +      argnum++;
>>> +
>>> +  attr_str = (char *)xmalloc (strlen (str) + 1);
>>
>>
>> XNEWVEC()
>>
>>> +  strcpy (attr_str, str);
>>> +
>>> +  /* Replace "=,-" with "_".  */
>>>
>>> +  for (i = 0; i < strlen (attr_str); i++)
>>> +    if (attr_str[i] == '=' || attr_str[i]== '-')
>>>
>>> +      attr_str[i] = '_';
>>> +
>>> +  if (argnum == 1)
>>> +    return attr_str;
>>> +
>>> +  args = (char **)xmalloc (argnum * sizeof (char *));
>>
>>
>> VEC()?
>>
>>> +  if (DECL_DECLARED_INLINE_P (decl)
>>> +      && lookup_attribute ("gnu_inline",
>>>
>>> +                          DECL_ATTRIBUTES (decl)))
>>> +    error_at (DECL_SOURCE_LOCATION (decl),
>>> +             "Function versions cannot be marked as gnu_inline,"
>>> +             " bodies have to be generated\n");
>>
>>
>> No newline at the end of the error message.
>>
>>> +  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
>>> +  if (dump_file)
>>> +    fprintf (stderr, "Assembler name set to %s for function version
>>> %s\n",
>>> +            assembler_name, IDENTIFIER_POINTER (id));
>>
>>
>> This dumps to stderr instead of dump_file.  Also, use the new dumping
>> facility?
>>
>>
>>> +/* Return a new name by appending SUFFIX to the DECL name.  If
>>> +   make_unique is true, append the full path name.  */
>>
>>
>> Full path name of what?
>>
>>
>>> +
>>> +static char *
>>> +make_name (tree decl, const char *suffix, bool make_unique)
>>> +{
>>> +  char *global_var_name;
>>> +  int name_len;
>>> +  const char *name;
>>> +  const char *unique_name = NULL;
>>> +
>>> +  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
>>> +
>>> +  /* Get a unique name that can be used globally without any chances
>>> +     of collision at link time.  */
>>> +  if (make_unique)
>>> +    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
>>> +
>>> +  name_len = strlen (name) + strlen (suffix) + 2;
>>> +
>>> +  if (make_unique)
>>> +    name_len += strlen (unique_name) + 1;
>>> +  global_var_name = (char *) xmalloc (name_len);
>>
>>
>> XNEWVEC.
>>
>>
>>
>> Diego.

[-- Attachment #2: mv_fe_patch_10232012.txt --]
[-- Type: text/plain, Size: 80477 bytes --]

Overview of the patch which adds support to specify function versions.  This is
only enabled for target i386.

Example:

int foo ();  /* Default version */
int foo () __attribute__ ((target("avx,popcnt")));/*Specialized for avx and popcnt */
int foo () __attribute__ ((target("arch=core2,ssse3")));/*Specialized for core2 and ssse3*/

int main ()
{
 int (*p)() = &foo;
 return foo () + (*p)();
}

int foo ()
{
 return 0;
}

int __attribute__ ((target("avx,popcnt")))
foo ()
{
 return 0;
}

int __attribute__ ((target("arch=core2,ssse3")))
foo ()
{
 return 0;
}

The above example has foo defined 3 times, but all 3 definitions of foo are
different versions of the same function. The call to foo in main, directly and
via a pointer, are calls to the multi-versioned function foo which is dispatched
to the right foo at run-time.

Front-end changes:

The front-end changes are calls at appropriate places to target hooks that
determine the following:

* Determine if two function decls with the same signature are versions.
* Determine the assembler name of a function version.
* Generate the dispatcher function for a set of function versions.
* Compare versions to see if one has a higher priority over the other.

All the implementation happens in the target-specific config/i386/i386.c.

What does the patch do?

* Tracking decls that correspond to function versions of function
name, say "foo":

When the front-end sees more than one decl for "foo", it calls a target hook to
determine if they are versions. To prevent duplicate definition errors with other
 versions of "foo", "decls_match" function in cp/decl.c is made to return false
 when 2 decls have are deemed versions by the target. This will make all function
versions of "foo" to be added to the overload list of "foo".

* Change the assembler names of the function versions.

For i386, the target changes the assembler names of the function versions by
 suffixing the sorted list of args to "target" to the function name of "foo". For
example, he assembler name of "void foo () __attribute__ ((target ("sse4")))" will
become _Z3foov.sse4.  The target hook mangle_decl_assembler_name is used for this.

* Overload resolution:

 Function "build_over_call" in cp/call.c sees a call to function
"foo", which is multi-versioned. The overload resolution happens in
function "joust" in "cp/call.c". Here, the call to "foo" has all
possible versions of "foo" as candidates. All the candidates of "foo" are
stored in the cgraph data structures. Each version of foo is chained in a 
doubly-linked list with the default function as the first element.  This allows
any pass to access all the semantically identical versions. A call to a
multi-versioned function will be replaced by a call to a dispatcher function,
determined by a target hook, to execute the right function version at run-time.

Optimization to directly call a version when possible:
Also, in joust, where overload resolution happens, a multiversioned function
resolution is made to return the most specialized version.  This is the version
that will be checked for dispatching first and is determined by the target.
Now, if the caller can inline this function version then a direct call is made
to this function version rather than go through the dispatcher. When a direct
call cannot be made, a call to the dispatcher function is created.

* Creating the dispatcher body.

The dispatcher body, called the resolver is made only when there is a call to a
multiversioned function dispatcher or the address of a function is taken. This
is generated during build_cgraph_edges for a call or cgraph_mark_address_taken
for a pointer reference. This is done by another target hook.

* Dispatch ordering.

The order in which the function versions are checked during dispatch is based
on a priority value assigned for the ISA that is catered. More specialized 
versions are checked for dispatching first.  This is to mitigate the ambiguity
that can arise when more than one function version is valid for execution on
a particular platform.  This is not a perfect solution, and in future the user
should be allowed to assign a dispatching priority value to each version.

Function MV in the Intel compiler:

The intel compiler supports function multiversioning and the syntax is
similar to the patch proposed here.  Here is an example of how to
generate multiple function versions with the intel compiler.

/* Create a stub function to specify the various versions of function that
   will be created, using declspec attribute cpu_dispatch.  */
__declspec (cpu_dispatch (core_i7_sse4_2, atom, generic))
void foo () {};

/* Bodies of each function version.  */

/* Intel Corei7 processor + SSE4.2 version.  */
__declspec (cpu_specific(core_i7_sse4_2))
void foo ()
{
  printf ("corei7 + sse4.2");
}

/* Atom processor.  */
__declspec (cpu_specific(atom))
void foo ()
{
  printf ("atom");
}

/* The generic or the default version.  */
__declspec (cpu_specific(generic))
void foo ()
{
  printf ("This is generic");
}

A new function version is generated by defining a new function with the same
signature but with a different cpu_specific declspec attribute string.  The
set of cpu_specific strings that are allowed is the following:

"core_2nd_gen_avx"
"core_aes_pclmulqdq"
"core_i7_sse4_2"
"core_2_duo_sse4_1"
"core_2_duo_ssse3"
"atom"
"pentium_4_sse3"
"pentium_4"
"pentium_m"
"pentium_iii"
"generic"

Comparison with the GCC MV implementation in this patch:

* Version creation syntax:

The implementation in this patch also has a similar syntax to specify function
versions. The first stub function is not needed.  Here is the code to generate
the function versions with this patch:

/* Intel Corei7 processor + SSE4.2 version.  */
__attribute__ ((target ("arch=corei7, sse4.2")))
void foo ()
{
  printf ("corei7 + sse4.2");
}

/* Atom processor.  */
__attribute__ ((target ("arch=atom")))
void foo ()
{
  printf ("atom");
}

void foo ()
{
}

The target attribute can have one of the following arch names:

"amd"
"intel"
"atom"
"core2"
"corei7"
"nehalem"
"westmere"
"sandybridge"
"amdfam10h"
"barcelona"
"shanghai"
"istanbul"
"amdfam15h"
"bdver1"
"bdver2"

and any number of the following ISA names:

"cmov"
"mmx"
"popcnt"
"sse"
"sse2"
"sse3"
"ssse3"
"sse4.1"
"sse4.2"
"avx"
"avx2"


	* doc/tm.texi.in (TARGET_OPTION_FUNCTION_VERSIONS): New hook description.
	* (TARGET_COMPARE_VERSION_PRIORITY): New hook description.
	* (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New hook description.
	* (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New hook description.
	* doc/tm.texi: Regenerate.
	* cgraphbuild.c (build_cgraph_edges): Generate body of multiversion
	function dispatcher.
	* c-family/c-common.c (handle_target_attribute): Warn for invalid attributes.
	* target.def (compare_version_priority): New target hook.
	* (generate_version_dispatcher_body): New target hook.
	* (get_function_versions_dispatcher): New target hook.
	* (function_versions): New target hook.
	* cgraph.c (cgraph_mark_address_taken_node): Generate body of multiversion
	function dispatcher.
	* cgraph.h (cgraph_node): New members function_version,
	dispatcher_function.
	(cgraph_function_version_info): New struct.
	* tree.h (DECL_FUNCTION_VERSIONED): New macro.
	(tree_function_decl): New bit-field versioned_function.
	* testsuite/g++.dg/mv1.C: New test.
	* testsuite/g++.dg/mv2.C: New test.
	* testsuite/g++.dg/mv3.C: New test.
	* testsuite/g++.dg/mv4.C: New test.
	* cp/class.c:
	(add_method): Change assembler names of function versions.
	(resolve_address_of_overloaded_function): Save all function
	version candidates. Create dispatcher decl and return address of
	dispatcher instead.
	* cp/decl.c (decls_match): Make decls unmatched for versioned
	functions.
	(duplicate_decls): Remove ambiguity for versioned functions. 
	(cxx_comdat_group): Make comdat group of versioned functions be the
	same.
	* cp/error.c (dump_exception_spec): Dump assembler name for function
	versions.
	* cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
	functions that are also marked inline.
	* cp/decl2.c:(check_classfn): Check attributes of versioned functions
	for match.
	* cp/call.c: (build_new_function_call): Check if versioned functions
	have a default version.
	(build_over_call): Make calls to multiversioned functions
	to call the dispatcher.
	(joust): For calls to multi-versioned functions, make the most
	specialized function version win.
	(tourney): Generate dispatcher decl for function versions.
	(get_function_version_dispatcher): New function.
	(generate_function_versions_dispatcher): New function.
	* Makefile.in: Add multiversion.o
	* config/i386/i386.c (add_condition_to_bb): New function.
	(get_builtin_code_for_version): New function.
	(ix86_compare_version_priority): New function.
	(feature_compare): New function.
	(dispatch_function_versions): New function.
	* (ix86_function_versions): New function.
	* (attr_strcmp): New function.
	* (sorted_attr_string): New function.
	* (ix86_mangle_function_version_assembler_name): New function.
	* (ix86_mangle_decl_assembler_name): New function.
	* (make_name): New function.
	* (make_dispatcher_decl): New function.
	* (is_function_default_version): New function.
	* (ix86_get_function_versions_dispatcher): New function.
	* (make_attribute): New function.
	* (make_resolver_func): New function.
	* (ix86_generate_version_dispatcher_body): New function.
	* (TARGET_COMPARE_VERSION_PRIORITY): New macro.
	* (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New macro.
	* (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New macro.
	* (TARGET_OPTION_FUNCTION_VERSIONS): New macro.
	* (TARGET_MANGLE_DECL_ASSEMBLER_NAME): New macro.

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 192623)
+++ gcc/doc/tm.texi	(working copy)
@@ -9913,6 +9913,14 @@ changed via the optimize attribute or pragma, see
 @code{TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE}
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_OPTION_FUNCTION_VERSIONS (tree @var{decl1}, tree @var{decl2})
+This target hook returns @code{true} if @var{DECL1} and @var{DECL2} are
+versions of the same function.  @var{DECL1} and @var{DECL2} are function
+versions if and only if they have the same function signature and
+different target specific attributes, that is, they are compiled for
+different target machines.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_CAN_INLINE_P (tree @var{caller}, tree @var{callee})
 This target hook returns @code{false} if the @var{caller} function
 cannot inline @var{callee}, based on target specific information.  By
@@ -10930,6 +10938,29 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_COMPARE_VERSION_PRIORITY (tree @var{decl1}, tree @var{decl2})
+This hook is used to compare the target attributes in two functions to
+determine which function's features get higher priority.  This is used
+during function multi-versioning to figure out the order in which two
+versions must be dispatched.  A function version with a higher priority
+is checked for dispatching earlier.  @var{decl1} and @var{decl2} are
+ the two function decls that will be compared.
+@end deftypefn
+
+@deftypefn {Target Hook} tree TARGET_GET_FUNCTION_VERSIONS_DISPATCHER (void *@var{arglist})
+This hook is used to get the dispatcher function for a set of function
+versions.  The dispatcher function is called to invoke the right function
+version at run-time. @var{arglist} is the vector of function versions
+that should be considered for dispatch.
+@end deftypefn
+
+@deftypefn {Target Hook} tree TARGET_GENERATE_VERSION_DISPATCHER_BODY (void *@var{arg})
+This hook is used to generate the dispatcher logic to invoke the right
+function version at run-time for a given set of function versions.
+@var{arg} points to the callgraph node of the dispatcher function whose
+body must be generated.
+@end deftypefn
+
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 192623)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -9782,6 +9782,14 @@ changed via the optimize attribute or pragma, see
 @code{TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE}
 @end deftypefn
 
+@hook TARGET_OPTION_FUNCTION_VERSIONS
+This target hook returns @code{true} if @var{DECL1} and @var{DECL2} are
+versions of the same function.  @var{DECL1} and @var{DECL2} are function
+versions if and only if they have the same function signature and
+different target specific attributes, that is, they are compiled for
+different target machines.
+@end deftypefn
+
 @hook TARGET_CAN_INLINE_P
 This target hook returns @code{false} if the @var{caller} function
 cannot inline @var{callee}, based on target specific information.  By
@@ -10788,6 +10796,29 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@hook TARGET_COMPARE_VERSION_PRIORITY
+This hook is used to compare the target attributes in two functions to
+determine which function's features get higher priority.  This is used
+during function multi-versioning to figure out the order in which two
+versions must be dispatched.  A function version with a higher priority
+is checked for dispatching earlier.  @var{decl1} and @var{decl2} are
+ the two function decls that will be compared.
+@end deftypefn
+
+@hook TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+This hook is used to get the dispatcher function for a set of function
+versions.  The dispatcher function is called to invoke the right function
+version at run-time. @var{arglist} is the vector of function versions
+that should be considered for dispatch.
+@end deftypefn
+
+@hook TARGET_GENERATE_VERSION_DISPATCHER_BODY
+This hook is used to generate the dispatcher logic to invoke the right
+function version at run-time for a given set of function versions.
+@var{arg} points to the callgraph node of the dispatcher function whose
+body must be generated.
+@end deftypefn
+
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/cgraphbuild.c
===================================================================
--- gcc/cgraphbuild.c	(revision 192623)
+++ gcc/cgraphbuild.c	(working copy)
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-utils.h"
 #include "except.h"
 #include "ipa-inline.h"
+#include "target.h"
 
 /* Context of record_reference.  */
 struct record_reference_ctx
@@ -317,8 +318,23 @@ build_cgraph_edges (void)
 							 bb);
 	      decl = gimple_call_fndecl (stmt);
 	      if (decl)
-		cgraph_create_edge (node, cgraph_get_create_node (decl),
-				    stmt, bb->count, freq);
+		{
+		  struct cgraph_node *callee = cgraph_get_create_node (decl);
+	          /* If a call to a multiversioned function dispatcher is
+		     found, generate the body to dispatch the right function
+		     at run-time.  */
+		  if (callee->dispatcher_function)
+		    {
+		      tree resolver_decl;
+		      gcc_assert (callee->function_version
+				  && callee->function_version->next);
+		      gcc_assert (targetm.generate_version_dispatcher_body);
+		      resolver_decl
+			 = targetm.generate_version_dispatcher_body (callee);
+		      gcc_assert (resolver_decl != NULL_TREE);
+		    }
+		  cgraph_create_edge (node, callee, stmt, bb->count, freq);
+	        }
 	      else
 		cgraph_create_indirect_edge (node, stmt,
 					     gimple_call_flags (stmt),
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 192623)
+++ gcc/c-family/c-common.c	(working copy)
@@ -8737,9 +8737,20 @@ handle_target_attribute (tree *node, tree name, tr
       warning (OPT_Wattributes, "%qE attribute ignored", name);
       *no_add_attrs = true;
     }
-  else if (! targetm.target_option.valid_attribute_p (*node, name, args,
-						      flags))
-    *no_add_attrs = true;
+  else
+    {
+      /* When a target attribute is invalid, it may also be because the
+	 target for the compilation unit and the attribute match.  For
+         instance, target attribute "xxx" is invalid when -mxxx is used.
+         When used with multiversioning, removing the attribute will lead
+         to duplicate definitions if a default version is provided.
+         So, generate a warning here and remove the attribute.  */
+      if (!targetm.target_option.valid_attribute_p (*node, name, args, flags))
+	{
+	  warning (OPT_Wattributes, "%qE attribute invalid, ignored", name);
+	  *no_add_attrs = true;
+	}
+    }
 
   return NULL_TREE;
 }
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 192623)
+++ gcc/target.def	(working copy)
@@ -1298,6 +1298,37 @@ DEFHOOK
  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
  hook_tree_tree_int_treep_bool_null)
 
+/* Target hook is used to compare the target attributes in two functions to
+   determine which function's features get higher priority.  This is used
+   during function multi-versioning to figure out the order in which two
+   versions must be dispatched.  A function version with a higher priority
+   is checked for dispatching earlier.  DECL1 and DECL2 are
+   the two function decls that will be compared. It returns positive value
+   if DECL1 is higher priority,  negative value if DECL2 is higher priority
+   and 0 if they are the same. */
+DEFHOOK
+(compare_version_priority,
+ "",
+ int, (tree decl1, tree decl2), NULL)
+
+/*  Target hook is used to generate the dispatcher logic to invoke the right
+    function version at run-time for a given set of function versions.
+    ARG points to the callgraph node of the dispatcher function whose body
+    must be generated.  */
+DEFHOOK
+(generate_version_dispatcher_body,
+ "",
+ tree, (void *arg), NULL) 
+
+/* Target hook is used to get the dispatcher function for a set of function
+   versions.  The dispatcher function is called to invoke the right function
+   version at run-time.  ARGLIST is the vector of function versions that
+   should be considered for dispatch.  */
+DEFHOOK
+(get_function_versions_dispatcher,
+ "",
+ tree, (void *arglist), NULL)
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 DEFHOOK
@@ -2725,6 +2756,16 @@ DEFHOOK
  void, (void),
  hook_void_void)
 
+/* This function returns true if DECL1 and DECL2 are versions of the same
+   function.  DECL1 and DECL2 are function versions if and only if they
+   have the same function signature and different target specific attributes,
+   that is, they are compiled for different target machines.  */
+DEFHOOK
+(function_versions,
+ "",
+ bool, (tree decl1, tree decl2),
+ hook_bool_tree_tree_false)
+
 /* Function to determine if one function can inline another function.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_"
Index: gcc/cgraph.c
===================================================================
--- gcc/cgraph.c	(revision 192623)
+++ gcc/cgraph.c	(working copy)
@@ -1277,6 +1277,16 @@ cgraph_mark_address_taken_node (struct cgraph_node
   node->symbol.address_taken = 1;
   node = cgraph_function_or_thunk_node (node, NULL);
   node->symbol.address_taken = 1;
+  /* If the address of a multiversioned function dispatcher is taken,
+     generate the body to dispatch the right function at run-time.  This
+     is needed as the address can be used to do an indirect call.  */
+  if (node->dispatcher_function)
+    {
+      gcc_assert (node->function_version
+		  && node->function_version->next);
+      gcc_assert (targetm.generate_version_dispatcher_body);
+      targetm.generate_version_dispatcher_body (node);
+    }
 }
 
 /* Return local info for the compiled function.  */
Index: gcc/cgraph.h
===================================================================
--- gcc/cgraph.h	(revision 192623)
+++ gcc/cgraph.h	(working copy)
@@ -200,7 +200,26 @@ struct GTY(()) cgraph_clone_info
   bitmap combined_args_to_skip;
 };
 
+/* Function Multiversioning info.  */
+struct GTY(()) cgraph_function_version_info
+{
+  /* Chains all the semantically identical function versions.  The
+     first function in this chain is the default function.  */
+  struct cgraph_node *prev;
+  /* If this node is a dispatcher for function versions, this points
+     to the default function version, the first function in the chain.  */
+  struct cgraph_node *next;
+  /* If this node corresponds to a function version, this points
+     to the dispatcher function decl, which is the function that must
+     be called to execute the right function version at run-time.
 
+     If this cgraph node is a dispatcher (if dispatcher_function is true, in
+     the cgraph struct) for function versions, this points to resolver
+     function, which holds the function body of the dispatcher.
+     The dispatcher decl is an alias to the resolver function decl.  */
+  tree dispatcher_resolver;
+};
+
 /* The cgraph data structure.
    Each function decl has assigned cgraph_node listing callees and callers.  */
 
@@ -228,6 +247,10 @@ struct GTY(()) cgraph_node {
   struct cgraph_node *prev_sibling_clone;
   struct cgraph_node *clones;
   struct cgraph_node *clone_of;
+
+  /* Function Multiversioning Info.  */
+  struct cgraph_function_version_info *function_version;
+  
   /* For functions with many calls sites it holds map from call expression
      to the edge to speed up cgraph_edge function.  */
   htab_t GTY((param_is (struct cgraph_edge))) call_site_hash;
@@ -279,6 +302,8 @@ struct GTY(()) cgraph_node {
   /* ?? We should be able to remove this.  We have enough bits in
      cgraph to calculate it.  */
   unsigned tm_clone : 1;
+  /* True if this decl is a dispatcher for function versions.  */
+  unsigned dispatcher_function : 1;
 };
 
 DEF_VEC_P(symtab_node);
@@ -656,6 +681,7 @@ void cgraph_rebuild_references (void);
 int compute_call_stmt_bb_frequency (tree, basic_block bb);
 void record_references_in_initializer (tree, bool);
 
+
 /* In ipa.c  */
 bool symtab_remove_unreachable_nodes (bool, FILE *);
 cgraph_node_set cgraph_node_set_new (void);
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 192623)
+++ gcc/tree.h	(working copy)
@@ -3476,6 +3476,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
 #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
 
+/* In FUNCTION_DECL, this is set if this function has other versions generated
+   using "target" attributes.  The default version is the one which does not
+   have any "target" attribute set. */
+#define DECL_FUNCTION_VERSIONED(NODE)\
+   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
+
 /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
    arguments/result/saved_tree fields by front ends.   It was either inherit
    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
@@ -3520,8 +3526,8 @@ struct GTY(()) tree_function_decl {
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
-
-  /* 1 bit left */
+  unsigned versioned_function : 1;
+  /* No bits left.  */
 };
 
 /* The source language of the translation-unit.  */
Index: gcc/testsuite/g++.dg/mv2.C
===================================================================
--- gcc/testsuite/g++.dg/mv2.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv2.C	(revision 0)
@@ -0,0 +1,119 @@
+/* Test case to check if Multiversioning chooses the correct
+   dispatching order when versions are for various ISAs.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -mno-sse -mno-mmx -mno-popcnt -mno-avx" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The dispatch checks should be in the exact reverse order of the
+   declarations below.  */
+int foo () __attribute__ ((target ("mmx")));
+int foo () __attribute__ ((target ("sse")));
+int foo () __attribute__ ((target ("sse2")));
+int foo () __attribute__ ((target ("sse3")));
+int foo () __attribute__ ((target ("ssse3")));
+int foo () __attribute__ ((target ("sse4.1")));
+int foo () __attribute__ ((target ("sse4.2")));
+int foo () __attribute__ ((target ("popcnt")));
+int foo () __attribute__ ((target ("avx")));
+int foo () __attribute__ ((target ("avx2")));
+
+int main ()
+{
+
+  int val = foo ();
+
+  if (__builtin_cpu_supports ("avx2"))
+    assert (val == 1);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("popcnt"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("sse4.2"))
+    assert (val == 4);
+  else if (__builtin_cpu_supports ("sse4.1"))
+    assert (val == 5);
+  else if (__builtin_cpu_supports ("ssse3"))
+    assert (val == 6);
+  else if (__builtin_cpu_supports ("sse3"))
+    assert (val == 7);
+  else if (__builtin_cpu_supports ("sse2"))
+    assert (val == 8);
+  else if (__builtin_cpu_supports ("sse"))
+    assert (val == 9);
+  else if (__builtin_cpu_supports ("mmx"))
+    assert (val == 10);
+  else
+    assert (val == 0);
+
+  return 0;
+}
+
+int
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 2;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return 1;
+}
Index: gcc/testsuite/g++.dg/mv4.C
===================================================================
--- gcc/testsuite/g++.dg/mv4.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv4.C	(revision 0)
@@ -0,0 +1,23 @@
+/* Test case to check if the compiler generates an error message
+   when the default version of a multiversioned function is absent
+   and its pointer is taken.  */
+
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -mno-sse -mno-popcnt" } */
+
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int main ()
+{
+  int (*p)() = &foo; /* { dg-error "Pointer to a multiversioned function without a default is not allowed" {} } */
+  return (*p)();
+}
Index: gcc/testsuite/g++.dg/mv1.C
===================================================================
--- gcc/testsuite/g++.dg/mv1.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv1.C	(revision 0)
@@ -0,0 +1,130 @@
+/* Test case to check if Multiversioning works.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -fPIC -mno-avx -mno-popcnt" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The other versions of foo.  Mix up the ordering and 
+   check if the dispatching does it in the order of priority. */
+/* Check combination of target attributes.  */
+int foo () __attribute__ ((target("arch=corei7,popcnt")));
+/* The target operands in this declaration and the definition are re-ordered.
+   This should still work.  */
+int foo () __attribute__ ((target("ssse3,avx2")));
+
+/* Check for all target attributes for which dispatchers are available.  */
+/* Check arch= */
+int foo () __attribute__((target("arch=core2")));
+int foo () __attribute__((target("arch=corei7")));
+int foo () __attribute__((target("arch=atom")));
+/* Check ISAs  */
+int foo () __attribute__((target("avx")));
+int foo () __attribute__ ((target("arch=core2,sse4.2")));
+/* Check more arch=.  */
+int foo () __attribute__((target("arch=amdfam10")));
+int foo () __attribute__((target("arch=bdver1")));
+int foo () __attribute__((target("arch=bdver2")));
+
+int (*p)() = &foo;
+int main ()
+{
+  int val = foo ();
+  assert (val ==  (*p)());
+
+  /* Check in the exact same order in which the dispatching
+     is expected to happen.  */
+  if (__builtin_cpu_is ("bdver1"))
+    assert (val == 1);
+  else if (__builtin_cpu_is ("bdver2"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("avx2")
+	   && __builtin_cpu_supports ("ssse3"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 4);
+  else if (__builtin_cpu_is ("corei7")
+	   && __builtin_cpu_supports ("popcnt"))
+    assert (val == 5);
+  else if (__builtin_cpu_is ("corei7"))
+    assert (val == 6);
+  else if (__builtin_cpu_is ("amdfam10h"))
+    assert (val == 7);
+  else if (__builtin_cpu_is ("core2")
+	   && __builtin_cpu_supports ("sse4.2"))
+    assert (val == 8);
+  else if (__builtin_cpu_is ("core2"))
+    assert (val == 9);
+  else if (__builtin_cpu_is ("atom"))
+    assert (val == 10);
+  else
+    assert (val == 0);
+  
+  return 0;
+}
+
+int foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=corei7,popcnt")))
+foo ()
+{
+  return 5;
+}
+int __attribute__ ((target("avx2,ssse3")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("arch=core2")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("arch=corei7")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("arch=atom")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("arch=core2,sse4.2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("arch=amdfam10")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("arch=bdver1")))
+foo ()
+{
+  return 1;
+}
+
+int __attribute__ ((target("arch=bdver2")))
+foo ()
+{
+  return 2;
+}
Index: gcc/testsuite/g++.dg/mv3.C
===================================================================
--- gcc/testsuite/g++.dg/mv3.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv3.C	(revision 0)
@@ -0,0 +1,37 @@
+/* Test case to check if a call to a multiversioned function
+   is replaced with a direct call to the particular version when
+   the most specialized version's target attributes match the
+   caller.  
+  
+   In this program, foo is multiversioned but there is no default
+   function.  This is an error if the call has to go through a
+   dispatcher.  However, the call to foo in bar can be replaced
+   with a direct call to the popcnt version of foo.  Hence, this
+   test should pass.  */
+
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -mno-sse -mno-popcnt" } */
+
+
+/* Default version.  */
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target ("popcnt")))
+bar ()
+{
+  return foo ();
+}
+
+int main ()
+{
+  return bar ();
+}
Index: gcc/cp/class.c
===================================================================
--- gcc/cp/class.c	(revision 192623)
+++ gcc/cp/class.c	(working copy)
@@ -1087,6 +1087,31 @@ add_method (tree type, tree method, tree using_dec
 	      || same_type_p (TREE_TYPE (fn_type),
 			      TREE_TYPE (method_type))))
 	{
+	  /* For function versions, their parms and types match
+	     but they are not duplicates.  Record function versions
+	     as and when they are found.  */
+	  if (TREE_CODE (fn) == FUNCTION_DECL
+	      && TREE_CODE (method) == FUNCTION_DECL
+	      && (DECL_FUNCTION_SPECIFIC_TARGET (fn)
+		  || DECL_FUNCTION_SPECIFIC_TARGET (method))
+	      && targetm.target_option.function_versions (fn, method))
+ 	    {
+	      /* Mark functions as versions if necessary.  Modify the mangled
+		 decl name if necessary.  */
+	      if (!DECL_FUNCTION_VERSIONED (fn))
+		{
+		  DECL_FUNCTION_VERSIONED (fn) = 1;
+		  if (DECL_ASSEMBLER_NAME_SET_P (fn))
+		    mangle_decl (fn);
+		}
+	      if (!DECL_FUNCTION_VERSIONED (method))
+		{
+		  DECL_FUNCTION_VERSIONED (method) = 1;
+		  if (DECL_ASSEMBLER_NAME_SET_P (method))
+		    mangle_decl (method);
+		}
+	      continue;
+	    }
 	  if (DECL_INHERITED_CTOR_BASE (method))
 	    {
 	      if (DECL_INHERITED_CTOR_BASE (fn))
@@ -6995,6 +7020,7 @@ resolve_address_of_overloaded_function (tree targe
   tree matches = NULL_TREE;
   tree fn;
   tree target_fn_type;
+  VEC (tree, heap) *fn_ver_vec = NULL;
 
   /* By the time we get here, we should be seeing only real
      pointer-to-member types, not the internal POINTER_TYPE to
@@ -7059,9 +7085,19 @@ resolve_address_of_overloaded_function (tree targe
 	  if (DECL_ANTICIPATED (fn))
 	    continue;
 
-	  /* See if there's a match.  */
+	  /* See if there's a match.   For functions that are multi-versioned,
+	     all the versions match.  */
 	  if (same_type_p (target_fn_type, static_fn_type (fn)))
-	    matches = tree_cons (fn, NULL_TREE, matches);
+	    {
+	      matches = tree_cons (fn, NULL_TREE, matches);
+	      /*If versioned, push all possible versions into a vector.  */
+	      if (DECL_FUNCTION_VERSIONED (fn))
+		{
+		  if (fn_ver_vec == NULL)
+		   fn_ver_vec = VEC_alloc (tree, heap, 2);
+		  VEC_safe_push (tree, heap, fn_ver_vec, fn); 
+		}
+	    }
 	}
     }
 
@@ -7149,13 +7185,26 @@ resolve_address_of_overloaded_function (tree targe
     {
       /* There were too many matches.  First check if they're all
 	 the same function.  */
-      tree match;
+      tree match = NULL_TREE;
 
       fn = TREE_PURPOSE (matches);
-      for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
-	if (!decls_match (fn, TREE_PURPOSE (match)))
-	  break;
 
+      /* For multi-versioned functions, more than one match is just fine.
+	 Call decls_match to make sure they are different because they are
+	 versioned.  */
+      if (DECL_FUNCTION_VERSIONED (fn))
+	{
+          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+  	    if (decls_match (fn, TREE_PURPOSE (match)))
+	      break;
+	}
+      else
+	{
+          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+  	    if (!decls_match (fn, TREE_PURPOSE (match)))
+	      break;
+	}
+
       if (match)
 	{
 	  if (flags & tf_error)
@@ -7217,6 +7266,28 @@ resolve_address_of_overloaded_function (tree targe
       perform_or_defer_access_check (access_path, fn, fn, flags);
     }
 
+  /* If a pointer to a function that is multi-versioned is requested, the
+     pointer to the dispatcher function is returned instead.  This works
+     well because indirectly calling the function will dispatch the right
+     function version at run-time.  */
+  if (DECL_FUNCTION_VERSIONED (fn))
+    {
+      tree dispatcher_decl = NULL;
+      gcc_assert (fn_ver_vec != NULL);
+      gcc_assert (targetm.get_function_versions_dispatcher);
+      dispatcher_decl = targetm.get_function_versions_dispatcher (fn_ver_vec);
+      if (!dispatcher_decl)
+	{
+	  error_at (input_location, "Pointer to a multiversioned function"
+		    " without a default is not allowed");
+	  return error_mark_node;
+	}
+      retrofit_lang_decl (dispatcher_decl);
+      mark_used (fn);
+      VEC_free (tree, heap, fn_ver_vec);
+      fn = dispatcher_decl;
+    }
+
   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
     return cp_build_addr_expr (fn, flags);
   else
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	(revision 192623)
+++ gcc/cp/decl.c	(working copy)
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "pointer-set.h"
 #include "splay-tree.h"
 #include "plugin.h"
+#include "cgraph.h"
 
 /* Possible cases of bad specifiers type used by bad_specifiers. */
 enum bad_spec_place {
@@ -981,6 +982,29 @@ decls_match (tree newdecl, tree olddecl)
       if (t1 != t2)
 	return 0;
 
+      /* The decls dont match if they correspond to two different versions
+	 of the same function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) 
+	  && targetm.target_option.function_versions (newdecl, olddecl))
+	{
+	  /* Mark functions as versions if necessary.  Modify the mangled decl
+	     name if necessary.  */
+	  if (!DECL_FUNCTION_VERSIONED (newdecl))
+	    {
+	      DECL_FUNCTION_VERSIONED (newdecl) = 1;
+	      if (DECL_ASSEMBLER_NAME_SET_P (newdecl))
+	        mangle_decl (newdecl);
+	    }
+	  if (!DECL_FUNCTION_VERSIONED (olddecl))
+	    {
+	      DECL_FUNCTION_VERSIONED (olddecl) = 1;
+	      if (DECL_ASSEMBLER_NAME_SET_P (olddecl))
+	       mangle_decl (olddecl);
+	    }
+	  return 0;
+	}
+
       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
 	  && ! (DECL_EXTERN_C_P (newdecl)
 		&& DECL_EXTERN_C_P (olddecl)))
@@ -1499,7 +1523,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
 	      error ("previous declaration %q+#D here", olddecl);
 	      return NULL_TREE;
 	    }
-	  else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
+	  /* For function versions, params and types match, but they
+	     are not ambiguous.  */
+	  else if ((!DECL_FUNCTION_VERSIONED (newdecl)
+		    && !DECL_FUNCTION_VERSIONED (olddecl))
+		   && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
 			      TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
 	    {
 	      error ("new declaration %q#D", newdecl);
@@ -2272,6 +2300,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
+  /* If the olddecl is a version, so is the newdecl.  */
+  if (TREE_CODE (newdecl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (olddecl))
+    DECL_FUNCTION_VERSIONED (newdecl) = 1;
+
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
     {
       int function_size;
@@ -14227,7 +14260,11 @@ cxx_comdat_group (tree decl)
 	  else
 	    break;
 	}
-      name = DECL_ASSEMBLER_NAME (decl);
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl))
+	name = DECL_NAME (decl);
+      else
+        name = DECL_ASSEMBLER_NAME (decl);
     }
 
   return name;
Index: gcc/cp/error.c
===================================================================
--- gcc/cp/error.c	(revision 192623)
+++ gcc/cp/error.c	(working copy)
@@ -1541,8 +1541,16 @@ dump_exception_spec (tree t, int flags)
 static void
 dump_function_name (tree t, int flags)
 {
-  tree name = DECL_NAME (t);
+  tree name;
 
+  /* For function versions, use the assembler name as the decl name is
+     the same for all versions.  */
+  if (TREE_CODE (t) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (t))
+    name = DECL_ASSEMBLER_NAME (t);
+  else
+    name = DECL_NAME (t);
+
   /* We can get here with a decl that was synthesized by language-
      independent machinery (e.g. coverage.c) in which case it won't
      have a lang_specific structure attached and DECL_CONSTRUCTOR_P
Index: gcc/cp/semantics.c
===================================================================
--- gcc/cp/semantics.c	(revision 192623)
+++ gcc/cp/semantics.c	(working copy)
@@ -3813,8 +3813,11 @@ expand_or_defer_fn_1 (tree fn)
       /* If the user wants us to keep all inline functions, then mark
 	 this function as needed so that finish_file will make sure to
 	 output it later.  Similarly, all dllexport'd functions must
-	 be emitted; there may be callers in other DLLs.  */
-      if ((flag_keep_inline_functions
+	 be emitted; there may be callers in other DLLs.
+	 Also, mark this function as needed if it is marked inline but
+	 is a multi-versioned function.  */
+      if (((flag_keep_inline_functions
+	    || DECL_FUNCTION_VERSIONED (fn))
 	   && DECL_DECLARED_INLINE_P (fn)
 	   && !DECL_REALLY_EXTERN (fn))
 	  || (flag_keep_inline_dllexport
Index: gcc/cp/decl2.c
===================================================================
--- gcc/cp/decl2.c	(revision 192623)
+++ gcc/cp/decl2.c	(working copy)
@@ -674,9 +674,13 @@ check_classfn (tree ctype, tree function, tree tem
 	  if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
 	    continue;
 
+	  /* While finding a match, same types and params are not enough
+	     if the function is versioned.  Also check version ("target")
+	     attributes.  */
 	  if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
 			   TREE_TYPE (TREE_TYPE (fndecl)))
 	      && compparms (p1, p2)
+	      && !targetm.target_option.function_versions (function, fndecl)
 	      && (!is_template
 		  || comp_template_parms (template_parms,
 					  DECL_TEMPLATE_PARMS (fndecl)))
Index: gcc/cp/call.c
===================================================================
--- gcc/cp/call.c	(revision 192623)
+++ gcc/cp/call.c	(working copy)
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "timevar.h"
+#include "cgraph.h"
 
 /* The various kinds of conversion.  */
 
@@ -6444,6 +6445,35 @@ magic_varargs_p (tree fn)
   return false;
 }
 
+/* Returns the decl of the dispatcher function if FN is a function version.  */
+
+static tree
+get_function_version_dispatcher (tree fn)
+{
+  tree dispatcher_decl = NULL;
+  struct cgraph_node *node = NULL;
+
+  gcc_assert (TREE_CODE (fn) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (fn));
+
+  node = cgraph_get_node (fn);
+
+  if (node != NULL && node->function_version != NULL)
+    dispatcher_decl = node->function_version->dispatcher_resolver;
+  else
+    return NULL;
+
+  if (dispatcher_decl == NULL)
+    {
+      error_at (input_location, "Call to multiversioned function"
+                " without a default is not allowed");
+      return NULL;
+    }
+  retrofit_lang_decl (dispatcher_decl);
+  gcc_assert (dispatcher_decl != NULL);
+  return dispatcher_decl;
+}
+
 /* Subroutine of the various build_*_call functions.  Overload resolution
    has chosen a winning candidate CAND; build up a CALL_EXPR accordingly.
    ARGS is a TREE_LIST of the unconverted arguments to the call.  FLAGS is a
@@ -6896,6 +6926,20 @@ build_over_call (struct z_candidate *cand, int fla
   if (!already_used)
     mark_used (fn);
 
+  /* For calls to a multi-versioned function, overload resolution
+     returns the function with the highest target priority, that is,
+     the version that will checked for dispatching first.  If this
+     version is inlinable, a direct call to this version can be made
+     otherwise the call should go through the dispatcher.  */
+
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && !targetm.target_option.can_inline_p (current_function_decl, fn))
+    {
+      fn = get_function_version_dispatcher (fn);
+      if (fn == NULL)
+	return NULL;
+    }
+
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
     {
       tree t;
@@ -8176,6 +8220,38 @@ joust (struct z_candidate *cand1, struct z_candida
       && (IS_TYPE_OR_DECL_P (cand1->fn)))
     return 1;
 
+  /* For candidates of a multi-versioned function,  make the version with
+     the highest priority win.  This version will be checked for dispatching
+     first.  If this version can be inlined into the caller, the front-end
+     will simply make a direct call to this function.  */
+
+  if (TREE_CODE (cand1->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (cand1->fn)
+      && TREE_CODE (cand2->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (cand2->fn))
+    {
+      tree f1 = TREE_TYPE (cand1->fn);
+      tree f2 = TREE_TYPE (cand2->fn);
+      tree p1 = TYPE_ARG_TYPES (f1);
+      tree p2 = TYPE_ARG_TYPES (f2);
+     
+      /* Check if cand1->fn and cand2->fn are versions of the same function.  It
+         is possible that cand1->fn and cand2->fn are function versions but of
+         different functions.  Check types to see if they are versions of the same
+         function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)))
+	{
+	  /* Always make the version with the higher priority, more
+	     specialized, win.  */
+	  gcc_assert (targetm.compare_version_priority);
+	  if (targetm.compare_version_priority (cand1->fn, cand2->fn) >= 0)
+	    return 1;
+	  else
+	    return -1;
+	}
+    }
+
   /* a viable function F1
      is defined to be a better function than another viable function F2  if
      for  all arguments i, ICSi(F1) is not a worse conversion sequence than
@@ -8496,6 +8572,37 @@ tweak:
   return 0;
 }
 
+/* Function FN is multi-versioned and CANDIDATES contains the list of all
+   overloaded candidates for FN.  This function extracts all functions from
+   CANDIDATES that are function versions of FN and generates a dispatcher
+   function for this multi-versioned function group.  */
+
+static void
+generate_function_versions_dispatcher (tree fn, struct z_candidate *candidates)
+{
+  tree f1 = TREE_TYPE (fn);
+  tree p1 = TYPE_ARG_TYPES (f1);
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  struct z_candidate *ver = candidates;
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+
+  for (;ver; ver = ver->next)
+    {
+      tree f2 = TREE_TYPE (ver->fn);
+      tree p2 = TYPE_ARG_TYPES (f2);
+      /* If this candidate is a version of FN, types must match.  */
+      if (DECL_FUNCTION_VERSIONED (ver->fn)
+          && compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)))
+        VEC_safe_push (tree, heap, fn_ver_vec, ver->fn);
+    }
+
+  gcc_assert (targetm.get_function_versions_dispatcher);
+  targetm.get_function_versions_dispatcher (fn_ver_vec);
+  VEC_free (tree, heap, fn_ver_vec); 
+}
+
 /* Given a list of candidates for overloading, find the best one, if any.
    This algorithm has a worst case of O(2n) (winner is last), and a best
    case of O(n/2) (totally ambiguous); much better than a sorting
@@ -8548,6 +8655,22 @@ tourney (struct z_candidate *candidates, tsubst_fl
 	return NULL;
     }
 
+  /* For multiversioned functions, aggregate all the versions here for
+     generating the dispatcher body later if necessary.  Check to see if
+     the dispatcher is already generated to avoid doing this more than
+     once.  */
+
+  if (TREE_CODE (champ->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (champ->fn))
+    {
+      struct cgraph_node *champ_node = cgraph_get_node (champ->fn);
+      if (champ_node == NULL
+	  || champ_node->function_version == NULL
+	  || champ_node->function_version->dispatcher_resolver == NULL)
+        generate_function_versions_dispatcher (champ->fn, candidates);
+
+    }
+
   return champ;
 }
 
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 192623)
+++ gcc/config/i386/i386.c	(working copy)
@@ -62,6 +62,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "diagnostic.h"
 #include "dumpfile.h"
+#include "tree-pass.h"
+#include "tree-flow.h"
 
 enum upper_128bits_state
 {
@@ -28413,6 +28415,987 @@ ix86_init_mmx_sse_builtins (void)
     }
 }
 
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
+   to return a pointer to VERSION_DECL if the outcome of the expression
+   formed by PREDICATE_CHAIN is true.  This function will be called during
+   version dispatch to decide which function version to execute.  It returns
+   the basic block at the end, to which more conditions can be added.  */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     tree predicate_chain, basic_block new_bb)
+{
+  gimple return_stmt;
+  tree convert_expr, result_var;
+  gimple convert_stmt;
+  gimple call_cond_stmt;
+  gimple if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+
+  tree cond_var, and_expr_var = NULL_TREE;
+  gimple_seq gseq;
+
+  tree predicate_decl, predicate_arg;
+
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node, NULL);
+  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  return_stmt = gimple_build_return (result_var);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      return new_bb;
+    }
+
+  while (predicate_chain != NULL)
+    {
+      cond_var = create_tmp_var (integer_type_node, NULL);
+      predicate_decl = TREE_PURPOSE (predicate_chain);
+      predicate_arg = TREE_VALUE (predicate_chain);
+      call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+      gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+      gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+      gimple_set_bb (call_cond_stmt, new_bb);
+      gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+      predicate_chain = TREE_CHAIN (predicate_chain);
+      
+      if (and_expr_var == NULL)
+        and_expr_var = cond_var;
+      else
+	{
+	  gimple assign_stmt;
+	  /* Use MIN_EXPR to check if any integer is zero?.
+	     and_expr_var = min_expr <cond_var, and_expr_var>  */
+	  assign_stmt = gimple_build_assign (and_expr_var,
+			  build2 (MIN_EXPR, integer_type_node,
+				  cond_var, and_expr_var));
+
+	  gimple_set_block (assign_stmt, DECL_INITIAL (function_decl));
+	  gimple_set_bb (assign_stmt, new_bb);
+	  gimple_seq_add_stmt (&gseq, assign_stmt);
+	}
+    }
+
+  if_else_stmt = gimple_build_cond (GT_EXPR, and_expr_var,
+	  		            integer_zero_node,
+				    NULL_TREE, NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR, 0);
+
+  pop_cfun ();
+
+  return bb3;
+}
+
+/* This parses the attribute arguments to target in DECL and determines
+   the right builtin to use to match the platform specification.
+   It returns the priority value for this version decl.  If PREDICATE_LIST
+   is not NULL, it stores the list of cpu features that need to be checked
+   before dispatching this function.  */
+
+static unsigned int
+get_builtin_code_for_version (tree decl, tree *predicate_list)
+{
+  tree attrs;
+  struct cl_target_option cur_target;
+  tree target_node;
+  struct cl_target_option *new_target;
+  const char *arg_str = NULL;
+  const char *attrs_str = NULL;
+  char *tok_str = NULL;
+  char *token;
+
+  /* Priority of i386 features, greater value is higher priority.   This is
+     used to decide the order in which function dispatch must happen.  For
+     instance, a version specialized for SSE4.2 should be checked for dispatch
+     before a version for SSE3, as SSE4.2 implies SSE3.  */
+  enum feature_priority
+  {
+    P_ZERO = 0,
+    P_MMX,
+    P_SSE,
+    P_SSE2,
+    P_SSE3,
+    P_SSSE3,
+    P_PROC_SSSE3,
+    P_SSE4_a,
+    P_PROC_SSE4_a,
+    P_SSE4_1,
+    P_SSE4_2,
+    P_PROC_SSE4_2,
+    P_POPCNT,
+    P_AVX,
+    P_AVX2,
+    P_FMA,
+    P_PROC_FMA
+  };
+
+ enum feature_priority priority = P_ZERO;
+
+  /* These are the target attribute strings for which a dispatcher is
+     available, from fold_builtin_cpu.  */
+
+  static struct _feature_list
+    {
+      const char *const name;
+      const enum feature_priority priority;
+    }
+  const feature_list[] =
+    {
+      {"mmx", P_MMX},
+      {"sse", P_SSE},
+      {"sse2", P_SSE2},
+      {"sse3", P_SSE3},
+      {"ssse3", P_SSSE3},
+      {"sse4.1", P_SSE4_1},
+      {"sse4.2", P_SSE4_2},
+      {"popcnt", P_POPCNT},
+      {"avx", P_AVX},
+      {"avx2", P_AVX2}
+    };
+
+
+  static unsigned int NUM_FEATURES
+    = sizeof (feature_list) / sizeof (struct _feature_list);
+
+  unsigned int i;
+
+  tree predicate_chain = NULL_TREE;
+  tree predicate_decl, predicate_arg;
+
+  attrs = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  gcc_assert (attrs != NULL);
+
+  attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+  gcc_assert (TREE_CODE (attrs) == STRING_CST);
+  attrs_str = TREE_STRING_POINTER (attrs);
+
+
+  /* Handle arch= if specified.  For priority, set it to be 1 more than
+     the best instruction set the processor can handle.  For instance, if
+     there is a version for atom and a version for ssse3 (the highest ISA
+     priority for atom), the atom version must be checked for dispatch
+     before the ssse3 version. */
+  if (strstr (attrs_str, "arch=") != NULL)
+    {
+      cl_target_option_save (&cur_target, &global_options);
+      target_node = ix86_valid_target_attribute_tree (attrs);
+    
+      gcc_assert (target_node);
+      new_target = TREE_TARGET_OPTION (target_node);
+      gcc_assert (new_target);
+      
+      if (new_target->arch_specified && new_target->arch > 0)
+	{
+	  switch (new_target->arch)
+	    {
+	    case PROCESSOR_CORE2_32:
+	    case PROCESSOR_CORE2_64:
+	      arg_str = "core2";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_COREI7_32:
+	    case PROCESSOR_COREI7_64:
+	      arg_str = "corei7";
+	      priority = P_PROC_SSE4_2;
+	      break;
+	    case PROCESSOR_ATOM:
+	      arg_str = "atom";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_AMDFAM10:
+	      arg_str = "amdfam10h";
+	      priority = P_PROC_SSE4_a;
+	      break;
+	    case PROCESSOR_BDVER1:
+	      arg_str = "bdver1";
+	      priority = P_PROC_FMA;
+	      break;
+	    case PROCESSOR_BDVER2:
+	      arg_str = "bdver2";
+	      priority = P_PROC_FMA;
+	      break;
+	    }  
+	}    
+    
+      cl_target_option_restore (&global_options, &cur_target);
+	
+      if (predicate_list && arg_str == NULL)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+	    	"No dispatcher found for the versioning attributes");
+	  return 0;
+	}
+    
+      if (predicate_list)
+	{
+          predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
+          /* For a C string literal the length includes the trailing NULL.  */
+          predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+          predicate_chain = tree_cons (predicate_decl, predicate_arg,
+				       predicate_chain);
+	}
+    }
+
+  /* Process feature name.  */
+  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
+  strcpy (tok_str, attrs_str);
+  token = strtok (tok_str, ",");
+  predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
+
+  while (token != NULL)
+    {
+      /* Do not process "arch="  */
+      if (strncmp (token, "arch=", 5) == 0)
+	{
+	  token = strtok (NULL, ",");
+	  continue;
+	}
+      for (i = 0; i < NUM_FEATURES; ++i)
+	{
+	  if (strcmp (token, feature_list[i].name) == 0)
+	    {
+	      if (predicate_list)
+		{
+		  predicate_arg = build_string_literal (
+				  strlen (feature_list[i].name) + 1,
+				  feature_list[i].name);
+		  predicate_chain = tree_cons (predicate_decl, predicate_arg,
+					       predicate_chain);
+		}
+	      /* Find the maximum priority feature.  */
+	      if (feature_list[i].priority > priority)
+		priority = feature_list[i].priority;
+
+	      break;
+	    }
+	}
+      if (predicate_list && i == NUM_FEATURES)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+		    "No dispatcher found for %s", token);
+	  return 0;
+	}
+      token = strtok (NULL, ",");
+    }
+  free (tok_str);
+
+  if (predicate_list && predicate_chain == NULL_TREE)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+	        "No dispatcher found for the versioning attributes : %s",
+	        attrs_str);
+      return 0;
+    }
+  else if (predicate_list)
+    {
+      predicate_chain = nreverse (predicate_chain);
+      *predicate_list = predicate_chain;
+    }
+
+  return priority; 
+}
+
+/* This compares the priority of target features in function DECL1
+   and DECL2.  It returns positive value if DECL1 is higher priority,
+   negative value if DECL2 is higher priority and 0 if they are the
+   same.  */
+
+static int
+ix86_compare_version_priority (tree decl1, tree decl2)
+{
+  unsigned int priority1 = 0;
+  unsigned int priority2 = 0;
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl1)) != NULL)
+    priority1 = get_builtin_code_for_version (decl1, NULL);
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl2)) != NULL)
+    priority2 = get_builtin_code_for_version (decl2, NULL);
+
+  return (int)priority1 - (int)priority2;
+}
+
+/* V1 and V2 point to function versions with different priorities
+   based on the target ISA.  This function compares their priorities.  */
+ 
+static int
+feature_compare (const void *v1, const void *v2)
+{
+  typedef struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    } function_version_info;
+
+  const function_version_info c1 = *(const function_version_info *)v1;
+  const function_version_info c2 = *(const function_version_info *)v2;
+  return (c2.dispatch_priority - c1.dispatch_priority);
+}
+
+/* This function generates the dispatch function for
+   multi-versioned functions.  DISPATCH_DECL is the function which will
+   contain the dispatch logic.  FNDECLS are the function choices for
+   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
+   in DISPATCH_DECL in which the dispatch code is generated.  */
+
+static int
+dispatch_function_versions (tree dispatch_decl,
+			    void *fndecls_p,
+			    basic_block *empty_bb)
+{
+  tree default_decl;
+  gimple ifunc_cpu_init_stmt;
+  gimple_seq gseq;
+  int ix;
+  tree ele;
+  VEC (tree, heap) *fndecls;
+  unsigned int num_versions = 0;
+  unsigned int actual_versions = 0;
+  unsigned int i;
+
+  struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    }*function_version_info;
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /*fndecls_p is actually a vector.  */
+  fndecls = (VEC (tree, heap) *)fndecls_p;
+
+  /* At least one more version other than the default.  */
+  num_versions = VEC_length (tree, fndecls);
+  gcc_assert (num_versions >= 2);
+
+  function_version_info = (struct _function_version_info *)
+    XNEWVEC (struct _function_version_info, (num_versions - 1));
+
+  /* The first version in the vector is the default decl.  */
+  default_decl = VEC_index (tree, fndecls, 0);
+
+  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
+
+  gseq = bb_seq (*empty_bb);
+  /* Function version dispatch is via IFUNC.  IFUNC resolvers fire before
+     constructors, so explicity call __builtin_cpu_init here.  */
+  ifunc_cpu_init_stmt = gimple_build_call_vec (
+                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
+  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
+  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
+  set_bb_seq (*empty_bb, gseq);
+
+  pop_cfun ();
+
+
+  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
+    {
+      tree version_decl = ele;
+      tree predicate_chain = NULL_TREE;
+      unsigned int priority;
+      /* Get attribute string, parse it and find the right predicate decl.
+         The predicate function could be a lengthy combination of many
+	 features, like arch-type and various isa-variants.  */
+      priority = get_builtin_code_for_version (version_decl,
+	 			               &predicate_chain);
+
+      if (predicate_chain == NULL_TREE)
+	continue;
+
+      actual_versions++;
+      function_version_info [ix - 1].version_decl = version_decl;
+      function_version_info [ix - 1].predicate_chain = predicate_chain;
+      function_version_info [ix - 1].dispatch_priority = priority;
+    }
+
+  /* Sort the versions according to descending order of dispatch priority.  The
+     priority is based on the ISA.  This is not a perfect solution.  There
+     could still be ambiguity.  If more than one function version is suitable
+     to execute,  which one should be dispatched?  In future, allow the user
+     to specify a dispatch  priority next to the version.  */
+  qsort (function_version_info, actual_versions,
+         sizeof (struct _function_version_info), feature_compare);
+
+  for  (i = 0; i < actual_versions; ++i)
+    *empty_bb = add_condition_to_bb (dispatch_decl,
+				     function_version_info[i].version_decl,
+				     function_version_info[i].predicate_chain,
+				     *empty_bb);
+
+  /* dispatch default version at the end.  */
+  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl,
+				   NULL, *empty_bb);
+
+  free (function_version_info);
+  return 0;
+}
+
+/* This function returns true if FN1 and FN2 are versions of the same function,
+   that is, the targets of the function decls are different.  This assumes
+   that FN1 and FN2 have the same signature.  */
+
+static bool
+ix86_function_versions (tree fn1, tree fn2)
+{
+  tree attr1, attr2;
+  struct cl_target_option *target1, *target2;
+
+  if (TREE_CODE (fn1) != FUNCTION_DECL
+      || TREE_CODE (fn2) != FUNCTION_DECL)
+    return false;
+
+  attr1 = DECL_FUNCTION_SPECIFIC_TARGET (fn1);
+  attr2 = DECL_FUNCTION_SPECIFIC_TARGET (fn2);
+
+  /* Atleast one function decl should have target attribute specified.  */
+  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
+    return false;
+
+  if (attr1 == NULL_TREE)
+    attr1 = target_option_default_node;
+  else if (attr2 == NULL_TREE)
+    attr2 = target_option_default_node;
+
+  target1 = TREE_TARGET_OPTION (attr1);
+  target2 = TREE_TARGET_OPTION (attr2);
+
+  /* target1 and target2 must be different in some way.  */
+  if (target1->x_ix86_isa_flags == target2->x_ix86_isa_flags
+      && target1->x_target_flags == target2->x_target_flags
+      && target1->arch == target2->arch
+      && target1->tune == target2->tune
+      && target1->x_ix86_fpmath == target2->x_ix86_fpmath
+      && target1->branch_cost == target2->branch_cost)
+    return false;
+
+  return true;
+}
+
+/* Comparator function to be used in qsort routine to sort attribute
+   specification strings to "target".  */
+
+static int
+attr_strcmp (const void *v1, const void *v2)
+{
+  const char *c1 = *(char *const*)v1;
+  const char *c2 = *(char *const*)v2;
+  return strcmp (c1, c2);
+}
+
+/* STR is the argument to target attribute.  This function tokenizes
+   the comma separated arguments, sorts them and returns a string which
+   is a unique identifier for the comma separated arguments.   It also
+   replaces non-identifier characters "=,-" with "_".  */
+
+static char *
+sorted_attr_string (const char *str)
+{
+  char **args = NULL;
+  char *attr_str, *ret_str;
+  char *attr = NULL;
+  unsigned int argnum = 1;
+  unsigned int i;
+
+  for (i = 0; i < strlen (str); i++)
+    if (str[i] == ',')
+      argnum++;
+
+  attr_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (attr_str, str);
+
+  /* Replace "=,-" with "_".  */
+  for (i = 0; i < strlen (attr_str); i++)
+    if (attr_str[i] == '=' || attr_str[i]== '-')
+      attr_str[i] = '_';
+
+  if (argnum == 1)
+    return attr_str;
+
+  args = XNEWVEC (char *, argnum);
+
+  i = 0;
+  attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      args[i] = attr;
+      i++;
+      attr = strtok (NULL, ",");
+    }
+
+  qsort (args, argnum, sizeof (char*), attr_strcmp);
+
+  ret_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (ret_str, args[0]);
+  for (i = 1; i < argnum; i++)
+    {
+      strcat (ret_str, "_");
+      strcat (ret_str, args[i]);
+    }
+
+  free (args);
+  free (attr_str);
+  return ret_str;
+}
+
+/* This function changes the assembler name for functions that are
+   versions.  If DECL is a function version and has a "target"
+   attribute, it appends the attribute string to its assembler name.  */
+
+static tree
+ix86_mangle_function_version_assembler_name (tree decl, tree id)
+{
+  tree version_attr;
+  const char *orig_name, *version_string, *attr_str;
+  char *assembler_name;
+
+  if (DECL_DECLARED_INLINE_P (decl)
+      && lookup_attribute ("gnu_inline",
+			   DECL_ATTRIBUTES (decl)))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Function versions cannot be marked as gnu_inline,"
+	      " bodies have to be generated");
+
+  if (DECL_VIRTUAL_P (decl)
+      || DECL_VINDEX (decl))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Virtual function versioning not supported\n");
+
+  version_attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+
+  /* target attribute string is NULL for default functions.  */
+  if (version_attr == NULL_TREE)
+    return id;
+
+  orig_name = IDENTIFIER_POINTER (id);
+  version_string
+    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
+
+  attr_str = sorted_attr_string (version_string);
+  assembler_name = (char *) xmalloc (strlen (orig_name)
+				     + strlen (attr_str) + 2);
+
+  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
+
+  /* Allow assembler name to be modified if already set.  */
+  if (DECL_ASSEMBLER_NAME_SET_P (decl))
+    SET_DECL_RTL (decl, NULL);
+
+  return get_identifier (assembler_name);
+}
+
+static tree 
+ix86_mangle_decl_assembler_name (tree decl, tree id)
+{
+  /* For function version, add the target suffix to the assembler name.  */
+  if (TREE_CODE (decl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (decl))
+    return ix86_mangle_function_version_assembler_name (decl, id);
+
+  return id;
+}
+
+/* Return a new name by appending SUFFIX to the DECL name.  If make_unique
+   is true, append the full path name of the source file.  */
+
+static char *
+make_name (tree decl, const char *suffix, bool make_unique)
+{
+  char *global_var_name;
+  int name_len;
+  const char *name;
+  const char *unique_name = NULL;
+
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* Get a unique name that can be used globally without any chances
+     of collision at link time.  */
+  if (make_unique)
+    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
+
+  name_len = strlen (name) + strlen (suffix) + 2;
+
+  if (make_unique)
+    name_len += strlen (unique_name) + 1;
+  global_var_name = XNEWVEC (char, name_len);
+
+  /* Use '.' to concatenate names as it is demangler friendly.  */
+  if (make_unique)
+      snprintf (global_var_name, name_len, "%s.%s.%s", name,
+		unique_name, suffix);
+  else
+      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
+
+  return global_var_name;
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher
+   by the front-end.  Return the decl created.  */
+
+static tree
+make_dispatcher_decl (const tree decl)
+{
+  tree func_decl;
+  char *func_name, *resolver_name;
+  tree fn_type, func_type;
+  bool is_uniq = false;
+
+  if (TREE_PUBLIC (decl) == 0)
+    is_uniq = true;
+
+  func_name = make_name (decl, "ifunc", is_uniq);
+  resolver_name = make_name (decl, "resolver", is_uniq);
+  gcc_assert (resolver_name);
+
+  fn_type = TREE_TYPE (decl);
+  func_type = build_function_type (TREE_TYPE (fn_type),
+				   TYPE_ARG_TYPES (fn_type));
+  
+  func_decl = build_fn_decl (func_name, func_type);
+  TREE_USED (func_decl) = 1;
+  DECL_CONTEXT (func_decl) = NULL_TREE;
+  DECL_INITIAL (func_decl) = error_mark_node;
+  DECL_ARTIFICIAL (func_decl) = 1;
+  /* Mark this func as external, the resolver will flip it again if
+     it gets generated.  */
+  DECL_EXTERNAL (func_decl) = 1;
+  /* This will be of type IFUNCs have to be externally visible.  */
+  TREE_PUBLIC (func_decl) = 1;
+
+  return func_decl;  
+}
+
+/* Returns true if decl is multi-versioned and DECL is the default function,
+   that is it is not tagged with target specific optimization.  */
+
+static bool
+is_function_default_version (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL_TREE);
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher
+   by the front-end.  It also chains the cgraph nodes of all the
+   semantically identical versions in vector FN_VER_VEC_P.  Returns the
+   decl of the dispatcher function.  */
+
+static tree
+ix86_get_function_versions_dispatcher (void *fn_ver_vec_p)
+{
+  struct cgraph_node *node = NULL;
+  struct cgraph_node *default_node = NULL;
+  struct cgraph_node *dispatcher_node = NULL;
+  int ix;
+  tree ele;
+  tree dispatch_decl = NULL;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+
+  fn_ver_vec = (VEC (tree,heap) *) fn_ver_vec_p;
+  gcc_assert (fn_ver_vec != NULL);
+
+  /* Find the default version.  */
+  for (ix = 0; VEC_iterate (tree, fn_ver_vec, ix, ele); ++ix)
+    {
+      if (is_function_default_version (ele))
+	{
+	  default_node = cgraph_get_create_node (ele);
+	  break;
+	}
+    }
+
+  /* If there is no default node, just return NULL.  */
+  if (!default_node)
+    return NULL;
+
+  /* If the dispatcher is already there, return it.  */
+  if (default_node->function_version
+      && default_node->function_version->dispatcher_resolver)
+    return default_node->function_version->dispatcher_resolver;
+
+  if (default_node->function_version == NULL)
+    default_node->function_version
+      = ggc_alloc_cleared_cgraph_function_version_info ();
+
+#if defined (ASM_OUTPUT_TYPE_DIRECTIVE) && HAVE_GNU_INDIRECT_FUNCTION
+  /* Right now, the dispatching is done via ifunc.  */
+  dispatch_decl = make_dispatcher_decl (default_node->symbol.decl); 
+#else
+  error_at (DECL_SOURCE_LOCATION (default_node->symbol.decl),
+	    "Multiversioning needs ifunc which is not supported "
+	    "in this configuration");
+#endif
+  default_node->function_version->dispatcher_resolver = dispatch_decl;
+  dispatcher_node = cgraph_get_create_node (dispatch_decl);
+  gcc_assert (dispatcher_node);
+  dispatcher_node->dispatcher_function = 1;
+  dispatcher_node->function_version
+    = ggc_alloc_cleared_cgraph_function_version_info ();
+  cgraph_mark_address_taken_node (default_node);
+
+  for (ix = 0; VEC_iterate (tree, fn_ver_vec, ix, ele); ++ix)
+    {
+      node = cgraph_get_create_node (ele);
+      gcc_assert (node != NULL && DECL_FUNCTION_VERSIONED (ele));
+
+      if (node == default_node)
+	continue;
+
+      if (node->function_version == NULL)
+	node->function_version
+	  = ggc_alloc_cleared_cgraph_function_version_info ();
+
+      gcc_assert (DECL_FUNCTION_SPECIFIC_TARGET (ele) != NULL_TREE);
+
+      if (dispatcher_node->function_version->next)
+ 	{
+	  struct cgraph_node *dispatcher_node_next
+	    = dispatcher_node->function_version->next;
+	  node->function_version->next = dispatcher_node_next;
+	  dispatcher_node_next->function_version->prev = node;
+	}
+
+      dispatcher_node->function_version->next = node;
+      node->function_version->prev = dispatcher_node;
+      node->function_version->dispatcher_resolver = dispatch_decl;
+    }
+
+  /* The default version should be the first node.  */
+  default_node->function_version->next = dispatcher_node->function_version->next;
+  (dispatcher_node->function_version->next)->function_version->prev
+     = default_node;
+  /* The dispatcher node should directly point to the default node.  */
+  dispatcher_node->function_version->next = default_node;
+  
+  return dispatch_decl;
+}
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* Make the resolver function decl to dispatch the versions of
+   a multi-versioned function,  DEFAULT_DECL.  Create an
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_resolver_func (const tree default_decl,
+		    const tree dispatch_decl,
+		    basic_block *empty_bb)
+{
+  char *resolver_name;
+  tree decl, type, decl_name, t;
+  basic_block new_bb;
+  bool is_uniq = false;
+
+  /* IFUNC's have to be globally visible.  So, if the default_decl is
+     not, then the name of the IFUNC should be made unique.  */
+  if (TREE_PUBLIC (default_decl) == 0)
+    is_uniq = true;
+
+  /* Append the filename to the resolver function if the versions are
+     not externally visible.  This is because the resolver function has
+     to be externally visible for the loader to find it.  So, appending
+     the filename will prevent conflicts with a resolver function from
+     another module which is based on the same version name.  */
+  resolver_name = make_name (default_decl, "resolver", is_uniq);
+
+  /* The resolver function should return a (void *). */
+  type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  decl_name = get_identifier (resolver_name);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  /* IFUNC resolvers have to be externally visible.  */
+  TREE_PUBLIC (decl) = 1;
+  DECL_UNINLINABLE (decl) = 1;
+
+  DECL_EXTERNAL (decl) = 0;
+  DECL_EXTERNAL (dispatch_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+  TREE_READONLY (decl) = 0;
+  DECL_PURE_P (decl) = 0;
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
+      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
+    }
+  else if (TREE_PUBLIC (default_decl))
+    {
+      /* In this case, each translation unit with a call to this
+	 versioned function will put out a resolver.  Ensure it
+	 is comdat to keep just one copy.  */
+      DECL_COMDAT (decl) = 1;
+      make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+    }
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  gimple_register_cfg_hooks ();
+  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
+  cfun->curr_properties |=
+    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_ssa
+     | PROP_gimple_any);
+  cfun->curr_properties = 15;
+  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
+  *empty_bb = new_bb;
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      gcc_assert (cgraph_get_node (default_decl));
+      symtab_add_to_same_comdat_group (
+	(symtab_node) cgraph_get_node (decl),
+	(symtab_node) cgraph_get_node (default_decl));
+    }
+
+  pop_cfun ();
+
+  gcc_assert (dispatch_decl != NULL);
+  /* Mark dispatch_decl as "ifunc" with resolver as resolver_name.  */
+  DECL_ATTRIBUTES (dispatch_decl) 
+    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));
+
+  /* Create the alias for dispatch to resolver here.  */
+  cgraph_create_function_alias (dispatch_decl, decl);
+  return decl;
+}
+
+/* Generate the dispatching code body to dispatch multi-versioned function
+   DECL.  The target hook is called to process the "target" attributes and
+   provide the code to dispatch the right function at run-time.  NODE points
+   to the dispatcher decl whose body will be created.  */
+
+static tree 
+ix86_generate_version_dispatcher_body (void *node_p)
+{
+  tree resolver_decl;
+  basic_block empty_bb;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  tree default_ver_decl;
+  struct cgraph_node *versn;
+  struct cgraph_node *node;
+
+  node = (cgraph_node *)node_p;
+
+  gcc_assert (node->dispatcher_function
+	      && (node->function_version != NULL));
+
+  if (node->function_version->dispatcher_resolver)
+    return node->function_version->dispatcher_resolver;
+
+  default_ver_decl = (node->function_version->next)->symbol.decl;
+  resolver_decl = make_resolver_func (default_ver_decl,
+				      node->symbol.decl, &empty_bb);
+  node->function_version->dispatcher_resolver = resolver_decl;
+
+  push_cfun (DECL_STRUCT_FUNCTION (resolver_decl));
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+
+  for (versn = node->function_version->next; versn;
+       versn = versn->function_version->next)
+    {
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (versn->symbol.decl))
+        error_at (DECL_SOURCE_LOCATION (versn->symbol.decl),
+		  "Virtual function multiversioning not supported");
+      VEC_safe_push (tree, heap, fn_ver_vec, versn->symbol.decl);
+      gcc_assert (versn->function_version);
+    }
+
+  dispatch_function_versions(resolver_decl, fn_ver_vec, &empty_bb);
+
+  rebuild_cgraph_edges (); 
+  pop_cfun ();
+  return resolver_decl;
+}
+
 /* This builds the processor_model struct type defined in
    libgcc/config/i386/cpuinfo.c  */
 
@@ -41005,6 +41988,9 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_PROFILE_BEFORE_PROLOGUE
 #define TARGET_PROFILE_BEFORE_PROLOGUE ix86_profile_before_prologue
 
+#undef TARGET_MANGLE_DECL_ASSEMBLER_NAME
+#define TARGET_MANGLE_DECL_ASSEMBLER_NAME ix86_mangle_decl_assembler_name
+
 #undef TARGET_ASM_UNALIGNED_HI_OP
 #define TARGET_ASM_UNALIGNED_HI_OP TARGET_ASM_ALIGNED_HI_OP
 #undef TARGET_ASM_UNALIGNED_SI_OP
@@ -41098,6 +42084,17 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN ix86_fold_builtin
 
+#undef TARGET_COMPARE_VERSION_PRIORITY
+#define TARGET_COMPARE_VERSION_PRIORITY ix86_compare_version_priority
+
+#undef TARGET_GENERATE_VERSION_DISPATCHER_BODY
+#define TARGET_GENERATE_VERSION_DISPATCHER_BODY \
+  ix86_generate_version_dispatcher_body
+
+#undef TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+#define TARGET_GET_FUNCTION_VERSIONS_DISPATCHER \
+  ix86_get_function_versions_dispatcher
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
 
@@ -41238,6 +42235,9 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_OPTION_PRINT
 #define TARGET_OPTION_PRINT ix86_function_specific_print
 
+#undef TARGET_OPTION_FUNCTION_VERSIONS
+#define TARGET_OPTION_FUNCTION_VERSIONS ix86_function_versions
+
 #undef TARGET_CAN_INLINE_P
 #define TARGET_CAN_INLINE_P ix86_can_inline_p
 
Index: gcc/testsuite/gcc.target/i386/funcspec-4.c
===================================================================
--- gcc/testsuite/gcc.target/i386/funcspec-4.c	(revision 192623)
+++ gcc/testsuite/gcc.target/i386/funcspec-4.c	(working copy)
@@ -12,3 +12,4 @@ extern void error3 (void) __attribute__((__target_
 
 /* option on a variable */
 extern int error4 __attribute__((__target__("sse2"))); /* { dg-warning "ignored" } */
+/* { dg-excess-errors "ignored" } */

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-10-20  4:29                                                                             ` Sriraman Tallam
  2012-10-23 21:21                                                                               ` Sriraman Tallam
@ 2012-10-26 14:11                                                                               ` Diego Novillo
  1 sibling, 0 replies; 93+ messages in thread
From: Diego Novillo @ 2012-10-26 14:11 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Jason Merrill, Jan Hubicka, Xinliang David Li, mark, nathan,
	H.J. Lu, Richard Guenther, Uros Bizjak, reply, GCC Patches

On Fri, Oct 19, 2012 at 10:33 PM, Sriraman Tallam <tmsriram@google.com> 
wrote:

> Yes, the support is only for C++ for now. "target" attribute is not
> new and if the user tries to use this with 'C' then a duplicate
> defintion error would occur just like now.
> I have plans to implement this for C too.

Would it be hard to emit a diagnostic that specifically states that 
"target" is not a valid attribute in C?

> So, callee here is the dispatcher function and it points to the set of
> semantically identical function versions. At this point, the
> dispatcher (callee) should have all the function versions chained in
> function_version, which is what the assert is checking.

Great, could you add this explanation as a comment?  It wasn't at all 
clear to me what was going on.

>>> +  enum feature_priority
>>> +  {
>>> +    P_ZERO = 0,
>>> +    P_MMX,
>>> +    P_SSE,
>>> +    P_SSE2,
>>> +    P_SSE3,
>>> +    P_SSSE3,
>>> +    P_PROC_SSSE3,
>>> +    P_SSE4_a,
>>> +    P_PROC_SSE4_a,
>>> +    P_SSE4_1,
>>> +    P_SSE4_2,
>>> +    P_PROC_SSE4_2,
>>> +    P_POPCNT,
>>> +    P_AVX,
>>> +    P_AVX2,
>>> +    P_FMA,
>>> +    P_PROC_FMA
>>> +  };
>>
>>
>> There's no need to have this list dynamically defined, right?
>
> I dont understand, why expose the enum outside the function?

To allow altering the list of priorities.  But if it doesn't make sense, 
ignore me.

The patch is OK with the changes above addressed.  Thanks.

Please be on the lookout for failures.


Diego.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-10-23 21:21                                                                               ` Sriraman Tallam
@ 2012-10-26 16:53                                                                                 ` Jan Hubicka
  2012-10-28  4:31                                                                                   ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: Jan Hubicka @ 2012-10-26 16:53 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Diego Novillo, Jason Merrill, Jan Hubicka, Xinliang David Li,
	mark, nathan, H.J. Lu, Richard Guenther, Uros Bizjak, reply,
	GCC Patches

Hi,
sorry for jumping in late, for too long I did not had chnce to look at my TODO.
I have two comments...
> Index: gcc/cgraphbuild.c
> ===================================================================
> --- gcc/cgraphbuild.c	(revision 192623)
> +++ gcc/cgraphbuild.c	(working copy)
> @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "ipa-utils.h"
>  #include "except.h"
>  #include "ipa-inline.h"
> +#include "target.h"
>  
>  /* Context of record_reference.  */
>  struct record_reference_ctx
> @@ -317,8 +318,23 @@ build_cgraph_edges (void)
>  							 bb);
>  	      decl = gimple_call_fndecl (stmt);
>  	      if (decl)
> -		cgraph_create_edge (node, cgraph_get_create_node (decl),
> -				    stmt, bb->count, freq);
> +		{
> +		  struct cgraph_node *callee = cgraph_get_create_node (decl);
> +	          /* If a call to a multiversioned function dispatcher is
> +		     found, generate the body to dispatch the right function
> +		     at run-time.  */
> +		  if (callee->dispatcher_function)
> +		    {
> +		      tree resolver_decl;
> +		      gcc_assert (callee->function_version
> +				  && callee->function_version->next);
> +		      gcc_assert (targetm.generate_version_dispatcher_body);
> +		      resolver_decl
> +			 = targetm.generate_version_dispatcher_body (callee);
> +		      gcc_assert (resolver_decl != NULL_TREE);
> +		    }
> +		  cgraph_create_edge (node, callee, stmt, bb->count, freq);
> +	        }
I do not really think resolver generation belongs here + I would preffer
build_cgraph_edges to really just build the edges.
> Index: gcc/cgraph.c
> ===================================================================
> --- gcc/cgraph.c	(revision 192623)
> +++ gcc/cgraph.c	(working copy)
> @@ -1277,6 +1277,16 @@ cgraph_mark_address_taken_node (struct cgraph_node
>    node->symbol.address_taken = 1;
>    node = cgraph_function_or_thunk_node (node, NULL);
>    node->symbol.address_taken = 1;
> +  /* If the address of a multiversioned function dispatcher is taken,
> +     generate the body to dispatch the right function at run-time.  This
> +     is needed as the address can be used to do an indirect call.  */
> +  if (node->dispatcher_function)
> +    {
> +      gcc_assert (node->function_version
> +		  && node->function_version->next);
> +      gcc_assert (targetm.generate_version_dispatcher_body);
> +      targetm.generate_version_dispatcher_body (node);
> +    }

Similarly here.  I also think this way you will miss aliases of the multiversioned
functions.

I am not sure why the multiversioning is tied with the cgraph build and the
datastructure is put into cgraph_node itself.  It seems to me that your
dispatchers are in a way related to thunks - i.e. they are inserted into
callgraph and once they become reachable their body needs to be produced.  I
think generate_version_dispatcher_body should thus probably be done from
cgraph_analyze_function. (to make the function to be seen by analyze_function
you will need to make it to be finalized at the time you set
dispatcher_function flag.

I would also put the dispatcher datastructure into on-side hash by node->uid.
(i.e. these are rare and thus the datastructure should be small)
symbol table is critical for WPA stage memory use and I plan to remove as much
as possible from the nodes in near future. For this reason I would preffer
to not add too much of stuff that is not going to be used by majority of nodes.

Honza

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-10-26 16:53                                                                                 ` Jan Hubicka
@ 2012-10-28  4:31                                                                                   ` Sriraman Tallam
  2012-10-29 13:05                                                                                     ` Jan Hubicka
  2012-10-30 19:18                                                                                     ` Jason Merrill
  0 siblings, 2 replies; 93+ messages in thread
From: Sriraman Tallam @ 2012-10-28  4:31 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Diego Novillo, Jason Merrill, Jan Hubicka, Xinliang David Li,
	Mark Mitchell, Nathan Sidwell, H.J. Lu, Richard Guenther,
	Uros Bizjak, reply, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 4030 bytes --]

Hi Diego and Honza,

   I have made all the changes mentioned and attached the new patch.

Thanks,
-Sri.

On Fri, Oct 26, 2012 at 8:54 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> sorry for jumping in late, for too long I did not had chnce to look at my TODO.
> I have two comments...
>> Index: gcc/cgraphbuild.c
>> ===================================================================
>> --- gcc/cgraphbuild.c (revision 192623)
>> +++ gcc/cgraphbuild.c (working copy)
>> @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "ipa-utils.h"
>>  #include "except.h"
>>  #include "ipa-inline.h"
>> +#include "target.h"
>>
>>  /* Context of record_reference.  */
>>  struct record_reference_ctx
>> @@ -317,8 +318,23 @@ build_cgraph_edges (void)
>>                                                        bb);
>>             decl = gimple_call_fndecl (stmt);
>>             if (decl)
>> -             cgraph_create_edge (node, cgraph_get_create_node (decl),
>> -                                 stmt, bb->count, freq);
>> +             {
>> +               struct cgraph_node *callee = cgraph_get_create_node (decl);
>> +               /* If a call to a multiversioned function dispatcher is
>> +                  found, generate the body to dispatch the right function
>> +                  at run-time.  */
>> +               if (callee->dispatcher_function)
>> +                 {
>> +                   tree resolver_decl;
>> +                   gcc_assert (callee->function_version
>> +                               && callee->function_version->next);
>> +                   gcc_assert (targetm.generate_version_dispatcher_body);
>> +                   resolver_decl
>> +                      = targetm.generate_version_dispatcher_body (callee);
>> +                   gcc_assert (resolver_decl != NULL_TREE);
>> +                 }
>> +               cgraph_create_edge (node, callee, stmt, bb->count, freq);
>> +             }
> I do not really think resolver generation belongs here + I would preffer
> build_cgraph_edges to really just build the edges.
>> Index: gcc/cgraph.c
>> ===================================================================
>> --- gcc/cgraph.c      (revision 192623)
>> +++ gcc/cgraph.c      (working copy)
>> @@ -1277,6 +1277,16 @@ cgraph_mark_address_taken_node (struct cgraph_node
>>    node->symbol.address_taken = 1;
>>    node = cgraph_function_or_thunk_node (node, NULL);
>>    node->symbol.address_taken = 1;
>> +  /* If the address of a multiversioned function dispatcher is taken,
>> +     generate the body to dispatch the right function at run-time.  This
>> +     is needed as the address can be used to do an indirect call.  */
>> +  if (node->dispatcher_function)
>> +    {
>> +      gcc_assert (node->function_version
>> +               && node->function_version->next);
>> +      gcc_assert (targetm.generate_version_dispatcher_body);
>> +      targetm.generate_version_dispatcher_body (node);
>> +    }
>
> Similarly here.  I also think this way you will miss aliases of the multiversioned
> functions.
>
> I am not sure why the multiversioning is tied with the cgraph build and the
> datastructure is put into cgraph_node itself.  It seems to me that your
> dispatchers are in a way related to thunks - i.e. they are inserted into
> callgraph and once they become reachable their body needs to be produced.  I
> think generate_version_dispatcher_body should thus probably be done from
> cgraph_analyze_function. (to make the function to be seen by analyze_function
> you will need to make it to be finalized at the time you set
> dispatcher_function flag.
>
> I would also put the dispatcher datastructure into on-side hash by node->uid.
> (i.e. these are rare and thus the datastructure should be small)
> symbol table is critical for WPA stage memory use and I plan to remove as much
> as possible from the nodes in near future. For this reason I would preffer
> to not add too much of stuff that is not going to be used by majority of nodes.
>
> Honza

[-- Attachment #2: mv_fe_patch_10272012.txt --]
[-- Type: text/plain, Size: 81930 bytes --]

Overview of the patch which adds support to specify function versions.  This is
only enabled for target i386.

Example:

int foo ();  /* Default version */
int foo () __attribute__ ((target("avx,popcnt")));/*Specialized for avx and popcnt */
int foo () __attribute__ ((target("arch=core2,ssse3")));/*Specialized for core2 and ssse3*/

int main ()
{
 int (*p)() = &foo;
 return foo () + (*p)();
}

int foo ()
{
 return 0;
}

int __attribute__ ((target("avx,popcnt")))
foo ()
{
 return 0;
}

int __attribute__ ((target("arch=core2,ssse3")))
foo ()
{
 return 0;
}

The above example has foo defined 3 times, but all 3 definitions of foo are
different versions of the same function. The call to foo in main, directly and
via a pointer, are calls to the multi-versioned function foo which is dispatched
to the right foo at run-time.

Front-end changes:

The front-end changes are calls at appropriate places to target hooks that
determine the following:

* Determine if two function decls with the same signature are versions.
* Determine the assembler name of a function version.
* Generate the dispatcher function for a set of function versions.
* Compare versions to see if one has a higher priority over the other.

All the implementation happens in the target-specific config/i386/i386.c.

What does the patch do?

* Tracking decls that correspond to function versions of function
name, say "foo":

When the front-end sees more than one decl for "foo", it calls a target hook to
determine if they are versions. To prevent duplicate definition errors with other
 versions of "foo", "decls_match" function in cp/decl.c is made to return false
 when 2 decls have are deemed versions by the target. This will make all function
versions of "foo" to be added to the overload list of "foo".

* Change the assembler names of the function versions.

For i386, the target changes the assembler names of the function versions by
 suffixing the sorted list of args to "target" to the function name of "foo". For
example, he assembler name of "void foo () __attribute__ ((target ("sse4")))" will
become _Z3foov.sse4.  The target hook mangle_decl_assembler_name is used for this.

* Overload resolution:

 Function "build_over_call" in cp/call.c sees a call to function
"foo", which is multi-versioned. The overload resolution happens in
function "joust" in "cp/call.c". Here, the call to "foo" has all
possible versions of "foo" as candidates. All the candidates of "foo" are
stored in the cgraph side data structure. Each version of foo is chained in a 
doubly-linked list with the default function as the first element.  This allows
any pass to access all the semantically identical versions. A call to a
multi-versioned function will be replaced by a call to a dispatcher function,
determined by a target hook, to execute the right function version at run-time.

Optimization to directly call a version when possible:
Also, in joust, where overload resolution happens, a multiversioned function
resolution is made to return the most specialized version.  This is the version
that will be checked for dispatching first and is determined by the target.
Now, if the caller can inline this function version then a direct call is made
to this function version rather than go through the dispatcher. When a direct
call cannot be made, a call to the dispatcher function is created.

* Creating the dispatcher body.

The dispatcher body, called the resolver is made only when there is a call to a
multiversioned function dispatcher or the address of a function is taken. This
is generated during cgraph_analyze_function. This is done by another target hook.

* Dispatch ordering.

The order in which the function versions are checked during dispatch is based
on a priority value assigned for the ISA that is catered. More specialized 
versions are checked for dispatching first.  This is to mitigate the ambiguity
that can arise when more than one function version is valid for execution on
a particular platform.  This is not a perfect solution, and in future the user
should be allowed to assign a dispatching priority value to each version.

Function MV in the Intel compiler:

The intel compiler supports function multiversioning and the syntax is
similar to the patch proposed here.  Here is an example of how to
generate multiple function versions with the intel compiler.

/* Create a stub function to specify the various versions of function that
   will be created, using declspec attribute cpu_dispatch.  */
__declspec (cpu_dispatch (core_i7_sse4_2, atom, generic))
void foo () {};

/* Bodies of each function version.  */

/* Intel Corei7 processor + SSE4.2 version.  */
__declspec (cpu_specific(core_i7_sse4_2))
void foo ()
{
  printf ("corei7 + sse4.2");
}

/* Atom processor.  */
__declspec (cpu_specific(atom))
void foo ()
{
  printf ("atom");
}

/* The generic or the default version.  */
__declspec (cpu_specific(generic))
void foo ()
{
  printf ("This is generic");
}

A new function version is generated by defining a new function with the same
signature but with a different cpu_specific declspec attribute string.  The
set of cpu_specific strings that are allowed is the following:

"core_2nd_gen_avx"
"core_aes_pclmulqdq"
"core_i7_sse4_2"
"core_2_duo_sse4_1"
"core_2_duo_ssse3"
"atom"
"pentium_4_sse3"
"pentium_4"
"pentium_m"
"pentium_iii"
"generic"

Comparison with the GCC MV implementation in this patch:

* Version creation syntax:

The implementation in this patch also has a similar syntax to specify function
versions. The first stub function is not needed.  Here is the code to generate
the function versions with this patch:

/* Intel Corei7 processor + SSE4.2 version.  */
__attribute__ ((target ("arch=corei7, sse4.2")))
void foo ()
{
  printf ("corei7 + sse4.2");
}

/* Atom processor.  */
__attribute__ ((target ("arch=atom")))
void foo ()
{
  printf ("atom");
}

void foo ()
{
}

The target attribute can have one of the following arch names:

"amd"
"intel"
"atom"
"core2"
"corei7"
"nehalem"
"westmere"
"sandybridge"
"amdfam10h"
"barcelona"
"shanghai"
"istanbul"
"amdfam15h"
"bdver1"
"bdver2"

and any number of the following ISA names:

"cmov"
"mmx"
"popcnt"
"sse"
"sse2"
"sse3"
"ssse3"
"sse4.1"
"sse4.2"
"avx"
"avx2"


	* doc/tm.texi.in (TARGET_OPTION_FUNCTION_VERSIONS): New hook description.
	* (TARGET_COMPARE_VERSION_PRIORITY): New hook description.
	* (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New hook description.
	* (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New hook description.
	* doc/tm.texi: Regenerate.
	* cgraphunit.c (cgraph_analyze_function): Generate body of multiversion
	function dispatcher.
	* target.def (compare_version_priority): New target hook.
	* (generate_version_dispatcher_body): New target hook.
	* (get_function_versions_dispatcher): New target hook.
	* (function_versions): New target hook.
	* cgraph.c (cgraph_fnver_htab): New htab.
	(cgraph_fn_ver_htab_hash): New function.
	(cgraph_fn_ver_htab_eq): New function.
	(version_info_node): New pointer.
	(insert_new_cgraph_node_version): New function.
	(get_cgraph_node_version): New function.
	* cgraph.h (cgraph_function_version_info): New struct.
	(insert_new_cgraph_node_version): New function.
	(get_cgraph_node_version): New function.
	(cgraph_node): New bitfield dispatcher_function.
	* tree.h (DECL_FUNCTION_VERSIONED): New macro.
	(tree_function_decl): New bit-field versioned_function.
	* testsuite/g++.dg/mv1.C: New test.
	* testsuite/g++.dg/mv2.C: New test.
	* testsuite/g++.dg/mv3.C: New test.
	* testsuite/g++.dg/mv4.C: New test.
	* cp/class.c:
	(add_method): Change assembler names of function versions.
	(resolve_address_of_overloaded_function): Save all function
	version candidates. Create dispatcher decl and return address of
	dispatcher instead.
	* cp/decl.c (decls_match): Make decls unmatched for versioned
	functions.
	(duplicate_decls): Remove ambiguity for versioned functions. 
	(cxx_comdat_group): Make comdat group of versioned functions be the
	same.
	* cp/error.c (dump_exception_spec): Dump assembler name for function
	versions.
	* cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned
	functions that are also marked inline.
	* cp/decl2.c:(check_classfn): Check attributes of versioned functions
	for match.
	* cp/call.c: (build_new_function_call): Check if versioned functions
	have a default version.
	(build_over_call): Make calls to multiversioned functions
	to call the dispatcher.
	(joust): For calls to multi-versioned functions, make the most
	specialized function version win.
	(tourney): Generate dispatcher decl for function versions.
	(get_function_version_dispatcher): New function.
	(generate_function_versions_dispatcher): New function.
	* Makefile.in: Add multiversion.o
	* config/i386/i386.c (add_condition_to_bb): New function.
	(get_builtin_code_for_version): New function.
	(ix86_compare_version_priority): New function.
	(feature_compare): New function.
	(dispatch_function_versions): New function.
	* (ix86_function_versions): New function.
	* (attr_strcmp): New function.
	* (sorted_attr_string): New function.
	* (ix86_mangle_function_version_assembler_name): New function.
	* (ix86_mangle_decl_assembler_name): New function.
	* (make_name): New function.
	* (make_dispatcher_decl): New function.
	* (is_function_default_version): New function.
	* (ix86_get_function_versions_dispatcher): New function.
	* (make_attribute): New function.
	* (make_resolver_func): New function.
	* (ix86_generate_version_dispatcher_body): New function.
	* (TARGET_COMPARE_VERSION_PRIORITY): New macro.
	* (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New macro.
	* (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New macro.
	* (TARGET_OPTION_FUNCTION_VERSIONS): New macro.
	* (TARGET_MANGLE_DECL_ASSEMBLER_NAME): New macro.

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 192623)
+++ gcc/doc/tm.texi	(working copy)
@@ -9913,6 +9913,14 @@ changed via the optimize attribute or pragma, see
 @code{TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE}
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_OPTION_FUNCTION_VERSIONS (tree @var{decl1}, tree @var{decl2})
+This target hook returns @code{true} if @var{DECL1} and @var{DECL2} are
+versions of the same function.  @var{DECL1} and @var{DECL2} are function
+versions if and only if they have the same function signature and
+different target specific attributes, that is, they are compiled for
+different target machines.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_CAN_INLINE_P (tree @var{caller}, tree @var{callee})
 This target hook returns @code{false} if the @var{caller} function
 cannot inline @var{callee}, based on target specific information.  By
@@ -10930,6 +10938,29 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_COMPARE_VERSION_PRIORITY (tree @var{decl1}, tree @var{decl2})
+This hook is used to compare the target attributes in two functions to
+determine which function's features get higher priority.  This is used
+during function multi-versioning to figure out the order in which two
+versions must be dispatched.  A function version with a higher priority
+is checked for dispatching earlier.  @var{decl1} and @var{decl2} are
+ the two function decls that will be compared.
+@end deftypefn
+
+@deftypefn {Target Hook} tree TARGET_GET_FUNCTION_VERSIONS_DISPATCHER (void *@var{arglist})
+This hook is used to get the dispatcher function for a set of function
+versions.  The dispatcher function is called to invoke the right function
+version at run-time. @var{arglist} is the vector of function versions
+that should be considered for dispatch.
+@end deftypefn
+
+@deftypefn {Target Hook} tree TARGET_GENERATE_VERSION_DISPATCHER_BODY (void *@var{arg})
+This hook is used to generate the dispatcher logic to invoke the right
+function version at run-time for a given set of function versions.
+@var{arg} points to the callgraph node of the dispatcher function whose
+body must be generated.
+@end deftypefn
+
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 192623)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -9782,6 +9782,14 @@ changed via the optimize attribute or pragma, see
 @code{TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE}
 @end deftypefn
 
+@hook TARGET_OPTION_FUNCTION_VERSIONS
+This target hook returns @code{true} if @var{DECL1} and @var{DECL2} are
+versions of the same function.  @var{DECL1} and @var{DECL2} are function
+versions if and only if they have the same function signature and
+different target specific attributes, that is, they are compiled for
+different target machines.
+@end deftypefn
+
 @hook TARGET_CAN_INLINE_P
 This target hook returns @code{false} if the @var{caller} function
 cannot inline @var{callee}, based on target specific information.  By
@@ -10788,6 +10796,29 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@hook TARGET_COMPARE_VERSION_PRIORITY
+This hook is used to compare the target attributes in two functions to
+determine which function's features get higher priority.  This is used
+during function multi-versioning to figure out the order in which two
+versions must be dispatched.  A function version with a higher priority
+is checked for dispatching earlier.  @var{decl1} and @var{decl2} are
+ the two function decls that will be compared.
+@end deftypefn
+
+@hook TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+This hook is used to get the dispatcher function for a set of function
+versions.  The dispatcher function is called to invoke the right function
+version at run-time. @var{arglist} is the vector of function versions
+that should be considered for dispatch.
+@end deftypefn
+
+@hook TARGET_GENERATE_VERSION_DISPATCHER_BODY
+This hook is used to generate the dispatcher logic to invoke the right
+function version at run-time for a given set of function versions.
+@var{arg} points to the callgraph node of the dispatcher function whose
+body must be generated.
+@end deftypefn
+
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/cgraphunit.c
===================================================================
--- gcc/cgraphunit.c	(revision 192623)
+++ gcc/cgraphunit.c	(working copy)
@@ -633,6 +633,34 @@ cgraph_analyze_function (struct cgraph_node *node)
     {
       push_cfun (DECL_STRUCT_FUNCTION (decl));
 
+      /* If this decl is one version of a set of multi-versioned functions,
+	 check if its dispatcher body needs to be generated.  */
+      if (DECL_FUNCTION_VERSIONED (decl)
+	  && get_cgraph_node_version (node) != NULL)
+	{
+	  struct cgraph_function_version_info *node_version_info
+	    = get_cgraph_node_version (node);
+	  if (node_version_info->dispatcher_resolver)
+	    {
+	      tree dispatcher_decl = node_version_info->dispatcher_resolver;
+	      struct cgraph_node *dispatcher_node
+		= cgraph_get_create_node (dispatcher_decl);
+	      struct cgraph_function_version_info *dispatcher_version_info
+		= get_cgraph_node_version (dispatcher_node);
+	      if (dispatcher_node->local.finalized
+		  && dispatcher_version_info != NULL
+	          && (dispatcher_version_info->dispatcher_resolver
+		      == NULL_TREE))
+		{
+		  tree resolver = NULL_TREE;
+		  gcc_assert (targetm.generate_version_dispatcher_body);
+		  resolver
+		    = targetm.generate_version_dispatcher_body (dispatcher_node);
+		  gcc_assert (resolver != NULL_TREE);
+		}
+	    }
+	}
+
       assign_assembler_name_if_neeeded (node->symbol.decl);
 
       /* Make sure to gimplify bodies only once.  During analyzing a
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 192623)
+++ gcc/target.def	(working copy)
@@ -1298,6 +1298,37 @@ DEFHOOK
  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
  hook_tree_tree_int_treep_bool_null)
 
+/* Target hook is used to compare the target attributes in two functions to
+   determine which function's features get higher priority.  This is used
+   during function multi-versioning to figure out the order in which two
+   versions must be dispatched.  A function version with a higher priority
+   is checked for dispatching earlier.  DECL1 and DECL2 are
+   the two function decls that will be compared. It returns positive value
+   if DECL1 is higher priority,  negative value if DECL2 is higher priority
+   and 0 if they are the same. */
+DEFHOOK
+(compare_version_priority,
+ "",
+ int, (tree decl1, tree decl2), NULL)
+
+/*  Target hook is used to generate the dispatcher logic to invoke the right
+    function version at run-time for a given set of function versions.
+    ARG points to the callgraph node of the dispatcher function whose body
+    must be generated.  */
+DEFHOOK
+(generate_version_dispatcher_body,
+ "",
+ tree, (void *arg), NULL) 
+
+/* Target hook is used to get the dispatcher function for a set of function
+   versions.  The dispatcher function is called to invoke the right function
+   version at run-time.  ARGLIST is the vector of function versions that
+   should be considered for dispatch.  */
+DEFHOOK
+(get_function_versions_dispatcher,
+ "",
+ tree, (void *arglist), NULL)
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 DEFHOOK
@@ -2725,6 +2756,16 @@ DEFHOOK
  void, (void),
  hook_void_void)
 
+/* This function returns true if DECL1 and DECL2 are versions of the same
+   function.  DECL1 and DECL2 are function versions if and only if they
+   have the same function signature and different target specific attributes,
+   that is, they are compiled for different target machines.  */
+DEFHOOK
+(function_versions,
+ "",
+ bool, (tree decl1, tree decl2),
+ hook_bool_tree_tree_false)
+
 /* Function to determine if one function can inline another function.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_"
Index: gcc/cgraph.c
===================================================================
--- gcc/cgraph.c	(revision 192623)
+++ gcc/cgraph.c	(working copy)
@@ -132,6 +132,74 @@ static GTY(()) struct cgraph_edge *free_edges;
 /* Did procss_same_body_aliases run?  */
 bool same_body_aliases_done;
 
+/* Map a cgraph_node to cgraph_function_version_info using this htab.
+   The cgraph_function_version_info has a THIS_NODE field that is the
+   corresponding cgraph_node..  */
+htab_t GTY((param_is (struct cgraph_function_version_info *)))
+  cgraph_fnver_htab = NULL;
+
+/* Hash function for cgraph_fnver_htab.  */
+static hashval_t
+cgraph_fnver_htab_hash (const void *ptr)
+{
+  int uid = ((const struct cgraph_function_version_info *)ptr)->this_node->uid;
+  return (hashval_t)(uid);
+}
+
+/* eq function for cgraph_fnver_htab.  */
+static int
+cgraph_fnver_htab_eq (const void *p1, const void *p2)
+{
+  const struct cgraph_function_version_info *n1
+    = (const struct cgraph_function_version_info *)p1;
+  const struct cgraph_function_version_info *n2
+    = (const struct cgraph_function_version_info *)p2;
+
+  return n1->this_node->uid == n2->this_node->uid;
+}
+
+/* Mark as GC root all allocated nodes.  */
+static GTY(()) struct cgraph_function_version_info *
+  version_info_node = NULL;
+
+/* Insert a new cgraph_function_version_info node into cgraph_fnver_htab
+   corresponding to cgraph_node NODE.  */
+struct cgraph_function_version_info *
+insert_new_cgraph_node_version (struct cgraph_node *node)
+{
+  void **slot;
+  
+  version_info_node = NULL;
+  version_info_node = ggc_alloc_cleared_cgraph_function_version_info ();
+  version_info_node->this_node = node;
+
+  if (cgraph_fnver_htab == NULL)
+    cgraph_fnver_htab = htab_create_ggc (2, cgraph_fnver_htab_hash,
+				         cgraph_fnver_htab_eq, NULL);
+
+  slot = htab_find_slot (cgraph_fnver_htab, version_info_node, INSERT);
+  gcc_assert (slot != NULL);
+  *slot = version_info_node;
+  return version_info_node;
+}
+
+/* Get the cgraph_function_version_info node corresponding to node.  */
+struct cgraph_function_version_info *
+get_cgraph_node_version (struct cgraph_node *node)
+{
+  struct cgraph_function_version_info *ret;
+  struct cgraph_function_version_info key;
+  key.this_node = node;
+
+  if (cgraph_fnver_htab == NULL)
+    return NULL;
+
+  ret = (struct cgraph_function_version_info *)
+    htab_find (cgraph_fnver_htab, &key);
+
+  return ret;
+}
+
 /* Macros to access the next item in the list of free cgraph nodes and
    edges. */
 #define NEXT_FREE_NODE(NODE) cgraph ((NODE)->symbol.next)
Index: gcc/cgraph.h
===================================================================
--- gcc/cgraph.h	(revision 192623)
+++ gcc/cgraph.h	(working copy)
@@ -200,7 +200,38 @@ struct GTY(()) cgraph_clone_info
   bitmap combined_args_to_skip;
 };
 
+/* Function Multiversioning info.  */
+struct GTY(()) cgraph_function_version_info {
+  /* The cgraph_node for which the function version info is stored.  */
+  struct cgraph_node *this_node;
+  /* Chains all the semantically identical function versions.  The
+     first function in this chain is the version_info node of the
+     default function.  */
+  struct cgraph_function_version_info *prev;
+  /* If this version node corresponds to a dispatcher for function
+     versions, this points to the version info node of the default
+     function, the first node in the chain.  */
+  struct cgraph_function_version_info *next;
+  /* If this node corresponds to a function version, this points
+     to the dispatcher function decl, which is the function that must
+     be called to execute the right function version at run-time.
 
+     If this cgraph node is a dispatcher (if dispatcher_function is
+     true, in the cgraph_node struct) for function versions, this
+     points to resolver function, which holds the function body of the
+     dispatcher. The dispatcher decl is an alias to the resolver
+     function decl.  */
+  tree dispatcher_resolver;
+};
+
+/* Defined in cgraph.c  */
+/* Get the cgraph_function_version_info node for NODE.  */
+struct cgraph_function_version_info *
+  get_cgraph_node_version (struct cgraph_node *node);
+/* Map a new  cgraph_function_version_info node for NODE.  */
+struct cgraph_function_version_info *
+  insert_new_cgraph_node_version (struct cgraph_node *node);
+
 /* The cgraph data structure.
    Each function decl has assigned cgraph_node listing callees and callers.  */
 
@@ -279,6 +310,8 @@ struct GTY(()) cgraph_node {
   /* ?? We should be able to remove this.  We have enough bits in
      cgraph to calculate it.  */
   unsigned tm_clone : 1;
+  /* True if this decl is a dispatcher for function versions.  */
+  unsigned dispatcher_function : 1;
 };
 
 DEF_VEC_P(symtab_node);
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 192623)
+++ gcc/tree.h	(working copy)
@@ -3476,6 +3476,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
 #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
 
+/* In FUNCTION_DECL, this is set if this function has other versions generated
+   using "target" attributes.  The default version is the one which does not
+   have any "target" attribute set. */
+#define DECL_FUNCTION_VERSIONED(NODE)\
+   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
+
 /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
    arguments/result/saved_tree fields by front ends.   It was either inherit
    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
@@ -3520,8 +3526,8 @@ struct GTY(()) tree_function_decl {
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
-
-  /* 1 bit left */
+  unsigned versioned_function : 1;
+  /* No bits left.  */
 };
 
 /* The source language of the translation-unit.  */
Index: gcc/cp/class.c
===================================================================
--- gcc/cp/class.c	(revision 192623)
+++ gcc/cp/class.c	(working copy)
@@ -1087,6 +1087,31 @@ add_method (tree type, tree method, tree using_dec
 	      || same_type_p (TREE_TYPE (fn_type),
 			      TREE_TYPE (method_type))))
 	{
+	  /* For function versions, their parms and types match
+	     but they are not duplicates.  Record function versions
+	     as and when they are found.  */
+	  if (TREE_CODE (fn) == FUNCTION_DECL
+	      && TREE_CODE (method) == FUNCTION_DECL
+	      && (DECL_FUNCTION_SPECIFIC_TARGET (fn)
+		  || DECL_FUNCTION_SPECIFIC_TARGET (method))
+	      && targetm.target_option.function_versions (fn, method))
+ 	    {
+	      /* Mark functions as versions if necessary.  Modify the mangled
+		 decl name if necessary.  */
+	      if (!DECL_FUNCTION_VERSIONED (fn))
+		{
+		  DECL_FUNCTION_VERSIONED (fn) = 1;
+		  if (DECL_ASSEMBLER_NAME_SET_P (fn))
+		    mangle_decl (fn);
+		}
+	      if (!DECL_FUNCTION_VERSIONED (method))
+		{
+		  DECL_FUNCTION_VERSIONED (method) = 1;
+		  if (DECL_ASSEMBLER_NAME_SET_P (method))
+		    mangle_decl (method);
+		}
+	      continue;
+	    }
 	  if (DECL_INHERITED_CTOR_BASE (method))
 	    {
 	      if (DECL_INHERITED_CTOR_BASE (fn))
@@ -6995,6 +7020,7 @@ resolve_address_of_overloaded_function (tree targe
   tree matches = NULL_TREE;
   tree fn;
   tree target_fn_type;
+  VEC (tree, heap) *fn_ver_vec = NULL;
 
   /* By the time we get here, we should be seeing only real
      pointer-to-member types, not the internal POINTER_TYPE to
@@ -7059,9 +7085,19 @@ resolve_address_of_overloaded_function (tree targe
 	  if (DECL_ANTICIPATED (fn))
 	    continue;
 
-	  /* See if there's a match.  */
+	  /* See if there's a match.   For functions that are multi-versioned,
+	     all the versions match.  */
 	  if (same_type_p (target_fn_type, static_fn_type (fn)))
-	    matches = tree_cons (fn, NULL_TREE, matches);
+	    {
+	      matches = tree_cons (fn, NULL_TREE, matches);
+	      /*If versioned, push all possible versions into a vector.  */
+	      if (DECL_FUNCTION_VERSIONED (fn))
+		{
+		  if (fn_ver_vec == NULL)
+		   fn_ver_vec = VEC_alloc (tree, heap, 2);
+		  VEC_safe_push (tree, heap, fn_ver_vec, fn); 
+		}
+	    }
 	}
     }
 
@@ -7149,13 +7185,26 @@ resolve_address_of_overloaded_function (tree targe
     {
       /* There were too many matches.  First check if they're all
 	 the same function.  */
-      tree match;
+      tree match = NULL_TREE;
 
       fn = TREE_PURPOSE (matches);
-      for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
-	if (!decls_match (fn, TREE_PURPOSE (match)))
-	  break;
 
+      /* For multi-versioned functions, more than one match is just fine.
+	 Call decls_match to make sure they are different because they are
+	 versioned.  */
+      if (DECL_FUNCTION_VERSIONED (fn))
+	{
+          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+  	    if (decls_match (fn, TREE_PURPOSE (match)))
+	      break;
+	}
+      else
+	{
+          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+  	    if (!decls_match (fn, TREE_PURPOSE (match)))
+	      break;
+	}
+
       if (match)
 	{
 	  if (flags & tf_error)
@@ -7217,6 +7266,33 @@ resolve_address_of_overloaded_function (tree targe
       perform_or_defer_access_check (access_path, fn, fn, flags);
     }
 
+  /* If a pointer to a function that is multi-versioned is requested, the
+     pointer to the dispatcher function is returned instead.  This works
+     well because indirectly calling the function will dispatch the right
+     function version at run-time.  */
+  if (DECL_FUNCTION_VERSIONED (fn))
+    {
+      struct cgraph_node *node = NULL;
+      tree dispatcher_decl = NULL;
+      gcc_assert (fn_ver_vec != NULL);
+      gcc_assert (targetm.get_function_versions_dispatcher);
+      dispatcher_decl = targetm.get_function_versions_dispatcher (fn_ver_vec);
+      if (!dispatcher_decl)
+	{
+	  error_at (input_location, "Pointer to a multiversioned function"
+		    " without a default is not allowed");
+	  return error_mark_node;
+	}
+      retrofit_lang_decl (dispatcher_decl);
+      mark_used (fn);
+      VEC_free (tree, heap, fn_ver_vec);
+      node = cgraph_get_create_node (dispatcher_decl);
+      gcc_assert (node != NULL);
+      /* Mark this functio to be output.  */
+      node->local.finalized = 1;
+      fn = dispatcher_decl;
+    }
+
   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
     return cp_build_addr_expr (fn, flags);
   else
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	(revision 192623)
+++ gcc/cp/decl.c	(working copy)
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "pointer-set.h"
 #include "splay-tree.h"
 #include "plugin.h"
+#include "cgraph.h"
 
 /* Possible cases of bad specifiers type used by bad_specifiers. */
 enum bad_spec_place {
@@ -981,6 +982,29 @@ decls_match (tree newdecl, tree olddecl)
       if (t1 != t2)
 	return 0;
 
+      /* The decls dont match if they correspond to two different versions
+	 of the same function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) 
+	  && targetm.target_option.function_versions (newdecl, olddecl))
+	{
+	  /* Mark functions as versions if necessary.  Modify the mangled decl
+	     name if necessary.  */
+	  if (!DECL_FUNCTION_VERSIONED (newdecl))
+	    {
+	      DECL_FUNCTION_VERSIONED (newdecl) = 1;
+	      if (DECL_ASSEMBLER_NAME_SET_P (newdecl))
+	        mangle_decl (newdecl);
+	    }
+	  if (!DECL_FUNCTION_VERSIONED (olddecl))
+	    {
+	      DECL_FUNCTION_VERSIONED (olddecl) = 1;
+	      if (DECL_ASSEMBLER_NAME_SET_P (olddecl))
+	       mangle_decl (olddecl);
+	    }
+	  return 0;
+	}
+
       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
 	  && ! (DECL_EXTERN_C_P (newdecl)
 		&& DECL_EXTERN_C_P (olddecl)))
@@ -1499,7 +1523,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
 	      error ("previous declaration %q+#D here", olddecl);
 	      return NULL_TREE;
 	    }
-	  else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
+	  /* For function versions, params and types match, but they
+	     are not ambiguous.  */
+	  else if ((!DECL_FUNCTION_VERSIONED (newdecl)
+		    && !DECL_FUNCTION_VERSIONED (olddecl))
+		   && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
 			      TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
 	    {
 	      error ("new declaration %q#D", newdecl);
@@ -2272,6 +2300,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
+  /* If the olddecl is a version, so is the newdecl.  */
+  if (TREE_CODE (newdecl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (olddecl))
+    DECL_FUNCTION_VERSIONED (newdecl) = 1;
+
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
     {
       int function_size;
@@ -14227,7 +14260,11 @@ cxx_comdat_group (tree decl)
 	  else
 	    break;
 	}
-      name = DECL_ASSEMBLER_NAME (decl);
+      if (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl))
+	name = DECL_NAME (decl);
+      else
+        name = DECL_ASSEMBLER_NAME (decl);
     }
 
   return name;
Index: gcc/cp/error.c
===================================================================
--- gcc/cp/error.c	(revision 192623)
+++ gcc/cp/error.c	(working copy)
@@ -1541,8 +1541,16 @@ dump_exception_spec (tree t, int flags)
 static void
 dump_function_name (tree t, int flags)
 {
-  tree name = DECL_NAME (t);
+  tree name;
 
+  /* For function versions, use the assembler name as the decl name is
+     the same for all versions.  */
+  if (TREE_CODE (t) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (t))
+    name = DECL_ASSEMBLER_NAME (t);
+  else
+    name = DECL_NAME (t);
+
   /* We can get here with a decl that was synthesized by language-
      independent machinery (e.g. coverage.c) in which case it won't
      have a lang_specific structure attached and DECL_CONSTRUCTOR_P
Index: gcc/cp/semantics.c
===================================================================
--- gcc/cp/semantics.c	(revision 192623)
+++ gcc/cp/semantics.c	(working copy)
@@ -3813,8 +3813,11 @@ expand_or_defer_fn_1 (tree fn)
       /* If the user wants us to keep all inline functions, then mark
 	 this function as needed so that finish_file will make sure to
 	 output it later.  Similarly, all dllexport'd functions must
-	 be emitted; there may be callers in other DLLs.  */
-      if ((flag_keep_inline_functions
+	 be emitted; there may be callers in other DLLs.
+	 Also, mark this function as needed if it is marked inline but
+	 is a multi-versioned function.  */
+      if (((flag_keep_inline_functions
+	    || DECL_FUNCTION_VERSIONED (fn))
 	   && DECL_DECLARED_INLINE_P (fn)
 	   && !DECL_REALLY_EXTERN (fn))
 	  || (flag_keep_inline_dllexport
Index: gcc/cp/decl2.c
===================================================================
--- gcc/cp/decl2.c	(revision 192623)
+++ gcc/cp/decl2.c	(working copy)
@@ -674,9 +674,13 @@ check_classfn (tree ctype, tree function, tree tem
 	  if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
 	    continue;
 
+	  /* While finding a match, same types and params are not enough
+	     if the function is versioned.  Also check version ("target")
+	     attributes.  */
 	  if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
 			   TREE_TYPE (TREE_TYPE (fndecl)))
 	      && compparms (p1, p2)
+	      && !targetm.target_option.function_versions (function, fndecl)
 	      && (!is_template
 		  || comp_template_parms (template_parms,
 					  DECL_TEMPLATE_PARMS (fndecl)))
Index: gcc/cp/call.c
===================================================================
--- gcc/cp/call.c	(revision 192623)
+++ gcc/cp/call.c	(working copy)
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "timevar.h"
+#include "cgraph.h"
 
 /* The various kinds of conversion.  */
 
@@ -6444,6 +6445,42 @@ magic_varargs_p (tree fn)
   return false;
 }
 
+/* Returns the decl of the dispatcher function if FN is a function version.  */
+
+static tree
+get_function_version_dispatcher (tree fn)
+{
+  tree dispatcher_decl = NULL;
+  struct cgraph_node *node = NULL;
+  struct cgraph_function_version_info *node_version_info = NULL;
+
+  gcc_assert (TREE_CODE (fn) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (fn));
+
+  node = cgraph_get_node (fn);
+
+  if (node == NULL)
+    return NULL;
+
+  node_version_info = get_cgraph_node_version (node);
+
+  if (node_version_info != NULL)
+    dispatcher_decl = node_version_info->dispatcher_resolver;
+  else
+    return NULL;
+
+  if (dispatcher_decl == NULL)
+    {
+      error_at (input_location, "Call to multiversioned function"
+                " without a default is not allowed");
+      return NULL;
+    }
+
+  retrofit_lang_decl (dispatcher_decl);
+  gcc_assert (dispatcher_decl != NULL);
+  return dispatcher_decl;
+}
+
 /* Subroutine of the various build_*_call functions.  Overload resolution
    has chosen a winning candidate CAND; build up a CALL_EXPR accordingly.
    ARGS is a TREE_LIST of the unconverted arguments to the call.  FLAGS is a
@@ -6896,6 +6933,25 @@ build_over_call (struct z_candidate *cand, int fla
   if (!already_used)
     mark_used (fn);
 
+  /* For calls to a multi-versioned function, overload resolution
+     returns the function with the highest target priority, that is,
+     the version that will checked for dispatching first.  If this
+     version is inlinable, a direct call to this version can be made
+     otherwise the call should go through the dispatcher.  */
+
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && !targetm.target_option.can_inline_p (current_function_decl, fn))
+    {
+      struct cgraph_node *dispatcher_node = NULL;
+      fn = get_function_version_dispatcher (fn);
+      if (fn == NULL)
+	return NULL;
+      dispatcher_node = cgraph_get_create_node (fn);
+      gcc_assert (dispatcher_node != NULL);
+      /* Mark this function to be output.  */
+      dispatcher_node->local.finalized = 1;
+    }
+
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
     {
       tree t;
@@ -8176,6 +8232,38 @@ joust (struct z_candidate *cand1, struct z_candida
       && (IS_TYPE_OR_DECL_P (cand1->fn)))
     return 1;
 
+  /* For candidates of a multi-versioned function,  make the version with
+     the highest priority win.  This version will be checked for dispatching
+     first.  If this version can be inlined into the caller, the front-end
+     will simply make a direct call to this function.  */
+
+  if (TREE_CODE (cand1->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (cand1->fn)
+      && TREE_CODE (cand2->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (cand2->fn))
+    {
+      tree f1 = TREE_TYPE (cand1->fn);
+      tree f2 = TREE_TYPE (cand2->fn);
+      tree p1 = TYPE_ARG_TYPES (f1);
+      tree p2 = TYPE_ARG_TYPES (f2);
+     
+      /* Check if cand1->fn and cand2->fn are versions of the same function.  It
+         is possible that cand1->fn and cand2->fn are function versions but of
+         different functions.  Check types to see if they are versions of the same
+         function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)))
+	{
+	  /* Always make the version with the higher priority, more
+	     specialized, win.  */
+	  gcc_assert (targetm.compare_version_priority);
+	  if (targetm.compare_version_priority (cand1->fn, cand2->fn) >= 0)
+	    return 1;
+	  else
+	    return -1;
+	}
+    }
+
   /* a viable function F1
      is defined to be a better function than another viable function F2  if
      for  all arguments i, ICSi(F1) is not a worse conversion sequence than
@@ -8496,6 +8584,37 @@ tweak:
   return 0;
 }
 
+/* Function FN is multi-versioned and CANDIDATES contains the list of all
+   overloaded candidates for FN.  This function extracts all functions from
+   CANDIDATES that are function versions of FN and generates a dispatcher
+   function for this multi-versioned function group.  */
+
+static void
+generate_function_versions_dispatcher (tree fn, struct z_candidate *candidates)
+{
+  tree f1 = TREE_TYPE (fn);
+  tree p1 = TYPE_ARG_TYPES (f1);
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  struct z_candidate *ver = candidates;
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+
+  for (;ver; ver = ver->next)
+    {
+      tree f2 = TREE_TYPE (ver->fn);
+      tree p2 = TYPE_ARG_TYPES (f2);
+      /* If this candidate is a version of FN, types must match.  */
+      if (DECL_FUNCTION_VERSIONED (ver->fn)
+          && compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)))
+        VEC_safe_push (tree, heap, fn_ver_vec, ver->fn);
+    }
+
+  gcc_assert (targetm.get_function_versions_dispatcher);
+  targetm.get_function_versions_dispatcher (fn_ver_vec);
+  VEC_free (tree, heap, fn_ver_vec); 
+}
+
 /* Given a list of candidates for overloading, find the best one, if any.
    This algorithm has a worst case of O(2n) (winner is last), and a best
    case of O(n/2) (totally ambiguous); much better than a sorting
@@ -8548,6 +8667,23 @@ tourney (struct z_candidate *candidates, tsubst_fl
 	return NULL;
     }
 
+  /* For multiversioned functions, aggregate all the versions here for
+     generating the dispatcher body later if necessary.  Check to see if
+     the dispatcher is already generated to avoid doing this more than
+     once.  */
+
+  if (TREE_CODE (champ->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (champ->fn))
+    {
+      struct cgraph_node *champ_node = cgraph_get_node (champ->fn);
+      struct cgraph_function_version_info *champ_version_info = NULL;
+      if (champ_node != NULL)
+        champ_version_info = get_cgraph_node_version (champ_node);
+      if (champ_node == NULL
+	  || champ_version_info == NULL
+	  || champ_version_info->dispatcher_resolver == NULL)
+        generate_function_versions_dispatcher (champ->fn, candidates);
+    }
   return champ;
 }
 
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 192623)
+++ gcc/config/i386/i386.c	(working copy)
@@ -62,6 +62,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "diagnostic.h"
 #include "dumpfile.h"
+#include "tree-pass.h"
+#include "tree-flow.h"
 
 enum upper_128bits_state
 {
@@ -28413,6 +28415,1001 @@ ix86_init_mmx_sse_builtins (void)
     }
 }
 
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
+   to return a pointer to VERSION_DECL if the outcome of the expression
+   formed by PREDICATE_CHAIN is true.  This function will be called during
+   version dispatch to decide which function version to execute.  It returns
+   the basic block at the end, to which more conditions can be added.  */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     tree predicate_chain, basic_block new_bb)
+{
+  gimple return_stmt;
+  tree convert_expr, result_var;
+  gimple convert_stmt;
+  gimple call_cond_stmt;
+  gimple if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+
+  tree cond_var, and_expr_var = NULL_TREE;
+  gimple_seq gseq;
+
+  tree predicate_decl, predicate_arg;
+
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node, NULL);
+  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  return_stmt = gimple_build_return (result_var);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      return new_bb;
+    }
+
+  while (predicate_chain != NULL)
+    {
+      cond_var = create_tmp_var (integer_type_node, NULL);
+      predicate_decl = TREE_PURPOSE (predicate_chain);
+      predicate_arg = TREE_VALUE (predicate_chain);
+      call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+      gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+      gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+      gimple_set_bb (call_cond_stmt, new_bb);
+      gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+      predicate_chain = TREE_CHAIN (predicate_chain);
+      
+      if (and_expr_var == NULL)
+        and_expr_var = cond_var;
+      else
+	{
+	  gimple assign_stmt;
+	  /* Use MIN_EXPR to check if any integer is zero?.
+	     and_expr_var = min_expr <cond_var, and_expr_var>  */
+	  assign_stmt = gimple_build_assign (and_expr_var,
+			  build2 (MIN_EXPR, integer_type_node,
+				  cond_var, and_expr_var));
+
+	  gimple_set_block (assign_stmt, DECL_INITIAL (function_decl));
+	  gimple_set_bb (assign_stmt, new_bb);
+	  gimple_seq_add_stmt (&gseq, assign_stmt);
+	}
+    }
+
+  if_else_stmt = gimple_build_cond (GT_EXPR, and_expr_var,
+	  		            integer_zero_node,
+				    NULL_TREE, NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR, 0);
+
+  pop_cfun ();
+
+  return bb3;
+}
+
+/* This parses the attribute arguments to target in DECL and determines
+   the right builtin to use to match the platform specification.
+   It returns the priority value for this version decl.  If PREDICATE_LIST
+   is not NULL, it stores the list of cpu features that need to be checked
+   before dispatching this function.  */
+
+static unsigned int
+get_builtin_code_for_version (tree decl, tree *predicate_list)
+{
+  tree attrs;
+  struct cl_target_option cur_target;
+  tree target_node;
+  struct cl_target_option *new_target;
+  const char *arg_str = NULL;
+  const char *attrs_str = NULL;
+  char *tok_str = NULL;
+  char *token;
+
+  /* Priority of i386 features, greater value is higher priority.   This is
+     used to decide the order in which function dispatch must happen.  For
+     instance, a version specialized for SSE4.2 should be checked for dispatch
+     before a version for SSE3, as SSE4.2 implies SSE3.  */
+  enum feature_priority
+  {
+    P_ZERO = 0,
+    P_MMX,
+    P_SSE,
+    P_SSE2,
+    P_SSE3,
+    P_SSSE3,
+    P_PROC_SSSE3,
+    P_SSE4_a,
+    P_PROC_SSE4_a,
+    P_SSE4_1,
+    P_SSE4_2,
+    P_PROC_SSE4_2,
+    P_POPCNT,
+    P_AVX,
+    P_AVX2,
+    P_FMA,
+    P_PROC_FMA
+  };
+
+ enum feature_priority priority = P_ZERO;
+
+  /* These are the target attribute strings for which a dispatcher is
+     available, from fold_builtin_cpu.  */
+
+  static struct _feature_list
+    {
+      const char *const name;
+      const enum feature_priority priority;
+    }
+  const feature_list[] =
+    {
+      {"mmx", P_MMX},
+      {"sse", P_SSE},
+      {"sse2", P_SSE2},
+      {"sse3", P_SSE3},
+      {"ssse3", P_SSSE3},
+      {"sse4.1", P_SSE4_1},
+      {"sse4.2", P_SSE4_2},
+      {"popcnt", P_POPCNT},
+      {"avx", P_AVX},
+      {"avx2", P_AVX2}
+    };
+
+
+  static unsigned int NUM_FEATURES
+    = sizeof (feature_list) / sizeof (struct _feature_list);
+
+  unsigned int i;
+
+  tree predicate_chain = NULL_TREE;
+  tree predicate_decl, predicate_arg;
+
+  attrs = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  gcc_assert (attrs != NULL);
+
+  attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+  gcc_assert (TREE_CODE (attrs) == STRING_CST);
+  attrs_str = TREE_STRING_POINTER (attrs);
+
+
+  /* Handle arch= if specified.  For priority, set it to be 1 more than
+     the best instruction set the processor can handle.  For instance, if
+     there is a version for atom and a version for ssse3 (the highest ISA
+     priority for atom), the atom version must be checked for dispatch
+     before the ssse3 version. */
+  if (strstr (attrs_str, "arch=") != NULL)
+    {
+      cl_target_option_save (&cur_target, &global_options);
+      target_node = ix86_valid_target_attribute_tree (attrs);
+    
+      gcc_assert (target_node);
+      new_target = TREE_TARGET_OPTION (target_node);
+      gcc_assert (new_target);
+      
+      if (new_target->arch_specified && new_target->arch > 0)
+	{
+	  switch (new_target->arch)
+	    {
+	    case PROCESSOR_CORE2_32:
+	    case PROCESSOR_CORE2_64:
+	      arg_str = "core2";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_COREI7_32:
+	    case PROCESSOR_COREI7_64:
+	      arg_str = "corei7";
+	      priority = P_PROC_SSE4_2;
+	      break;
+	    case PROCESSOR_ATOM:
+	      arg_str = "atom";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_AMDFAM10:
+	      arg_str = "amdfam10h";
+	      priority = P_PROC_SSE4_a;
+	      break;
+	    case PROCESSOR_BDVER1:
+	      arg_str = "bdver1";
+	      priority = P_PROC_FMA;
+	      break;
+	    case PROCESSOR_BDVER2:
+	      arg_str = "bdver2";
+	      priority = P_PROC_FMA;
+	      break;
+	    }  
+	}    
+    
+      cl_target_option_restore (&global_options, &cur_target);
+	
+      if (predicate_list && arg_str == NULL)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+	    	"No dispatcher found for the versioning attributes");
+	  return 0;
+	}
+    
+      if (predicate_list)
+	{
+          predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
+          /* For a C string literal the length includes the trailing NULL.  */
+          predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+          predicate_chain = tree_cons (predicate_decl, predicate_arg,
+				       predicate_chain);
+	}
+    }
+
+  /* Process feature name.  */
+  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
+  strcpy (tok_str, attrs_str);
+  token = strtok (tok_str, ",");
+  predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
+
+  while (token != NULL)
+    {
+      /* Do not process "arch="  */
+      if (strncmp (token, "arch=", 5) == 0)
+	{
+	  token = strtok (NULL, ",");
+	  continue;
+	}
+      for (i = 0; i < NUM_FEATURES; ++i)
+	{
+	  if (strcmp (token, feature_list[i].name) == 0)
+	    {
+	      if (predicate_list)
+		{
+		  predicate_arg = build_string_literal (
+				  strlen (feature_list[i].name) + 1,
+				  feature_list[i].name);
+		  predicate_chain = tree_cons (predicate_decl, predicate_arg,
+					       predicate_chain);
+		}
+	      /* Find the maximum priority feature.  */
+	      if (feature_list[i].priority > priority)
+		priority = feature_list[i].priority;
+
+	      break;
+	    }
+	}
+      if (predicate_list && i == NUM_FEATURES)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+		    "No dispatcher found for %s", token);
+	  return 0;
+	}
+      token = strtok (NULL, ",");
+    }
+  free (tok_str);
+
+  if (predicate_list && predicate_chain == NULL_TREE)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+	        "No dispatcher found for the versioning attributes : %s",
+	        attrs_str);
+      return 0;
+    }
+  else if (predicate_list)
+    {
+      predicate_chain = nreverse (predicate_chain);
+      *predicate_list = predicate_chain;
+    }
+
+  return priority; 
+}
+
+/* This compares the priority of target features in function DECL1
+   and DECL2.  It returns positive value if DECL1 is higher priority,
+   negative value if DECL2 is higher priority and 0 if they are the
+   same.  */
+
+static int
+ix86_compare_version_priority (tree decl1, tree decl2)
+{
+  unsigned int priority1 = 0;
+  unsigned int priority2 = 0;
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl1)) != NULL)
+    priority1 = get_builtin_code_for_version (decl1, NULL);
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl2)) != NULL)
+    priority2 = get_builtin_code_for_version (decl2, NULL);
+
+  return (int)priority1 - (int)priority2;
+}
+
+/* V1 and V2 point to function versions with different priorities
+   based on the target ISA.  This function compares their priorities.  */
+ 
+static int
+feature_compare (const void *v1, const void *v2)
+{
+  typedef struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    } function_version_info;
+
+  const function_version_info c1 = *(const function_version_info *)v1;
+  const function_version_info c2 = *(const function_version_info *)v2;
+  return (c2.dispatch_priority - c1.dispatch_priority);
+}
+
+/* This function generates the dispatch function for
+   multi-versioned functions.  DISPATCH_DECL is the function which will
+   contain the dispatch logic.  FNDECLS are the function choices for
+   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
+   in DISPATCH_DECL in which the dispatch code is generated.  */
+
+static int
+dispatch_function_versions (tree dispatch_decl,
+			    void *fndecls_p,
+			    basic_block *empty_bb)
+{
+  tree default_decl;
+  gimple ifunc_cpu_init_stmt;
+  gimple_seq gseq;
+  int ix;
+  tree ele;
+  VEC (tree, heap) *fndecls;
+  unsigned int num_versions = 0;
+  unsigned int actual_versions = 0;
+  unsigned int i;
+
+  struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    }*function_version_info;
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /*fndecls_p is actually a vector.  */
+  fndecls = (VEC (tree, heap) *)fndecls_p;
+
+  /* At least one more version other than the default.  */
+  num_versions = VEC_length (tree, fndecls);
+  gcc_assert (num_versions >= 2);
+
+  function_version_info = (struct _function_version_info *)
+    XNEWVEC (struct _function_version_info, (num_versions - 1));
+
+  /* The first version in the vector is the default decl.  */
+  default_decl = VEC_index (tree, fndecls, 0);
+
+  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
+
+  gseq = bb_seq (*empty_bb);
+  /* Function version dispatch is via IFUNC.  IFUNC resolvers fire before
+     constructors, so explicity call __builtin_cpu_init here.  */
+  ifunc_cpu_init_stmt = gimple_build_call_vec (
+                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
+  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
+  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
+  set_bb_seq (*empty_bb, gseq);
+
+  pop_cfun ();
+
+
+  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
+    {
+      tree version_decl = ele;
+      tree predicate_chain = NULL_TREE;
+      unsigned int priority;
+      /* Get attribute string, parse it and find the right predicate decl.
+         The predicate function could be a lengthy combination of many
+	 features, like arch-type and various isa-variants.  */
+      priority = get_builtin_code_for_version (version_decl,
+	 			               &predicate_chain);
+
+      if (predicate_chain == NULL_TREE)
+	continue;
+
+      actual_versions++;
+      function_version_info [ix - 1].version_decl = version_decl;
+      function_version_info [ix - 1].predicate_chain = predicate_chain;
+      function_version_info [ix - 1].dispatch_priority = priority;
+    }
+
+  /* Sort the versions according to descending order of dispatch priority.  The
+     priority is based on the ISA.  This is not a perfect solution.  There
+     could still be ambiguity.  If more than one function version is suitable
+     to execute,  which one should be dispatched?  In future, allow the user
+     to specify a dispatch  priority next to the version.  */
+  qsort (function_version_info, actual_versions,
+         sizeof (struct _function_version_info), feature_compare);
+
+  for  (i = 0; i < actual_versions; ++i)
+    *empty_bb = add_condition_to_bb (dispatch_decl,
+				     function_version_info[i].version_decl,
+				     function_version_info[i].predicate_chain,
+				     *empty_bb);
+
+  /* dispatch default version at the end.  */
+  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl,
+				   NULL, *empty_bb);
+
+  free (function_version_info);
+  return 0;
+}
+
+/* This function returns true if FN1 and FN2 are versions of the same function,
+   that is, the targets of the function decls are different.  This assumes
+   that FN1 and FN2 have the same signature.  */
+
+static bool
+ix86_function_versions (tree fn1, tree fn2)
+{
+  tree attr1, attr2;
+  struct cl_target_option *target1, *target2;
+
+  if (TREE_CODE (fn1) != FUNCTION_DECL
+      || TREE_CODE (fn2) != FUNCTION_DECL)
+    return false;
+
+  attr1 = DECL_FUNCTION_SPECIFIC_TARGET (fn1);
+  attr2 = DECL_FUNCTION_SPECIFIC_TARGET (fn2);
+
+  /* Atleast one function decl should have target attribute specified.  */
+  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
+    return false;
+
+  if (attr1 == NULL_TREE)
+    attr1 = target_option_default_node;
+  else if (attr2 == NULL_TREE)
+    attr2 = target_option_default_node;
+
+  target1 = TREE_TARGET_OPTION (attr1);
+  target2 = TREE_TARGET_OPTION (attr2);
+
+  /* target1 and target2 must be different in some way.  */
+  if (target1->x_ix86_isa_flags == target2->x_ix86_isa_flags
+      && target1->x_target_flags == target2->x_target_flags
+      && target1->arch == target2->arch
+      && target1->tune == target2->tune
+      && target1->x_ix86_fpmath == target2->x_ix86_fpmath
+      && target1->branch_cost == target2->branch_cost)
+    return false;
+
+  return true;
+}
+
+/* Comparator function to be used in qsort routine to sort attribute
+   specification strings to "target".  */
+
+static int
+attr_strcmp (const void *v1, const void *v2)
+{
+  const char *c1 = *(char *const*)v1;
+  const char *c2 = *(char *const*)v2;
+  return strcmp (c1, c2);
+}
+
+/* STR is the argument to target attribute.  This function tokenizes
+   the comma separated arguments, sorts them and returns a string which
+   is a unique identifier for the comma separated arguments.   It also
+   replaces non-identifier characters "=,-" with "_".  */
+
+static char *
+sorted_attr_string (const char *str)
+{
+  char **args = NULL;
+  char *attr_str, *ret_str;
+  char *attr = NULL;
+  unsigned int argnum = 1;
+  unsigned int i;
+
+  for (i = 0; i < strlen (str); i++)
+    if (str[i] == ',')
+      argnum++;
+
+  attr_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (attr_str, str);
+
+  /* Replace "=,-" with "_".  */
+  for (i = 0; i < strlen (attr_str); i++)
+    if (attr_str[i] == '=' || attr_str[i]== '-')
+      attr_str[i] = '_';
+
+  if (argnum == 1)
+    return attr_str;
+
+  args = XNEWVEC (char *, argnum);
+
+  i = 0;
+  attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      args[i] = attr;
+      i++;
+      attr = strtok (NULL, ",");
+    }
+
+  qsort (args, argnum, sizeof (char*), attr_strcmp);
+
+  ret_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (ret_str, args[0]);
+  for (i = 1; i < argnum; i++)
+    {
+      strcat (ret_str, "_");
+      strcat (ret_str, args[i]);
+    }
+
+  free (args);
+  free (attr_str);
+  return ret_str;
+}
+
+/* This function changes the assembler name for functions that are
+   versions.  If DECL is a function version and has a "target"
+   attribute, it appends the attribute string to its assembler name.  */
+
+static tree
+ix86_mangle_function_version_assembler_name (tree decl, tree id)
+{
+  tree version_attr;
+  const char *orig_name, *version_string, *attr_str;
+  char *assembler_name;
+
+  if (DECL_DECLARED_INLINE_P (decl)
+      && lookup_attribute ("gnu_inline",
+			   DECL_ATTRIBUTES (decl)))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Function versions cannot be marked as gnu_inline,"
+	      " bodies have to be generated");
+
+  if (DECL_VIRTUAL_P (decl)
+      || DECL_VINDEX (decl))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Virtual function versioning not supported\n");
+
+  version_attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+
+  /* target attribute string is NULL for default functions.  */
+  if (version_attr == NULL_TREE)
+    return id;
+
+  orig_name = IDENTIFIER_POINTER (id);
+  version_string
+    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
+
+  attr_str = sorted_attr_string (version_string);
+  assembler_name = (char *) xmalloc (strlen (orig_name)
+				     + strlen (attr_str) + 2);
+
+  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
+
+  /* Allow assembler name to be modified if already set.  */
+  if (DECL_ASSEMBLER_NAME_SET_P (decl))
+    SET_DECL_RTL (decl, NULL);
+
+  return get_identifier (assembler_name);
+}
+
+static tree 
+ix86_mangle_decl_assembler_name (tree decl, tree id)
+{
+  /* For function version, add the target suffix to the assembler name.  */
+  if (TREE_CODE (decl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (decl))
+    return ix86_mangle_function_version_assembler_name (decl, id);
+
+  return id;
+}
+
+/* Return a new name by appending SUFFIX to the DECL name.  If make_unique
+   is true, append the full path name of the source file.  */
+
+static char *
+make_name (tree decl, const char *suffix, bool make_unique)
+{
+  char *global_var_name;
+  int name_len;
+  const char *name;
+  const char *unique_name = NULL;
+
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* Get a unique name that can be used globally without any chances
+     of collision at link time.  */
+  if (make_unique)
+    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
+
+  name_len = strlen (name) + strlen (suffix) + 2;
+
+  if (make_unique)
+    name_len += strlen (unique_name) + 1;
+  global_var_name = XNEWVEC (char, name_len);
+
+  /* Use '.' to concatenate names as it is demangler friendly.  */
+  if (make_unique)
+      snprintf (global_var_name, name_len, "%s.%s.%s", name,
+		unique_name, suffix);
+  else
+      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
+
+  return global_var_name;
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher
+   by the front-end.  Return the decl created.  */
+
+static tree
+make_dispatcher_decl (const tree decl)
+{
+  tree func_decl;
+  char *func_name, *resolver_name;
+  tree fn_type, func_type;
+  bool is_uniq = false;
+
+  if (TREE_PUBLIC (decl) == 0)
+    is_uniq = true;
+
+  func_name = make_name (decl, "ifunc", is_uniq);
+  resolver_name = make_name (decl, "resolver", is_uniq);
+  gcc_assert (resolver_name);
+
+  fn_type = TREE_TYPE (decl);
+  func_type = build_function_type (TREE_TYPE (fn_type),
+				   TYPE_ARG_TYPES (fn_type));
+  
+  func_decl = build_fn_decl (func_name, func_type);
+  TREE_USED (func_decl) = 1;
+  DECL_CONTEXT (func_decl) = NULL_TREE;
+  DECL_INITIAL (func_decl) = error_mark_node;
+  DECL_ARTIFICIAL (func_decl) = 1;
+  /* Mark this func as external, the resolver will flip it again if
+     it gets generated.  */
+  DECL_EXTERNAL (func_decl) = 1;
+  /* This will be of type IFUNCs have to be externally visible.  */
+  TREE_PUBLIC (func_decl) = 1;
+
+  return func_decl;  
+}
+
+/* Returns true if decl is multi-versioned and DECL is the default function,
+   that is it is not tagged with target specific optimization.  */
+
+static bool
+is_function_default_version (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL_TREE);
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher
+   by the front-end.  It also chains the cgraph nodes of all the
+   semantically identical versions in vector FN_VER_VEC_P.  Returns the
+   decl of the dispatcher function.  */
+
+static tree
+ix86_get_function_versions_dispatcher (void *fn_ver_vec_p)
+{
+  struct cgraph_node *node = NULL;
+  struct cgraph_node *default_node = NULL;
+  struct cgraph_node *dispatcher_node = NULL;
+
+  struct cgraph_function_version_info *default_version_info = NULL;
+  struct cgraph_function_version_info *dispatcher_version_info = NULL;
+  struct cgraph_function_version_info *node_version_info = NULL;
+
+  int ix;
+  tree ele;
+  tree dispatch_decl = NULL;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+
+  fn_ver_vec = (VEC (tree,heap) *) fn_ver_vec_p;
+  gcc_assert (fn_ver_vec != NULL);
+
+  /* Find the default version.  */
+  for (ix = 0; VEC_iterate (tree, fn_ver_vec, ix, ele); ++ix)
+    {
+      if (is_function_default_version (ele))
+	{
+	  default_node = cgraph_get_create_node (ele);
+	  break;
+	}
+    }
+
+  /* If there is no default node, just return NULL.  */
+  if (!default_node)
+    return NULL;
+
+  default_version_info = get_cgraph_node_version (default_node);
+
+  /* If the dispatcher is already there, return it.  */
+  if (default_version_info && default_version_info->dispatcher_resolver)
+    return default_version_info->dispatcher_resolver;
+
+  if (default_version_info == NULL)
+    default_version_info = insert_new_cgraph_node_version (default_node);
+
+#if defined (ASM_OUTPUT_TYPE_DIRECTIVE) && HAVE_GNU_INDIRECT_FUNCTION
+  /* Right now, the dispatching is done via ifunc.  */
+  dispatch_decl = make_dispatcher_decl (default_node->symbol.decl); 
+#else
+  error_at (DECL_SOURCE_LOCATION (default_node->symbol.decl),
+	    "Multiversioning needs ifunc which is not supported "
+	    "in this configuration");
+#endif
+
+  default_version_info->dispatcher_resolver = dispatch_decl;
+  dispatcher_node = cgraph_get_create_node (dispatch_decl);
+  gcc_assert (dispatcher_node);
+  dispatcher_node->dispatcher_function = 1;
+  dispatcher_version_info = insert_new_cgraph_node_version (dispatcher_node);
+  cgraph_mark_address_taken_node (default_node);
+
+  for (ix = 0; VEC_iterate (tree, fn_ver_vec, ix, ele); ++ix)
+    {
+      node = cgraph_get_create_node (ele);
+      gcc_assert (node != NULL && DECL_FUNCTION_VERSIONED (ele));
+
+      if (node == default_node)
+	continue;
+
+      node_version_info = get_cgraph_node_version (node);
+      if (node_version_info == NULL)
+	node_version_info = insert_new_cgraph_node_version (node);
+
+      gcc_assert (DECL_FUNCTION_SPECIFIC_TARGET (ele) != NULL_TREE);
+
+      /* Chain all the cgraph_function_version_info nodes that are
+	 semantically identical.  */
+      if (dispatcher_version_info->next)
+	{
+	  node_version_info->next = dispatcher_version_info->next;
+	  dispatcher_version_info->next->prev = node_version_info;
+	}
+
+      dispatcher_version_info->next = node_version_info;
+      node_version_info->prev = dispatcher_version_info;
+      node_version_info->dispatcher_resolver = dispatch_decl;
+    }
+
+  /* The default version should be the first node.  */
+  default_version_info->next = dispatcher_version_info->next;
+  dispatcher_version_info->next->prev = default_version_info;
+
+  /* The dispatcher node should directly point to the default node.  */
+  dispatcher_version_info->next = default_version_info;
+  
+  return dispatch_decl;
+}
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* Make the resolver function decl to dispatch the versions of
+   a multi-versioned function,  DEFAULT_DECL.  Create an
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_resolver_func (const tree default_decl,
+		    const tree dispatch_decl,
+		    basic_block *empty_bb)
+{
+  char *resolver_name;
+  tree decl, type, decl_name, t;
+  basic_block new_bb;
+  bool is_uniq = false;
+
+  /* IFUNC's have to be globally visible.  So, if the default_decl is
+     not, then the name of the IFUNC should be made unique.  */
+  if (TREE_PUBLIC (default_decl) == 0)
+    is_uniq = true;
+
+  /* Append the filename to the resolver function if the versions are
+     not externally visible.  This is because the resolver function has
+     to be externally visible for the loader to find it.  So, appending
+     the filename will prevent conflicts with a resolver function from
+     another module which is based on the same version name.  */
+  resolver_name = make_name (default_decl, "resolver", is_uniq);
+
+  /* The resolver function should return a (void *). */
+  type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  decl_name = get_identifier (resolver_name);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  /* IFUNC resolvers have to be externally visible.  */
+  TREE_PUBLIC (decl) = 1;
+  DECL_UNINLINABLE (decl) = 1;
+
+  DECL_EXTERNAL (decl) = 0;
+  DECL_EXTERNAL (dispatch_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+  TREE_READONLY (decl) = 0;
+  DECL_PURE_P (decl) = 0;
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
+      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
+    }
+  else if (TREE_PUBLIC (default_decl))
+    {
+      /* In this case, each translation unit with a call to this
+	 versioned function will put out a resolver.  Ensure it
+	 is comdat to keep just one copy.  */
+      DECL_COMDAT (decl) = 1;
+      make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+    }
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  gimple_register_cfg_hooks ();
+  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
+  cfun->curr_properties |=
+    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_ssa
+     | PROP_gimple_any);
+  cfun->curr_properties = 15;
+  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
+  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
+  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
+  *empty_bb = new_bb;
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      gcc_assert (cgraph_get_node (default_decl));
+      symtab_add_to_same_comdat_group (
+	(symtab_node) cgraph_get_node (decl),
+	(symtab_node) cgraph_get_node (default_decl));
+    }
+
+  pop_cfun ();
+
+  gcc_assert (dispatch_decl != NULL);
+  /* Mark dispatch_decl as "ifunc" with resolver as resolver_name.  */
+  DECL_ATTRIBUTES (dispatch_decl) 
+    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));
+
+  /* Create the alias for dispatch to resolver here.  */
+  cgraph_create_function_alias (dispatch_decl, decl);
+  return decl;
+}
+
+/* Generate the dispatching code body to dispatch multi-versioned function
+   DECL.  The target hook is called to process the "target" attributes and
+   provide the code to dispatch the right function at run-time.  NODE points
+   to the dispatcher decl whose body will be created.  */
+
+static tree 
+ix86_generate_version_dispatcher_body (void *node_p)
+{
+  tree resolver_decl;
+  basic_block empty_bb;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  tree default_ver_decl;
+  struct cgraph_node *versn;
+  struct cgraph_node *node;
+
+  struct cgraph_function_version_info *node_version_info = NULL;
+  struct cgraph_function_version_info *versn_info = NULL;
+
+  node = (cgraph_node *)node_p;
+
+  node_version_info = get_cgraph_node_version (node);
+  gcc_assert (node->dispatcher_function
+	      && node_version_info != NULL);
+
+  if (node_version_info->dispatcher_resolver)
+    return node_version_info->dispatcher_resolver;
+
+  /* The first version in the chain corresponds to the default version.  */
+  default_ver_decl = node_version_info->next->this_node->symbol.decl;
+
+  /* node is going to be an alias, so remove the finalized bit.  */
+  node->local.finalized = 0;
+
+  resolver_decl = make_resolver_func (default_ver_decl,
+				      node->symbol.decl, &empty_bb);
+  node_version_info->dispatcher_resolver = resolver_decl;
+
+  push_cfun (DECL_STRUCT_FUNCTION (resolver_decl));
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+
+  for (versn_info = node_version_info->next; versn_info;
+       versn_info = versn_info->next)
+    {
+      versn = versn_info->this_node;
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (versn->symbol.decl))
+        error_at (DECL_SOURCE_LOCATION (versn->symbol.decl),
+		  "Virtual function multiversioning not supported");
+      VEC_safe_push (tree, heap, fn_ver_vec, versn->symbol.decl);
+    }
+
+  dispatch_function_versions (resolver_decl, fn_ver_vec, &empty_bb);
+
+  rebuild_cgraph_edges (); 
+  pop_cfun ();
+  return resolver_decl;
+}
+
 /* This builds the processor_model struct type defined in
    libgcc/config/i386/cpuinfo.c  */
 
@@ -41005,6 +42002,9 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_PROFILE_BEFORE_PROLOGUE
 #define TARGET_PROFILE_BEFORE_PROLOGUE ix86_profile_before_prologue
 
+#undef TARGET_MANGLE_DECL_ASSEMBLER_NAME
+#define TARGET_MANGLE_DECL_ASSEMBLER_NAME ix86_mangle_decl_assembler_name
+
 #undef TARGET_ASM_UNALIGNED_HI_OP
 #define TARGET_ASM_UNALIGNED_HI_OP TARGET_ASM_ALIGNED_HI_OP
 #undef TARGET_ASM_UNALIGNED_SI_OP
@@ -41098,6 +42098,17 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN ix86_fold_builtin
 
+#undef TARGET_COMPARE_VERSION_PRIORITY
+#define TARGET_COMPARE_VERSION_PRIORITY ix86_compare_version_priority
+
+#undef TARGET_GENERATE_VERSION_DISPATCHER_BODY
+#define TARGET_GENERATE_VERSION_DISPATCHER_BODY \
+  ix86_generate_version_dispatcher_body
+
+#undef TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+#define TARGET_GET_FUNCTION_VERSIONS_DISPATCHER \
+  ix86_get_function_versions_dispatcher
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
 
@@ -41238,6 +42249,9 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_OPTION_PRINT
 #define TARGET_OPTION_PRINT ix86_function_specific_print
 
+#undef TARGET_OPTION_FUNCTION_VERSIONS
+#define TARGET_OPTION_FUNCTION_VERSIONS ix86_function_versions
+
 #undef TARGET_CAN_INLINE_P
 #define TARGET_CAN_INLINE_P ix86_can_inline_p
 
Index: gcc/testsuite/g++.dg/mv1.C
===================================================================
--- gcc/testsuite/g++.dg/mv1.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv1.C	(revision 0)
@@ -0,0 +1,130 @@
+/* Test case to check if Multiversioning works.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -fPIC -mno-avx -mno-popcnt" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The other versions of foo.  Mix up the ordering and 
+   check if the dispatching does it in the order of priority. */
+/* Check combination of target attributes.  */
+int foo () __attribute__ ((target("arch=corei7,popcnt")));
+/* The target operands in this declaration and the definition are re-ordered.
+   This should still work.  */
+int foo () __attribute__ ((target("ssse3,avx2")));
+
+/* Check for all target attributes for which dispatchers are available.  */
+/* Check arch= */
+int foo () __attribute__((target("arch=core2")));
+int foo () __attribute__((target("arch=corei7")));
+int foo () __attribute__((target("arch=atom")));
+/* Check ISAs  */
+int foo () __attribute__((target("avx")));
+int foo () __attribute__ ((target("arch=core2,sse4.2")));
+/* Check more arch=.  */
+int foo () __attribute__((target("arch=amdfam10")));
+int foo () __attribute__((target("arch=bdver1")));
+int foo () __attribute__((target("arch=bdver2")));
+
+int (*p)() = &foo;
+int main ()
+{
+  int val = foo ();
+  assert (val ==  (*p)());
+
+  /* Check in the exact same order in which the dispatching
+     is expected to happen.  */
+  if (__builtin_cpu_is ("bdver1"))
+    assert (val == 1);
+  else if (__builtin_cpu_is ("bdver2"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("avx2")
+	   && __builtin_cpu_supports ("ssse3"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 4);
+  else if (__builtin_cpu_is ("corei7")
+	   && __builtin_cpu_supports ("popcnt"))
+    assert (val == 5);
+  else if (__builtin_cpu_is ("corei7"))
+    assert (val == 6);
+  else if (__builtin_cpu_is ("amdfam10h"))
+    assert (val == 7);
+  else if (__builtin_cpu_is ("core2")
+	   && __builtin_cpu_supports ("sse4.2"))
+    assert (val == 8);
+  else if (__builtin_cpu_is ("core2"))
+    assert (val == 9);
+  else if (__builtin_cpu_is ("atom"))
+    assert (val == 10);
+  else
+    assert (val == 0);
+  
+  return 0;
+}
+
+int foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=corei7,popcnt")))
+foo ()
+{
+  return 5;
+}
+int __attribute__ ((target("avx2,ssse3")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("arch=core2")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("arch=corei7")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("arch=atom")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("arch=core2,sse4.2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("arch=amdfam10")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("arch=bdver1")))
+foo ()
+{
+  return 1;
+}
+
+int __attribute__ ((target("arch=bdver2")))
+foo ()
+{
+  return 2;
+}
Index: gcc/testsuite/g++.dg/mv2.C
===================================================================
--- gcc/testsuite/g++.dg/mv2.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv2.C	(revision 0)
@@ -0,0 +1,121 @@
+/* Test case to check if Multiversioning chooses the correct
+   dispatching order when versions are for various ISAs.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -mno-sse -mno-mmx -mno-popcnt -mno-avx" } */
+
+#include <assert.h>
+#include <stdio.h>
+
+/* Default version.  */
+int foo ();
+/* The dispatch checks should be in the exact reverse order of the
+   declarations below.  */
+int foo () __attribute__ ((target ("mmx")));
+int foo () __attribute__ ((target ("sse")));
+int foo () __attribute__ ((target ("sse2")));
+int foo () __attribute__ ((target ("sse3")));
+int foo () __attribute__ ((target ("ssse3")));
+int foo () __attribute__ ((target ("sse4.1")));
+int foo () __attribute__ ((target ("sse4.2")));
+int foo () __attribute__ ((target ("popcnt")));
+int foo () __attribute__ ((target ("avx")));
+int foo () __attribute__ ((target ("avx2")));
+
+int main ()
+{
+
+  int val = foo ();
+  printf ("val = %d\n", val);
+
+  if (__builtin_cpu_supports ("avx2"))
+    assert (val == 1);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("popcnt"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("sse4.2"))
+    assert (val == 4);
+  else if (__builtin_cpu_supports ("sse4.1"))
+    assert (val == 5);
+  else if (__builtin_cpu_supports ("ssse3"))
+    assert (val == 6);
+  else if (__builtin_cpu_supports ("sse3"))
+    assert (val == 7);
+  else if (__builtin_cpu_supports ("sse2"))
+    assert (val == 8);
+  else if (__builtin_cpu_supports ("sse"))
+    assert (val == 9);
+  else if (__builtin_cpu_supports ("mmx"))
+    assert (val == 10);
+  else
+    assert (val == 0);
+
+  return 0;
+}
+
+int
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 2;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return 1;
+}
Index: gcc/testsuite/g++.dg/mv3.C
===================================================================
--- gcc/testsuite/g++.dg/mv3.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv3.C	(revision 0)
@@ -0,0 +1,37 @@
+/* Test case to check if a call to a multiversioned function
+   is replaced with a direct call to the particular version when
+   the most specialized version's target attributes match the
+   caller.  
+  
+   In this program, foo is multiversioned but there is no default
+   function.  This is an error if the call has to go through a
+   dispatcher.  However, the call to foo in bar can be replaced
+   with a direct call to the popcnt version of foo.  Hence, this
+   test should pass.  */
+
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -mno-sse -mno-popcnt" } */
+
+
+/* Default version.  */
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target ("popcnt")))
+bar ()
+{
+  return foo ();
+}
+
+int main ()
+{
+  return bar ();
+}
Index: gcc/testsuite/g++.dg/mv4.C
===================================================================
--- gcc/testsuite/g++.dg/mv4.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv4.C	(revision 0)
@@ -0,0 +1,23 @@
+/* Test case to check if the compiler generates an error message
+   when the default version of a multiversioned function is absent
+   and its pointer is taken.  */
+
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -mno-sse -mno-popcnt" } */
+
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int main ()
+{
+  int (*p)() = &foo; /* { dg-error "Pointer to a multiversioned function without a default is not allowed" {} } */
+  return (*p)();
+}

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-10-28  4:31                                                                                   ` Sriraman Tallam
@ 2012-10-29 13:05                                                                                     ` Jan Hubicka
  2012-10-29 17:56                                                                                       ` Sriraman Tallam
  2012-10-30 19:18                                                                                     ` Jason Merrill
  1 sibling, 1 reply; 93+ messages in thread
From: Jan Hubicka @ 2012-10-29 13:05 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Jan Hubicka, Diego Novillo, Jason Merrill, Jan Hubicka,
	Xinliang David Li, Mark Mitchell, Nathan Sidwell, H.J. Lu,
	Richard Guenther, Uros Bizjak, reply, GCC Patches

> Index: gcc/cgraph.c
> ===================================================================
> --- gcc/cgraph.c	(revision 192623)
> +++ gcc/cgraph.c	(working copy)
> @@ -132,6 +132,74 @@ static GTY(()) struct cgraph_edge *free_edges;
>  /* Did procss_same_body_aliases run?  */
>  bool same_body_aliases_done;
>  
> +/* Map a cgraph_node to cgraph_function_version_info using this htab.
> +   The cgraph_function_version_info has a THIS_NODE field that is the
> +   corresponding cgraph_node..  */
> +htab_t GTY((param_is (struct cgraph_function_version_info *)))
> +  cgraph_fnver_htab = NULL;

I think you want declare the htab static and arrange it to be freed after
cgraph construction, so you don't need to take care of nodes being removed
via the hooks.

OK with this change.

I have few other comments:
> +  /* IFUNC resolvers have to be externally visible.  */
> +  TREE_PUBLIC (decl) = 1;
> +  DECL_UNINLINABLE (decl) = 1;

Why the resolvers can not be inlined?
> +
> +  DECL_EXTERNAL (decl) = 0;
> +  DECL_EXTERNAL (dispatch_decl) = 0;
> +
> +  DECL_CONTEXT (decl) = NULL_TREE;
> +  DECL_INITIAL (decl) = make_node (BLOCK);
> +  DECL_STATIC_CONSTRUCTOR (decl) = 0;
> +  TREE_READONLY (decl) = 0;
> +  DECL_PURE_P (decl) = 0;

I think those can be copied from the functions you are resolving. (well as well
as many attributes and properties)
> +
> +  if (DECL_COMDAT_GROUP (default_decl))
> +    {
> +      DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
> +      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
> +    }
> +  else if (TREE_PUBLIC (default_decl))
> +    {
> +      /* In this case, each translation unit with a call to this
> +	 versioned function will put out a resolver.  Ensure it
> +	 is comdat to keep just one copy.  */
> +      DECL_COMDAT (decl) = 1;
> +      make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
> +    }
> +  /* Build result decl and add to function_decl. */
> +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
> +  DECL_ARTIFICIAL (t) = 1;
> +  DECL_IGNORED_P (t) = 1;
> +  DECL_RESULT (decl) = t;
> +
> +  gimplify_function_tree (decl);
> +  push_cfun (DECL_STRUCT_FUNCTION (decl));
> +  gimple_register_cfg_hooks ();
> +  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
> +  cfun->curr_properties |=
> +    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_ssa
> +     | PROP_gimple_any);
> +  cfun->curr_properties = 15;
> +  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
> +  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
> +  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
> +  *empty_bb = new_bb;

You can simplify this by init_lowered_empty_function.

Honza

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-10-29 13:05                                                                                     ` Jan Hubicka
@ 2012-10-29 17:56                                                                                       ` Sriraman Tallam
  0 siblings, 0 replies; 93+ messages in thread
From: Sriraman Tallam @ 2012-10-29 17:56 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Diego Novillo, Jason Merrill, Jan Hubicka, Xinliang David Li,
	Mark Mitchell, Nathan Sidwell, H.J. Lu, Richard Guenther,
	Uros Bizjak, reply, GCC Patches

On Mon, Oct 29, 2012 at 5:55 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Index: gcc/cgraph.c
>> ===================================================================
>> --- gcc/cgraph.c      (revision 192623)
>> +++ gcc/cgraph.c      (working copy)
>> @@ -132,6 +132,74 @@ static GTY(()) struct cgraph_edge *free_edges;
>>  /* Did procss_same_body_aliases run?  */
>>  bool same_body_aliases_done;
>>
>> +/* Map a cgraph_node to cgraph_function_version_info using this htab.
>> +   The cgraph_function_version_info has a THIS_NODE field that is the
>> +   corresponding cgraph_node..  */
>> +htab_t GTY((param_is (struct cgraph_function_version_info *)))
>> +  cgraph_fnver_htab = NULL;
>
> I think you want declare the htab static and arrange it to be freed after
> cgraph construction, so you don't need to take care of nodes being removed
> via the hooks.

I will declare the htab static but I want this htab for later
optimizations, like dispatch hoisting. Please see:
http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02285.html for a
description of the optimization.  IFUNC based dispatch blocks inlining
of multi-versioned functions and dispatch hoisting will help with
this.

I will make the other changes asap.

Thanks,
-Sri.


>
> OK with this change.
>
> I have few other comments:
>> +  /* IFUNC resolvers have to be externally visible.  */
>> +  TREE_PUBLIC (decl) = 1;
>> +  DECL_UNINLINABLE (decl) = 1;
>
> Why the resolvers can not be inlined?
>> +
>> +  DECL_EXTERNAL (decl) = 0;
>> +  DECL_EXTERNAL (dispatch_decl) = 0;
>> +
>> +  DECL_CONTEXT (decl) = NULL_TREE;
>> +  DECL_INITIAL (decl) = make_node (BLOCK);
>> +  DECL_STATIC_CONSTRUCTOR (decl) = 0;
>> +  TREE_READONLY (decl) = 0;
>> +  DECL_PURE_P (decl) = 0;
>
> I think those can be copied from the functions you are resolving. (well as well
> as many attributes and properties)
>> +
>> +  if (DECL_COMDAT_GROUP (default_decl))
>> +    {
>> +      DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
>> +      make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));
>> +    }
>> +  else if (TREE_PUBLIC (default_decl))
>> +    {
>> +      /* In this case, each translation unit with a call to this
>> +      versioned function will put out a resolver.  Ensure it
>> +      is comdat to keep just one copy.  */
>> +      DECL_COMDAT (decl) = 1;
>> +      make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
>> +    }
>> +  /* Build result decl and add to function_decl. */
>> +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
>> +  DECL_ARTIFICIAL (t) = 1;
>> +  DECL_IGNORED_P (t) = 1;
>> +  DECL_RESULT (decl) = t;
>> +
>> +  gimplify_function_tree (decl);
>> +  push_cfun (DECL_STRUCT_FUNCTION (decl));
>> +  gimple_register_cfg_hooks ();
>> +  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
>> +  cfun->curr_properties |=
>> +    (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_ssa
>> +     | PROP_gimple_any);
>> +  cfun->curr_properties = 15;
>> +  new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
>> +  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
>> +  make_edge (new_bb, EXIT_BLOCK_PTR, 0);
>> +  *empty_bb = new_bb;
>
> You can simplify this by init_lowered_empty_function.
>
> Honza

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-10-28  4:31                                                                                   ` Sriraman Tallam
  2012-10-29 13:05                                                                                     ` Jan Hubicka
@ 2012-10-30 19:18                                                                                     ` Jason Merrill
  2012-10-31  0:58                                                                                       ` Sriraman Tallam
       [not found]                                                                                       ` <CAAs8Hmw09giv-5_v0irhByTjTJV=kD58rCAD2SAz7M8zrwjBOA@mail.gmail.com>
  1 sibling, 2 replies; 93+ messages in thread
From: Jason Merrill @ 2012-10-30 19:18 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Jan Hubicka, Diego Novillo, Jan Hubicka, Xinliang David Li,
	Mark Mitchell, Nathan Sidwell, H.J. Lu, Richard Guenther,
	Uros Bizjak, reply, GCC Patches

On 10/27/2012 09:16 PM, Sriraman Tallam wrote:
> +	  /* See if there's a match.   For functions that are multi-versioned,
> +	     all the versions match.  */
>   	  if (same_type_p (target_fn_type, static_fn_type (fn)))
> -	    matches = tree_cons (fn, NULL_TREE, matches);
> +	    {
> +	      matches = tree_cons (fn, NULL_TREE, matches);
> +	      /*If versioned, push all possible versions into a vector.  */
> +	      if (DECL_FUNCTION_VERSIONED (fn))
> +		{
> +		  if (fn_ver_vec == NULL)
> +		   fn_ver_vec = VEC_alloc (tree, heap, 2);
> +		  VEC_safe_push (tree, heap, fn_ver_vec, fn);
> +		}
> +	    }

Why do we need to keep both a list and vector of the matches?

> +	 Call decls_match to make sure they are different because they are
> +	 versioned.  */
> +      if (DECL_FUNCTION_VERSIONED (fn))
> +	{
> +          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
> +  	    if (decls_match (fn, TREE_PURPOSE (match)))
> +	      break;
> +	}

What if you have multiple matches that aren't all versions of the same 
function?

Why would it be a problem to have two separate declarations of the same 
function?

> +      dispatcher_decl = targetm.get_function_versions_dispatcher (fn_ver_vec);

Is the idea here that if you have some versions declared, then a call, 
then more versions declared, then another call, you will call two 
different dispatchers, where the first one will only dispatch to the 
versions declared before the first call?  If not, why do we care about 
the set of declarations at this point?

> +      /* Mark this functio to be output.  */
> +      node->local.finalized = 1;

Missing 'n' in "function".

> @@ -14227,7 +14260,11 @@ cxx_comdat_group (tree decl)
>  	  else
>  	    break;
>  	}
> -      name = DECL_ASSEMBLER_NAME (decl);
> +      if (TREE_CODE (decl) == FUNCTION_DECL
> +	     && DECL_FUNCTION_VERSIONED (decl))
> +	   name = DECL_NAME (decl);

This would mean that f in the global namespace and f in namespace foo 
would end up in the same comdat group.  Why do we need special handling 
here at all?

>  dump_function_name (tree t, int flags)
>  {
> -  tree name = DECL_NAME (t);
> +  tree name;
>
> +  /* For function versions, use the assembler name as the decl name is
> +     the same for all versions.  */
> +  if (TREE_CODE (t) == FUNCTION_DECL
> +      && DECL_FUNCTION_VERSIONED (t))
> +    name = DECL_ASSEMBLER_NAME (t);

This shouldn't be necessary; we should print the target attribute when 
printing the function declaration.

> +	 Also, mark this function as needed if it is marked inline but
> +	 is a multi-versioned function.  */
> +      if (((flag_keep_inline_functions
> +	    || DECL_FUNCTION_VERSIONED (fn))

This should be marked as needed by the code that builds the dispatcher.

> +  /* For calls to a multi-versioned function, overload resolution
> +     returns the function with the highest target priority, that is,
> +     the version that will checked for dispatching first.  If this
> +     version is inlinable, a direct call to this version can be made
> +     otherwise the call should go through the dispatcher.  */

I'm a bit confused why people would want both dispatched calls and 
non-dispatched inlining; I would expect that if a function can be 
compiled differently enough on newer hardware to make versioning 
worthwhile, that would be a larger difference than the call overhead.

> +  if (DECL_FUNCTION_VERSIONED (fn)
> +      && !targetm.target_option.can_inline_p (current_function_decl, fn))
> +    {
> +      struct cgraph_node *dispatcher_node = NULL;
> +      fn = get_function_version_dispatcher (fn);
> +      if (fn == NULL)
> +	return NULL;
> +      dispatcher_node = cgraph_get_create_node (fn);
> +      gcc_assert (dispatcher_node != NULL);
> +      /* Mark this function to be output.  */
> +      dispatcher_node->local.finalized = 1;
> +    }

Why do you need to mark this here?  If you generate a call to the 
dispatcher, cgraph should mark it to be output automatically.

> +  /* For candidates of a multi-versioned function,  make the version with
> +     the highest priority win.  This version will be checked for dispatching
> +     first.  If this version can be inlined into the caller, the front-end
> +     will simply make a direct call to this function.  */

This is still too high in joust.  I believe I said before that this code 
should come just above

    /* If the two function declarations represent the same function 
(this can
       happen with declarations in multiple scopes and arg-dependent 
lookup),
       arbitrarily choose one.  But first make sure the default args 
we're
       using match.  */

> +  /* For multiversioned functions, aggregate all the versions here for
> +     generating the dispatcher body later if necessary.  Check to see if
> +     the dispatcher is already generated to avoid doing this more than
> +     once.  */

This caching seems to assume that you'll always be considering the same 
group of declarations, which goes back to my earlier question.

Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-10-30 19:18                                                                                     ` Jason Merrill
@ 2012-10-31  0:58                                                                                       ` Sriraman Tallam
       [not found]                                                                                       ` <CAAs8Hmw09giv-5_v0irhByTjTJV=kD58rCAD2SAz7M8zrwjBOA@mail.gmail.com>
  1 sibling, 0 replies; 93+ messages in thread
From: Sriraman Tallam @ 2012-10-31  0:58 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Jan Hubicka, Diego Novillo, Jan Hubicka, Xinliang David Li,
	Mark Mitchell, Nathan Sidwell, H.J. Lu, Richard Guenther,
	Uros Bizjak, reply, GCC Patches

On Tue, Oct 30, 2012 at 12:10 PM, Jason Merrill <jason@redhat.com> wrote:
> On 10/27/2012 09:16 PM, Sriraman Tallam wrote:
>>
>> +         /* See if there's a match.   For functions that are
>> multi-versioned,
>> +            all the versions match.  */
>>           if (same_type_p (target_fn_type, static_fn_type (fn)))
>> -           matches = tree_cons (fn, NULL_TREE, matches);
>> +           {
>> +             matches = tree_cons (fn, NULL_TREE, matches);
>> +             /*If versioned, push all possible versions into a vector.
>> */
>> +             if (DECL_FUNCTION_VERSIONED (fn))
>> +               {
>> +                 if (fn_ver_vec == NULL)
>> +                  fn_ver_vec = VEC_alloc (tree, heap, 2);
>> +                 VEC_safe_push (tree, heap, fn_ver_vec, fn);
>> +               }
>> +           }
>
>
> Why do we need to keep both a list and vector of the matches?

Right, but we later call the target hook
get_function_versions_dispatcher which takes a vector. I could change
that to accept a list instead if that is preferable?

>
>> +        Call decls_match to make sure they are different because they are
>> +        versioned.  */
>> +      if (DECL_FUNCTION_VERSIONED (fn))
>> +       {
>> +          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN
>> (match))
>> +           if (decls_match (fn, TREE_PURPOSE (match)))
>> +             break;
>> +       }
>
>
> What if you have multiple matches that aren't all versions of the same
> function?

Right, I should really check if there are versions by comparing params
too. I fixed this in joust but missed out here. I will make the change
so that any matches with functions that do not belong to the
semantically identical group of function versions will be caught and
the ambiguity will be flagged.

>
> Why would it be a problem to have two separate declarations of the same
> function?

AFAIU, this should not be a problem. For duplicate declarations,
duplicate_decls should merge them and they should never be seen here.
Did I miss something?

>
>> +      dispatcher_decl = targetm.get_function_versions_dispatcher
>> (fn_ver_vec);
>
>
> Is the idea here that if you have some versions declared, then a call, then
> more versions declared, then another call, you will call two different
> dispatchers,

No, I thought about this but I did not want to handle this case in
this iteration. The dispatcher is created only once and if more
functions are declared later, they will not be dispatched atleast in
this iteration.

> where the first one will only dispatch to the versions declared
> before the first call?  If not, why do we care about the set of declarations
> at this point?

I am taking the address of a multi-versioned function here. The
front-end is returning the address of the dispatcher decl instead.
Since, I am building the dispatcher here, why not construct the cgraph
datastructures for these versions too?  That is why I aggregate all
the declarations here.

>
>> +      /* Mark this functio to be output.  */
>> +      node->local.finalized = 1;
>
>
> Missing 'n' in "function".
>
>> @@ -14227,7 +14260,11 @@ cxx_comdat_group (tree decl)
>>           else
>>             break;
>>         }
>> -      name = DECL_ASSEMBLER_NAME (decl);
>> +      if (TREE_CODE (decl) == FUNCTION_DECL
>> +            && DECL_FUNCTION_VERSIONED (decl))
>> +          name = DECL_NAME (decl);
>
>
> This would mean that f in the global namespace and f in namespace foo would
> end up in the same comdat group.  Why do we need special handling here at
> all?


Right, we do not need special handling. It is ok for each function
version to be in its own comdat group, I will remove this.

>
>>  dump_function_name (tree t, int flags)
>>  {
>> -  tree name = DECL_NAME (t);
>> +  tree name;
>>
>> +  /* For function versions, use the assembler name as the decl name is
>> +     the same for all versions.  */
>> +  if (TREE_CODE (t) == FUNCTION_DECL
>> +      && DECL_FUNCTION_VERSIONED (t))
>> +    name = DECL_ASSEMBLER_NAME (t);
>
>
> This shouldn't be necessary; we should print the target attribute when
> printing the function declaration.

Ok.

>
>> +        Also, mark this function as needed if it is marked inline but
>> +        is a multi-versioned function.  */
>> +      if (((flag_keep_inline_functions
>> +           || DECL_FUNCTION_VERSIONED (fn))
>
>
> This should be marked as needed by the code that builds the dispatcher.

I had some trouble previously figuring out where to mark this as
needed. I will fix it.

>
>> +  /* For calls to a multi-versioned function, overload resolution
>> +     returns the function with the highest target priority, that is,
>> +     the version that will checked for dispatching first.  If this
>> +     version is inlinable, a direct call to this version can be made
>> +     otherwise the call should go through the dispatcher.  */
>
>
> I'm a bit confused why people would want both dispatched calls and
> non-dispatched inlining; I would expect that if a function can be compiled
> differently enough on newer hardware to make versioning worthwhile, that
> would be a larger difference than the call overhead.

Simple example:

int
foo ()
{
  return 1;
}
int __attribute__ ((target ("popcnt")))
foo ()
{
  return 0;
}

int __attribute__ ((target ("popcnt")))
bar ()
{
  return foo ();
}

Here, the call to foo () from bar () will be turned into a direct call
to the popcnt version.

Here, if bar is executed, then popcnt is supported and the call to foo
from bar will be dispatched to the popcnt version even if it goes
through the dispatcher and this is known at compile time. So, why not
make a direct call?  I am only making direct calls to versions when I
am sure the dispatcher would do the same.

>
>> +  if (DECL_FUNCTION_VERSIONED (fn)
>> +      && !targetm.target_option.can_inline_p (current_function_decl, fn))
>> +    {
>> +      struct cgraph_node *dispatcher_node = NULL;
>> +      fn = get_function_version_dispatcher (fn);
>> +      if (fn == NULL)
>> +       return NULL;
>> +      dispatcher_node = cgraph_get_create_node (fn);
>> +      gcc_assert (dispatcher_node != NULL);
>> +      /* Mark this function to be output.  */
>> +      dispatcher_node->local.finalized = 1;
>> +    }
>
>
> Why do you need to mark this here?  If you generate a call to the
> dispatcher, cgraph should mark it to be output automatically.

dispatcher_node does not have a body  until it is generated in
cgraphunit.c, so cgraph does not mark this field before this is
processed in cgraph_analyze_function.

>
>> +  /* For candidates of a multi-versioned function,  make the version with
>> +     the highest priority win.  This version will be checked for
>> dispatching
>> +     first.  If this version can be inlined into the caller, the
>> front-end
>> +     will simply make a direct call to this function.  */
>
>
> This is still too high in joust.  I believe I said before that this code
> should come just above
>
>    /* If the two function declarations represent the same function (this can
>       happen with declarations in multiple scopes and arg-dependent lookup),
>       arbitrarily choose one.  But first make sure the default args we're
>       using match.  */

Yes, I missed this the last time around. Will fix it this time.

>
>> +  /* For multiversioned functions, aggregate all the versions here for
>> +     generating the dispatcher body later if necessary.  Check to see if
>> +     the dispatcher is already generated to avoid doing this more than
>> +     once.  */
>
>
> This caching seems to assume that you'll always be considering the same
> group of declarations, which goes back to my earlier question.


Yes, for now I want to be only considering the same group of
declarations.  I am assuming that all declarations/definitions of all
versions of foo are seen before the first call to foo. I do not want
multiple dispatcher support complexity in this iteration. Is it ok to
delay this to the next patch iteration?

Your earlier question on this was:

 "This seems to assume that all the functions in the list of
candidates are versioned, but there might be unrelated functions from
different namespaces.  Also, doing this every time someone calls a
versioned function seems like the wrong place; I would think it would
be better to build up a list of versions as you seed declarations, and
then use that list to define the dispatcher at EOF if it's needed."

I have fixed the problem of unrelated functions by always checking the
type (same_type_p) and params (comp_params) in
get_function_version_dispatcher. You talked about doing the dispatcher
building later, but I did it here since I am doing it only once.


Thanks,
-Sri.


>
> Jason
>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
       [not found]                                                                                       ` <CAAs8Hmw09giv-5_v0irhByTjTJV=kD58rCAD2SAz7M8zrwjBOA@mail.gmail.com>
@ 2012-10-31 14:27                                                                                         ` Jason Merrill
  2012-11-02  2:53                                                                                           ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: Jason Merrill @ 2012-10-31 14:27 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: gcc-patches List, Jan Hubicka, Diego Novillo

On 10/30/2012 05:49 PM, Sriraman Tallam wrote:
> AFAIU, this should not be a problem. For duplicate declarations,
> duplicate_decls should merge them and they should never be seen here.
> Did I miss something?

With extern "C" functions you can have multiple declarations of the same 
function in different namespaces that are not duplicates, but still 
match.  And I can't think what that test is supposed to be catching, anyway.

> No, I thought about this but I did not want to handle this case in
> this iteration. The dispatcher is created only once and if more
> functions are declared later, they will not be dispatched atleast in
> this iteration.

I still think that instead of collecting the set of functions in 
overload resolution, they should be collected at declaration time and 
added to a vector in the cgraph information for use when generating the 
body of the dispatcher.

> You talked about doing the dispatcher
> building later, but I did it here since I am doing it only once.

I still don't think this is the right place for it.

> dispatcher_node does not have a body  until it is generated in
> cgraphunit.c, so cgraph does not mark this field before this is
> processed in cgraph_analyze_function.

That seems like something to address in your cgraph changes.

Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-10-31 14:27                                                                                         ` Jason Merrill
@ 2012-11-02  2:53                                                                                           ` Sriraman Tallam
  2012-11-06  2:38                                                                                             ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-11-02  2:53 UTC (permalink / raw)
  To: Jason Merrill; +Cc: gcc-patches List, Jan Hubicka, Diego Novillo

[-- Attachment #1: Type: text/plain, Size: 1905 bytes --]

Hi Jason,

   I have made all the changes you mentioned and attached the new
patch.  Summary of the important things changed:

* The versions are collected at declaration time itself now.
* extern "C" functions are disallowed from being versions for now.
extern "C" functions have to be handled exactly like how the C
front-end would handle versioned functions. I will do this when I get
to the C front-end.
* Finalizing cgraph nodes is removed from front-end code.


Thanks,
-Sri.


On Wed, Oct 31, 2012 at 7:02 AM, Jason Merrill <jason@redhat.com> wrote:
> On 10/30/2012 05:49 PM, Sriraman Tallam wrote:
>>
>> AFAIU, this should not be a problem. For duplicate declarations,
>> duplicate_decls should merge them and they should never be seen here.
>> Did I miss something?
>
>
> With extern "C" functions you can have multiple declarations of the same
> function in different namespaces that are not duplicates, but still match.
> And I can't think what that test is supposed to be catching, anyway.
>
>
>> No, I thought about this but I did not want to handle this case in
>> this iteration. The dispatcher is created only once and if more
>> functions are declared later, they will not be dispatched atleast in
>> this iteration.
>
>
> I still think that instead of collecting the set of functions in overload
> resolution, they should be collected at declaration time and added to a
> vector in the cgraph information for use when generating the body of the
> dispatcher.
>
>
>> You talked about doing the dispatcher
>> building later, but I did it here since I am doing it only once.
>
>
> I still don't think this is the right place for it.
>
>
>> dispatcher_node does not have a body  until it is generated in
>> cgraphunit.c, so cgraph does not mark this field before this is
>> processed in cgraph_analyze_function.
>
>
> That seems like something to address in your cgraph changes.
>
> Jason
>

[-- Attachment #2: mv_fe_patch_11012012.txt --]
[-- Type: text/plain, Size: 81690 bytes --]

Overview of the patch which adds support to specify function versions.  This is
only enabled for target i386.

Example:

int foo ();  /* Default version */
int foo () __attribute__ ((target("avx,popcnt")));/*Specialized for avx and popcnt */
int foo () __attribute__ ((target("arch=core2,ssse3")));/*Specialized for core2 and ssse3*/

int main ()
{
 int (*p)() = &foo;
 return foo () + (*p)();
}

int foo ()
{
 return 0;
}

int __attribute__ ((target("avx,popcnt")))
foo ()
{
 return 0;
}

int __attribute__ ((target("arch=core2,ssse3")))
foo ()
{
 return 0;
}

The above example has foo defined 3 times, but all 3 definitions of foo are
different versions of the same function. The call to foo in main, directly and
via a pointer, are calls to the multi-versioned function foo which is dispatched
to the right foo at run-time.

Front-end changes:

The front-end changes are calls at appropriate places to target hooks that
determine the following:

* Determine if two function decls with the same signature are versions.
* Determine the assembler name of a function version.
* Generate the dispatcher function for a set of function versions.
* Compare versions to see if one has a higher priority over the other.

All the implementation happens in the target-specific config/i386/i386.c.

What does the patch do?

* Tracking decls that correspond to function versions of function
name, say "foo":

When the front-end sees more than one decl for "foo", it calls a target hook to
determine if they are versions. To prevent duplicate definition errors with other
 versions of "foo", "decls_match" function in cp/decl.c is made to return false
 when 2 decls have are deemed versions by the target. This will make all function
versions of "foo" to be added to the overload list of "foo".

* Change the assembler names of the function versions.

For i386, the target changes the assembler names of the function versions by
 suffixing the sorted list of args to "target" to the function name of "foo". For
example, he assembler name of "void foo () __attribute__ ((target ("sse4")))" will
become _Z3foov.sse4.  The target hook mangle_decl_assembler_name is used for this.

* Overload resolution:

 Function "build_over_call" in cp/call.c sees a call to function
"foo", which is multi-versioned. The overload resolution happens in
function "joust" in "cp/call.c". Here, the call to "foo" has all
possible versions of "foo" as candidates. All the candidates of "foo" are
stored in the cgraph side data structure. Each version of foo is chained in a 
doubly-linked list with the default function as the first element.  This allows
any pass to access all the semantically identical versions. A call to a
multi-versioned function will be replaced by a call to a dispatcher function,
determined by a target hook, to execute the right function version at run-time.

Optimization to directly call a version when possible:
Also, in joust, where overload resolution happens, a multiversioned function
resolution is made to return the most specialized version.  This is the version
that will be checked for dispatching first and is determined by the target.
Now, if the caller can inline this function version then a direct call is made
to this function version rather than go through the dispatcher. When a direct
call cannot be made, a call to the dispatcher function is created.

* Creating the dispatcher body.

The dispatcher body, called the resolver is made only when there is a call to a
multiversioned function dispatcher or the address of a function is taken. This
is generated during cgraph_analyze_function. This is done by another target hook.

* Dispatch ordering.

The order in which the function versions are checked during dispatch is based
on a priority value assigned for the ISA that is catered. More specialized 
versions are checked for dispatching first.  This is to mitigate the ambiguity
that can arise when more than one function version is valid for execution on
a particular platform.  This is not a perfect solution, and in future the user
should be allowed to assign a dispatching priority value to each version.

Function MV in the Intel compiler:

The intel compiler supports function multiversioning and the syntax is
similar to the patch proposed here.  Here is an example of how to
generate multiple function versions with the intel compiler.

/* Create a stub function to specify the various versions of function that
   will be created, using declspec attribute cpu_dispatch.  */
__declspec (cpu_dispatch (core_i7_sse4_2, atom, generic))
void foo () {};

/* Bodies of each function version.  */

/* Intel Corei7 processor + SSE4.2 version.  */
__declspec (cpu_specific(core_i7_sse4_2))
void foo ()
{
  printf ("corei7 + sse4.2");
}

/* Atom processor.  */
__declspec (cpu_specific(atom))
void foo ()
{
  printf ("atom");
}

/* The generic or the default version.  */
__declspec (cpu_specific(generic))
void foo ()
{
  printf ("This is generic");
}

A new function version is generated by defining a new function with the same
signature but with a different cpu_specific declspec attribute string.  The
set of cpu_specific strings that are allowed is the following:

"core_2nd_gen_avx"
"core_aes_pclmulqdq"
"core_i7_sse4_2"
"core_2_duo_sse4_1"
"core_2_duo_ssse3"
"atom"
"pentium_4_sse3"
"pentium_4"
"pentium_m"
"pentium_iii"
"generic"

Comparison with the GCC MV implementation in this patch:

* Version creation syntax:

The implementation in this patch also has a similar syntax to specify function
versions. The first stub function is not needed.  Here is the code to generate
the function versions with this patch:

/* Intel Corei7 processor + SSE4.2 version.  */
__attribute__ ((target ("arch=corei7, sse4.2")))
void foo ()
{
  printf ("corei7 + sse4.2");
}

/* Atom processor.  */
__attribute__ ((target ("arch=atom")))
void foo ()
{
  printf ("atom");
}

void foo ()
{
}

The target attribute can have one of the following arch names:

"amd"
"intel"
"atom"
"core2"
"corei7"
"nehalem"
"westmere"
"sandybridge"
"amdfam10h"
"barcelona"
"shanghai"
"istanbul"
"amdfam15h"
"bdver1"
"bdver2"

and any number of the following ISA names:

"cmov"
"mmx"
"popcnt"
"sse"
"sse2"
"sse3"
"ssse3"
"sse4.1"
"sse4.2"
"avx"
"avx2"


	* doc/tm.texi.in (TARGET_OPTION_FUNCTION_VERSIONS): New hook description.
	* (TARGET_COMPARE_VERSION_PRIORITY): New hook description.
	* (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New hook description.
	* (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New hook description.
	* doc/tm.texi: Regenerate.
	* target.def (compare_version_priority): New target hook.
	* (generate_version_dispatcher_body): New target hook.
	* (get_function_versions_dispatcher): New target hook.
	* (function_versions): New target hook.
	* cgraph.c (cgraph_fnver_htab): New htab.
	(cgraph_fn_ver_htab_hash): New function.
	(cgraph_fn_ver_htab_eq): New function.
	(version_info_node): New pointer.
	(insert_new_cgraph_node_version): New function.
	(get_cgraph_node_version): New function.
	(delete_function_version): New function.
	(record_function_versions): New function.
	* cgraph.h (cgraph_function_version_info): New struct.
	(insert_new_cgraph_node_version): New function.
	(get_cgraph_node_version): New function.
	(delete_function_version): New function.
	(record_function_versions): New function.
	(cgraph_node): New bitfield dispatcher_function.
	(init_lowered_empty_function): Expose function.
	* tree.h (DECL_FUNCTION_VERSIONED): New macro.
	(tree_function_decl): New bit-field versioned_function.
	* cgraphunit.c (cgraph_analyze_function): Generate body of multiversion
	function dispatcher.
	(cgraph_analyze_functions): Analyze dispatcher function.
	(init_lowered_empty_function): Make non-static. New parameter in_ssa.
	Change edge flag to EDGE_FALLTHRU.
	(assemble_thunk): Add parameter to call to init_lowered_empty_function.
	* cp/class.c:
	(add_method): Change assembler names of function versions.
	(resolve_address_of_overloaded_function):Create dispatcher decl and
	return address of dispatcher instead.
	* cp/decl.c (decls_match): Make decls unmatched for versioned
	functions.
	(duplicate_decls): Remove ambiguity for versioned functions. 
	* cp/decl2.c:(check_classfn): Check attributes of versioned functions
	for match.
	* cp/call.c: (build_over_call): Make calls to multiversioned functions
	to call the dispatcher.
	(joust): For calls to multi-versioned functions, make the most
	specialized function version win.
	(get_function_version_dispatcher): New function.
	* Makefile.in: Add multiversion.o
	* config/i386/i386.c (add_condition_to_bb): New function.
	(get_builtin_code_for_version): New function.
	(ix86_compare_version_priority): New function.
	(feature_compare): New function.
	(dispatch_function_versions): New function.
	(ix86_function_versions): New function.
	(attr_strcmp): New function.
	(sorted_attr_string): New function.
	(ix86_mangle_function_version_assembler_name): New function.
	(ix86_mangle_decl_assembler_name): New function.
	(make_name): New function.
	(make_dispatcher_decl): New function.
	(is_function_default_version): New function.
	(ix86_get_function_versions_dispatcher): New function.
	(make_attribute): New function.
	(make_resolver_func): New function.
	(ix86_generate_version_dispatcher_body): New function.
	(TARGET_COMPARE_VERSION_PRIORITY): New macro.
	(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New macro.
	(TARGET_GENERATE_VERSION_DISPATCHER_BODY): New macro.
	(TARGET_OPTION_FUNCTION_VERSIONS): New macro.
	(TARGET_MANGLE_DECL_ASSEMBLER_NAME): New macro.
	* testsuite/g++.dg/mv1.C: New test.
	* testsuite/g++.dg/mv2.C: New test.
	* testsuite/g++.dg/mv3.C: New test.
	* testsuite/g++.dg/mv4.C: New test.

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 192968)
+++ gcc/doc/tm.texi	(working copy)
@@ -9929,6 +9929,14 @@ changed via the optimize attribute or pragma, see
 @code{TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE}
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_OPTION_FUNCTION_VERSIONS (tree @var{decl1}, tree @var{decl2})
+This target hook returns @code{true} if @var{DECL1} and @var{DECL2} are
+versions of the same function.  @var{DECL1} and @var{DECL2} are function
+versions if and only if they have the same function signature and
+different target specific attributes, that is, they are compiled for
+different target machines.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_CAN_INLINE_P (tree @var{caller}, tree @var{callee})
 This target hook returns @code{false} if the @var{caller} function
 cannot inline @var{callee}, based on target specific information.  By
@@ -10952,6 +10960,29 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_COMPARE_VERSION_PRIORITY (tree @var{decl1}, tree @var{decl2})
+This hook is used to compare the target attributes in two functions to
+determine which function's features get higher priority.  This is used
+during function multi-versioning to figure out the order in which two
+versions must be dispatched.  A function version with a higher priority
+is checked for dispatching earlier.  @var{decl1} and @var{decl2} are
+ the two function decls that will be compared.
+@end deftypefn
+
+@deftypefn {Target Hook} tree TARGET_GET_FUNCTION_VERSIONS_DISPATCHER (void *@var{decl})
+This hook is used to get the dispatcher function for a set of function
+versions.  The dispatcher function is called to invoke the right function
+version at run-time. @var{decl} is one version from a set of semantically
+identical versions.
+@end deftypefn
+
+@deftypefn {Target Hook} tree TARGET_GENERATE_VERSION_DISPATCHER_BODY (void *@var{arg})
+This hook is used to generate the dispatcher logic to invoke the right
+function version at run-time for a given set of function versions.
+@var{arg} points to the callgraph node of the dispatcher function whose
+body must be generated.
+@end deftypefn
+
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 192968)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -9790,6 +9790,14 @@ changed via the optimize attribute or pragma, see
 @code{TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE}
 @end deftypefn
 
+@hook TARGET_OPTION_FUNCTION_VERSIONS
+This target hook returns @code{true} if @var{DECL1} and @var{DECL2} are
+versions of the same function.  @var{DECL1} and @var{DECL2} are function
+versions if and only if they have the same function signature and
+different target specific attributes, that is, they are compiled for
+different target machines.
+@end deftypefn
+
 @hook TARGET_CAN_INLINE_P
 This target hook returns @code{false} if the @var{caller} function
 cannot inline @var{callee}, based on target specific information.  By
@@ -10798,6 +10806,29 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@hook TARGET_COMPARE_VERSION_PRIORITY
+This hook is used to compare the target attributes in two functions to
+determine which function's features get higher priority.  This is used
+during function multi-versioning to figure out the order in which two
+versions must be dispatched.  A function version with a higher priority
+is checked for dispatching earlier.  @var{decl1} and @var{decl2} are
+ the two function decls that will be compared.
+@end deftypefn
+
+@hook TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+This hook is used to get the dispatcher function for a set of function
+versions.  The dispatcher function is called to invoke the right function
+version at run-time. @var{decl} is one version from a set of semantically
+identical versions.
+@end deftypefn
+
+@hook TARGET_GENERATE_VERSION_DISPATCHER_BODY
+This hook is used to generate the dispatcher logic to invoke the right
+function version at run-time for a given set of function versions.
+@var{arg} points to the callgraph node of the dispatcher function whose
+body must be generated.
+@end deftypefn
+
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 192968)
+++ gcc/target.def	(working copy)
@@ -1298,6 +1298,37 @@ DEFHOOK
  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
  hook_tree_tree_int_treep_bool_null)
 
+/* Target hook is used to compare the target attributes in two functions to
+   determine which function's features get higher priority.  This is used
+   during function multi-versioning to figure out the order in which two
+   versions must be dispatched.  A function version with a higher priority
+   is checked for dispatching earlier.  DECL1 and DECL2 are
+   the two function decls that will be compared. It returns positive value
+   if DECL1 is higher priority,  negative value if DECL2 is higher priority
+   and 0 if they are the same. */
+DEFHOOK
+(compare_version_priority,
+ "",
+ int, (tree decl1, tree decl2), NULL)
+
+/*  Target hook is used to generate the dispatcher logic to invoke the right
+    function version at run-time for a given set of function versions.
+    ARG points to the callgraph node of the dispatcher function whose body
+    must be generated.  */
+DEFHOOK
+(generate_version_dispatcher_body,
+ "",
+ tree, (void *arg), NULL) 
+
+/* Target hook is used to get the dispatcher function for a set of function
+   versions.  The dispatcher function is called to invoke the right function
+   version at run-time.  DECL is one version from a set of semantically
+   identical versions.  */
+DEFHOOK
+(get_function_versions_dispatcher,
+ "",
+ tree, (void *decl), NULL)
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 DEFHOOK
@@ -2774,6 +2805,16 @@ DEFHOOK
  void, (void),
  hook_void_void)
 
+/* This function returns true if DECL1 and DECL2 are versions of the same
+   function.  DECL1 and DECL2 are function versions if and only if they
+   have the same function signature and different target specific attributes,
+   that is, they are compiled for different target machines.  */
+DEFHOOK
+(function_versions,
+ "",
+ bool, (tree decl1, tree decl2),
+ hook_bool_tree_tree_false)
+
 /* Function to determine if one function can inline another function.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_"
Index: gcc/cgraph.c
===================================================================
--- gcc/cgraph.c	(revision 192968)
+++ gcc/cgraph.c	(working copy)
@@ -132,6 +132,144 @@ static GTY(()) struct cgraph_edge *free_edges;
 /* Did procss_same_body_aliases run?  */
 bool same_body_aliases_done;
 
+/* Map a cgraph_node to cgraph_function_version_info using this htab.
+   The cgraph_function_version_info has a THIS_NODE field that is the
+   corresponding cgraph_node..  */
+
+static htab_t GTY((param_is (struct cgraph_function_version_info *)))
+  cgraph_fnver_htab = NULL;
+
+/* Hash function for cgraph_fnver_htab.  */
+static hashval_t
+cgraph_fnver_htab_hash (const void *ptr)
+{
+  int uid = ((const struct cgraph_function_version_info *)ptr)->this_node->uid;
+  return (hashval_t)(uid);
+}
+
+/* eq function for cgraph_fnver_htab.  */
+static int
+cgraph_fnver_htab_eq (const void *p1, const void *p2)
+{
+  const struct cgraph_function_version_info *n1
+    = (const struct cgraph_function_version_info *)p1;
+  const struct cgraph_function_version_info *n2
+    = (const struct cgraph_function_version_info *)p2;
+
+  return n1->this_node->uid == n2->this_node->uid;
+}
+
+/* Mark as GC root all allocated nodes.  */
+static GTY(()) struct cgraph_function_version_info *
+  version_info_node = NULL;
+
+/* Get the cgraph_function_version_info node corresponding to node.  */
+struct cgraph_function_version_info *
+get_cgraph_node_version (struct cgraph_node *node)
+{
+  struct cgraph_function_version_info *ret;
+  struct cgraph_function_version_info key;
+  key.this_node = node;
+
+  if (cgraph_fnver_htab == NULL)
+    return NULL;
+
+  ret = (struct cgraph_function_version_info *)
+    htab_find (cgraph_fnver_htab, &key);
+
+  return ret;
+}
+
+/* Insert a new cgraph_function_version_info node into cgraph_fnver_htab
+   corresponding to cgraph_node NODE.  */
+struct cgraph_function_version_info *
+insert_new_cgraph_node_version (struct cgraph_node *node)
+{
+  void **slot;
+  
+  version_info_node = NULL;
+  version_info_node = ggc_alloc_cleared_cgraph_function_version_info ();
+  version_info_node->this_node = node;
+
+  if (cgraph_fnver_htab == NULL)
+    cgraph_fnver_htab = htab_create_ggc (2, cgraph_fnver_htab_hash,
+				         cgraph_fnver_htab_eq, NULL);
+
+  slot = htab_find_slot (cgraph_fnver_htab, version_info_node, INSERT);
+  gcc_assert (slot != NULL);
+  *slot = version_info_node;
+  return version_info_node;
+}
+
+/* Remove the cgraph_function_version_info and cgraph_node for DECL.  This
+   DECL is a duplicate declaration.  */
+void
+delete_function_version (tree decl)
+{
+  struct cgraph_node *decl_node = cgraph_get_create_node (decl);
+  struct cgraph_function_version_info *decl_v = NULL;
+
+  if (decl_node == NULL)
+    return;
+
+  decl_v = get_cgraph_node_version (decl_node);
+
+  if (decl_v == NULL)
+    return;
+
+  if (decl_v->prev != NULL)
+   decl_v->prev->next = decl_v->next;
+
+  if (decl_v->next != NULL)
+    decl_v->next->prev = decl_v->prev;
+
+  if (cgraph_fnver_htab != NULL)
+    htab_remove_elt (cgraph_fnver_htab, decl_v);
+
+  cgraph_remove_node (decl_node);
+}
+
+/* Record that DECL1 and DECL2 are semantically identical function
+   versions.  */
+void
+record_function_versions (tree decl1, tree decl2)
+{
+  struct cgraph_node *decl1_node = cgraph_get_create_node (decl1);
+  struct cgraph_node *decl2_node = cgraph_get_create_node (decl2);
+  struct cgraph_function_version_info *decl1_v = NULL;
+  struct cgraph_function_version_info *decl2_v = NULL;
+  struct cgraph_function_version_info *before;
+  struct cgraph_function_version_info *after;
+
+  gcc_assert (decl1_node != NULL && decl2_node != NULL);
+  decl1_v = get_cgraph_node_version (decl1_node);
+  decl2_v = get_cgraph_node_version (decl2_node);
+
+  if (decl1_v != NULL && decl2_v != NULL)
+    return;
+
+  if (decl1_v == NULL)
+    decl1_v = insert_new_cgraph_node_version (decl1_node);
+
+  if (decl2_v == NULL)
+    decl2_v = insert_new_cgraph_node_version (decl2_node);
+
+  /* Chain decl2_v and decl1_v.  All semantically identical versions
+     will be chained together.  */
+
+  before = decl1_v;
+  after = decl2_v;
+
+  while (before->next != NULL)
+    before = before->next;
+
+  while (after->prev != NULL)
+    after= after->prev;
+
+  before->next = after;
+  after->prev = before;
+}
+
 /* Macros to access the next item in the list of free cgraph nodes and
    edges. */
 #define NEXT_FREE_NODE(NODE) cgraph ((NODE)->symbol.next)
Index: gcc/cgraph.h
===================================================================
--- gcc/cgraph.h	(revision 192968)
+++ gcc/cgraph.h	(working copy)
@@ -279,6 +279,8 @@ struct GTY(()) cgraph_node {
   /* ?? We should be able to remove this.  We have enough bits in
      cgraph to calculate it.  */
   unsigned tm_clone : 1;
+  /* True if this decl is a dispatcher for function versions.  */
+  unsigned dispatcher_function : 1;
 };
 
 DEF_VEC_P(symtab_node);
@@ -291,6 +293,47 @@ DEF_VEC_P(cgraph_node_ptr);
 DEF_VEC_ALLOC_P(cgraph_node_ptr,heap);
 DEF_VEC_ALLOC_P(cgraph_node_ptr,gc);
 
+/* Function Multiversioning info.  */
+struct GTY(()) cgraph_function_version_info {
+  /* The cgraph_node for which the function version info is stored.  */
+  struct cgraph_node *this_node;
+  /* Chains all the semantically identical function versions.  The
+     first function in this chain is the version_info node of the
+     default function.  */
+  struct cgraph_function_version_info *prev;
+  /* If this version node corresponds to a dispatcher for function
+     versions, this points to the version info node of the default
+     function, the first node in the chain.  */
+  struct cgraph_function_version_info *next;
+  /* If this node corresponds to a function version, this points
+     to the dispatcher function decl, which is the function that must
+     be called to execute the right function version at run-time.
+
+     If this cgraph node is a dispatcher (if dispatcher_function is
+     true, in the cgraph_node struct) for function versions, this
+     points to resolver function, which holds the function body of the
+     dispatcher. The dispatcher decl is an alias to the resolver
+     function decl.  */
+  tree dispatcher_resolver;
+};
+
+/* Get the cgraph_function_version_info node corresponding to node.  */
+struct cgraph_function_version_info *
+  get_cgraph_node_version (struct cgraph_node *node);
+
+/* Insert a new cgraph_function_version_info node into cgraph_fnver_htab
+   corresponding to cgraph_node NODE.  */
+struct cgraph_function_version_info *
+  insert_new_cgraph_node_version (struct cgraph_node *node);
+
+/* Record that DECL1 and DECL2 are semantically identical function
+   versions.  */
+void record_function_versions (tree decl1, tree decl2);
+
+/* Remove the cgraph_function_version_info and cgraph_node for DECL.  This
+   DECL is a duplicate declaration.  */
+void delete_function_version (tree decl);
+
 /* A cgraph node set is a collection of cgraph nodes.  A cgraph node
    can appear in multiple sets.  */
 struct cgraph_node_set_def
@@ -617,6 +660,9 @@ void init_cgraph (void);
 bool cgraph_process_new_functions (void);
 void cgraph_process_same_body_aliases (void);
 void fixup_same_cpp_alias_visibility (symtab_node node, symtab_node target, tree alias);
+/*  Initialize datastructures so DECL is a function in lowered gimple form.
+    IN_SSA is true if the gimple is in SSA.  */
+basic_block init_lowered_empty_function (tree decl, bool in_ssa);
 
 /* In cgraphclones.c  */
 
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 192968)
+++ gcc/tree.h	(working copy)
@@ -3480,6 +3480,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
 #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
 
+/* In FUNCTION_DECL, this is set if this function has other versions generated
+   using "target" attributes.  The default version is the one which does not
+   have any "target" attribute set. */
+#define DECL_FUNCTION_VERSIONED(NODE)\
+   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
+
 /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
    arguments/result/saved_tree fields by front ends.   It was either inherit
    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
@@ -3524,8 +3530,8 @@ struct GTY(()) tree_function_decl {
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
-
-  /* 1 bit left */
+  unsigned versioned_function : 1;
+  /* No bits left.  */
 };
 
 /* The source language of the translation-unit.  */
Index: gcc/cgraphunit.c
===================================================================
--- gcc/cgraphunit.c	(revision 192968)
+++ gcc/cgraphunit.c	(working copy)
@@ -629,6 +629,21 @@ cgraph_analyze_function (struct cgraph_node *node)
       cgraph_create_edge (node, cgraph_get_node (node->thunk.alias),
 			  NULL, 0, CGRAPH_FREQ_BASE);
     }
+  else if (node->dispatcher_function)
+    {
+      /* Generate the dispatcher body of multi-versioned functions.  */
+      struct cgraph_function_version_info *dispatcher_version_info
+	= get_cgraph_node_version (node);
+      if (dispatcher_version_info != NULL
+          && (dispatcher_version_info->dispatcher_resolver
+	      == NULL_TREE))
+	{
+	  tree resolver = NULL_TREE;
+	  gcc_assert (targetm.generate_version_dispatcher_body);
+	  resolver = targetm.generate_version_dispatcher_body (node);
+	  gcc_assert (resolver != NULL_TREE);
+	}
+    }
   else
     {
       push_cfun (DECL_STRUCT_FUNCTION (decl));
@@ -917,7 +932,8 @@ cgraph_analyze_functions (void)
 		 gcc.c-torture/compile/20011119-1.c  */
 	      if (!DECL_STRUCT_FUNCTION (decl)
 		  && (!cnode->alias || !cnode->thunk.alias)
-		  && !cnode->thunk.thunk_p)
+		  && !cnode->thunk.thunk_p
+		  && !cnode->dispatcher_function)
 		{
 		  cgraph_reset_node (cnode);
 		  cnode->local.redefined_extern_inline = true;
@@ -1198,13 +1214,13 @@ mark_functions_to_output (void)
 }
 
 /* DECL is FUNCTION_DECL.  Initialize datastructures so DECL is a function
-   in lowered gimple form.
+   in lowered gimple form.  IN_SSA is true if the gimple is in SSA.
    
    Set current_function_decl and cfun to newly constructed empty function body.
    return basic block in the function body.  */
 
-static basic_block
-init_lowered_empty_function (tree decl)
+basic_block
+init_lowered_empty_function (tree decl, bool in_ssa)
 {
   basic_block bb;
 
@@ -1212,9 +1228,14 @@ mark_functions_to_output (void)
   allocate_struct_function (decl, false);
   gimple_register_cfg_hooks ();
   init_empty_tree_cfg ();
-  init_tree_ssa (cfun);
-  init_ssa_operands (cfun);
-  cfun->gimple_df->in_ssa_p = true;
+
+  if (in_ssa)
+    {
+      init_tree_ssa (cfun);
+      init_ssa_operands (cfun);
+      cfun->gimple_df->in_ssa_p = true;
+    }
+
   DECL_INITIAL (decl) = make_node (BLOCK);
 
   DECL_SAVED_TREE (decl) = error_mark_node;
@@ -1223,7 +1244,7 @@ mark_functions_to_output (void)
 
   /* Create BB for body of the function and connect it properly.  */
   bb = create_basic_block (NULL, (void *) 0, ENTRY_BLOCK_PTR);
-  make_edge (ENTRY_BLOCK_PTR, bb, 0);
+  make_edge (ENTRY_BLOCK_PTR, bb, EDGE_FALLTHRU);
   make_edge (bb, EXIT_BLOCK_PTR, 0);
 
   return bb;
@@ -1421,7 +1442,7 @@ assemble_thunk (struct cgraph_node *node)
       else
 	resdecl = DECL_RESULT (thunk_fndecl);
 
-      bb = then_bb = else_bb = return_bb = init_lowered_empty_function (thunk_fndecl);
+      bb = then_bb = else_bb = return_bb = init_lowered_empty_function (thunk_fndecl, true);
 
       bsi = gsi_start_bb (bb);
 
Index: gcc/cp/class.c
===================================================================
--- gcc/cp/class.c	(revision 192968)
+++ gcc/cp/class.c	(working copy)
@@ -1087,6 +1087,35 @@ add_method (tree type, tree method, tree using_dec
 	      || same_type_p (TREE_TYPE (fn_type),
 			      TREE_TYPE (method_type))))
 	{
+	  /* For function versions, their parms and types match
+	     but they are not duplicates.  Record function versions
+	     as and when they are found.  extern "C" functions are
+	     not treated as versions.  */
+	  if (TREE_CODE (fn) == FUNCTION_DECL
+	      && TREE_CODE (method) == FUNCTION_DECL
+	      && !DECL_EXTERN_C_P (fn)
+	      && !DECL_EXTERN_C_P (method)
+	      && (DECL_FUNCTION_SPECIFIC_TARGET (fn)
+		  || DECL_FUNCTION_SPECIFIC_TARGET (method))
+	      && targetm.target_option.function_versions (fn, method))
+ 	    {
+	      /* Mark functions as versions if necessary.  Modify the mangled
+		 decl name if necessary.  */
+	      if (!DECL_FUNCTION_VERSIONED (fn))
+		{
+		  DECL_FUNCTION_VERSIONED (fn) = 1;
+		  if (DECL_ASSEMBLER_NAME_SET_P (fn))
+		    mangle_decl (fn);
+		}
+	      if (!DECL_FUNCTION_VERSIONED (method))
+		{
+		  DECL_FUNCTION_VERSIONED (method) = 1;
+		  if (DECL_ASSEMBLER_NAME_SET_P (method))
+		    mangle_decl (method);
+		}
+	      record_function_versions (fn, method);
+	      continue;
+	    }
 	  if (DECL_INHERITED_CTOR_BASE (method))
 	    {
 	      if (DECL_INHERITED_CTOR_BASE (fn))
@@ -7162,13 +7191,27 @@ resolve_address_of_overloaded_function (tree targe
     {
       /* There were too many matches.  First check if they're all
 	 the same function.  */
-      tree match;
+      tree match = NULL_TREE;
 
       fn = TREE_PURPOSE (matches);
-      for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
-	if (!decls_match (fn, TREE_PURPOSE (match)))
-	  break;
 
+      /* For multi-versioned functions, more than one match is just fine.
+	 Call decls_match to make sure they are different because they are
+	 versioned.  */
+      if (DECL_FUNCTION_VERSIONED (fn))
+	{
+          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+  	    if (!DECL_FUNCTION_VERSIONED (TREE_PURPOSE (match))
+	        || decls_match (fn, TREE_PURPOSE (match)))
+	      break;
+	}
+      else
+	{
+          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+  	    if (!decls_match (fn, TREE_PURPOSE (match)))
+	      break;
+	}
+
       if (match)
 	{
 	  if (flags & tf_error)
@@ -7230,6 +7273,29 @@ resolve_address_of_overloaded_function (tree targe
       perform_or_defer_access_check (access_path, fn, fn, flags);
     }
 
+  /* If a pointer to a function that is multi-versioned is requested, the
+     pointer to the dispatcher function is returned instead.  This works
+     well because indirectly calling the function will dispatch the right
+     function version at run-time.  */
+  if (DECL_FUNCTION_VERSIONED (fn))
+    {
+      struct cgraph_node *node = NULL;
+      tree dispatcher_decl = NULL;
+      gcc_assert (targetm.get_function_versions_dispatcher);
+      dispatcher_decl = targetm.get_function_versions_dispatcher (fn);
+      if (!dispatcher_decl)
+	{
+	  error_at (input_location, "Pointer to a multiversioned function"
+		    " without a default is not allowed");
+	  return error_mark_node;
+	}
+      retrofit_lang_decl (dispatcher_decl);
+      mark_used (fn);
+      node = cgraph_get_create_node (dispatcher_decl);
+      gcc_assert (node != NULL);
+      fn = dispatcher_decl;
+    }
+
   if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type))
     return cp_build_addr_expr (fn, flags);
   else
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	(revision 192968)
+++ gcc/cp/decl.c	(working copy)
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "pointer-set.h"
 #include "splay-tree.h"
 #include "plugin.h"
+#include "cgraph.h"
 
 /* Possible cases of bad specifiers type used by bad_specifiers. */
 enum bad_spec_place {
@@ -981,6 +982,36 @@ decls_match (tree newdecl, tree olddecl)
       if (t1 != t2)
 	return 0;
 
+      /* The decls dont match if they correspond to two different versions
+	 of the same function.   Disallow extern "C" functions to be
+	 versions for now.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2))
+	  && !DECL_EXTERN_C_P (newdecl)
+	  && !DECL_EXTERN_C_P (olddecl)
+	  && targetm.target_option.function_versions (newdecl, olddecl))
+	{
+	  /* Mark functions as versions if necessary.  Modify the mangled decl
+	     name if necessary.  */
+	  if (DECL_FUNCTION_VERSIONED (newdecl)
+	      && DECL_FUNCTION_VERSIONED (olddecl))
+	    return 0;
+	  if (!DECL_FUNCTION_VERSIONED (newdecl))
+	    {
+	      DECL_FUNCTION_VERSIONED (newdecl) = 1;
+	      if (DECL_ASSEMBLER_NAME_SET_P (newdecl))
+	        mangle_decl (newdecl);
+	    }
+	  if (!DECL_FUNCTION_VERSIONED (olddecl))
+	    {
+	      DECL_FUNCTION_VERSIONED (olddecl) = 1;
+	      if (DECL_ASSEMBLER_NAME_SET_P (olddecl))
+	       mangle_decl (olddecl);
+	    }
+	  record_function_versions (olddecl, newdecl);
+	  return 0;
+	}
+
       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
 	  && ! (DECL_EXTERN_C_P (newdecl)
 		&& DECL_EXTERN_C_P (olddecl)))
@@ -1499,7 +1530,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
 	      error ("previous declaration %q+#D here", olddecl);
 	      return NULL_TREE;
 	    }
-	  else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
+	  /* For function versions, params and types match, but they
+	     are not ambiguous.  */
+	  else if ((!DECL_FUNCTION_VERSIONED (newdecl)
+		    && !DECL_FUNCTION_VERSIONED (olddecl))
+		   && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
 			      TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
 	    {
 	      error ("new declaration %q#D", newdecl);
@@ -2272,6 +2307,15 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
+  /* If the olddecl is a version, so is the newdecl.  */
+  if (TREE_CODE (newdecl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (olddecl))
+    {
+      DECL_FUNCTION_VERSIONED (newdecl) = 1;
+      /* newdecl will be purged and is no longer a version.  */
+      delete_function_version (newdecl);
+    }
+
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
     {
       int function_size;
Index: gcc/cp/decl2.c
===================================================================
--- gcc/cp/decl2.c	(revision 192968)
+++ gcc/cp/decl2.c	(working copy)
@@ -674,9 +674,13 @@ check_classfn (tree ctype, tree function, tree tem
 	  if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
 	    continue;
 
+	  /* While finding a match, same types and params are not enough
+	     if the function is versioned.  Also check version ("target")
+	     attributes.  */
 	  if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
 			   TREE_TYPE (TREE_TYPE (fndecl)))
 	      && compparms (p1, p2)
+	      && !targetm.target_option.function_versions (function, fndecl)
 	      && (!is_template
 		  || comp_template_parms (template_parms,
 					  DECL_TEMPLATE_PARMS (fndecl)))
Index: gcc/cp/call.c
===================================================================
--- gcc/cp/call.c	(revision 192968)
+++ gcc/cp/call.c	(working copy)
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "timevar.h"
+#include "cgraph.h"
 
 /* The various kinds of conversion.  */
 
@@ -6514,6 +6515,31 @@ magic_varargs_p (tree fn)
   return false;
 }
 
+/* Returns the decl of the dispatcher function if FN is a function version.  */
+
+static tree
+get_function_version_dispatcher (tree fn)
+{
+  tree dispatcher_decl = NULL;
+
+  gcc_assert (TREE_CODE (fn) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (fn));
+
+  gcc_assert (targetm.get_function_versions_dispatcher);
+  dispatcher_decl = targetm.get_function_versions_dispatcher (fn);
+
+  if (dispatcher_decl == NULL)
+    {
+      error_at (input_location, "Call to multiversioned function"
+                " without a default is not allowed");
+      return NULL;
+    }
+
+  retrofit_lang_decl (dispatcher_decl);
+  gcc_assert (dispatcher_decl != NULL);
+  return dispatcher_decl;
+}
+
 /* Subroutine of the various build_*_call functions.  Overload resolution
    has chosen a winning candidate CAND; build up a CALL_EXPR accordingly.
    ARGS is a TREE_LIST of the unconverted arguments to the call.  FLAGS is a
@@ -6966,6 +6992,23 @@ build_over_call (struct z_candidate *cand, int fla
   if (!already_used)
     mark_used (fn);
 
+  /* For calls to a multi-versioned function, overload resolution
+     returns the function with the highest target priority, that is,
+     the version that will checked for dispatching first.  If this
+     version is inlinable, a direct call to this version can be made
+     otherwise the call should go through the dispatcher.  */
+
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && !targetm.target_option.can_inline_p (current_function_decl, fn))
+    {
+      struct cgraph_node *dispatcher_node = NULL;
+      fn = get_function_version_dispatcher (fn);
+      if (fn == NULL)
+	return NULL;
+      dispatcher_node = cgraph_get_create_node (fn);
+      gcc_assert (dispatcher_node != NULL);
+    }
+
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0)
     {
       tree t;
@@ -8481,6 +8524,38 @@ joust (struct z_candidate *cand1, struct z_candida
 	}
     }
 
+  /* For candidates of a multi-versioned function,  make the version with
+     the highest priority win.  This version will be checked for dispatching
+     first.  If this version can be inlined into the caller, the front-end
+     will simply make a direct call to this function.  */
+
+  if (TREE_CODE (cand1->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (cand1->fn)
+      && TREE_CODE (cand2->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (cand2->fn))
+    {
+      tree f1 = TREE_TYPE (cand1->fn);
+      tree f2 = TREE_TYPE (cand2->fn);
+      tree p1 = TYPE_ARG_TYPES (f1);
+      tree p2 = TYPE_ARG_TYPES (f2);
+     
+      /* Check if cand1->fn and cand2->fn are versions of the same function.  It
+         is possible that cand1->fn and cand2->fn are function versions but of
+         different functions.  Check types to see if they are versions of the same
+         function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)))
+	{
+	  /* Always make the version with the higher priority, more
+	     specialized, win.  */
+	  gcc_assert (targetm.compare_version_priority);
+	  if (targetm.compare_version_priority (cand1->fn, cand2->fn) >= 0)
+	    return 1;
+	  else
+	    return -1;
+	}
+    }
+
   /* If the two function declarations represent the same function (this can
      happen with declarations in multiple scopes and arg-dependent lookup),
      arbitrarily choose one.  But first make sure the default args we're
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 192968)
+++ gcc/config/i386/i386.c	(working copy)
@@ -62,6 +62,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "diagnostic.h"
 #include "dumpfile.h"
+#include "tree-pass.h"
+#include "tree-flow.h"
 
 enum upper_128bits_state
 {
@@ -28470,6 +28472,981 @@ ix86_init_mmx_sse_builtins (void)
     }
 }
 
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
+   to return a pointer to VERSION_DECL if the outcome of the expression
+   formed by PREDICATE_CHAIN is true.  This function will be called during
+   version dispatch to decide which function version to execute.  It returns
+   the basic block at the end, to which more conditions can be added.  */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     tree predicate_chain, basic_block new_bb)
+{
+  gimple return_stmt;
+  tree convert_expr, result_var;
+  gimple convert_stmt;
+  gimple call_cond_stmt;
+  gimple if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+
+  tree cond_var, and_expr_var = NULL_TREE;
+  gimple_seq gseq;
+
+  tree predicate_decl, predicate_arg;
+
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node, NULL);
+  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  return_stmt = gimple_build_return (result_var);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      return new_bb;
+    }
+
+  while (predicate_chain != NULL)
+    {
+      cond_var = create_tmp_var (integer_type_node, NULL);
+      predicate_decl = TREE_PURPOSE (predicate_chain);
+      predicate_arg = TREE_VALUE (predicate_chain);
+      call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+      gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+      gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+      gimple_set_bb (call_cond_stmt, new_bb);
+      gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+      predicate_chain = TREE_CHAIN (predicate_chain);
+      
+      if (and_expr_var == NULL)
+        and_expr_var = cond_var;
+      else
+	{
+	  gimple assign_stmt;
+	  /* Use MIN_EXPR to check if any integer is zero?.
+	     and_expr_var = min_expr <cond_var, and_expr_var>  */
+	  assign_stmt = gimple_build_assign (and_expr_var,
+			  build2 (MIN_EXPR, integer_type_node,
+				  cond_var, and_expr_var));
+
+	  gimple_set_block (assign_stmt, DECL_INITIAL (function_decl));
+	  gimple_set_bb (assign_stmt, new_bb);
+	  gimple_seq_add_stmt (&gseq, assign_stmt);
+	}
+    }
+
+  if_else_stmt = gimple_build_cond (GT_EXPR, and_expr_var,
+	  		            integer_zero_node,
+				    NULL_TREE, NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR, 0);
+
+  pop_cfun ();
+
+  return bb3;
+}
+
+/* This parses the attribute arguments to target in DECL and determines
+   the right builtin to use to match the platform specification.
+   It returns the priority value for this version decl.  If PREDICATE_LIST
+   is not NULL, it stores the list of cpu features that need to be checked
+   before dispatching this function.  */
+
+static unsigned int
+get_builtin_code_for_version (tree decl, tree *predicate_list)
+{
+  tree attrs;
+  struct cl_target_option cur_target;
+  tree target_node;
+  struct cl_target_option *new_target;
+  const char *arg_str = NULL;
+  const char *attrs_str = NULL;
+  char *tok_str = NULL;
+  char *token;
+
+  /* Priority of i386 features, greater value is higher priority.   This is
+     used to decide the order in which function dispatch must happen.  For
+     instance, a version specialized for SSE4.2 should be checked for dispatch
+     before a version for SSE3, as SSE4.2 implies SSE3.  */
+  enum feature_priority
+  {
+    P_ZERO = 0,
+    P_MMX,
+    P_SSE,
+    P_SSE2,
+    P_SSE3,
+    P_SSSE3,
+    P_PROC_SSSE3,
+    P_SSE4_a,
+    P_PROC_SSE4_a,
+    P_SSE4_1,
+    P_SSE4_2,
+    P_PROC_SSE4_2,
+    P_POPCNT,
+    P_AVX,
+    P_AVX2,
+    P_FMA,
+    P_PROC_FMA
+  };
+
+ enum feature_priority priority = P_ZERO;
+
+  /* These are the target attribute strings for which a dispatcher is
+     available, from fold_builtin_cpu.  */
+
+  static struct _feature_list
+    {
+      const char *const name;
+      const enum feature_priority priority;
+    }
+  const feature_list[] =
+    {
+      {"mmx", P_MMX},
+      {"sse", P_SSE},
+      {"sse2", P_SSE2},
+      {"sse3", P_SSE3},
+      {"ssse3", P_SSSE3},
+      {"sse4.1", P_SSE4_1},
+      {"sse4.2", P_SSE4_2},
+      {"popcnt", P_POPCNT},
+      {"avx", P_AVX},
+      {"avx2", P_AVX2}
+    };
+
+
+  static unsigned int NUM_FEATURES
+    = sizeof (feature_list) / sizeof (struct _feature_list);
+
+  unsigned int i;
+
+  tree predicate_chain = NULL_TREE;
+  tree predicate_decl, predicate_arg;
+
+  attrs = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  gcc_assert (attrs != NULL);
+
+  attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+  gcc_assert (TREE_CODE (attrs) == STRING_CST);
+  attrs_str = TREE_STRING_POINTER (attrs);
+
+
+  /* Handle arch= if specified.  For priority, set it to be 1 more than
+     the best instruction set the processor can handle.  For instance, if
+     there is a version for atom and a version for ssse3 (the highest ISA
+     priority for atom), the atom version must be checked for dispatch
+     before the ssse3 version. */
+  if (strstr (attrs_str, "arch=") != NULL)
+    {
+      cl_target_option_save (&cur_target, &global_options);
+      target_node = ix86_valid_target_attribute_tree (attrs);
+    
+      gcc_assert (target_node);
+      new_target = TREE_TARGET_OPTION (target_node);
+      gcc_assert (new_target);
+      
+      if (new_target->arch_specified && new_target->arch > 0)
+	{
+	  switch (new_target->arch)
+	    {
+	    case PROCESSOR_CORE2_32:
+	    case PROCESSOR_CORE2_64:
+	      arg_str = "core2";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_COREI7_32:
+	    case PROCESSOR_COREI7_64:
+	      arg_str = "corei7";
+	      priority = P_PROC_SSE4_2;
+	      break;
+	    case PROCESSOR_ATOM:
+	      arg_str = "atom";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_AMDFAM10:
+	      arg_str = "amdfam10h";
+	      priority = P_PROC_SSE4_a;
+	      break;
+	    case PROCESSOR_BDVER1:
+	      arg_str = "bdver1";
+	      priority = P_PROC_FMA;
+	      break;
+	    case PROCESSOR_BDVER2:
+	      arg_str = "bdver2";
+	      priority = P_PROC_FMA;
+	      break;
+	    }  
+	}    
+    
+      cl_target_option_restore (&global_options, &cur_target);
+	
+      if (predicate_list && arg_str == NULL)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+	    	"No dispatcher found for the versioning attributes");
+	  return 0;
+	}
+    
+      if (predicate_list)
+	{
+          predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
+          /* For a C string literal the length includes the trailing NULL.  */
+          predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+          predicate_chain = tree_cons (predicate_decl, predicate_arg,
+				       predicate_chain);
+	}
+    }
+
+  /* Process feature name.  */
+  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
+  strcpy (tok_str, attrs_str);
+  token = strtok (tok_str, ",");
+  predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
+
+  while (token != NULL)
+    {
+      /* Do not process "arch="  */
+      if (strncmp (token, "arch=", 5) == 0)
+	{
+	  token = strtok (NULL, ",");
+	  continue;
+	}
+      for (i = 0; i < NUM_FEATURES; ++i)
+	{
+	  if (strcmp (token, feature_list[i].name) == 0)
+	    {
+	      if (predicate_list)
+		{
+		  predicate_arg = build_string_literal (
+				  strlen (feature_list[i].name) + 1,
+				  feature_list[i].name);
+		  predicate_chain = tree_cons (predicate_decl, predicate_arg,
+					       predicate_chain);
+		}
+	      /* Find the maximum priority feature.  */
+	      if (feature_list[i].priority > priority)
+		priority = feature_list[i].priority;
+
+	      break;
+	    }
+	}
+      if (predicate_list && i == NUM_FEATURES)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+		    "No dispatcher found for %s", token);
+	  return 0;
+	}
+      token = strtok (NULL, ",");
+    }
+  free (tok_str);
+
+  if (predicate_list && predicate_chain == NULL_TREE)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+	        "No dispatcher found for the versioning attributes : %s",
+	        attrs_str);
+      return 0;
+    }
+  else if (predicate_list)
+    {
+      predicate_chain = nreverse (predicate_chain);
+      *predicate_list = predicate_chain;
+    }
+
+  return priority; 
+}
+
+/* This compares the priority of target features in function DECL1
+   and DECL2.  It returns positive value if DECL1 is higher priority,
+   negative value if DECL2 is higher priority and 0 if they are the
+   same.  */
+
+static int
+ix86_compare_version_priority (tree decl1, tree decl2)
+{
+  unsigned int priority1 = 0;
+  unsigned int priority2 = 0;
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl1)) != NULL)
+    priority1 = get_builtin_code_for_version (decl1, NULL);
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl2)) != NULL)
+    priority2 = get_builtin_code_for_version (decl2, NULL);
+
+  return (int)priority1 - (int)priority2;
+}
+
+/* V1 and V2 point to function versions with different priorities
+   based on the target ISA.  This function compares their priorities.  */
+ 
+static int
+feature_compare (const void *v1, const void *v2)
+{
+  typedef struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    } function_version_info;
+
+  const function_version_info c1 = *(const function_version_info *)v1;
+  const function_version_info c2 = *(const function_version_info *)v2;
+  return (c2.dispatch_priority - c1.dispatch_priority);
+}
+
+/* This function generates the dispatch function for
+   multi-versioned functions.  DISPATCH_DECL is the function which will
+   contain the dispatch logic.  FNDECLS are the function choices for
+   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
+   in DISPATCH_DECL in which the dispatch code is generated.  */
+
+static int
+dispatch_function_versions (tree dispatch_decl,
+			    void *fndecls_p,
+			    basic_block *empty_bb)
+{
+  tree default_decl;
+  gimple ifunc_cpu_init_stmt;
+  gimple_seq gseq;
+  int ix;
+  tree ele;
+  VEC (tree, heap) *fndecls;
+  unsigned int num_versions = 0;
+  unsigned int actual_versions = 0;
+  unsigned int i;
+
+  struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    }*function_version_info;
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /*fndecls_p is actually a vector.  */
+  fndecls = (VEC (tree, heap) *)fndecls_p;
+
+  /* At least one more version other than the default.  */
+  num_versions = VEC_length (tree, fndecls);
+  gcc_assert (num_versions >= 2);
+
+  function_version_info = (struct _function_version_info *)
+    XNEWVEC (struct _function_version_info, (num_versions - 1));
+
+  /* The first version in the vector is the default decl.  */
+  default_decl = VEC_index (tree, fndecls, 0);
+
+  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
+
+  gseq = bb_seq (*empty_bb);
+  /* Function version dispatch is via IFUNC.  IFUNC resolvers fire before
+     constructors, so explicity call __builtin_cpu_init here.  */
+  ifunc_cpu_init_stmt = gimple_build_call_vec (
+                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
+  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
+  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
+  set_bb_seq (*empty_bb, gseq);
+
+  pop_cfun ();
+
+
+  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
+    {
+      tree version_decl = ele;
+      tree predicate_chain = NULL_TREE;
+      unsigned int priority;
+      /* Get attribute string, parse it and find the right predicate decl.
+         The predicate function could be a lengthy combination of many
+	 features, like arch-type and various isa-variants.  */
+      priority = get_builtin_code_for_version (version_decl,
+	 			               &predicate_chain);
+
+      if (predicate_chain == NULL_TREE)
+	continue;
+
+      actual_versions++;
+      function_version_info [ix - 1].version_decl = version_decl;
+      function_version_info [ix - 1].predicate_chain = predicate_chain;
+      function_version_info [ix - 1].dispatch_priority = priority;
+    }
+
+  /* Sort the versions according to descending order of dispatch priority.  The
+     priority is based on the ISA.  This is not a perfect solution.  There
+     could still be ambiguity.  If more than one function version is suitable
+     to execute,  which one should be dispatched?  In future, allow the user
+     to specify a dispatch  priority next to the version.  */
+  qsort (function_version_info, actual_versions,
+         sizeof (struct _function_version_info), feature_compare);
+
+  for  (i = 0; i < actual_versions; ++i)
+    *empty_bb = add_condition_to_bb (dispatch_decl,
+				     function_version_info[i].version_decl,
+				     function_version_info[i].predicate_chain,
+				     *empty_bb);
+
+  /* dispatch default version at the end.  */
+  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl,
+				   NULL, *empty_bb);
+
+  free (function_version_info);
+  return 0;
+}
+
+/* This function returns true if FN1 and FN2 are versions of the same function,
+   that is, the targets of the function decls are different.  This assumes
+   that FN1 and FN2 have the same signature.  */
+
+static bool
+ix86_function_versions (tree fn1, tree fn2)
+{
+  tree attr1, attr2;
+  struct cl_target_option *target1, *target2;
+
+  if (TREE_CODE (fn1) != FUNCTION_DECL
+      || TREE_CODE (fn2) != FUNCTION_DECL)
+    return false;
+
+  attr1 = DECL_FUNCTION_SPECIFIC_TARGET (fn1);
+  attr2 = DECL_FUNCTION_SPECIFIC_TARGET (fn2);
+
+  /* Atleast one function decl should have target attribute specified.  */
+  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
+    return false;
+
+  if (attr1 == NULL_TREE)
+    attr1 = target_option_default_node;
+  else if (attr2 == NULL_TREE)
+    attr2 = target_option_default_node;
+
+  target1 = TREE_TARGET_OPTION (attr1);
+  target2 = TREE_TARGET_OPTION (attr2);
+
+  /* target1 and target2 must be different in some way.  */
+  if (target1->x_ix86_isa_flags == target2->x_ix86_isa_flags
+      && target1->x_target_flags == target2->x_target_flags
+      && target1->arch == target2->arch
+      && target1->tune == target2->tune
+      && target1->x_ix86_fpmath == target2->x_ix86_fpmath
+      && target1->branch_cost == target2->branch_cost)
+    return false;
+
+  return true;
+}
+
+/* Comparator function to be used in qsort routine to sort attribute
+   specification strings to "target".  */
+
+static int
+attr_strcmp (const void *v1, const void *v2)
+{
+  const char *c1 = *(char *const*)v1;
+  const char *c2 = *(char *const*)v2;
+  return strcmp (c1, c2);
+}
+
+/* STR is the argument to target attribute.  This function tokenizes
+   the comma separated arguments, sorts them and returns a string which
+   is a unique identifier for the comma separated arguments.   It also
+   replaces non-identifier characters "=,-" with "_".  */
+
+static char *
+sorted_attr_string (const char *str)
+{
+  char **args = NULL;
+  char *attr_str, *ret_str;
+  char *attr = NULL;
+  unsigned int argnum = 1;
+  unsigned int i;
+
+  for (i = 0; i < strlen (str); i++)
+    if (str[i] == ',')
+      argnum++;
+
+  attr_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (attr_str, str);
+
+  /* Replace "=,-" with "_".  */
+  for (i = 0; i < strlen (attr_str); i++)
+    if (attr_str[i] == '=' || attr_str[i]== '-')
+      attr_str[i] = '_';
+
+  if (argnum == 1)
+    return attr_str;
+
+  args = XNEWVEC (char *, argnum);
+
+  i = 0;
+  attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      args[i] = attr;
+      i++;
+      attr = strtok (NULL, ",");
+    }
+
+  qsort (args, argnum, sizeof (char*), attr_strcmp);
+
+  ret_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (ret_str, args[0]);
+  for (i = 1; i < argnum; i++)
+    {
+      strcat (ret_str, "_");
+      strcat (ret_str, args[i]);
+    }
+
+  free (args);
+  free (attr_str);
+  return ret_str;
+}
+
+/* This function changes the assembler name for functions that are
+   versions.  If DECL is a function version and has a "target"
+   attribute, it appends the attribute string to its assembler name.  */
+
+static tree
+ix86_mangle_function_version_assembler_name (tree decl, tree id)
+{
+  tree version_attr;
+  const char *orig_name, *version_string, *attr_str;
+  char *assembler_name;
+
+  if (DECL_DECLARED_INLINE_P (decl)
+      && lookup_attribute ("gnu_inline",
+			   DECL_ATTRIBUTES (decl)))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Function versions cannot be marked as gnu_inline,"
+	      " bodies have to be generated");
+
+  if (DECL_VIRTUAL_P (decl)
+      || DECL_VINDEX (decl))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Virtual function versioning not supported\n");
+
+  version_attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+
+  /* target attribute string is NULL for default functions.  */
+  if (version_attr == NULL_TREE)
+    return id;
+
+  orig_name = IDENTIFIER_POINTER (id);
+  version_string
+    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
+
+  attr_str = sorted_attr_string (version_string);
+  assembler_name = (char *) xmalloc (strlen (orig_name)
+				     + strlen (attr_str) + 2);
+
+  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
+
+  /* Allow assembler name to be modified if already set.  */
+  if (DECL_ASSEMBLER_NAME_SET_P (decl))
+    SET_DECL_RTL (decl, NULL);
+
+  return get_identifier (assembler_name);
+}
+
+static tree 
+ix86_mangle_decl_assembler_name (tree decl, tree id)
+{
+  /* For function version, add the target suffix to the assembler name.  */
+  if (TREE_CODE (decl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (decl))
+    return ix86_mangle_function_version_assembler_name (decl, id);
+
+  return id;
+}
+
+/* Return a new name by appending SUFFIX to the DECL name.  If make_unique
+   is true, append the full path name of the source file.  */
+
+static char *
+make_name (tree decl, const char *suffix, bool make_unique)
+{
+  char *global_var_name;
+  int name_len;
+  const char *name;
+  const char *unique_name = NULL;
+
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* Get a unique name that can be used globally without any chances
+     of collision at link time.  */
+  if (make_unique)
+    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
+
+  name_len = strlen (name) + strlen (suffix) + 2;
+
+  if (make_unique)
+    name_len += strlen (unique_name) + 1;
+  global_var_name = XNEWVEC (char, name_len);
+
+  /* Use '.' to concatenate names as it is demangler friendly.  */
+  if (make_unique)
+      snprintf (global_var_name, name_len, "%s.%s.%s", name,
+		unique_name, suffix);
+  else
+      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
+
+  return global_var_name;
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher
+   by the front-end.  Return the decl created.  */
+
+static tree
+make_dispatcher_decl (const tree decl)
+{
+  tree func_decl;
+  char *func_name, *resolver_name;
+  tree fn_type, func_type;
+  bool is_uniq = false;
+
+  if (TREE_PUBLIC (decl) == 0)
+    is_uniq = true;
+
+  func_name = make_name (decl, "ifunc", is_uniq);
+  resolver_name = make_name (decl, "resolver", is_uniq);
+  gcc_assert (resolver_name);
+
+  fn_type = TREE_TYPE (decl);
+  func_type = build_function_type (TREE_TYPE (fn_type),
+				   TYPE_ARG_TYPES (fn_type));
+  
+  func_decl = build_fn_decl (func_name, func_type);
+  TREE_USED (func_decl) = 1;
+  DECL_CONTEXT (func_decl) = NULL_TREE;
+  DECL_INITIAL (func_decl) = error_mark_node;
+  DECL_ARTIFICIAL (func_decl) = 1;
+  /* Mark this func as external, the resolver will flip it again if
+     it gets generated.  */
+  DECL_EXTERNAL (func_decl) = 1;
+  /* This will be of type IFUNCs have to be externally visible.  */
+  TREE_PUBLIC (func_decl) = 1;
+
+  return func_decl;  
+}
+
+/* Returns true if decl is multi-versioned and DECL is the default function,
+   that is it is not tagged with target specific optimization.  */
+
+static bool
+is_function_default_version (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL_TREE);
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher
+   by the front-end.  Returns the decl of the dispatcher function.  */
+
+static tree
+ix86_get_function_versions_dispatcher (void *decl)
+{
+  tree fn = (tree) decl;
+  struct cgraph_node *node = NULL;
+  struct cgraph_node *default_node = NULL;
+  struct cgraph_function_version_info *node_v = NULL;
+  struct cgraph_function_version_info *it_v = NULL;
+  struct cgraph_function_version_info *first_v = NULL;
+
+  tree dispatch_decl = NULL;
+  struct cgraph_node *dispatcher_node = NULL;
+  struct cgraph_function_version_info *dispatcher_version_info = NULL;
+
+  struct cgraph_function_version_info *default_version_info = NULL;
+ 
+  gcc_assert (fn != NULL && DECL_FUNCTION_VERSIONED (fn));
+
+  node = cgraph_get_node (fn);
+  gcc_assert (node != NULL);
+
+  node_v = get_cgraph_node_version (node);
+  gcc_assert (node_v != NULL);
+ 
+  if (node_v->dispatcher_resolver != NULL)
+    return node_v->dispatcher_resolver;
+
+  /* Find the default version and make it the first node.  */
+  first_v = node_v;
+  /* Go to the beginnig of the chain.  */
+  while (first_v->prev != NULL)
+    first_v = first_v->prev;
+  default_version_info = first_v;
+  while (default_version_info != NULL)
+    {
+      if (is_function_default_version
+	    (default_version_info->this_node->symbol.decl))
+        break;
+      default_version_info = default_version_info->next;
+    }
+
+  /* If there is no default node, just return NULL.  */
+  if (default_version_info == NULL)
+    return NULL;
+
+  /* Make default info the first node.  */
+  if (first_v != default_version_info)
+    {
+      default_version_info->prev->next = default_version_info->next;
+      if (default_version_info->next)
+        default_version_info->next->prev = default_version_info->prev;
+      first_v->prev = default_version_info;
+      default_version_info->next = first_v;
+      default_version_info->prev = NULL;
+    }
+
+  default_node = default_version_info->this_node;
+
+#if defined (ASM_OUTPUT_TYPE_DIRECTIVE) && HAVE_GNU_INDIRECT_FUNCTION
+  /* Right now, the dispatching is done via ifunc.  */
+  dispatch_decl = make_dispatcher_decl (default_node->symbol.decl); 
+#else
+  error_at (DECL_SOURCE_LOCATION (default_node->symbol.decl),
+	    "Multiversioning needs ifunc which is not supported "
+	    "in this configuration");
+#endif
+
+  dispatcher_node = cgraph_get_create_node (dispatch_decl);
+  gcc_assert (dispatcher_node != NULL);
+  dispatcher_node->dispatcher_function = 1;
+  dispatcher_version_info
+    = insert_new_cgraph_node_version (dispatcher_node);
+  dispatcher_version_info->next = default_version_info;
+  dispatcher_node->local.finalized = 1;
+  cgraph_mark_address_taken_node (default_node);
+ 
+  /* Set the dispatcher for all the versions.  */ 
+  it_v = default_version_info;
+  while (it_v->next != NULL)
+    {
+      it_v->dispatcher_resolver = dispatch_decl;
+      it_v = it_v->next;
+    }
+
+  return dispatch_decl;
+}
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* Make the resolver function decl to dispatch the versions of
+   a multi-versioned function,  DEFAULT_DECL.  Create an
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_resolver_func (const tree default_decl,
+		    const tree dispatch_decl,
+		    basic_block *empty_bb)
+{
+  char *resolver_name;
+  tree decl, type, decl_name, t;
+  bool is_uniq = false;
+
+  /* IFUNC's have to be globally visible.  So, if the default_decl is
+     not, then the name of the IFUNC should be made unique.  */
+  if (TREE_PUBLIC (default_decl) == 0)
+    is_uniq = true;
+
+  /* Append the filename to the resolver function if the versions are
+     not externally visible.  This is because the resolver function has
+     to be externally visible for the loader to find it.  So, appending
+     the filename will prevent conflicts with a resolver function from
+     another module which is based on the same version name.  */
+  resolver_name = make_name (default_decl, "resolver", is_uniq);
+
+  /* The resolver function should return a (void *). */
+  type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  decl_name = get_identifier (resolver_name);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  /* IFUNC resolvers have to be externally visible.  */
+  TREE_PUBLIC (decl) = 1;
+  DECL_UNINLINABLE (decl) = 1;
+
+  /* Resolver is not external, body is generated.  */
+  DECL_EXTERNAL (decl) = 0;
+  DECL_EXTERNAL (dispatch_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      DECL_COMDAT (decl) = DECL_COMDAT (default_decl);
+      /*make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl));*/
+      make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+    }
+  else if (TREE_PUBLIC (default_decl))
+    {
+      /* In this case, each translation unit with a call to this
+	 versioned function will put out a resolver.  Ensure it
+	 is comdat to keep just one copy.  */
+      DECL_COMDAT (decl) = 1;
+      make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+    }
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  *empty_bb = init_lowered_empty_function (decl, false);
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
+
+  if (DECL_COMDAT_GROUP (default_decl))
+    {
+      gcc_assert (cgraph_get_node (default_decl));
+      symtab_add_to_same_comdat_group (
+	(symtab_node) cgraph_get_node (decl),
+	(symtab_node) cgraph_get_node (default_decl));
+    }
+
+  pop_cfun ();
+
+  gcc_assert (dispatch_decl != NULL);
+  /* Mark dispatch_decl as "ifunc" with resolver as resolver_name.  */
+  DECL_ATTRIBUTES (dispatch_decl) 
+    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));
+
+  /* Create the alias for dispatch to resolver here.  */
+  /*cgraph_create_function_alias (dispatch_decl, decl);*/
+  cgraph_same_body_alias (NULL, dispatch_decl, decl);
+  return decl;
+}
+
+/* Generate the dispatching code body to dispatch multi-versioned function
+   DECL.  The target hook is called to process the "target" attributes and
+   provide the code to dispatch the right function at run-time.  NODE points
+   to the dispatcher decl whose body will be created.  */
+
+static tree 
+ix86_generate_version_dispatcher_body (void *node_p)
+{
+  tree resolver_decl;
+  basic_block empty_bb;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  tree default_ver_decl;
+  struct cgraph_node *versn;
+  struct cgraph_node *node;
+
+  struct cgraph_function_version_info *node_version_info = NULL;
+  struct cgraph_function_version_info *versn_info = NULL;
+
+  node = (cgraph_node *)node_p;
+
+  node_version_info = get_cgraph_node_version (node);
+  gcc_assert (node->dispatcher_function
+	      && node_version_info != NULL);
+
+  if (node_version_info->dispatcher_resolver)
+    return node_version_info->dispatcher_resolver;
+
+  /* The first version in the chain corresponds to the default version.  */
+  default_ver_decl = node_version_info->next->this_node->symbol.decl;
+
+  /* node is going to be an alias, so remove the finalized bit.  */
+  node->local.finalized = false;
+
+  resolver_decl = make_resolver_func (default_ver_decl,
+				      node->symbol.decl, &empty_bb);
+
+  node_version_info->dispatcher_resolver = resolver_decl;
+
+  push_cfun (DECL_STRUCT_FUNCTION (resolver_decl));
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+
+  for (versn_info = node_version_info->next; versn_info;
+       versn_info = versn_info->next)
+    {
+      versn = versn_info->this_node;
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (versn->symbol.decl))
+        error_at (DECL_SOURCE_LOCATION (versn->symbol.decl),
+		  "Virtual function multiversioning not supported");
+      VEC_safe_push (tree, heap, fn_ver_vec, versn->symbol.decl);
+    }
+
+  dispatch_function_versions (resolver_decl, fn_ver_vec, &empty_bb);
+
+  rebuild_cgraph_edges (); 
+  pop_cfun ();
+  return resolver_decl;
+}
 /* This builds the processor_model struct type defined in
    libgcc/config/i386/cpuinfo.c  */
 
@@ -28658,6 +29635,8 @@ fold_builtin_cpu (tree fndecl, tree *args)
     {
       tree ref;
       tree field;
+      tree final;
+
       unsigned int field_val = 0;
       unsigned int NUM_ARCH_NAMES
 	= sizeof (arch_names_table) / sizeof (struct _arch_names_table);
@@ -28697,14 +29676,17 @@ fold_builtin_cpu (tree fndecl, tree *args)
 		     field, NULL_TREE);
 
       /* Check the value.  */
-      return build2 (EQ_EXPR, unsigned_type_node, ref,
-		     build_int_cstu (unsigned_type_node, field_val));
+      final = build2 (EQ_EXPR, unsigned_type_node, ref,
+		      build_int_cstu (unsigned_type_node, field_val));
+      return build1 (CONVERT_EXPR, integer_type_node, final);
     }
   else if (fn_code == IX86_BUILTIN_CPU_SUPPORTS)
     {
       tree ref;
       tree array_elt;
       tree field;
+      tree final;
+
       unsigned int field_val = 0;
       unsigned int NUM_ISA_NAMES
 	= sizeof (isa_names_table) / sizeof (struct _isa_names_table);
@@ -28736,8 +29718,9 @@ fold_builtin_cpu (tree fndecl, tree *args)
 
       field_val = (1 << isa_names_table[i].feature);
       /* Return __cpu_model.__cpu_features[0] & field_val  */
-      return build2 (BIT_AND_EXPR, unsigned_type_node, array_elt,
-		     build_int_cstu (unsigned_type_node, field_val));
+      final = build2 (BIT_AND_EXPR, unsigned_type_node, array_elt,
+		      build_int_cstu (unsigned_type_node, field_val));
+      return build1 (CONVERT_EXPR, integer_type_node, final);
     }
   gcc_unreachable ();
 }
@@ -41225,6 +42208,9 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_PROFILE_BEFORE_PROLOGUE
 #define TARGET_PROFILE_BEFORE_PROLOGUE ix86_profile_before_prologue
 
+#undef TARGET_MANGLE_DECL_ASSEMBLER_NAME
+#define TARGET_MANGLE_DECL_ASSEMBLER_NAME ix86_mangle_decl_assembler_name
+
 #undef TARGET_ASM_UNALIGNED_HI_OP
 #define TARGET_ASM_UNALIGNED_HI_OP TARGET_ASM_ALIGNED_HI_OP
 #undef TARGET_ASM_UNALIGNED_SI_OP
@@ -41318,6 +42304,17 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN ix86_fold_builtin
 
+#undef TARGET_COMPARE_VERSION_PRIORITY
+#define TARGET_COMPARE_VERSION_PRIORITY ix86_compare_version_priority
+
+#undef TARGET_GENERATE_VERSION_DISPATCHER_BODY
+#define TARGET_GENERATE_VERSION_DISPATCHER_BODY \
+  ix86_generate_version_dispatcher_body
+
+#undef TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+#define TARGET_GET_FUNCTION_VERSIONS_DISPATCHER \
+  ix86_get_function_versions_dispatcher
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
 
@@ -41458,6 +42455,9 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_OPTION_PRINT
 #define TARGET_OPTION_PRINT ix86_function_specific_print
 
+#undef TARGET_OPTION_FUNCTION_VERSIONS
+#define TARGET_OPTION_FUNCTION_VERSIONS ix86_function_versions
+
 #undef TARGET_CAN_INLINE_P
 #define TARGET_CAN_INLINE_P ix86_can_inline_p
 
Index: gcc/testsuite/g++.dg/mv1.C
===================================================================
--- gcc/testsuite/g++.dg/mv1.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv1.C	(revision 0)
@@ -0,0 +1,130 @@
+/* Test case to check if Multiversioning works.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -fPIC -mno-avx -mno-popcnt" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The other versions of foo.  Mix up the ordering and 
+   check if the dispatching does it in the order of priority. */
+/* Check combination of target attributes.  */
+int foo () __attribute__ ((target("arch=corei7,popcnt")));
+/* The target operands in this declaration and the definition are re-ordered.
+   This should still work.  */
+int foo () __attribute__ ((target("ssse3,avx2")));
+
+/* Check for all target attributes for which dispatchers are available.  */
+/* Check arch= */
+int foo () __attribute__((target("arch=core2")));
+int foo () __attribute__((target("arch=corei7")));
+int foo () __attribute__((target("arch=atom")));
+/* Check ISAs  */
+int foo () __attribute__((target("avx")));
+int foo () __attribute__ ((target("arch=core2,sse4.2")));
+/* Check more arch=.  */
+int foo () __attribute__((target("arch=amdfam10")));
+int foo () __attribute__((target("arch=bdver1")));
+int foo () __attribute__((target("arch=bdver2")));
+
+int (*p)() = &foo;
+int main ()
+{
+  int val = foo ();
+  assert (val ==  (*p)());
+
+  /* Check in the exact same order in which the dispatching
+     is expected to happen.  */
+  if (__builtin_cpu_is ("bdver1"))
+    assert (val == 1);
+  else if (__builtin_cpu_is ("bdver2"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("avx2")
+	   && __builtin_cpu_supports ("ssse3"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 4);
+  else if (__builtin_cpu_is ("corei7")
+	   && __builtin_cpu_supports ("popcnt"))
+    assert (val == 5);
+  else if (__builtin_cpu_is ("corei7"))
+    assert (val == 6);
+  else if (__builtin_cpu_is ("amdfam10h"))
+    assert (val == 7);
+  else if (__builtin_cpu_is ("core2")
+	   && __builtin_cpu_supports ("sse4.2"))
+    assert (val == 8);
+  else if (__builtin_cpu_is ("core2"))
+    assert (val == 9);
+  else if (__builtin_cpu_is ("atom"))
+    assert (val == 10);
+  else
+    assert (val == 0);
+  
+  return 0;
+}
+
+int foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=corei7,popcnt")))
+foo ()
+{
+  return 5;
+}
+int __attribute__ ((target("avx2,ssse3")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("arch=core2")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("arch=corei7")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("arch=atom")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("arch=core2,sse4.2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("arch=amdfam10")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("arch=bdver1")))
+foo ()
+{
+  return 1;
+}
+
+int __attribute__ ((target("arch=bdver2")))
+foo ()
+{
+  return 2;
+}
Index: gcc/testsuite/g++.dg/mv2.C
===================================================================
--- gcc/testsuite/g++.dg/mv2.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv2.C	(revision 0)
@@ -0,0 +1,121 @@
+/* Test case to check if Multiversioning chooses the correct
+   dispatching order when versions are for various ISAs.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -mno-sse -mno-mmx -mno-popcnt -mno-avx" } */
+
+#include <assert.h>
+#include <stdio.h>
+
+/* Default version.  */
+int foo ();
+/* The dispatch checks should be in the exact reverse order of the
+   declarations below.  */
+int foo () __attribute__ ((target ("mmx")));
+int foo () __attribute__ ((target ("sse")));
+int foo () __attribute__ ((target ("sse2")));
+int foo () __attribute__ ((target ("sse3")));
+int foo () __attribute__ ((target ("ssse3")));
+int foo () __attribute__ ((target ("sse4.1")));
+int foo () __attribute__ ((target ("sse4.2")));
+int foo () __attribute__ ((target ("popcnt")));
+int foo () __attribute__ ((target ("avx")));
+int foo () __attribute__ ((target ("avx2")));
+
+int main ()
+{
+
+  int val = foo ();
+  printf ("val = %d\n", val);
+
+  if (__builtin_cpu_supports ("avx2"))
+    assert (val == 1);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("popcnt"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("sse4.2"))
+    assert (val == 4);
+  else if (__builtin_cpu_supports ("sse4.1"))
+    assert (val == 5);
+  else if (__builtin_cpu_supports ("ssse3"))
+    assert (val == 6);
+  else if (__builtin_cpu_supports ("sse3"))
+    assert (val == 7);
+  else if (__builtin_cpu_supports ("sse2"))
+    assert (val == 8);
+  else if (__builtin_cpu_supports ("sse"))
+    assert (val == 9);
+  else if (__builtin_cpu_supports ("mmx"))
+    assert (val == 10);
+  else
+    assert (val == 0);
+
+  return 0;
+}
+
+int
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 2;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return 1;
+}
Index: gcc/testsuite/g++.dg/mv3.C
===================================================================
--- gcc/testsuite/g++.dg/mv3.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv3.C	(revision 0)
@@ -0,0 +1,37 @@
+/* Test case to check if a call to a multiversioned function
+   is replaced with a direct call to the particular version when
+   the most specialized version's target attributes match the
+   caller.  
+  
+   In this program, foo is multiversioned but there is no default
+   function.  This is an error if the call has to go through a
+   dispatcher.  However, the call to foo in bar can be replaced
+   with a direct call to the popcnt version of foo.  Hence, this
+   test should pass.  */
+
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -mno-sse -mno-popcnt" } */
+
+
+/* Default version.  */
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target ("popcnt")))
+bar ()
+{
+  return foo ();
+}
+
+int main ()
+{
+  return bar ();
+}
Index: gcc/testsuite/g++.dg/mv4.C
===================================================================
--- gcc/testsuite/g++.dg/mv4.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv4.C	(revision 0)
@@ -0,0 +1,23 @@
+/* Test case to check if the compiler generates an error message
+   when the default version of a multiversioned function is absent
+   and its pointer is taken.  */
+
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -mno-sse -mno-popcnt" } */
+
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int main ()
+{
+  int (*p)() = &foo; /* { dg-error "Pointer to a multiversioned function without a default is not allowed" {} } */
+  return (*p)();
+}

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-11-02  2:53                                                                                           ` Sriraman Tallam
@ 2012-11-06  2:38                                                                                             ` Sriraman Tallam
  2012-11-06 15:52                                                                                               ` Jason Merrill
  2012-11-06 22:15                                                                                               ` Gerald Pfeifer
  0 siblings, 2 replies; 93+ messages in thread
From: Sriraman Tallam @ 2012-11-06  2:38 UTC (permalink / raw)
  To: Jason Merrill, David Li, H.J. Lu
  Cc: gcc-patches List, Jan Hubicka, Diego Novillo

[-- Attachment #1: Type: text/plain, Size: 2117 bytes --]

Hi,

   I have now committed the attached patch.

Thanks,
-Sri.

On Thu, Nov 1, 2012 at 7:53 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi Jason,
>
>    I have made all the changes you mentioned and attached the new
> patch.  Summary of the important things changed:
>
> * The versions are collected at declaration time itself now.
> * extern "C" functions are disallowed from being versions for now.
> extern "C" functions have to be handled exactly like how the C
> front-end would handle versioned functions. I will do this when I get
> to the C front-end.
> * Finalizing cgraph nodes is removed from front-end code.
>
>
> Thanks,
> -Sri.
>
>
> On Wed, Oct 31, 2012 at 7:02 AM, Jason Merrill <jason@redhat.com> wrote:
>> On 10/30/2012 05:49 PM, Sriraman Tallam wrote:
>>>
>>> AFAIU, this should not be a problem. For duplicate declarations,
>>> duplicate_decls should merge them and they should never be seen here.
>>> Did I miss something?
>>
>>
>> With extern "C" functions you can have multiple declarations of the same
>> function in different namespaces that are not duplicates, but still match.
>> And I can't think what that test is supposed to be catching, anyway.
>>
>>
>>> No, I thought about this but I did not want to handle this case in
>>> this iteration. The dispatcher is created only once and if more
>>> functions are declared later, they will not be dispatched atleast in
>>> this iteration.
>>
>>
>> I still think that instead of collecting the set of functions in overload
>> resolution, they should be collected at declaration time and added to a
>> vector in the cgraph information for use when generating the body of the
>> dispatcher.
>>
>>
>>> You talked about doing the dispatcher
>>> building later, but I did it here since I am doing it only once.
>>
>>
>> I still don't think this is the right place for it.
>>
>>
>>> dispatcher_node does not have a body  until it is generated in
>>> cgraphunit.c, so cgraph does not mark this field before this is
>>> processed in cgraph_analyze_function.
>>
>>
>> That seems like something to address in your cgraph changes.
>>
>> Jason
>>

[-- Attachment #2: mv_fe_patch_11052012.txt --]
[-- Type: text/plain, Size: 83935 bytes --]

Function Multiversioning
========================

Overview of the patch which adds support to specify function versions.  This is
only enabled for target i386.

Example:

int foo ();  /* Default version */
int foo () __attribute__ ((target("avx,popcnt")));/*Specialized for avx and popcnt */
int foo () __attribute__ ((target("arch=core2,ssse3")));/*Specialized for core2 and ssse3*/

int main ()
{
 int (*p)() = &foo;
 return foo () + (*p)();
}

int foo ()
{
 return 0;
}

int __attribute__ ((target("avx,popcnt")))
foo ()
{
 return 0;
}

int __attribute__ ((target("arch=core2,ssse3")))
foo ()
{
 return 0;
}

The above example has foo defined 3 times, but all 3 definitions of foo are
different versions of the same function. The call to foo in main, directly and
via a pointer, are calls to the multi-versioned function foo which is dispatched
to the right foo at run-time.

Front-end changes:

The front-end changes are calls at appropriate places to target hooks that
determine the following:

* Determine if two function decls with the same signature are versions.
* Determine the assembler name of a function version.
* Generate the dispatcher function for a set of function versions.
* Compare versions to see if one has a higher priority over the other.

All the implementation happens in the target-specific config/i386/i386.c.

What does the patch do?

* Tracking decls that correspond to function versions of function
name, say "foo":

When the front-end sees more than one decl for "foo", it calls a target hook to
determine if they are versions. To prevent duplicate definition errors with
other versions of "foo", "decls_match" function in cp/decl.c is made to return
false when 2 decls have are deemed versions by the target. This will make all
function versions of "foo" to be added to the overload list of "foo".

* Change the assembler names of the function versions.

For i386, the target changes the assembler names of the function versions by
 suffixing the sorted list of args to "target" to the function name of "foo".
For example, the assembler name of
 "void foo () __attribute__ ((target ("sse4")))" will
become _Z3foov.sse4.  The target hook mangle_decl_assembler_name is used
for this.

* Overload resolution:

 Function "build_over_call" in cp/call.c sees a call to function
"foo", which is multi-versioned. The overload resolution happens in
function "joust" in "cp/call.c". Here, the call to "foo" has all
possible versions of "foo" as candidates. All the candidates of "foo" are
stored in the cgraph side data structure. Each version of foo is chained in a 
doubly-linked list with the default function as the first element.  This allows
any pass to access all the semantically identical versions. A call to a
multi-versioned function will be replaced by a call to a dispatcher function,
determined by a target hook, to execute the right function version at run-time.

Optimization to directly call a version when possible:
Also, in joust, where overload resolution happens, a multiversioned function
resolution is made to return the most specialized version.  This is the version
that will be checked for dispatching first and is determined by the target.
Now, if the caller can inline this function version then a direct call is made
to this function version rather than go through the dispatcher. When a direct
call cannot be made, a call to the dispatcher function is created.

* Creating the dispatcher body.

The dispatcher body, called the resolver is made only when there is a call to a
multiversioned function dispatcher or the address of a function is taken. This
is generated during cgraph_analyze_function. This is done by another target hook.

* Dispatch ordering.

The order in which the function versions are checked during dispatch is based
on a priority value assigned for the ISA that is catered. More specialized 
versions are checked for dispatching first.  This is to mitigate the ambiguity
that can arise when more than one function version is valid for execution on
a particular platform.  This is not a perfect solution, and in future the user
should be allowed to assign a dispatching priority value to each version.

Function MV in the Intel compiler:

The intel compiler supports function multiversioning and the syntax is
similar to the patch proposed here.  Here is an example of how to
generate multiple function versions with the intel compiler.

/* Create a stub function to specify the various versions of function that
   will be created, using declspec attribute cpu_dispatch.  */
__declspec (cpu_dispatch (core_i7_sse4_2, atom, generic))
void foo () {};

/* Bodies of each function version.  */

/* Intel Corei7 processor + SSE4.2 version.  */
__declspec (cpu_specific(core_i7_sse4_2))
void foo ()
{
  printf ("corei7 + sse4.2");
}

/* Atom processor.  */
__declspec (cpu_specific(atom))
void foo ()
{
  printf ("atom");
}

/* The generic or the default version.  */
__declspec (cpu_specific(generic))
void foo ()
{
  printf ("This is generic");
}

A new function version is generated by defining a new function with the same
signature but with a different cpu_specific declspec attribute string.  The
set of cpu_specific strings that are allowed is the following:

"core_2nd_gen_avx"
"core_aes_pclmulqdq"
"core_i7_sse4_2"
"core_2_duo_sse4_1"
"core_2_duo_ssse3"
"atom"
"pentium_4_sse3"
"pentium_4"
"pentium_m"
"pentium_iii"
"generic"

Comparison with the GCC MV implementation in this patch:

* Version creation syntax:

The implementation in this patch also has a similar syntax to specify function
versions. The first stub function is not needed.  Here is the code to generate
the function versions with this patch:

/* Intel Corei7 processor + SSE4.2 version.  */
__attribute__ ((target ("arch=corei7, sse4.2")))
void foo ()
{
  printf ("corei7 + sse4.2");
}

/* Atom processor.  */
__attribute__ ((target ("arch=atom")))
void foo ()
{
  printf ("atom");
}

void foo ()
{
}

The target attribute can have one of the following arch names:

"amd"
"intel"
"atom"
"core2"
"corei7"
"nehalem"
"westmere"
"sandybridge"
"amdfam10h"
"barcelona"
"shanghai"
"istanbul"
"amdfam15h"
"bdver1"
"bdver2"

and any number of the following ISA names:

"cmov"
"mmx"
"popcnt"
"sse"
"sse2"
"sse3"
"ssse3"
"sse4.1"
"sse4.2"
"avx"
"avx2"



	* doc/tm.texi.in (TARGET_OPTION_FUNCTION_VERSIONS): New hook
	description.
	* (TARGET_COMPARE_VERSION_PRIORITY): New hook description.
	* (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New hook description.
	* (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New hook description.
	* doc/tm.texi: Regenerate.
	* target.def (compare_version_priority): New target hook.
	* (generate_version_dispatcher_body): New target hook.
	* (get_function_versions_dispatcher): New target hook.
	* (function_versions): New target hook.
	* cgraph.c (cgraph_fnver_htab): New htab.
	(cgraph_fn_ver_htab_hash): New function.
	(cgraph_fn_ver_htab_eq): New function.
	(version_info_node): New pointer.
	(insert_new_cgraph_node_version): New function.
	(get_cgraph_node_version): New function.
	(delete_function_version): New function.
	(record_function_versions): New function.
	* cgraph.h (cgraph_node): New bitfield dispatcher_function.
	(cgraph_function_version_info): New struct.
	(get_cgraph_node_version): New function.
	(insert_new_cgraph_node_version): New function.
	(record_function_versions): New function.
	(delete_function_version): New function.
	(init_lowered_empty_function): Expose function.
	* tree.h (DECL_FUNCTION_VERSIONED): New macro.
	(tree_function_decl): New bit-field versioned_function.
	* cgraphunit.c (cgraph_analyze_function): Generate body of multiversion
	function dispatcher.
	(cgraph_analyze_functions): Analyze dispatcher function.
	(init_lowered_empty_function): Make non-static. New parameter in_ssa.
	(assemble_thunk): Add parameter to call to init_lowered_empty_function.
	* config/i386/i386.c (add_condition_to_bb): New function.
	(get_builtin_code_for_version): New function.
	(ix86_compare_version_priority): New function.
	(feature_compare): New function.
	(dispatch_function_versions): New function.
	(ix86_function_versions): New function.
	(attr_strcmp): New function.
	(ix86_mangle_function_version_assembler_name): New function.
	(ix86_mangle_decl_assembler_name): New function.
	(make_name): New function.
	(make_dispatcher_decl): New function.
	(is_function_default_version): New function.
	(ix86_get_function_versions_dispatcher): New function.
	(make_attribute): New function.
	(make_resolver_func): New function.
	(ix86_generate_version_dispatcher_body): New function.
	(fold_builtin_cpu): Return integer for cpu builtins.
	(TARGET_MANGLE_DECL_ASSEMBLER_NAME): New macro.
	(TARGET_COMPARE_VERSION_PRIORITY): New macro.
	(TARGET_GENERATE_VERSION_DISPATCHER_BODY): New macro.
	(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New macro.
	(TARGET_OPTION_FUNCTION_VERSIONS): New macro.

		* class.c:
	(add_method): Change assembler names of function versions.
	(mark_versions_used): New static function.
	(resolve_address_of_overloaded_function): Create dispatcher decl and
	return address of dispatcher instead.
	* decl.c (decls_match): Make decls unmatched for versioned
	functions.
	(duplicate_decls): Remove ambiguity for versioned functions.
	Delete versioned function data for merged decls. 
	* decl2.c (check_classfn): Check attributes of versioned functions
	for match.
	* call.c (get_function_version_dispatcher): New function.
	(mark_versions_used): New static function.
	(build_over_call): Make calls to multiversioned functions
	to call the dispatcher.
	(joust): For calls to multi-versioned functions, make the most
	specialized function version win.

	* testsuite/g++.dg/mv1.C: New test.
	* testsuite/g++.dg/mv2.C: New test.
	* testsuite/g++.dg/mv3.C: New test.
	* testsuite/g++.dg/mv4.C: New test.
	* testsuite/g++.dg/mv5.C: New test.
	* testsuite/g++.dg/mv6.C: New test.

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 193203)
+++ gcc/doc/tm.texi	(working copy)
@@ -9929,6 +9929,14 @@ changed via the optimize attribute or pragma, see
 @code{TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE}
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_OPTION_FUNCTION_VERSIONS (tree @var{decl1}, tree @var{decl2})
+This target hook returns @code{true} if @var{DECL1} and @var{DECL2} are
+versions of the same function.  @var{DECL1} and @var{DECL2} are function
+versions if and only if they have the same function signature and
+different target specific attributes, that is, they are compiled for
+different target machines.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_CAN_INLINE_P (tree @var{caller}, tree @var{callee})
 This target hook returns @code{false} if the @var{caller} function
 cannot inline @var{callee}, based on target specific information.  By
@@ -10952,6 +10960,29 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_COMPARE_VERSION_PRIORITY (tree @var{decl1}, tree @var{decl2})
+This hook is used to compare the target attributes in two functions to
+determine which function's features get higher priority.  This is used
+during function multi-versioning to figure out the order in which two
+versions must be dispatched.  A function version with a higher priority
+is checked for dispatching earlier.  @var{decl1} and @var{decl2} are
+ the two function decls that will be compared.
+@end deftypefn
+
+@deftypefn {Target Hook} tree TARGET_GET_FUNCTION_VERSIONS_DISPATCHER (void *@var{decl})
+This hook is used to get the dispatcher function for a set of function
+versions.  The dispatcher function is called to invoke the right function
+version at run-time. @var{decl} is one version from a set of semantically
+identical versions.
+@end deftypefn
+
+@deftypefn {Target Hook} tree TARGET_GENERATE_VERSION_DISPATCHER_BODY (void *@var{arg})
+This hook is used to generate the dispatcher logic to invoke the right
+function version at run-time for a given set of function versions.
+@var{arg} points to the callgraph node of the dispatcher function whose
+body must be generated.
+@end deftypefn
+
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn})
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 193203)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -9790,6 +9790,14 @@ changed via the optimize attribute or pragma, see
 @code{TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE}
 @end deftypefn
 
+@hook TARGET_OPTION_FUNCTION_VERSIONS
+This target hook returns @code{true} if @var{DECL1} and @var{DECL2} are
+versions of the same function.  @var{DECL1} and @var{DECL2} are function
+versions if and only if they have the same function signature and
+different target specific attributes, that is, they are compiled for
+different target machines.
+@end deftypefn
+
 @hook TARGET_CAN_INLINE_P
 This target hook returns @code{false} if the @var{caller} function
 cannot inline @var{callee}, based on target specific information.  By
@@ -10798,6 +10806,29 @@ The result is another tree containing a simplified
 call's result.  If @var{ignore} is true the value will be ignored.
 @end deftypefn
 
+@hook TARGET_COMPARE_VERSION_PRIORITY
+This hook is used to compare the target attributes in two functions to
+determine which function's features get higher priority.  This is used
+during function multi-versioning to figure out the order in which two
+versions must be dispatched.  A function version with a higher priority
+is checked for dispatching earlier.  @var{decl1} and @var{decl2} are
+ the two function decls that will be compared.
+@end deftypefn
+
+@hook TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+This hook is used to get the dispatcher function for a set of function
+versions.  The dispatcher function is called to invoke the right function
+version at run-time. @var{decl} is one version from a set of semantically
+identical versions.
+@end deftypefn
+
+@hook TARGET_GENERATE_VERSION_DISPATCHER_BODY
+This hook is used to generate the dispatcher logic to invoke the right
+function version at run-time for a given set of function versions.
+@var{arg} points to the callgraph node of the dispatcher function whose
+body must be generated.
+@end deftypefn
+
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
 Take an instruction in @var{insn} and return NULL if it is valid within a
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 193203)
+++ gcc/target.def	(working copy)
@@ -1298,6 +1298,37 @@ DEFHOOK
  tree, (tree fndecl, int n_args, tree *argp, bool ignore),
  hook_tree_tree_int_treep_bool_null)
 
+/* Target hook is used to compare the target attributes in two functions to
+   determine which function's features get higher priority.  This is used
+   during function multi-versioning to figure out the order in which two
+   versions must be dispatched.  A function version with a higher priority
+   is checked for dispatching earlier.  DECL1 and DECL2 are
+   the two function decls that will be compared. It returns positive value
+   if DECL1 is higher priority,  negative value if DECL2 is higher priority
+   and 0 if they are the same. */
+DEFHOOK
+(compare_version_priority,
+ "",
+ int, (tree decl1, tree decl2), NULL)
+
+/*  Target hook is used to generate the dispatcher logic to invoke the right
+    function version at run-time for a given set of function versions.
+    ARG points to the callgraph node of the dispatcher function whose body
+    must be generated.  */
+DEFHOOK
+(generate_version_dispatcher_body,
+ "",
+ tree, (void *arg), NULL) 
+
+/* Target hook is used to get the dispatcher function for a set of function
+   versions.  The dispatcher function is called to invoke the right function
+   version at run-time.  DECL is one version from a set of semantically
+   identical versions.  */
+DEFHOOK
+(get_function_versions_dispatcher,
+ "",
+ tree, (void *decl), NULL)
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 DEFHOOK
@@ -2774,6 +2805,16 @@ DEFHOOK
  void, (void),
  hook_void_void)
 
+/* This function returns true if DECL1 and DECL2 are versions of the same
+   function.  DECL1 and DECL2 are function versions if and only if they
+   have the same function signature and different target specific attributes,
+   that is, they are compiled for different target machines.  */
+DEFHOOK
+(function_versions,
+ "",
+ bool, (tree decl1, tree decl2),
+ hook_bool_tree_tree_false)
+
 /* Function to determine if one function can inline another function.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_"
Index: gcc/cgraph.c
===================================================================
--- gcc/cgraph.c	(revision 193203)
+++ gcc/cgraph.c	(working copy)
@@ -132,6 +132,144 @@ static GTY(()) struct cgraph_edge *free_edges;
 /* Did procss_same_body_aliases run?  */
 bool same_body_aliases_done;
 
+/* Map a cgraph_node to cgraph_function_version_info using this htab.
+   The cgraph_function_version_info has a THIS_NODE field that is the
+   corresponding cgraph_node..  */
+
+static htab_t GTY((param_is (struct cgraph_function_version_info *)))
+  cgraph_fnver_htab = NULL;
+
+/* Hash function for cgraph_fnver_htab.  */
+static hashval_t
+cgraph_fnver_htab_hash (const void *ptr)
+{
+  int uid = ((const struct cgraph_function_version_info *)ptr)->this_node->uid;
+  return (hashval_t)(uid);
+}
+
+/* eq function for cgraph_fnver_htab.  */
+static int
+cgraph_fnver_htab_eq (const void *p1, const void *p2)
+{
+  const struct cgraph_function_version_info *n1
+    = (const struct cgraph_function_version_info *)p1;
+  const struct cgraph_function_version_info *n2
+    = (const struct cgraph_function_version_info *)p2;
+
+  return n1->this_node->uid == n2->this_node->uid;
+}
+
+/* Mark as GC root all allocated nodes.  */
+static GTY(()) struct cgraph_function_version_info *
+  version_info_node = NULL;
+
+/* Get the cgraph_function_version_info node corresponding to node.  */
+struct cgraph_function_version_info *
+get_cgraph_node_version (struct cgraph_node *node)
+{
+  struct cgraph_function_version_info *ret;
+  struct cgraph_function_version_info key;
+  key.this_node = node;
+
+  if (cgraph_fnver_htab == NULL)
+    return NULL;
+
+  ret = (struct cgraph_function_version_info *)
+    htab_find (cgraph_fnver_htab, &key);
+
+  return ret;
+}
+
+/* Insert a new cgraph_function_version_info node into cgraph_fnver_htab
+   corresponding to cgraph_node NODE.  */
+struct cgraph_function_version_info *
+insert_new_cgraph_node_version (struct cgraph_node *node)
+{
+  void **slot;
+  
+  version_info_node = NULL;
+  version_info_node = ggc_alloc_cleared_cgraph_function_version_info ();
+  version_info_node->this_node = node;
+
+  if (cgraph_fnver_htab == NULL)
+    cgraph_fnver_htab = htab_create_ggc (2, cgraph_fnver_htab_hash,
+				         cgraph_fnver_htab_eq, NULL);
+
+  slot = htab_find_slot (cgraph_fnver_htab, version_info_node, INSERT);
+  gcc_assert (slot != NULL);
+  *slot = version_info_node;
+  return version_info_node;
+}
+
+/* Remove the cgraph_function_version_info and cgraph_node for DECL.  This
+   DECL is a duplicate declaration.  */
+void
+delete_function_version (tree decl)
+{
+  struct cgraph_node *decl_node = cgraph_get_create_node (decl);
+  struct cgraph_function_version_info *decl_v = NULL;
+
+  if (decl_node == NULL)
+    return;
+
+  decl_v = get_cgraph_node_version (decl_node);
+
+  if (decl_v == NULL)
+    return;
+
+  if (decl_v->prev != NULL)
+   decl_v->prev->next = decl_v->next;
+
+  if (decl_v->next != NULL)
+    decl_v->next->prev = decl_v->prev;
+
+  if (cgraph_fnver_htab != NULL)
+    htab_remove_elt (cgraph_fnver_htab, decl_v);
+
+  cgraph_remove_node (decl_node);
+}
+
+/* Record that DECL1 and DECL2 are semantically identical function
+   versions.  */
+void
+record_function_versions (tree decl1, tree decl2)
+{
+  struct cgraph_node *decl1_node = cgraph_get_create_node (decl1);
+  struct cgraph_node *decl2_node = cgraph_get_create_node (decl2);
+  struct cgraph_function_version_info *decl1_v = NULL;
+  struct cgraph_function_version_info *decl2_v = NULL;
+  struct cgraph_function_version_info *before;
+  struct cgraph_function_version_info *after;
+
+  gcc_assert (decl1_node != NULL && decl2_node != NULL);
+  decl1_v = get_cgraph_node_version (decl1_node);
+  decl2_v = get_cgraph_node_version (decl2_node);
+
+  if (decl1_v != NULL && decl2_v != NULL)
+    return;
+
+  if (decl1_v == NULL)
+    decl1_v = insert_new_cgraph_node_version (decl1_node);
+
+  if (decl2_v == NULL)
+    decl2_v = insert_new_cgraph_node_version (decl2_node);
+
+  /* Chain decl2_v and decl1_v.  All semantically identical versions
+     will be chained together.  */
+
+  before = decl1_v;
+  after = decl2_v;
+
+  while (before->next != NULL)
+    before = before->next;
+
+  while (after->prev != NULL)
+    after= after->prev;
+
+  before->next = after;
+  after->prev = before;
+}
+
 /* Macros to access the next item in the list of free cgraph nodes and
    edges. */
 #define NEXT_FREE_NODE(NODE) cgraph ((NODE)->symbol.next)
Index: gcc/cgraph.h
===================================================================
--- gcc/cgraph.h	(revision 193203)
+++ gcc/cgraph.h	(working copy)
@@ -280,6 +280,8 @@ struct GTY(()) cgraph_node {
   /* ?? We should be able to remove this.  We have enough bits in
      cgraph to calculate it.  */
   unsigned tm_clone : 1;
+  /* True if this decl is a dispatcher for function versions.  */
+  unsigned dispatcher_function : 1;
 };
 
 DEF_VEC_P(symtab_node);
@@ -292,6 +294,47 @@ DEF_VEC_P(cgraph_node_ptr);
 DEF_VEC_ALLOC_P(cgraph_node_ptr,heap);
 DEF_VEC_ALLOC_P(cgraph_node_ptr,gc);
 
+/* Function Multiversioning info.  */
+struct GTY(()) cgraph_function_version_info {
+  /* The cgraph_node for which the function version info is stored.  */
+  struct cgraph_node *this_node;
+  /* Chains all the semantically identical function versions.  The
+     first function in this chain is the version_info node of the
+     default function.  */
+  struct cgraph_function_version_info *prev;
+  /* If this version node corresponds to a dispatcher for function
+     versions, this points to the version info node of the default
+     function, the first node in the chain.  */
+  struct cgraph_function_version_info *next;
+  /* If this node corresponds to a function version, this points
+     to the dispatcher function decl, which is the function that must
+     be called to execute the right function version at run-time.
+
+     If this cgraph node is a dispatcher (if dispatcher_function is
+     true, in the cgraph_node struct) for function versions, this
+     points to resolver function, which holds the function body of the
+     dispatcher. The dispatcher decl is an alias to the resolver
+     function decl.  */
+  tree dispatcher_resolver;
+};
+
+/* Get the cgraph_function_version_info node corresponding to node.  */
+struct cgraph_function_version_info *
+  get_cgraph_node_version (struct cgraph_node *node);
+
+/* Insert a new cgraph_function_version_info node into cgraph_fnver_htab
+   corresponding to cgraph_node NODE.  */
+struct cgraph_function_version_info *
+  insert_new_cgraph_node_version (struct cgraph_node *node);
+
+/* Record that DECL1 and DECL2 are semantically identical function
+   versions.  */
+void record_function_versions (tree decl1, tree decl2);
+
+/* Remove the cgraph_function_version_info and cgraph_node for DECL.  This
+   DECL is a duplicate declaration.  */
+void delete_function_version (tree decl);
+
 /* A cgraph node set is a collection of cgraph nodes.  A cgraph node
    can appear in multiple sets.  */
 struct cgraph_node_set_def
@@ -638,6 +681,9 @@ void init_cgraph (void);
 bool cgraph_process_new_functions (void);
 void cgraph_process_same_body_aliases (void);
 void fixup_same_cpp_alias_visibility (symtab_node node, symtab_node target, tree alias);
+/*  Initialize datastructures so DECL is a function in lowered gimple form.
+    IN_SSA is true if the gimple is in SSA.  */
+basic_block init_lowered_empty_function (tree decl, bool in_ssa);
 
 /* In cgraphclones.c  */
 
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 193203)
+++ gcc/tree.h	(working copy)
@@ -3480,6 +3480,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre
 #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \
    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
 
+/* In FUNCTION_DECL, this is set if this function has other versions generated
+   using "target" attributes.  The default version is the one which does not
+   have any "target" attribute set. */
+#define DECL_FUNCTION_VERSIONED(NODE)\
+   (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
+
 /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the
    arguments/result/saved_tree fields by front ends.   It was either inherit
    FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL,
@@ -3524,8 +3530,8 @@ struct GTY(()) tree_function_decl {
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
   unsigned tm_clone_flag : 1;
-
-  /* 1 bit left */
+  unsigned versioned_function : 1;
+  /* No bits left.  */
 };
 
 /* The source language of the translation-unit.  */
Index: gcc/cgraphunit.c
===================================================================
--- gcc/cgraphunit.c	(revision 193203)
+++ gcc/cgraphunit.c	(working copy)
@@ -630,6 +630,21 @@ cgraph_analyze_function (struct cgraph_node *node)
       cgraph_create_edge (node, cgraph_get_node (node->thunk.alias),
 			  NULL, 0, CGRAPH_FREQ_BASE);
     }
+  else if (node->dispatcher_function)
+    {
+      /* Generate the dispatcher body of multi-versioned functions.  */
+      struct cgraph_function_version_info *dispatcher_version_info
+	= get_cgraph_node_version (node);
+      if (dispatcher_version_info != NULL
+          && (dispatcher_version_info->dispatcher_resolver
+	      == NULL_TREE))
+	{
+	  tree resolver = NULL_TREE;
+	  gcc_assert (targetm.generate_version_dispatcher_body);
+	  resolver = targetm.generate_version_dispatcher_body (node);
+	  gcc_assert (resolver != NULL_TREE);
+	}
+    }
   else
     {
       push_cfun (DECL_STRUCT_FUNCTION (decl));
@@ -938,7 +953,8 @@ cgraph_analyze_functions (void)
 	      See gcc.c-torture/compile/20011119-1.c  */
 	      if (!DECL_STRUCT_FUNCTION (decl)
 		  && (!cnode->alias || !cnode->thunk.alias)
-		  && !cnode->thunk.thunk_p)
+		  && !cnode->thunk.thunk_p
+		  && !cnode->dispatcher_function)
 		{
 		  cgraph_reset_node (cnode);
 		  cnode->local.redefined_extern_inline = true;
@@ -1219,13 +1235,13 @@ mark_functions_to_output (void)
 }
 
 /* DECL is FUNCTION_DECL.  Initialize datastructures so DECL is a function
-   in lowered gimple form.
+   in lowered gimple form.  IN_SSA is true if the gimple is in SSA.
    
    Set current_function_decl and cfun to newly constructed empty function body.
    return basic block in the function body.  */
 
-static basic_block
-init_lowered_empty_function (tree decl)
+basic_block
+init_lowered_empty_function (tree decl, bool in_ssa)
 {
   basic_block bb;
 
@@ -1233,9 +1249,14 @@ mark_functions_to_output (void)
   allocate_struct_function (decl, false);
   gimple_register_cfg_hooks ();
   init_empty_tree_cfg ();
-  init_tree_ssa (cfun);
-  init_ssa_operands (cfun);
-  cfun->gimple_df->in_ssa_p = true;
+
+  if (in_ssa)
+    {
+      init_tree_ssa (cfun);
+      init_ssa_operands (cfun);
+      cfun->gimple_df->in_ssa_p = true;
+    }
+
   DECL_INITIAL (decl) = make_node (BLOCK);
 
   DECL_SAVED_TREE (decl) = error_mark_node;
@@ -1442,7 +1463,7 @@ assemble_thunk (struct cgraph_node *node)
       else
 	resdecl = DECL_RESULT (thunk_fndecl);
 
-      bb = then_bb = else_bb = return_bb = init_lowered_empty_function (thunk_fndecl);
+      bb = then_bb = else_bb = return_bb = init_lowered_empty_function (thunk_fndecl, true);
 
       bsi = gsi_start_bb (bb);
 
Index: gcc/cp/class.c
===================================================================
--- gcc/cp/class.c	(revision 193203)
+++ gcc/cp/class.c	(working copy)
@@ -1087,6 +1087,35 @@ add_method (tree type, tree method, tree using_dec
 	      || same_type_p (TREE_TYPE (fn_type),
 			      TREE_TYPE (method_type))))
 	{
+	  /* For function versions, their parms and types match
+	     but they are not duplicates.  Record function versions
+	     as and when they are found.  extern "C" functions are
+	     not treated as versions.  */
+	  if (TREE_CODE (fn) == FUNCTION_DECL
+	      && TREE_CODE (method) == FUNCTION_DECL
+	      && !DECL_EXTERN_C_P (fn)
+	      && !DECL_EXTERN_C_P (method)
+	      && (DECL_FUNCTION_SPECIFIC_TARGET (fn)
+		  || DECL_FUNCTION_SPECIFIC_TARGET (method))
+	      && targetm.target_option.function_versions (fn, method))
+ 	    {
+	      /* Mark functions as versions if necessary.  Modify the mangled
+		 decl name if necessary.  */
+	      if (!DECL_FUNCTION_VERSIONED (fn))
+		{
+		  DECL_FUNCTION_VERSIONED (fn) = 1;
+		  if (DECL_ASSEMBLER_NAME_SET_P (fn))
+		    mangle_decl (fn);
+		}
+	      if (!DECL_FUNCTION_VERSIONED (method))
+		{
+		  DECL_FUNCTION_VERSIONED (method) = 1;
+		  if (DECL_ASSEMBLER_NAME_SET_P (method))
+		    mangle_decl (method);
+		}
+	      record_function_versions (fn, method);
+	      continue;
+	    }
 	  if (DECL_INHERITED_CTOR_BASE (method))
 	    {
 	      if (DECL_INHERITED_CTOR_BASE (fn))
@@ -6951,6 +6980,38 @@ pop_lang_context (void)
 {
   current_lang_name = VEC_pop (tree, current_lang_base);
 }
+
+/* fn is a function version dispatcher that is marked used. Mark all the 
+   semantically identical function versions it will dispatch as used.  */
+
+static void
+mark_versions_used (tree fn)
+{
+  struct cgraph_node *node;
+  struct cgraph_function_version_info *node_v;
+  struct cgraph_function_version_info *it_v;
+
+  gcc_assert (TREE_CODE (fn) == FUNCTION_DECL);
+
+  node = cgraph_get_node (fn);
+  if (node == NULL)
+    return;
+
+  gcc_assert (node->dispatcher_function);
+
+  node_v = get_cgraph_node_version (node);
+  if (node_v == NULL)
+    return;
+
+  /* All semantically identical versions are chained.  Traverse and mark each
+     one of them as used.  */
+  it_v = node_v->next;
+  while (it_v != NULL)
+    {
+      mark_used (it_v->this_node->symbol.decl);
+      it_v = it_v->next;
+    }
+}
 \f
 /* Type instantiation routines.  */
 
@@ -7162,13 +7223,27 @@ resolve_address_of_overloaded_function (tree targe
     {
       /* There were too many matches.  First check if they're all
 	 the same function.  */
-      tree match;
+      tree match = NULL_TREE;
 
       fn = TREE_PURPOSE (matches);
-      for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
-	if (!decls_match (fn, TREE_PURPOSE (match)))
-	  break;
 
+      /* For multi-versioned functions, more than one match is just fine.
+	 Call decls_match to make sure they are different because they are
+	 versioned.  */
+      if (DECL_FUNCTION_VERSIONED (fn))
+	{
+          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+  	    if (!DECL_FUNCTION_VERSIONED (TREE_PURPOSE (match))
+	        || decls_match (fn, TREE_PURPOSE (match)))
+	      break;
+	}
+      else
+	{
+          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+  	    if (!decls_match (fn, TREE_PURPOSE (match)))
+	      break;
+	}
+
       if (match)
 	{
 	  if (flags & tf_error)
@@ -7208,6 +7283,28 @@ resolve_address_of_overloaded_function (tree targe
 	}
     }
 
+  /* If a pointer to a function that is multi-versioned is requested, the
+     pointer to the dispatcher function is returned instead.  This works
+     well because indirectly calling the function will dispatch the right
+     function version at run-time.  */
+  if (DECL_FUNCTION_VERSIONED (fn))
+    {
+      tree dispatcher_decl = NULL;
+      gcc_assert (targetm.get_function_versions_dispatcher);
+      dispatcher_decl = targetm.get_function_versions_dispatcher (fn);
+      if (!dispatcher_decl)
+	{
+	  error_at (input_location, "Pointer to a multiversioned function"
+		    " without a default is not allowed");
+	  return error_mark_node;
+	}
+      retrofit_lang_decl (dispatcher_decl);
+      fn = dispatcher_decl;
+      /* Mark all the versions corresponding to the dispatcher as used.  */
+      if (!(flags & tf_conv))
+	mark_versions_used (fn);
+    }
+
   /* If we're doing overload resolution purely for the purpose of
      determining conversion sequences, we should not consider the
      function used.  If this conversion sequence is selected, the
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	(revision 193203)
+++ gcc/cp/decl.c	(working copy)
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "pointer-set.h"
 #include "splay-tree.h"
 #include "plugin.h"
+#include "cgraph.h"
 
 /* Possible cases of bad specifiers type used by bad_specifiers. */
 enum bad_spec_place {
@@ -981,6 +982,36 @@ decls_match (tree newdecl, tree olddecl)
       if (t1 != t2)
 	return 0;
 
+      /* The decls dont match if they correspond to two different versions
+	 of the same function.   Disallow extern "C" functions to be
+	 versions for now.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2))
+	  && !DECL_EXTERN_C_P (newdecl)
+	  && !DECL_EXTERN_C_P (olddecl)
+	  && targetm.target_option.function_versions (newdecl, olddecl))
+	{
+	  /* Mark functions as versions if necessary.  Modify the mangled decl
+	     name if necessary.  */
+	  if (DECL_FUNCTION_VERSIONED (newdecl)
+	      && DECL_FUNCTION_VERSIONED (olddecl))
+	    return 0;
+	  if (!DECL_FUNCTION_VERSIONED (newdecl))
+	    {
+	      DECL_FUNCTION_VERSIONED (newdecl) = 1;
+	      if (DECL_ASSEMBLER_NAME_SET_P (newdecl))
+	        mangle_decl (newdecl);
+	    }
+	  if (!DECL_FUNCTION_VERSIONED (olddecl))
+	    {
+	      DECL_FUNCTION_VERSIONED (olddecl) = 1;
+	      if (DECL_ASSEMBLER_NAME_SET_P (olddecl))
+	       mangle_decl (olddecl);
+	    }
+	  record_function_versions (olddecl, newdecl);
+	  return 0;
+	}
+
       if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
 	  && ! (DECL_EXTERN_C_P (newdecl)
 		&& DECL_EXTERN_C_P (olddecl)))
@@ -1499,7 +1530,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool
 	      error ("previous declaration %q+#D here", olddecl);
 	      return NULL_TREE;
 	    }
-	  else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
+	  /* For function versions, params and types match, but they
+	     are not ambiguous.  */
+	  else if ((!DECL_FUNCTION_VERSIONED (newdecl)
+		    && !DECL_FUNCTION_VERSIONED (olddecl))
+		   && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
 			      TYPE_ARG_TYPES (TREE_TYPE (olddecl))))
 	    {
 	      error ("new declaration %q#D", newdecl);
@@ -2272,6 +2307,15 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
+  /* If the olddecl is a version, so is the newdecl.  */
+  if (TREE_CODE (newdecl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (olddecl))
+    {
+      DECL_FUNCTION_VERSIONED (newdecl) = 1;
+      /* newdecl will be purged and is no longer a version.  */
+      delete_function_version (newdecl);
+    }
+
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
     {
       int function_size;
Index: gcc/cp/decl2.c
===================================================================
--- gcc/cp/decl2.c	(revision 193203)
+++ gcc/cp/decl2.c	(working copy)
@@ -674,9 +674,13 @@ check_classfn (tree ctype, tree function, tree tem
 	  if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL))
 	    continue;
 
+	  /* While finding a match, same types and params are not enough
+	     if the function is versioned.  Also check version ("target")
+	     attributes.  */
 	  if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
 			   TREE_TYPE (TREE_TYPE (fndecl)))
 	      && compparms (p1, p2)
+	      && !targetm.target_option.function_versions (function, fndecl)
 	      && (!is_template
 		  || comp_template_parms (template_parms,
 					  DECL_TEMPLATE_PARMS (fndecl)))
Index: gcc/cp/call.c
===================================================================
--- gcc/cp/call.c	(revision 193203)
+++ gcc/cp/call.c	(working copy)
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-family/c-objc.h"
 #include "timevar.h"
+#include "cgraph.h"
 
 /* The various kinds of conversion.  */
 
@@ -6514,6 +6515,63 @@ magic_varargs_p (tree fn)
   return false;
 }
 
+/* Returns the decl of the dispatcher function if FN is a function version.  */
+
+static tree
+get_function_version_dispatcher (tree fn)
+{
+  tree dispatcher_decl = NULL;
+
+  gcc_assert (TREE_CODE (fn) == FUNCTION_DECL
+	      && DECL_FUNCTION_VERSIONED (fn));
+
+  gcc_assert (targetm.get_function_versions_dispatcher);
+  dispatcher_decl = targetm.get_function_versions_dispatcher (fn);
+
+  if (dispatcher_decl == NULL)
+    {
+      error_at (input_location, "Call to multiversioned function"
+                " without a default is not allowed");
+      return NULL;
+    }
+
+  retrofit_lang_decl (dispatcher_decl);
+  gcc_assert (dispatcher_decl != NULL);
+  return dispatcher_decl;
+}
+
+/* fn is a function version dispatcher that is marked used. Mark all the 
+   semantically identical function versions it will dispatch as used.  */
+
+static void
+mark_versions_used (tree fn)
+{
+  struct cgraph_node *node;
+  struct cgraph_function_version_info *node_v;
+  struct cgraph_function_version_info *it_v;
+
+  gcc_assert (TREE_CODE (fn) == FUNCTION_DECL);
+
+  node = cgraph_get_node (fn);
+  if (node == NULL)
+    return;
+
+  gcc_assert (node->dispatcher_function);
+
+  node_v = get_cgraph_node_version (node);
+  if (node_v == NULL)
+    return;
+
+  /* All semantically identical versions are chained.  Traverse and mark each
+     one of them as used.  */
+  it_v = node_v->next;
+  while (it_v != NULL)
+    {
+      mark_used (it_v->this_node->symbol.decl);
+      it_v = it_v->next;
+    }
+}
+
 /* Subroutine of the various build_*_call functions.  Overload resolution
    has chosen a winning candidate CAND; build up a CALL_EXPR accordingly.
    ARGS is a TREE_LIST of the unconverted arguments to the call.  FLAGS is a
@@ -6963,6 +7021,22 @@ build_over_call (struct z_candidate *cand, int fla
     return fold_convert (void_type_node, argarray[0]);
   /* FIXME handle trivial default constructor, too.  */
 
+  /* For calls to a multi-versioned function, overload resolution
+     returns the function with the highest target priority, that is,
+     the version that will checked for dispatching first.  If this
+     version is inlinable, a direct call to this version can be made
+     otherwise the call should go through the dispatcher.  */
+
+  if (DECL_FUNCTION_VERSIONED (fn)
+      && !targetm.target_option.can_inline_p (current_function_decl, fn))
+    {
+      fn = get_function_version_dispatcher (fn);
+      if (fn == NULL)
+	return NULL;
+      if (!already_used)
+	mark_versions_used (fn);
+    }
+
   if (!already_used)
     mark_used (fn);
 
@@ -8481,6 +8555,38 @@ joust (struct z_candidate *cand1, struct z_candida
 	}
     }
 
+  /* For candidates of a multi-versioned function,  make the version with
+     the highest priority win.  This version will be checked for dispatching
+     first.  If this version can be inlined into the caller, the front-end
+     will simply make a direct call to this function.  */
+
+  if (TREE_CODE (cand1->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (cand1->fn)
+      && TREE_CODE (cand2->fn) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (cand2->fn))
+    {
+      tree f1 = TREE_TYPE (cand1->fn);
+      tree f2 = TREE_TYPE (cand2->fn);
+      tree p1 = TYPE_ARG_TYPES (f1);
+      tree p2 = TYPE_ARG_TYPES (f2);
+     
+      /* Check if cand1->fn and cand2->fn are versions of the same function.  It
+         is possible that cand1->fn and cand2->fn are function versions but of
+         different functions.  Check types to see if they are versions of the same
+         function.  */
+      if (compparms (p1, p2)
+	  && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)))
+	{
+	  /* Always make the version with the higher priority, more
+	     specialized, win.  */
+	  gcc_assert (targetm.compare_version_priority);
+	  if (targetm.compare_version_priority (cand1->fn, cand2->fn) >= 0)
+	    return 1;
+	  else
+	    return -1;
+	}
+    }
+
   /* If the two function declarations represent the same function (this can
      happen with declarations in multiple scopes and arg-dependent lookup),
      arbitrarily choose one.  But first make sure the default args we're
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 193203)
+++ gcc/config/i386/i386.c	(working copy)
@@ -62,6 +62,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "diagnostic.h"
 #include "dumpfile.h"
+#include "tree-pass.h"
+#include "tree-flow.h"
 
 enum upper_128bits_state
 {
@@ -28463,6 +28465,967 @@ ix86_init_mmx_sse_builtins (void)
     }
 }
 
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
+   to return a pointer to VERSION_DECL if the outcome of the expression
+   formed by PREDICATE_CHAIN is true.  This function will be called during
+   version dispatch to decide which function version to execute.  It returns
+   the basic block at the end, to which more conditions can be added.  */
+
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     tree predicate_chain, basic_block new_bb)
+{
+  gimple return_stmt;
+  tree convert_expr, result_var;
+  gimple convert_stmt;
+  gimple call_cond_stmt;
+  gimple if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+
+  tree cond_var, and_expr_var = NULL_TREE;
+  gimple_seq gseq;
+
+  tree predicate_decl, predicate_arg;
+
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+	     		 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node, NULL);
+  convert_stmt = gimple_build_assign (result_var, convert_expr); 
+  return_stmt = gimple_build_return (result_var);
+
+  if (predicate_chain == NULL_TREE)
+    {
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      return new_bb;
+    }
+
+  while (predicate_chain != NULL)
+    {
+      cond_var = create_tmp_var (integer_type_node, NULL);
+      predicate_decl = TREE_PURPOSE (predicate_chain);
+      predicate_arg = TREE_VALUE (predicate_chain);
+      call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
+      gimple_call_set_lhs (call_cond_stmt, cond_var);
+
+      gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl));
+      gimple_set_bb (call_cond_stmt, new_bb);
+      gimple_seq_add_stmt (&gseq, call_cond_stmt);
+
+      predicate_chain = TREE_CHAIN (predicate_chain);
+      
+      if (and_expr_var == NULL)
+        and_expr_var = cond_var;
+      else
+	{
+	  gimple assign_stmt;
+	  /* Use MIN_EXPR to check if any integer is zero?.
+	     and_expr_var = min_expr <cond_var, and_expr_var>  */
+	  assign_stmt = gimple_build_assign (and_expr_var,
+			  build2 (MIN_EXPR, integer_type_node,
+				  cond_var, and_expr_var));
+
+	  gimple_set_block (assign_stmt, DECL_INITIAL (function_decl));
+	  gimple_set_bb (assign_stmt, new_bb);
+	  gimple_seq_add_stmt (&gseq, assign_stmt);
+	}
+    }
+
+  if_else_stmt = gimple_build_cond (GT_EXPR, and_expr_var,
+	  		            integer_zero_node,
+				    NULL_TREE, NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE); 
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR, 0);
+
+  pop_cfun ();
+
+  return bb3;
+}
+
+/* This parses the attribute arguments to target in DECL and determines
+   the right builtin to use to match the platform specification.
+   It returns the priority value for this version decl.  If PREDICATE_LIST
+   is not NULL, it stores the list of cpu features that need to be checked
+   before dispatching this function.  */
+
+static unsigned int
+get_builtin_code_for_version (tree decl, tree *predicate_list)
+{
+  tree attrs;
+  struct cl_target_option cur_target;
+  tree target_node;
+  struct cl_target_option *new_target;
+  const char *arg_str = NULL;
+  const char *attrs_str = NULL;
+  char *tok_str = NULL;
+  char *token;
+
+  /* Priority of i386 features, greater value is higher priority.   This is
+     used to decide the order in which function dispatch must happen.  For
+     instance, a version specialized for SSE4.2 should be checked for dispatch
+     before a version for SSE3, as SSE4.2 implies SSE3.  */
+  enum feature_priority
+  {
+    P_ZERO = 0,
+    P_MMX,
+    P_SSE,
+    P_SSE2,
+    P_SSE3,
+    P_SSSE3,
+    P_PROC_SSSE3,
+    P_SSE4_a,
+    P_PROC_SSE4_a,
+    P_SSE4_1,
+    P_SSE4_2,
+    P_PROC_SSE4_2,
+    P_POPCNT,
+    P_AVX,
+    P_AVX2,
+    P_FMA,
+    P_PROC_FMA
+  };
+
+ enum feature_priority priority = P_ZERO;
+
+  /* These are the target attribute strings for which a dispatcher is
+     available, from fold_builtin_cpu.  */
+
+  static struct _feature_list
+    {
+      const char *const name;
+      const enum feature_priority priority;
+    }
+  const feature_list[] =
+    {
+      {"mmx", P_MMX},
+      {"sse", P_SSE},
+      {"sse2", P_SSE2},
+      {"sse3", P_SSE3},
+      {"ssse3", P_SSSE3},
+      {"sse4.1", P_SSE4_1},
+      {"sse4.2", P_SSE4_2},
+      {"popcnt", P_POPCNT},
+      {"avx", P_AVX},
+      {"avx2", P_AVX2}
+    };
+
+
+  static unsigned int NUM_FEATURES
+    = sizeof (feature_list) / sizeof (struct _feature_list);
+
+  unsigned int i;
+
+  tree predicate_chain = NULL_TREE;
+  tree predicate_decl, predicate_arg;
+
+  attrs = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  gcc_assert (attrs != NULL);
+
+  attrs = TREE_VALUE (TREE_VALUE (attrs));
+
+  gcc_assert (TREE_CODE (attrs) == STRING_CST);
+  attrs_str = TREE_STRING_POINTER (attrs);
+
+
+  /* Handle arch= if specified.  For priority, set it to be 1 more than
+     the best instruction set the processor can handle.  For instance, if
+     there is a version for atom and a version for ssse3 (the highest ISA
+     priority for atom), the atom version must be checked for dispatch
+     before the ssse3 version. */
+  if (strstr (attrs_str, "arch=") != NULL)
+    {
+      cl_target_option_save (&cur_target, &global_options);
+      target_node = ix86_valid_target_attribute_tree (attrs);
+    
+      gcc_assert (target_node);
+      new_target = TREE_TARGET_OPTION (target_node);
+      gcc_assert (new_target);
+      
+      if (new_target->arch_specified && new_target->arch > 0)
+	{
+	  switch (new_target->arch)
+	    {
+	    case PROCESSOR_CORE2_32:
+	    case PROCESSOR_CORE2_64:
+	      arg_str = "core2";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_COREI7_32:
+	    case PROCESSOR_COREI7_64:
+	      arg_str = "corei7";
+	      priority = P_PROC_SSE4_2;
+	      break;
+	    case PROCESSOR_ATOM:
+	      arg_str = "atom";
+	      priority = P_PROC_SSSE3;
+	      break;
+	    case PROCESSOR_AMDFAM10:
+	      arg_str = "amdfam10h";
+	      priority = P_PROC_SSE4_a;
+	      break;
+	    case PROCESSOR_BDVER1:
+	      arg_str = "bdver1";
+	      priority = P_PROC_FMA;
+	      break;
+	    case PROCESSOR_BDVER2:
+	      arg_str = "bdver2";
+	      priority = P_PROC_FMA;
+	      break;
+	    }  
+	}    
+    
+      cl_target_option_restore (&global_options, &cur_target);
+	
+      if (predicate_list && arg_str == NULL)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+	    	"No dispatcher found for the versioning attributes");
+	  return 0;
+	}
+    
+      if (predicate_list)
+	{
+          predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS];
+          /* For a C string literal the length includes the trailing NULL.  */
+          predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
+          predicate_chain = tree_cons (predicate_decl, predicate_arg,
+				       predicate_chain);
+	}
+    }
+
+  /* Process feature name.  */
+  tok_str =  (char *) xmalloc (strlen (attrs_str) + 1);
+  strcpy (tok_str, attrs_str);
+  token = strtok (tok_str, ",");
+  predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_SUPPORTS];
+
+  while (token != NULL)
+    {
+      /* Do not process "arch="  */
+      if (strncmp (token, "arch=", 5) == 0)
+	{
+	  token = strtok (NULL, ",");
+	  continue;
+	}
+      for (i = 0; i < NUM_FEATURES; ++i)
+	{
+	  if (strcmp (token, feature_list[i].name) == 0)
+	    {
+	      if (predicate_list)
+		{
+		  predicate_arg = build_string_literal (
+				  strlen (feature_list[i].name) + 1,
+				  feature_list[i].name);
+		  predicate_chain = tree_cons (predicate_decl, predicate_arg,
+					       predicate_chain);
+		}
+	      /* Find the maximum priority feature.  */
+	      if (feature_list[i].priority > priority)
+		priority = feature_list[i].priority;
+
+	      break;
+	    }
+	}
+      if (predicate_list && i == NUM_FEATURES)
+	{
+	  error_at (DECL_SOURCE_LOCATION (decl),
+		    "No dispatcher found for %s", token);
+	  return 0;
+	}
+      token = strtok (NULL, ",");
+    }
+  free (tok_str);
+
+  if (predicate_list && predicate_chain == NULL_TREE)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+	        "No dispatcher found for the versioning attributes : %s",
+	        attrs_str);
+      return 0;
+    }
+  else if (predicate_list)
+    {
+      predicate_chain = nreverse (predicate_chain);
+      *predicate_list = predicate_chain;
+    }
+
+  return priority; 
+}
+
+/* This compares the priority of target features in function DECL1
+   and DECL2.  It returns positive value if DECL1 is higher priority,
+   negative value if DECL2 is higher priority and 0 if they are the
+   same.  */
+
+static int
+ix86_compare_version_priority (tree decl1, tree decl2)
+{
+  unsigned int priority1 = 0;
+  unsigned int priority2 = 0;
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl1)) != NULL)
+    priority1 = get_builtin_code_for_version (decl1, NULL);
+
+  if (lookup_attribute ("target", DECL_ATTRIBUTES (decl2)) != NULL)
+    priority2 = get_builtin_code_for_version (decl2, NULL);
+
+  return (int)priority1 - (int)priority2;
+}
+
+/* V1 and V2 point to function versions with different priorities
+   based on the target ISA.  This function compares their priorities.  */
+ 
+static int
+feature_compare (const void *v1, const void *v2)
+{
+  typedef struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    } function_version_info;
+
+  const function_version_info c1 = *(const function_version_info *)v1;
+  const function_version_info c2 = *(const function_version_info *)v2;
+  return (c2.dispatch_priority - c1.dispatch_priority);
+}
+
+/* This function generates the dispatch function for
+   multi-versioned functions.  DISPATCH_DECL is the function which will
+   contain the dispatch logic.  FNDECLS are the function choices for
+   dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer
+   in DISPATCH_DECL in which the dispatch code is generated.  */
+
+static int
+dispatch_function_versions (tree dispatch_decl,
+			    void *fndecls_p,
+			    basic_block *empty_bb)
+{
+  tree default_decl;
+  gimple ifunc_cpu_init_stmt;
+  gimple_seq gseq;
+  int ix;
+  tree ele;
+  VEC (tree, heap) *fndecls;
+  unsigned int num_versions = 0;
+  unsigned int actual_versions = 0;
+  unsigned int i;
+
+  struct _function_version_info
+    {
+      tree version_decl;
+      tree predicate_chain;
+      unsigned int dispatch_priority;
+    }*function_version_info;
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /*fndecls_p is actually a vector.  */
+  fndecls = (VEC (tree, heap) *)fndecls_p;
+
+  /* At least one more version other than the default.  */
+  num_versions = VEC_length (tree, fndecls);
+  gcc_assert (num_versions >= 2);
+
+  function_version_info = (struct _function_version_info *)
+    XNEWVEC (struct _function_version_info, (num_versions - 1));
+
+  /* The first version in the vector is the default decl.  */
+  default_decl = VEC_index (tree, fndecls, 0);
+
+  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
+
+  gseq = bb_seq (*empty_bb);
+  /* Function version dispatch is via IFUNC.  IFUNC resolvers fire before
+     constructors, so explicity call __builtin_cpu_init here.  */
+  ifunc_cpu_init_stmt = gimple_build_call_vec (
+                     ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL);
+  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
+  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
+  set_bb_seq (*empty_bb, gseq);
+
+  pop_cfun ();
+
+
+  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix)
+    {
+      tree version_decl = ele;
+      tree predicate_chain = NULL_TREE;
+      unsigned int priority;
+      /* Get attribute string, parse it and find the right predicate decl.
+         The predicate function could be a lengthy combination of many
+	 features, like arch-type and various isa-variants.  */
+      priority = get_builtin_code_for_version (version_decl,
+	 			               &predicate_chain);
+
+      if (predicate_chain == NULL_TREE)
+	continue;
+
+      actual_versions++;
+      function_version_info [ix - 1].version_decl = version_decl;
+      function_version_info [ix - 1].predicate_chain = predicate_chain;
+      function_version_info [ix - 1].dispatch_priority = priority;
+    }
+
+  /* Sort the versions according to descending order of dispatch priority.  The
+     priority is based on the ISA.  This is not a perfect solution.  There
+     could still be ambiguity.  If more than one function version is suitable
+     to execute,  which one should be dispatched?  In future, allow the user
+     to specify a dispatch  priority next to the version.  */
+  qsort (function_version_info, actual_versions,
+         sizeof (struct _function_version_info), feature_compare);
+
+  for  (i = 0; i < actual_versions; ++i)
+    *empty_bb = add_condition_to_bb (dispatch_decl,
+				     function_version_info[i].version_decl,
+				     function_version_info[i].predicate_chain,
+				     *empty_bb);
+
+  /* dispatch default version at the end.  */
+  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl,
+				   NULL, *empty_bb);
+
+  free (function_version_info);
+  return 0;
+}
+
+/* This function returns true if FN1 and FN2 are versions of the same function,
+   that is, the targets of the function decls are different.  This assumes
+   that FN1 and FN2 have the same signature.  */
+
+static bool
+ix86_function_versions (tree fn1, tree fn2)
+{
+  tree attr1, attr2;
+  struct cl_target_option *target1, *target2;
+
+  if (TREE_CODE (fn1) != FUNCTION_DECL
+      || TREE_CODE (fn2) != FUNCTION_DECL)
+    return false;
+
+  attr1 = DECL_FUNCTION_SPECIFIC_TARGET (fn1);
+  attr2 = DECL_FUNCTION_SPECIFIC_TARGET (fn2);
+
+  /* Atleast one function decl should have target attribute specified.  */
+  if (attr1 == NULL_TREE && attr2 == NULL_TREE)
+    return false;
+
+  if (attr1 == NULL_TREE)
+    attr1 = target_option_default_node;
+  else if (attr2 == NULL_TREE)
+    attr2 = target_option_default_node;
+
+  target1 = TREE_TARGET_OPTION (attr1);
+  target2 = TREE_TARGET_OPTION (attr2);
+
+  /* target1 and target2 must be different in some way.  */
+  if (target1->x_ix86_isa_flags == target2->x_ix86_isa_flags
+      && target1->x_target_flags == target2->x_target_flags
+      && target1->arch == target2->arch
+      && target1->tune == target2->tune
+      && target1->x_ix86_fpmath == target2->x_ix86_fpmath
+      && target1->branch_cost == target2->branch_cost)
+    return false;
+
+  return true;
+}
+
+/* Comparator function to be used in qsort routine to sort attribute
+   specification strings to "target".  */
+
+static int
+attr_strcmp (const void *v1, const void *v2)
+{
+  const char *c1 = *(char *const*)v1;
+  const char *c2 = *(char *const*)v2;
+  return strcmp (c1, c2);
+}
+
+/* STR is the argument to target attribute.  This function tokenizes
+   the comma separated arguments, sorts them and returns a string which
+   is a unique identifier for the comma separated arguments.   It also
+   replaces non-identifier characters "=,-" with "_".  */
+
+static char *
+sorted_attr_string (const char *str)
+{
+  char **args = NULL;
+  char *attr_str, *ret_str;
+  char *attr = NULL;
+  unsigned int argnum = 1;
+  unsigned int i;
+
+  for (i = 0; i < strlen (str); i++)
+    if (str[i] == ',')
+      argnum++;
+
+  attr_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (attr_str, str);
+
+  /* Replace "=,-" with "_".  */
+  for (i = 0; i < strlen (attr_str); i++)
+    if (attr_str[i] == '=' || attr_str[i]== '-')
+      attr_str[i] = '_';
+
+  if (argnum == 1)
+    return attr_str;
+
+  args = XNEWVEC (char *, argnum);
+
+  i = 0;
+  attr = strtok (attr_str, ",");
+  while (attr != NULL)
+    {
+      args[i] = attr;
+      i++;
+      attr = strtok (NULL, ",");
+    }
+
+  qsort (args, argnum, sizeof (char*), attr_strcmp);
+
+  ret_str = (char *)xmalloc (strlen (str) + 1);
+  strcpy (ret_str, args[0]);
+  for (i = 1; i < argnum; i++)
+    {
+      strcat (ret_str, "_");
+      strcat (ret_str, args[i]);
+    }
+
+  free (args);
+  free (attr_str);
+  return ret_str;
+}
+
+/* This function changes the assembler name for functions that are
+   versions.  If DECL is a function version and has a "target"
+   attribute, it appends the attribute string to its assembler name.  */
+
+static tree
+ix86_mangle_function_version_assembler_name (tree decl, tree id)
+{
+  tree version_attr;
+  const char *orig_name, *version_string, *attr_str;
+  char *assembler_name;
+
+  if (DECL_DECLARED_INLINE_P (decl)
+      && lookup_attribute ("gnu_inline",
+			   DECL_ATTRIBUTES (decl)))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Function versions cannot be marked as gnu_inline,"
+	      " bodies have to be generated");
+
+  if (DECL_VIRTUAL_P (decl)
+      || DECL_VINDEX (decl))
+    error_at (DECL_SOURCE_LOCATION (decl),
+	      "Virtual function versioning not supported\n");
+
+  version_attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+
+  /* target attribute string is NULL for default functions.  */
+  if (version_attr == NULL_TREE)
+    return id;
+
+  orig_name = IDENTIFIER_POINTER (id);
+  version_string
+    = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr)));
+
+  attr_str = sorted_attr_string (version_string);
+  assembler_name = (char *) xmalloc (strlen (orig_name)
+				     + strlen (attr_str) + 2);
+
+  sprintf (assembler_name, "%s.%s", orig_name, attr_str);
+
+  /* Allow assembler name to be modified if already set.  */
+  if (DECL_ASSEMBLER_NAME_SET_P (decl))
+    SET_DECL_RTL (decl, NULL);
+
+  return get_identifier (assembler_name);
+}
+
+static tree 
+ix86_mangle_decl_assembler_name (tree decl, tree id)
+{
+  /* For function version, add the target suffix to the assembler name.  */
+  if (TREE_CODE (decl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (decl))
+    return ix86_mangle_function_version_assembler_name (decl, id);
+
+  return id;
+}
+
+/* Return a new name by appending SUFFIX to the DECL name.  If make_unique
+   is true, append the full path name of the source file.  */
+
+static char *
+make_name (tree decl, const char *suffix, bool make_unique)
+{
+  char *global_var_name;
+  int name_len;
+  const char *name;
+  const char *unique_name = NULL;
+
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* Get a unique name that can be used globally without any chances
+     of collision at link time.  */
+  if (make_unique)
+    unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0"));
+
+  name_len = strlen (name) + strlen (suffix) + 2;
+
+  if (make_unique)
+    name_len += strlen (unique_name) + 1;
+  global_var_name = XNEWVEC (char, name_len);
+
+  /* Use '.' to concatenate names as it is demangler friendly.  */
+  if (make_unique)
+      snprintf (global_var_name, name_len, "%s.%s.%s", name,
+		unique_name, suffix);
+  else
+      snprintf (global_var_name, name_len, "%s.%s", name, suffix);
+
+  return global_var_name;
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher
+   by the front-end.  Return the decl created.  */
+
+static tree
+make_dispatcher_decl (const tree decl)
+{
+  tree func_decl;
+  char *func_name, *resolver_name;
+  tree fn_type, func_type;
+  bool is_uniq = false;
+
+  if (TREE_PUBLIC (decl) == 0)
+    is_uniq = true;
+
+  func_name = make_name (decl, "ifunc", is_uniq);
+  resolver_name = make_name (decl, "resolver", is_uniq);
+  gcc_assert (resolver_name);
+
+  fn_type = TREE_TYPE (decl);
+  func_type = build_function_type (TREE_TYPE (fn_type),
+				   TYPE_ARG_TYPES (fn_type));
+  
+  func_decl = build_fn_decl (func_name, func_type);
+  TREE_USED (func_decl) = 1;
+  DECL_CONTEXT (func_decl) = NULL_TREE;
+  DECL_INITIAL (func_decl) = error_mark_node;
+  DECL_ARTIFICIAL (func_decl) = 1;
+  /* Mark this func as external, the resolver will flip it again if
+     it gets generated.  */
+  DECL_EXTERNAL (func_decl) = 1;
+  /* This will be of type IFUNCs have to be externally visible.  */
+  TREE_PUBLIC (func_decl) = 1;
+
+  return func_decl;  
+}
+
+/* Returns true if decl is multi-versioned and DECL is the default function,
+   that is it is not tagged with target specific optimization.  */
+
+static bool
+is_function_default_version (const tree decl)
+{
+  return (TREE_CODE (decl) == FUNCTION_DECL
+	  && DECL_FUNCTION_VERSIONED (decl)
+	  && DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL_TREE);
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher
+   by the front-end.  Returns the decl of the dispatcher function.  */
+
+static tree
+ix86_get_function_versions_dispatcher (void *decl)
+{
+  tree fn = (tree) decl;
+  struct cgraph_node *node = NULL;
+  struct cgraph_node *default_node = NULL;
+  struct cgraph_function_version_info *node_v = NULL;
+  struct cgraph_function_version_info *it_v = NULL;
+  struct cgraph_function_version_info *first_v = NULL;
+
+  tree dispatch_decl = NULL;
+  struct cgraph_node *dispatcher_node = NULL;
+  struct cgraph_function_version_info *dispatcher_version_info = NULL;
+
+  struct cgraph_function_version_info *default_version_info = NULL;
+ 
+  gcc_assert (fn != NULL && DECL_FUNCTION_VERSIONED (fn));
+
+  node = cgraph_get_node (fn);
+  gcc_assert (node != NULL);
+
+  node_v = get_cgraph_node_version (node);
+  gcc_assert (node_v != NULL);
+ 
+  if (node_v->dispatcher_resolver != NULL)
+    return node_v->dispatcher_resolver;
+
+  /* Find the default version and make it the first node.  */
+  first_v = node_v;
+  /* Go to the beginnig of the chain.  */
+  while (first_v->prev != NULL)
+    first_v = first_v->prev;
+  default_version_info = first_v;
+  while (default_version_info != NULL)
+    {
+      if (is_function_default_version
+	    (default_version_info->this_node->symbol.decl))
+        break;
+      default_version_info = default_version_info->next;
+    }
+
+  /* If there is no default node, just return NULL.  */
+  if (default_version_info == NULL)
+    return NULL;
+
+  /* Make default info the first node.  */
+  if (first_v != default_version_info)
+    {
+      default_version_info->prev->next = default_version_info->next;
+      if (default_version_info->next)
+        default_version_info->next->prev = default_version_info->prev;
+      first_v->prev = default_version_info;
+      default_version_info->next = first_v;
+      default_version_info->prev = NULL;
+    }
+
+  default_node = default_version_info->this_node;
+
+#if defined (ASM_OUTPUT_TYPE_DIRECTIVE) && HAVE_GNU_INDIRECT_FUNCTION
+  /* Right now, the dispatching is done via ifunc.  */
+  dispatch_decl = make_dispatcher_decl (default_node->symbol.decl); 
+#else
+  error_at (DECL_SOURCE_LOCATION (default_node->symbol.decl),
+	    "Multiversioning needs ifunc which is not supported "
+	    "in this configuration");
+#endif
+
+  dispatcher_node = cgraph_get_create_node (dispatch_decl);
+  gcc_assert (dispatcher_node != NULL);
+  dispatcher_node->dispatcher_function = 1;
+  dispatcher_version_info
+    = insert_new_cgraph_node_version (dispatcher_node);
+  dispatcher_version_info->next = default_version_info;
+  dispatcher_node->local.finalized = 1;
+ 
+  /* Set the dispatcher for all the versions.  */ 
+  it_v = default_version_info;
+  while (it_v->next != NULL)
+    {
+      it_v->dispatcher_resolver = dispatch_decl;
+      it_v = it_v->next;
+    }
+
+  return dispatch_decl;
+}
+
+/* Makes a function attribute of the form NAME(ARG_NAME) and chains
+   it to CHAIN.  */
+
+static tree
+make_attribute (const char *name, const char *arg_name, tree chain)
+{
+  tree attr_name;
+  tree attr_arg_name;
+  tree attr_args;
+  tree attr;
+
+  attr_name = get_identifier (name);
+  attr_arg_name = build_string (strlen (arg_name), arg_name);
+  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  attr = tree_cons (attr_name, attr_args, chain);
+  return attr;
+}
+
+/* Make the resolver function decl to dispatch the versions of
+   a multi-versioned function,  DEFAULT_DECL.  Create an
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_resolver_func (const tree default_decl,
+		    const tree dispatch_decl,
+		    basic_block *empty_bb)
+{
+  char *resolver_name;
+  tree decl, type, decl_name, t;
+  bool is_uniq = false;
+
+  /* IFUNC's have to be globally visible.  So, if the default_decl is
+     not, then the name of the IFUNC should be made unique.  */
+  if (TREE_PUBLIC (default_decl) == 0)
+    is_uniq = true;
+
+  /* Append the filename to the resolver function if the versions are
+     not externally visible.  This is because the resolver function has
+     to be externally visible for the loader to find it.  So, appending
+     the filename will prevent conflicts with a resolver function from
+     another module which is based on the same version name.  */
+  resolver_name = make_name (default_decl, "resolver", is_uniq);
+
+  /* The resolver function should return a (void *). */
+  type = build_function_type_list (ptr_type_node, NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  decl_name = get_identifier (resolver_name);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  /* IFUNC resolvers have to be externally visible.  */
+  TREE_PUBLIC (decl) = 1;
+  DECL_UNINLINABLE (decl) = 0;
+
+  /* Resolver is not external, body is generated.  */
+  DECL_EXTERNAL (decl) = 0;
+  DECL_EXTERNAL (dispatch_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+
+  if (DECL_COMDAT_GROUP (default_decl)
+      || TREE_PUBLIC (default_decl))
+    {
+      /* In this case, each translation unit with a call to this
+	 versioned function will put out a resolver.  Ensure it
+	 is comdat to keep just one copy.  */
+      DECL_COMDAT (decl) = 1;
+      make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+    }
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  gimplify_function_tree (decl);
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  *empty_bb = init_lowered_empty_function (decl, false);
+
+  cgraph_add_new_function (decl, true);
+  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl));
+
+  pop_cfun ();
+
+  gcc_assert (dispatch_decl != NULL);
+  /* Mark dispatch_decl as "ifunc" with resolver as resolver_name.  */
+  DECL_ATTRIBUTES (dispatch_decl) 
+    = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (dispatch_decl));
+
+  /* Create the alias for dispatch to resolver here.  */
+  /*cgraph_create_function_alias (dispatch_decl, decl);*/
+  cgraph_same_body_alias (NULL, dispatch_decl, decl);
+  return decl;
+}
+
+/* Generate the dispatching code body to dispatch multi-versioned function
+   DECL.  The target hook is called to process the "target" attributes and
+   provide the code to dispatch the right function at run-time.  NODE points
+   to the dispatcher decl whose body will be created.  */
+
+static tree 
+ix86_generate_version_dispatcher_body (void *node_p)
+{
+  tree resolver_decl;
+  basic_block empty_bb;
+  VEC (tree, heap) *fn_ver_vec = NULL;
+  tree default_ver_decl;
+  struct cgraph_node *versn;
+  struct cgraph_node *node;
+
+  struct cgraph_function_version_info *node_version_info = NULL;
+  struct cgraph_function_version_info *versn_info = NULL;
+
+  node = (cgraph_node *)node_p;
+
+  node_version_info = get_cgraph_node_version (node);
+  gcc_assert (node->dispatcher_function
+	      && node_version_info != NULL);
+
+  if (node_version_info->dispatcher_resolver)
+    return node_version_info->dispatcher_resolver;
+
+  /* The first version in the chain corresponds to the default version.  */
+  default_ver_decl = node_version_info->next->this_node->symbol.decl;
+
+  /* node is going to be an alias, so remove the finalized bit.  */
+  node->local.finalized = false;
+
+  resolver_decl = make_resolver_func (default_ver_decl,
+				      node->symbol.decl, &empty_bb);
+
+  node_version_info->dispatcher_resolver = resolver_decl;
+
+  push_cfun (DECL_STRUCT_FUNCTION (resolver_decl));
+
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+
+  for (versn_info = node_version_info->next; versn_info;
+       versn_info = versn_info->next)
+    {
+      versn = versn_info->this_node;
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (versn->symbol.decl))
+        error_at (DECL_SOURCE_LOCATION (versn->symbol.decl),
+		  "Virtual function multiversioning not supported");
+      VEC_safe_push (tree, heap, fn_ver_vec, versn->symbol.decl);
+    }
+
+  dispatch_function_versions (resolver_decl, fn_ver_vec, &empty_bb);
+
+  rebuild_cgraph_edges (); 
+  pop_cfun ();
+  return resolver_decl;
+}
 /* This builds the processor_model struct type defined in
    libgcc/config/i386/cpuinfo.c  */
 
@@ -28651,6 +29614,8 @@ fold_builtin_cpu (tree fndecl, tree *args)
     {
       tree ref;
       tree field;
+      tree final;
+
       unsigned int field_val = 0;
       unsigned int NUM_ARCH_NAMES
 	= sizeof (arch_names_table) / sizeof (struct _arch_names_table);
@@ -28690,14 +29655,17 @@ fold_builtin_cpu (tree fndecl, tree *args)
 		     field, NULL_TREE);
 
       /* Check the value.  */
-      return build2 (EQ_EXPR, unsigned_type_node, ref,
-		     build_int_cstu (unsigned_type_node, field_val));
+      final = build2 (EQ_EXPR, unsigned_type_node, ref,
+		      build_int_cstu (unsigned_type_node, field_val));
+      return build1 (CONVERT_EXPR, integer_type_node, final);
     }
   else if (fn_code == IX86_BUILTIN_CPU_SUPPORTS)
     {
       tree ref;
       tree array_elt;
       tree field;
+      tree final;
+
       unsigned int field_val = 0;
       unsigned int NUM_ISA_NAMES
 	= sizeof (isa_names_table) / sizeof (struct _isa_names_table);
@@ -28729,8 +29697,9 @@ fold_builtin_cpu (tree fndecl, tree *args)
 
       field_val = (1 << isa_names_table[i].feature);
       /* Return __cpu_model.__cpu_features[0] & field_val  */
-      return build2 (BIT_AND_EXPR, unsigned_type_node, array_elt,
-		     build_int_cstu (unsigned_type_node, field_val));
+      final = build2 (BIT_AND_EXPR, unsigned_type_node, array_elt,
+		      build_int_cstu (unsigned_type_node, field_val));
+      return build1 (CONVERT_EXPR, integer_type_node, final);
     }
   gcc_unreachable ();
 }
@@ -41218,6 +42187,9 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_PROFILE_BEFORE_PROLOGUE
 #define TARGET_PROFILE_BEFORE_PROLOGUE ix86_profile_before_prologue
 
+#undef TARGET_MANGLE_DECL_ASSEMBLER_NAME
+#define TARGET_MANGLE_DECL_ASSEMBLER_NAME ix86_mangle_decl_assembler_name
+
 #undef TARGET_ASM_UNALIGNED_HI_OP
 #define TARGET_ASM_UNALIGNED_HI_OP TARGET_ASM_ALIGNED_HI_OP
 #undef TARGET_ASM_UNALIGNED_SI_OP
@@ -41311,6 +42283,17 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN ix86_fold_builtin
 
+#undef TARGET_COMPARE_VERSION_PRIORITY
+#define TARGET_COMPARE_VERSION_PRIORITY ix86_compare_version_priority
+
+#undef TARGET_GENERATE_VERSION_DISPATCHER_BODY
+#define TARGET_GENERATE_VERSION_DISPATCHER_BODY \
+  ix86_generate_version_dispatcher_body
+
+#undef TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+#define TARGET_GET_FUNCTION_VERSIONS_DISPATCHER \
+  ix86_get_function_versions_dispatcher
+
 #undef TARGET_ENUM_VA_LIST_P
 #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
 
@@ -41451,6 +42434,9 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 #undef TARGET_OPTION_PRINT
 #define TARGET_OPTION_PRINT ix86_function_specific_print
 
+#undef TARGET_OPTION_FUNCTION_VERSIONS
+#define TARGET_OPTION_FUNCTION_VERSIONS ix86_function_versions
+
 #undef TARGET_CAN_INLINE_P
 #define TARGET_CAN_INLINE_P ix86_can_inline_p
 
Index: gcc/testsuite/g++.dg/mv1.C
===================================================================
--- gcc/testsuite/g++.dg/mv1.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv1.C	(revision 0)
@@ -0,0 +1,130 @@
+/* Test case to check if Multiversioning works.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -fPIC -mno-avx -mno-popcnt" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The other versions of foo.  Mix up the ordering and 
+   check if the dispatching does it in the order of priority. */
+/* Check combination of target attributes.  */
+int foo () __attribute__ ((target("arch=corei7,popcnt")));
+/* The target operands in this declaration and the definition are re-ordered.
+   This should still work.  */
+int foo () __attribute__ ((target("ssse3,avx2")));
+
+/* Check for all target attributes for which dispatchers are available.  */
+/* Check arch= */
+int foo () __attribute__((target("arch=core2")));
+int foo () __attribute__((target("arch=corei7")));
+int foo () __attribute__((target("arch=atom")));
+/* Check ISAs  */
+int foo () __attribute__((target("avx")));
+int foo () __attribute__ ((target("arch=core2,sse4.2")));
+/* Check more arch=.  */
+int foo () __attribute__((target("arch=amdfam10")));
+int foo () __attribute__((target("arch=bdver1")));
+int foo () __attribute__((target("arch=bdver2")));
+
+int (*p)() = &foo;
+int main ()
+{
+  int val = foo ();
+  assert (val ==  (*p)());
+
+  /* Check in the exact same order in which the dispatching
+     is expected to happen.  */
+  if (__builtin_cpu_is ("bdver1"))
+    assert (val == 1);
+  else if (__builtin_cpu_is ("bdver2"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("avx2")
+	   && __builtin_cpu_supports ("ssse3"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 4);
+  else if (__builtin_cpu_is ("corei7")
+	   && __builtin_cpu_supports ("popcnt"))
+    assert (val == 5);
+  else if (__builtin_cpu_is ("corei7"))
+    assert (val == 6);
+  else if (__builtin_cpu_is ("amdfam10h"))
+    assert (val == 7);
+  else if (__builtin_cpu_is ("core2")
+	   && __builtin_cpu_supports ("sse4.2"))
+    assert (val == 8);
+  else if (__builtin_cpu_is ("core2"))
+    assert (val == 9);
+  else if (__builtin_cpu_is ("atom"))
+    assert (val == 10);
+  else
+    assert (val == 0);
+  
+  return 0;
+}
+
+int foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("arch=corei7,popcnt")))
+foo ()
+{
+  return 5;
+}
+int __attribute__ ((target("avx2,ssse3")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("arch=core2")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("arch=corei7")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("arch=atom")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("arch=core2,sse4.2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("arch=amdfam10")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("arch=bdver1")))
+foo ()
+{
+  return 1;
+}
+
+int __attribute__ ((target("arch=bdver2")))
+foo ()
+{
+  return 2;
+}
Index: gcc/testsuite/g++.dg/mv2.C
===================================================================
--- gcc/testsuite/g++.dg/mv2.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv2.C	(revision 0)
@@ -0,0 +1,118 @@
+/* Test case to check if Multiversioning chooses the correct
+   dispatching order when versions are for various ISAs.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
+/* { dg-options "-O2 -mno-sse -mno-mmx -mno-popcnt -mno-avx" } */
+
+#include <assert.h>
+
+/* Default version.  */
+int foo ();
+/* The dispatch checks should be in the exact reverse order of the
+   declarations below.  */
+int foo () __attribute__ ((target ("mmx")));
+int foo () __attribute__ ((target ("sse")));
+int foo () __attribute__ ((target ("sse2")));
+int foo () __attribute__ ((target ("sse3")));
+int foo () __attribute__ ((target ("ssse3")));
+int foo () __attribute__ ((target ("sse4.1")));
+int foo () __attribute__ ((target ("sse4.2")));
+int foo () __attribute__ ((target ("popcnt")));
+int foo () __attribute__ ((target ("avx")));
+int foo () __attribute__ ((target ("avx2")));
+
+int main ()
+{
+  int val = foo ();
+
+  if (__builtin_cpu_supports ("avx2"))
+    assert (val == 1);
+  else if (__builtin_cpu_supports ("avx"))
+    assert (val == 2);
+  else if (__builtin_cpu_supports ("popcnt"))
+    assert (val == 3);
+  else if (__builtin_cpu_supports ("sse4.2"))
+    assert (val == 4);
+  else if (__builtin_cpu_supports ("sse4.1"))
+    assert (val == 5);
+  else if (__builtin_cpu_supports ("ssse3"))
+    assert (val == 6);
+  else if (__builtin_cpu_supports ("sse3"))
+    assert (val == 7);
+  else if (__builtin_cpu_supports ("sse2"))
+    assert (val == 8);
+  else if (__builtin_cpu_supports ("sse"))
+    assert (val == 9);
+  else if (__builtin_cpu_supports ("mmx"))
+    assert (val == 10);
+  else
+    assert (val == 0);
+
+  return 0;
+}
+
+int
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("mmx")))
+foo ()
+{
+  return 10;
+}
+
+int __attribute__ ((target("sse")))
+foo ()
+{
+  return 9;
+}
+
+int __attribute__ ((target("sse2")))
+foo ()
+{
+  return 8;
+}
+
+int __attribute__ ((target("sse3")))
+foo ()
+{
+  return 7;
+}
+
+int __attribute__ ((target("ssse3")))
+foo ()
+{
+  return 6;
+}
+
+int __attribute__ ((target("sse4.1")))
+foo ()
+{
+  return 5;
+}
+
+int __attribute__ ((target("sse4.2")))
+foo ()
+{
+  return 4;
+}
+
+int __attribute__ ((target("popcnt")))
+foo ()
+{
+  return 3;
+}
+
+int __attribute__ ((target("avx")))
+foo ()
+{
+  return 2;
+}
+
+int __attribute__ ((target("avx2")))
+foo ()
+{
+  return 1;
+}
Index: gcc/testsuite/g++.dg/mv3.C
===================================================================
--- gcc/testsuite/g++.dg/mv3.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv3.C	(revision 0)
@@ -0,0 +1,36 @@
+/* Test case to check if a call to a multiversioned function
+   is replaced with a direct call to the particular version when
+   the most specialized version's target attributes match the
+   caller.  
+  
+   In this program, foo is multiversioned but there is no default
+   function.  This is an error if the call has to go through a
+   dispatcher.  However, the call to foo in bar can be replaced
+   with a direct call to the popcnt version of foo.  Hence, this
+   test should pass.  */
+
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -mno-sse -mno-popcnt" } */
+
+
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target ("popcnt")))
+bar ()
+{
+  return foo ();
+}
+
+int main ()
+{
+  return bar ();
+}
Index: gcc/testsuite/g++.dg/mv4.C
===================================================================
--- gcc/testsuite/g++.dg/mv4.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv4.C	(revision 0)
@@ -0,0 +1,23 @@
+/* Test case to check if the compiler generates an error message
+   when the default version of a multiversioned function is absent
+   and its pointer is taken.  */
+
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -mno-sse -mno-popcnt" } */
+
+int __attribute__ ((target ("sse")))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int main ()
+{
+  int (*p)() = &foo; /* { dg-error "Pointer to a multiversioned function without a default is not allowed" {} } */
+  return (*p)();
+}
Index: gcc/testsuite/g++.dg/mv5.C
===================================================================
--- gcc/testsuite/g++.dg/mv5.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv5.C	(revision 0)
@@ -0,0 +1,24 @@
+/* Test case to check if multiversioned functions are still generated if they are
+   marked comdat with inline keyword.  */
+
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2  -mno-popcnt" } */
+
+
+/* Default version.  */
+inline int
+foo ()
+{
+  return 0;
+}
+
+inline int __attribute__ ((target ("popcnt")))
+foo ()
+{
+  return 0;
+}
+
+int main ()
+{
+  return foo ();
+}
Index: gcc/testsuite/g++.dg/mv6.C
===================================================================
--- gcc/testsuite/g++.dg/mv6.C	(revision 0)
+++ gcc/testsuite/g++.dg/mv6.C	(revision 0)
@@ -0,0 +1,25 @@
+/* Test to check if member version multiversioning works correctly.  */
+
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+
+class Foo
+{
+ public:
+  /* Default version of foo.  */
+  int foo ()
+  {
+    return 0;
+  }
+  /* corei7 version of foo.  */
+  __attribute__ ((target("arch=corei7")))
+  int foo ()
+  {
+    return 0;
+  }
+};
+
+int main ()
+{
+  Foo f;
+  return f.foo ();
+}

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-11-06  2:38                                                                                             ` Sriraman Tallam
@ 2012-11-06 15:52                                                                                               ` Jason Merrill
  2012-11-06 18:17                                                                                                 ` Sriraman Tallam
  2012-11-10  1:33                                                                                                 ` Sriraman Tallam
  2012-11-06 22:15                                                                                               ` Gerald Pfeifer
  1 sibling, 2 replies; 93+ messages in thread
From: Jason Merrill @ 2012-11-06 15:52 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: David Li, H.J. Lu, gcc-patches List, Jan Hubicka, Diego Novillo

On 11/05/2012 09:38 PM, Sriraman Tallam wrote:
> +      /* For multi-versioned functions, more than one match is just fine.
> +	 Call decls_match to make sure they are different because they are
> +	 versioned.  */
> +      if (DECL_FUNCTION_VERSIONED (fn))
> +	{
> +          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
> +  	    if (!DECL_FUNCTION_VERSIONED (TREE_PURPOSE (match))
> +	        || decls_match (fn, TREE_PURPOSE (match)))
> +	      break;
> +	}

I still don't understand what this code is supposed to be doing.  Please 
remove it and instead modify the other loop to allow mismatches that are 
versions of the same function.

> +  /* If the olddecl is a version, so is the newdecl.  */
> +  if (TREE_CODE (newdecl) == FUNCTION_DECL
> +      && DECL_FUNCTION_VERSIONED (olddecl))
> +    {
> +      DECL_FUNCTION_VERSIONED (newdecl) = 1;
> +      /* newdecl will be purged and is no longer a version.  */
> +      delete_function_version (newdecl);
> +    }

Please make the comment clearer that the reason we're setting the flag 
on the newdecl is so that it'll be copied back into the olddecl; 
otherwise it seems odd to say it's a version and then it isn't a version.

> +  /* If a pointer to a function that is multi-versioned is requested, the
> +     pointer to the dispatcher function is returned instead.  This works
> +     well because indirectly calling the function will dispatch the right
> +     function version at run-time.  */
> +  if (DECL_FUNCTION_VERSIONED (fn))
> +    {
> +      tree dispatcher_decl = NULL;
> +      gcc_assert (targetm.get_function_versions_dispatcher);
> +      dispatcher_decl = targetm.get_function_versions_dispatcher (fn);
> +      if (!dispatcher_decl)
> +	{
> +	  error_at (input_location, "Pointer to a multiversioned function"
> +		    " without a default is not allowed");
> +	  return error_mark_node;
> +	}
> +      retrofit_lang_decl (dispatcher_decl);
> +      fn = dispatcher_decl;

This code should use the get_function_version_dispatcher function in 
cp/call.c.

Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-11-06 15:52                                                                                               ` Jason Merrill
@ 2012-11-06 18:17                                                                                                 ` Sriraman Tallam
  2012-11-10  1:33                                                                                                 ` Sriraman Tallam
  1 sibling, 0 replies; 93+ messages in thread
From: Sriraman Tallam @ 2012-11-06 18:17 UTC (permalink / raw)
  To: Jason Merrill
  Cc: David Li, H.J. Lu, gcc-patches List, Jan Hubicka, Diego Novillo

On Tue, Nov 6, 2012 at 7:52 AM, Jason Merrill <jason@redhat.com> wrote:
> On 11/05/2012 09:38 PM, Sriraman Tallam wrote:
>>
>> +      /* For multi-versioned functions, more than one match is just fine.
>>
>> +        Call decls_match to make sure they are different because they are
>> +        versioned.  */
>> +      if (DECL_FUNCTION_VERSIONED (fn))
>> +       {
>> +          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN
>> (match))
>> +           if (!DECL_FUNCTION_VERSIONED (TREE_PURPOSE (match))
>> +               || decls_match (fn, TREE_PURPOSE (match)))
>> +             break;
>> +       }
>
>
> I still don't understand what this code is supposed to be doing.  Please
> remove it and instead modify the other loop to allow mismatches that are
> versions of the same function.

Ok, will do. I was trying to do for versioned functions what the other
loop was doing thought I could not come up with a test case to
exercise this code.


I will make all the other changes and get back asap.

Thanks,
-Sri.

>
>> +  /* If the olddecl is a version, so is the newdecl.  */
>> +  if (TREE_CODE (newdecl) == FUNCTION_DECL
>> +      && DECL_FUNCTION_VERSIONED (olddecl))
>> +    {
>> +      DECL_FUNCTION_VERSIONED (newdecl) = 1;
>> +      /* newdecl will be purged and is no longer a version.  */
>> +      delete_function_version (newdecl);
>> +    }
>
>
> Please make the comment clearer that the reason we're setting the flag on
> the newdecl is so that it'll be copied back into the olddecl; otherwise it
> seems odd to say it's a version and then it isn't a version.
>
>> +  /* If a pointer to a function that is multi-versioned is requested, the
>> +     pointer to the dispatcher function is returned instead.  This works
>> +     well because indirectly calling the function will dispatch the right
>> +     function version at run-time.  */
>>
>> +  if (DECL_FUNCTION_VERSIONED (fn))
>> +    {
>> +      tree dispatcher_decl = NULL;
>> +      gcc_assert (targetm.get_function_versions_dispatcher);
>> +      dispatcher_decl = targetm.get_function_versions_dispatcher (fn);
>> +      if (!dispatcher_decl)
>> +       {
>> +         error_at (input_location, "Pointer to a multiversioned function"
>> +                   " without a default is not allowed");
>> +         return error_mark_node;
>> +       }
>> +      retrofit_lang_decl (dispatcher_decl);
>> +      fn = dispatcher_decl;
>
>
> This code should use the get_function_version_dispatcher function in
> cp/call.c.
>
> Jason
>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-11-06  2:38                                                                                             ` Sriraman Tallam
  2012-11-06 15:52                                                                                               ` Jason Merrill
@ 2012-11-06 22:15                                                                                               ` Gerald Pfeifer
  1 sibling, 0 replies; 93+ messages in thread
From: Gerald Pfeifer @ 2012-11-06 22:15 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Jason Merrill, David Li, H.J. Lu, gcc-patches List, Jan Hubicka,
	Diego Novillo

On Mon, 5 Nov 2012, Sriraman Tallam wrote:
>    I have now committed the attached patch.

...and broke bootstrap on *-unknown-freebsd* and other targets
that way:

   /scratch2/tmp/gerald/gcc-HEAD/gcc/config/i386/i386.c:28820:1: error: 
   'tree_node* make_dispatcher_decl(tree)' defined but not used 
   [-Werror=unused-function]
    make_dispatcher_decl (const tree decl)
    ^
   cc1plus: all warnings being treated as errors

To restore bootstrap, I applied the patch below after testing on
i386-unknown-freebsd10.0.

Gerald


2012-11-06  Gerald Pfeifer  <gerald@pfeifer.com>

	* config/i386/i386.c (make_dispatcher_decl): Guard with
	ASM_OUTPUT_TYPE_DIRECTIVE and HAVE_GNU_INDIRECT_FUNCTION.

Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 193259)
+++ config/i386/i386.c	(working copy)
@@ -28813,6 +28813,8 @@
   return global_var_name;
 }
 
+#if defined (ASM_OUTPUT_TYPE_DIRECTIVE) && HAVE_GNU_INDIRECT_FUNCTION
+
 /* Make a dispatcher declaration for the multi-versioned function DECL.
    Calls to DECL function will be replaced with calls to the dispatcher
    by the front-end.  Return the decl created.  */
@@ -28850,6 +28852,8 @@
   return func_decl;  
 }
 
+#endif
+
 /* Returns true if decl is multi-versioned and DECL is the default function,
    that is it is not tagged with target specific optimization.  */
 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-11-06 15:52                                                                                               ` Jason Merrill
  2012-11-06 18:17                                                                                                 ` Sriraman Tallam
@ 2012-11-10  1:33                                                                                                 ` Sriraman Tallam
  2012-11-12  5:04                                                                                                   ` Jason Merrill
  1 sibling, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-11-10  1:33 UTC (permalink / raw)
  To: Jason Merrill
  Cc: David Li, H.J. Lu, gcc-patches List, Jan Hubicka, Diego Novillo

[-- Attachment #1: Type: text/plain, Size: 2376 bytes --]

Hi Jason,

   Made all the changes and attached patch. Ok to commit?

Thanks,
-Sri.

On Tue, Nov 6, 2012 at 7:52 AM, Jason Merrill <jason@redhat.com> wrote:
> On 11/05/2012 09:38 PM, Sriraman Tallam wrote:
>>
>> +      /* For multi-versioned functions, more than one match is just fine.
>>
>> +        Call decls_match to make sure they are different because they are
>> +        versioned.  */
>> +      if (DECL_FUNCTION_VERSIONED (fn))
>> +       {
>> +          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN
>> (match))
>> +           if (!DECL_FUNCTION_VERSIONED (TREE_PURPOSE (match))
>> +               || decls_match (fn, TREE_PURPOSE (match)))
>> +             break;
>> +       }
>
>
> I still don't understand what this code is supposed to be doing.  Please
> remove it and instead modify the other loop to allow mismatches that are
> versions of the same function.
>
>> +  /* If the olddecl is a version, so is the newdecl.  */
>> +  if (TREE_CODE (newdecl) == FUNCTION_DECL
>> +      && DECL_FUNCTION_VERSIONED (olddecl))
>> +    {
>> +      DECL_FUNCTION_VERSIONED (newdecl) = 1;
>> +      /* newdecl will be purged and is no longer a version.  */
>> +      delete_function_version (newdecl);
>> +    }
>
>
> Please make the comment clearer that the reason we're setting the flag on
> the newdecl is so that it'll be copied back into the olddecl; otherwise it
> seems odd to say it's a version and then it isn't a version.
>
>> +  /* If a pointer to a function that is multi-versioned is requested, the
>> +     pointer to the dispatcher function is returned instead.  This works
>> +     well because indirectly calling the function will dispatch the right
>> +     function version at run-time.  */
>>
>> +  if (DECL_FUNCTION_VERSIONED (fn))
>> +    {
>> +      tree dispatcher_decl = NULL;
>> +      gcc_assert (targetm.get_function_versions_dispatcher);
>> +      dispatcher_decl = targetm.get_function_versions_dispatcher (fn);
>> +      if (!dispatcher_decl)
>> +       {
>> +         error_at (input_location, "Pointer to a multiversioned function"
>> +                   " without a default is not allowed");
>> +         return error_mark_node;
>> +       }
>> +      retrofit_lang_decl (dispatcher_decl);
>> +      fn = dispatcher_decl;
>
>
> This code should use the get_function_version_dispatcher function in
> cp/call.c.
>
> Jason
>

[-- Attachment #2: mv_patch.txt --]
[-- Type: text/plain, Size: 8191 bytes --]


	* cgraph.c (insert_new_cgraph_node_version): Use cgraph_get_node
	instead of cgraph_get_create_node.
	* cp/class.c (mark_versions_used): Remove.
	(resolve_address_of_overloaded_function): Do not call decls_match
	for versioned functions. Call get_function_versions_dispatcher.
	* cp/decl.c (duplicate_decls): Add comments.
	* cp/call.c (get_function_version_dispatcher): Expose function.
	(mark_versions_used): Expose function.
	* cp/cp-tree.h (mark_versions_used): New declaration.
	(get_function_version_dispatcher): Ditto.
	* testsuite/g++.dg/mv4.C: Add require ifunc. Change error message.
	* testsuite/g++.dg/mv5.C: Add require ifunc.
	* testsuite/g++.dg/mv6.C: Add require ifunc.

Index: cgraph.c
===================================================================
--- cgraph.c	(revision 193385)
+++ cgraph.c	(working copy)
@@ -206,7 +206,7 @@ insert_new_cgraph_node_version (struct cgraph_node
 void
 delete_function_version (tree decl)
 {
-  struct cgraph_node *decl_node = cgraph_get_create_node (decl);
+  struct cgraph_node *decl_node = cgraph_get_node (decl);
   struct cgraph_function_version_info *decl_v = NULL;
 
   if (decl_node == NULL)
Index: testsuite/g++.dg/mv4.C
===================================================================
--- testsuite/g++.dg/mv4.C	(revision 193385)
+++ testsuite/g++.dg/mv4.C	(working copy)
@@ -3,6 +3,7 @@
    and its pointer is taken.  */
 
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
 /* { dg-options "-O2 -mno-sse -mno-popcnt" } */
 
 int __attribute__ ((target ("sse")))
@@ -18,6 +19,6 @@ foo ()
 
 int main ()
 {
-  int (*p)() = &foo; /* { dg-error "Pointer to a multiversioned function without a default is not allowed" {} } */
+  int (*p)() = &foo; /* { dg-error "Pointer/Call to multiversioned function without a default is not allowed" {} } */
   return (*p)();
 }
Index: testsuite/g++.dg/mv6.C
===================================================================
--- testsuite/g++.dg/mv6.C	(revision 193385)
+++ testsuite/g++.dg/mv6.C	(working copy)
@@ -1,6 +1,7 @@
 /* Test to check if member version multiversioning works correctly.  */
 
 /* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
 
 class Foo
 {
Index: testsuite/g++.dg/mv5.C
===================================================================
--- testsuite/g++.dg/mv5.C	(revision 193385)
+++ testsuite/g++.dg/mv5.C	(working copy)
@@ -2,6 +2,7 @@
    marked comdat with inline keyword.  */
 
 /* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
 /* { dg-options "-O2  -mno-popcnt" } */
 
 
Index: cp/class.c
===================================================================
--- cp/class.c	(revision 193385)
+++ cp/class.c	(working copy)
@@ -7068,38 +7068,6 @@ pop_lang_context (void)
 {
   current_lang_name = VEC_pop (tree, current_lang_base);
 }
-
-/* fn is a function version dispatcher that is marked used. Mark all the 
-   semantically identical function versions it will dispatch as used.  */
-
-static void
-mark_versions_used (tree fn)
-{
-  struct cgraph_node *node;
-  struct cgraph_function_version_info *node_v;
-  struct cgraph_function_version_info *it_v;
-
-  gcc_assert (TREE_CODE (fn) == FUNCTION_DECL);
-
-  node = cgraph_get_node (fn);
-  if (node == NULL)
-    return;
-
-  gcc_assert (node->dispatcher_function);
-
-  node_v = get_cgraph_node_version (node);
-  if (node_v == NULL)
-    return;
-
-  /* All semantically identical versions are chained.  Traverse and mark each
-     one of them as used.  */
-  it_v = node_v->next;
-  while (it_v != NULL)
-    {
-      mark_used (it_v->this_node->symbol.decl);
-      it_v = it_v->next;
-    }
-}
 \f
 /* Type instantiation routines.  */
 
@@ -7315,22 +7283,17 @@ resolve_address_of_overloaded_function (tree targe
 
       fn = TREE_PURPOSE (matches);
 
-      /* For multi-versioned functions, more than one match is just fine.
-	 Call decls_match to make sure they are different because they are
-	 versioned.  */
-      if (DECL_FUNCTION_VERSIONED (fn))
+      /* For multi-versioned functions, more than one match is just fine and
+	 decls_match will return false as they are different.  */
+      for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
 	{
-          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
-  	    if (!DECL_FUNCTION_VERSIONED (TREE_PURPOSE (match))
-	        || decls_match (fn, TREE_PURPOSE (match)))
-	      break;
+	  /* Skip calling decls_match for versioned functions.  */
+          if (DECL_FUNCTION_VERSIONED (fn)
+	      && DECL_FUNCTION_VERSIONED (TREE_PURPOSE (match)))
+	    continue;
+	  if (!decls_match (fn, TREE_PURPOSE (match)))
+            break;
 	}
-      else
-	{
-          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
-  	    if (!decls_match (fn, TREE_PURPOSE (match)))
-	      break;
-	}
 
       if (match)
 	{
@@ -7377,17 +7340,9 @@ resolve_address_of_overloaded_function (tree targe
      function version at run-time.  */
   if (DECL_FUNCTION_VERSIONED (fn))
     {
-      tree dispatcher_decl = NULL;
-      gcc_assert (targetm.get_function_versions_dispatcher);
-      dispatcher_decl = targetm.get_function_versions_dispatcher (fn);
-      if (!dispatcher_decl)
-	{
-	  error_at (input_location, "Pointer to a multiversioned function"
-		    " without a default is not allowed");
-	  return error_mark_node;
-	}
-      retrofit_lang_decl (dispatcher_decl);
-      fn = dispatcher_decl;
+      fn = get_function_version_dispatcher (fn);
+      if (fn == NULL)
+	return error_mark_node;
       /* Mark all the versions corresponding to the dispatcher as used.  */
       if (!(flags & tf_conv))
 	mark_versions_used (fn);
Index: cp/decl.c
===================================================================
--- cp/decl.c	(revision 193385)
+++ cp/decl.c	(working copy)
@@ -2307,12 +2307,15 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
-  /* If the olddecl is a version, so is the newdecl.  */
+  /* Merge the DECL_FUNCTION_VERSIONED information.  newdecl will be copied
+     to olddecl and deleted.  */
   if (TREE_CODE (newdecl) == FUNCTION_DECL
       && DECL_FUNCTION_VERSIONED (olddecl))
     {
+      /* Set the flag for newdecl so that it gets copied to olddecl.  */
       DECL_FUNCTION_VERSIONED (newdecl) = 1;
-      /* newdecl will be purged and is no longer a version.  */
+      /* newdecl will be purged after copying to olddecl and is no longer
+         a version.  */
       delete_function_version (newdecl);
     }
 
Index: cp/call.c
===================================================================
--- cp/call.c	(revision 193385)
+++ cp/call.c	(working copy)
@@ -6517,7 +6517,7 @@ magic_varargs_p (tree fn)
 
 /* Returns the decl of the dispatcher function if FN is a function version.  */
 
-static tree
+tree
 get_function_version_dispatcher (tree fn)
 {
   tree dispatcher_decl = NULL;
@@ -6530,8 +6530,8 @@ get_function_version_dispatcher (tree fn)
 
   if (dispatcher_decl == NULL)
     {
-      error_at (input_location, "Call to multiversioned function"
-                " without a default is not allowed");
+      error_at (input_location, "Call/Pointer to multiversioned function"
+                " without a default cannot be dispatched");
       return NULL;
     }
 
@@ -6543,7 +6543,7 @@ get_function_version_dispatcher (tree fn)
 /* fn is a function version dispatcher that is marked used. Mark all the 
    semantically identical function versions it will dispatch as used.  */
 
-static void
+void
 mark_versions_used (tree fn)
 {
   struct cgraph_node *node;
Index: cp/cp-tree.h
===================================================================
--- cp/cp-tree.h	(revision 193385)
+++ cp/cp-tree.h	(working copy)
@@ -4971,6 +4971,8 @@ extern bool is_list_ctor			(tree);
 #ifdef ENABLE_CHECKING
 extern void validate_conversion_obstack		(void);
 #endif /* ENABLE_CHECKING */
+extern void mark_versions_used			(tree);
+extern tree get_function_version_dispatcher	(tree);
 
 /* in class.c */
 extern tree build_vfield_ref			(tree, tree);

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-11-10  1:33                                                                                                 ` Sriraman Tallam
@ 2012-11-12  5:04                                                                                                   ` Jason Merrill
  2012-11-13  1:11                                                                                                     ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: Jason Merrill @ 2012-11-12  5:04 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: David Li, H.J. Lu, gcc-patches List, Jan Hubicka, Diego Novillo

On 11/09/2012 08:33 PM, Sriraman Tallam wrote:
> +	  /* Skip calling decls_match for versioned functions.  */
> +          if (DECL_FUNCTION_VERSIONED (fn)
> +	      && DECL_FUNCTION_VERSIONED (TREE_PURPOSE (match)))
> +	    continue;
> +	  if (!decls_match (fn, TREE_PURPOSE (match)))
> +            break;

This seems like it would allow multiple versioned functions from 
different namespaces; I want to allow mismatches only if they are 
versions of the same function.  I was thinking

          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN 
(match))
   	    if (!decls_match (fn, TREE_PURPOSE (match))
                 && !targetm.target_option.function_versions (fn, 
TREE_PURPOSE (match)))
	      break;

> +      error_at (input_location, "Call/Pointer to multiversioned function"
> +                " without a default cannot be dispatched");

Let's just say "use of multiversioned  function without a default".

Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-11-12  5:04                                                                                                   ` Jason Merrill
@ 2012-11-13  1:11                                                                                                     ` Sriraman Tallam
  2012-11-13  2:39                                                                                                       ` Jason Merrill
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-11-13  1:11 UTC (permalink / raw)
  To: Jason Merrill
  Cc: David Li, H.J. Lu, gcc-patches List, Jan Hubicka, Diego Novillo

[-- Attachment #1: Type: text/plain, Size: 1204 bytes --]

Hi Jason,

   Made the changes. Also fixed one more segfault bug when ifunc is
not supported.

Thanks,
-Sri.

On Sun, Nov 11, 2012 at 9:04 PM, Jason Merrill <jason@redhat.com> wrote:
> On 11/09/2012 08:33 PM, Sriraman Tallam wrote:
>>
>> +         /* Skip calling decls_match for versioned functions.  */
>> +          if (DECL_FUNCTION_VERSIONED (fn)
>> +             && DECL_FUNCTION_VERSIONED (TREE_PURPOSE (match)))
>> +           continue;
>> +         if (!decls_match (fn, TREE_PURPOSE (match)))
>> +            break;
>
>
> This seems like it would allow multiple versioned functions from different
> namespaces; I want to allow mismatches only if they are versions of the same
> function.  I was thinking
>
>
>          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN
> (match))
>             if (!decls_match (fn, TREE_PURPOSE (match))
>                 && !targetm.target_option.function_versions (fn,
> TREE_PURPOSE (match)))
>               break;
>
>> +      error_at (input_location, "Call/Pointer to multiversioned function"
>> +                " without a default cannot be dispatched");
>
>
> Let's just say "use of multiversioned  function without a default".
>
> Jason
>

[-- Attachment #2: mv_patch.txt --]
[-- Type: text/plain, Size: 9215 bytes --]

	* cgraph.c (insert_new_cgraph_node_version): Use cgraph_get_node
	instead of cgraph_get_create_node.
	* cp/class.c (mark_versions_used): Remove.
	(resolve_address_of_overloaded_function): Do not call decls_match
	for versioned functions. Call get_function_versions_dispatcher.
	* cp/decl.c (duplicate_decls): Add comments.
	* cp/call.c (get_function_version_dispatcher): Expose function.
	(mark_versions_used): Expose function.
	* cp/cp-tree.h (mark_versions_used): New declaration.
	(get_function_version_dispatcher): Ditto.
	* config/i386/i386.c (ix86_get_function_versions_dispatcher): Move ifunc
	not supported code to the end.
	* testsuite/g++.dg/mv4.C: Add require ifunc. Change error message.
	* testsuite/g++.dg/mv5.C: Add require ifunc.
	* testsuite/g++.dg/mv6.C: Add require ifunc.

Index: cgraph.c
===================================================================
--- cgraph.c	(revision 193452)
+++ cgraph.c	(working copy)
@@ -206,7 +206,7 @@ insert_new_cgraph_node_version (struct cgraph_node
 void
 delete_function_version (tree decl)
 {
-  struct cgraph_node *decl_node = cgraph_get_create_node (decl);
+  struct cgraph_node *decl_node = cgraph_get_node (decl);
   struct cgraph_function_version_info *decl_v = NULL;
 
   if (decl_node == NULL)
Index: testsuite/g++.dg/mv4.C
===================================================================
--- testsuite/g++.dg/mv4.C	(revision 193452)
+++ testsuite/g++.dg/mv4.C	(working copy)
@@ -3,6 +3,7 @@
    and its pointer is taken.  */
 
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
 /* { dg-options "-O2 -mno-sse -mno-popcnt" } */
 
 int __attribute__ ((target ("sse")))
@@ -18,6 +19,6 @@ foo ()
 
 int main ()
 {
-  int (*p)() = &foo; /* { dg-error "Pointer to a multiversioned function without a default is not allowed" {} } */
+  int (*p)() = &foo; /* { dg-error "Use of multiversioned function without a default" {} } */
   return (*p)();
 }
Index: testsuite/g++.dg/mv6.C
===================================================================
--- testsuite/g++.dg/mv6.C	(revision 193452)
+++ testsuite/g++.dg/mv6.C	(working copy)
@@ -1,6 +1,7 @@
 /* Test to check if member version multiversioning works correctly.  */
 
 /* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
 
 class Foo
 {
Index: testsuite/g++.dg/mv5.C
===================================================================
--- testsuite/g++.dg/mv5.C	(revision 193452)
+++ testsuite/g++.dg/mv5.C	(working copy)
@@ -2,6 +2,7 @@
    marked comdat with inline keyword.  */
 
 /* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" }  */
 /* { dg-options "-O2  -mno-popcnt" } */
 
 
Index: cp/class.c
===================================================================
--- cp/class.c	(revision 193452)
+++ cp/class.c	(working copy)
@@ -7068,38 +7068,6 @@ pop_lang_context (void)
 {
   current_lang_name = VEC_pop (tree, current_lang_base);
 }
-
-/* fn is a function version dispatcher that is marked used. Mark all the 
-   semantically identical function versions it will dispatch as used.  */
-
-static void
-mark_versions_used (tree fn)
-{
-  struct cgraph_node *node;
-  struct cgraph_function_version_info *node_v;
-  struct cgraph_function_version_info *it_v;
-
-  gcc_assert (TREE_CODE (fn) == FUNCTION_DECL);
-
-  node = cgraph_get_node (fn);
-  if (node == NULL)
-    return;
-
-  gcc_assert (node->dispatcher_function);
-
-  node_v = get_cgraph_node_version (node);
-  if (node_v == NULL)
-    return;
-
-  /* All semantically identical versions are chained.  Traverse and mark each
-     one of them as used.  */
-  it_v = node_v->next;
-  while (it_v != NULL)
-    {
-      mark_used (it_v->this_node->symbol.decl);
-      it_v = it_v->next;
-    }
-}
 \f
 /* Type instantiation routines.  */
 
@@ -7315,22 +7283,13 @@ resolve_address_of_overloaded_function (tree targe
 
       fn = TREE_PURPOSE (matches);
 
-      /* For multi-versioned functions, more than one match is just fine.
-	 Call decls_match to make sure they are different because they are
-	 versioned.  */
-      if (DECL_FUNCTION_VERSIONED (fn))
-	{
-          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
-  	    if (!DECL_FUNCTION_VERSIONED (TREE_PURPOSE (match))
-	        || decls_match (fn, TREE_PURPOSE (match)))
-	      break;
-	}
-      else
-	{
-          for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
-  	    if (!decls_match (fn, TREE_PURPOSE (match)))
-	      break;
-	}
+      /* For multi-versioned functions, more than one match is just fine and
+	 decls_match will return false as they are different.  */
+      for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+	if (!decls_match (fn, TREE_PURPOSE (match))
+	    && !targetm.target_option.function_versions (fn,
+	         TREE_PURPOSE (match)))
+          break;
 
       if (match)
 	{
@@ -7377,17 +7336,9 @@ resolve_address_of_overloaded_function (tree targe
      function version at run-time.  */
   if (DECL_FUNCTION_VERSIONED (fn))
     {
-      tree dispatcher_decl = NULL;
-      gcc_assert (targetm.get_function_versions_dispatcher);
-      dispatcher_decl = targetm.get_function_versions_dispatcher (fn);
-      if (!dispatcher_decl)
-	{
-	  error_at (input_location, "Pointer to a multiversioned function"
-		    " without a default is not allowed");
-	  return error_mark_node;
-	}
-      retrofit_lang_decl (dispatcher_decl);
-      fn = dispatcher_decl;
+      fn = get_function_version_dispatcher (fn);
+      if (fn == NULL)
+	return error_mark_node;
       /* Mark all the versions corresponding to the dispatcher as used.  */
       if (!(flags & tf_conv))
 	mark_versions_used (fn);
Index: cp/decl.c
===================================================================
--- cp/decl.c	(revision 193452)
+++ cp/decl.c	(working copy)
@@ -2307,12 +2307,15 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   else if (DECL_PRESERVE_P (newdecl))
     DECL_PRESERVE_P (olddecl) = 1;
 
-  /* If the olddecl is a version, so is the newdecl.  */
+  /* Merge the DECL_FUNCTION_VERSIONED information.  newdecl will be copied
+     to olddecl and deleted.  */
   if (TREE_CODE (newdecl) == FUNCTION_DECL
       && DECL_FUNCTION_VERSIONED (olddecl))
     {
+      /* Set the flag for newdecl so that it gets copied to olddecl.  */
       DECL_FUNCTION_VERSIONED (newdecl) = 1;
-      /* newdecl will be purged and is no longer a version.  */
+      /* newdecl will be purged after copying to olddecl and is no longer
+         a version.  */
       delete_function_version (newdecl);
     }
 
Index: cp/call.c
===================================================================
--- cp/call.c	(revision 193452)
+++ cp/call.c	(working copy)
@@ -6517,7 +6517,7 @@ magic_varargs_p (tree fn)
 
 /* Returns the decl of the dispatcher function if FN is a function version.  */
 
-static tree
+tree
 get_function_version_dispatcher (tree fn)
 {
   tree dispatcher_decl = NULL;
@@ -6530,8 +6530,8 @@ get_function_version_dispatcher (tree fn)
 
   if (dispatcher_decl == NULL)
     {
-      error_at (input_location, "Call to multiversioned function"
-                " without a default is not allowed");
+      error_at (input_location, "Use of multiversioned function "
+				"without a default");
       return NULL;
     }
 
@@ -6543,7 +6543,7 @@ get_function_version_dispatcher (tree fn)
 /* fn is a function version dispatcher that is marked used. Mark all the 
    semantically identical function versions it will dispatch as used.  */
 
-static void
+void
 mark_versions_used (tree fn)
 {
   struct cgraph_node *node;
Index: cp/cp-tree.h
===================================================================
--- cp/cp-tree.h	(revision 193452)
+++ cp/cp-tree.h	(working copy)
@@ -4971,6 +4971,8 @@ extern bool is_list_ctor			(tree);
 #ifdef ENABLE_CHECKING
 extern void validate_conversion_obstack		(void);
 #endif /* ENABLE_CHECKING */
+extern void mark_versions_used			(tree);
+extern tree get_function_version_dispatcher	(tree);
 
 /* in class.c */
 extern tree build_vfield_ref			(tree, tree);
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 193452)
+++ config/i386/i386.c	(working copy)
@@ -28926,11 +28926,6 @@ ix86_get_function_versions_dispatcher (void *decl)
 #if defined (ASM_OUTPUT_TYPE_DIRECTIVE) && HAVE_GNU_INDIRECT_FUNCTION
   /* Right now, the dispatching is done via ifunc.  */
   dispatch_decl = make_dispatcher_decl (default_node->symbol.decl); 
-#else
-  error_at (DECL_SOURCE_LOCATION (default_node->symbol.decl),
-	    "Multiversioning needs ifunc which is not supported "
-	    "in this configuration");
-#endif
 
   dispatcher_node = cgraph_get_create_node (dispatch_decl);
   gcc_assert (dispatcher_node != NULL);
@@ -28947,7 +28942,11 @@ ix86_get_function_versions_dispatcher (void *decl)
       it_v->dispatcher_resolver = dispatch_decl;
       it_v = it_v->next;
     }
-
+#else
+  error_at (DECL_SOURCE_LOCATION (default_node->symbol.decl),
+	    "Multiversioning needs ifunc which is not supported "
+	    "in this configuration");
+#endif
   return dispatch_decl;
 }
 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-11-13  1:11                                                                                                     ` Sriraman Tallam
@ 2012-11-13  2:39                                                                                                       ` Jason Merrill
  2012-11-13 21:57                                                                                                         ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: Jason Merrill @ 2012-11-13  2:39 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: David Li, H.J. Lu, gcc-patches List, Jan Hubicka, Diego Novillo

On 11/12/2012 08:11 PM, Sriraman Tallam wrote:
> +	    && !targetm.target_option.function_versions (fn,
> +	         TREE_PURPOSE (match)))

The second argument should be lined up with the left paren if it's on a 
different line.  Perhaps formatting this as

&& !(targetm.target_option.function_versions
      (fn, TREE_PURPOSE (match))))

would be better.

> +      error_at (input_location, "Use of multiversioned function "
> +	    "Multiversioning needs ifunc which is not supported "

We don't capitalize the first letter of a diagnostic.

OK with those changes.

Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-11-13  2:39                                                                                                       ` Jason Merrill
@ 2012-11-13 21:57                                                                                                         ` Sriraman Tallam
  2012-11-17 22:23                                                                                                           ` H.J. Lu
  0 siblings, 1 reply; 93+ messages in thread
From: Sriraman Tallam @ 2012-11-13 21:57 UTC (permalink / raw)
  To: Jason Merrill
  Cc: David Li, H.J. Lu, gcc-patches List, Jan Hubicka, Diego Novillo

Patch committed now after making the changes.

Thanks,
-Sri.

On Mon, Nov 12, 2012 at 6:39 PM, Jason Merrill <jason@redhat.com> wrote:
> On 11/12/2012 08:11 PM, Sriraman Tallam wrote:
>>
>> +           && !targetm.target_option.function_versions (fn,
>> +                TREE_PURPOSE (match)))
>
>
> The second argument should be lined up with the left paren if it's on a
> different line.  Perhaps formatting this as
>
> && !(targetm.target_option.function_versions
>      (fn, TREE_PURPOSE (match))))
>
> would be better.
>
>> +      error_at (input_location, "Use of multiversioned function "
>> +           "Multiversioning needs ifunc which is not supported "
>
>
> We don't capitalize the first letter of a diagnostic.
>
> OK with those changes.
>
> Jason
>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-11-13 21:57                                                                                                         ` Sriraman Tallam
@ 2012-11-17 22:23                                                                                                           ` H.J. Lu
  0 siblings, 0 replies; 93+ messages in thread
From: H.J. Lu @ 2012-11-17 22:23 UTC (permalink / raw)
  To: Sriraman Tallam
  Cc: Jason Merrill, David Li, gcc-patches List, Jan Hubicka, Diego Novillo

On Tue, Nov 13, 2012 at 1:57 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Patch committed now after making the changes.
>

Your libgcc change caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55370

-- 
H.J.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-11-07  1:16 ` Gerald Pfeifer
@ 2012-11-07  8:53   ` Dominique Dhumieres
  0 siblings, 0 replies; 93+ messages in thread
From: Dominique Dhumieres @ 2012-11-07  8:53 UTC (permalink / raw)
  To: gerald, dominiq; +Cc: tmsriram, gcc-patches

> This should be fixed by a patch I committed directly before you
> sent your mail (which is why you did not see it yet).  Can you
> please verify?

Bootstrap has completed at revision 193278 (with the patch for
dwarf2out.c.

Thanks,

Dominique

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-11-06 22:17 Dominique Dhumieres
@ 2012-11-07  1:16 ` Gerald Pfeifer
  2012-11-07  8:53   ` Dominique Dhumieres
  0 siblings, 1 reply; 93+ messages in thread
From: Gerald Pfeifer @ 2012-11-07  1:16 UTC (permalink / raw)
  To: Dominique Dhumieres; +Cc: gcc-patches, tmsriram

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1724 bytes --]

On Tue, 6 Nov 2012, Dominique Dhumieres wrote:
> /opt/gcc/build_a/./prev-gcc/g++ -B/opt/gcc/build_a/./prev-gcc/ -B/opt/gcc/gcc4.8a/x86_64-apple-darwin10.8.0/bin/ -nostdinc++ -B/opt/gcc/build_a/prev-x86_64-apple-darwin10.8.0/libstdc++-v3/src/.libs -B/opt/gcc/build_a/prev-x86_64-apple-darwin10.8.0/libstdc++-v3/libsupc++/.libs -I/opt/gcc/build_a/prev-x86_64-apple-darwin10.8.0/libstdc++-v3/include/x86_64-apple-darwin10.8.0 -I/opt/gcc/build_a/prev-x86_64-apple-darwin10.8.0/libstdc++-v3/include -I/opt/gcc/_clean/libstdc++-v3/libsupc++ -L/opt/gcc/build_a/prev-x86_64-apple-darwin10.8.0/libstdc++-v3/src/.libs -L/opt/gcc/build_a/prev-x86_64-apple-darwin10.8.0/libstdc++-v3/libsupc++/.libs -c  -DIN_GCC_FRONTEND -g -O2 -mdynamic-no-pic -gtoggle -DIN_GCC   -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror   -DHAVE_CONFIG_H -I. -Icp -I../../_clean/gcc -I../../_clean/gcc/cp -I../../_cle
 an/gcc/../include -I./../intl -I../../_clean/gcc/../libcpp/include -I/opt/mp/include  -I../../_clean/gcc/../libdecnumber -I../../_clean/gcc/../libdecnumber/dpd -I../libdecnumber -I../../_clean/gcc/../libbacktrace -DCLOOG_INT_GMP  -I/opt/mp/include  ../../_clean/gcc/cp/class.c -o cp/class.o
> ../../_clean/gcc/config/i386/i386.c:28821:1: error: 'tree_node* make_dispatcher_decl(tree)' defined but not used [-Werror=unused-function]
>  make_dispatcher_decl (const tree decl)
>  ^
> cc1plus: all warnings being treated as errors

This should be fixed by a patch I committed directly before you
sent your mail (which is why you did not see it yet).  Can you
please verify?

Gerald

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
@ 2012-11-06 22:17 Dominique Dhumieres
  2012-11-07  1:16 ` Gerald Pfeifer
  0 siblings, 1 reply; 93+ messages in thread
From: Dominique Dhumieres @ 2012-11-06 22:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: tmsriram

> I have now committed the attached patch.

This (r193204) breaks bootstrap on x86_64-apple-darwin10:

...
/opt/gcc/build_a/./prev-gcc/g++ -B/opt/gcc/build_a/./prev-gcc/ -B/opt/gcc/gcc4.8a/x86_64-apple-darwin10.8.0/bin/ -nostdinc++ -B/opt/gcc/build_a/prev-x86_64-apple-darwin10.8.0/libstdc++-v3/src/.libs -B/opt/gcc/build_a/prev-x86_64-apple-darwin10.8.0/libstdc++-v3/libsupc++/.libs -I/opt/gcc/build_a/prev-x86_64-apple-darwin10.8.0/libstdc++-v3/include/x86_64-apple-darwin10.8.0 -I/opt/gcc/build_a/prev-x86_64-apple-darwin10.8.0/libstdc++-v3/include -I/opt/gcc/_clean/libstdc++-v3/libsupc++ -L/opt/gcc/build_a/prev-x86_64-apple-darwin10.8.0/libstdc++-v3/src/.libs -L/opt/gcc/build_a/prev-x86_64-apple-darwin10.8.0/libstdc++-v3/libsupc++/.libs -c  -DIN_GCC_FRONTEND -g -O2 -mdynamic-no-pic -gtoggle -DIN_GCC   -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror   -DHAVE_CONFIG_H -I. -Icp -I../../_clean/gcc -I../../_clean/gcc/cp -I../../_clean/gcc/../include -I./../intl -I../../_clean/gcc/../libcpp/include -I/opt/mp/include  -I../../_clean/gcc/../libdecnumber -I../../_clean/gcc/../libdecnumber/dpd -I../libdecnumber -I../../_clean/gcc/../libbacktrace -DCLOOG_INT_GMP  -I/opt/mp/include  ../../_clean/gcc/cp/class.c -o cp/class.o
../../_clean/gcc/config/i386/i386.c:28821:1: error: 'tree_node* make_dispatcher_decl(tree)' defined but not used [-Werror=unused-function]
 make_dispatcher_decl (const tree decl)
 ^
cc1plus: all warnings being treated as errors
...

TIA

Dominique

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
  2012-10-26 16:54 Xinliang David Li
@ 2012-10-26 17:28 ` Sriraman Tallam
  0 siblings, 0 replies; 93+ messages in thread
From: Sriraman Tallam @ 2012-10-26 17:28 UTC (permalink / raw)
  To: Xinliang David Li
  Cc: Jan Hubicka, Diego Novillo, Jason Merrill, Jan Hubicka,
	Mark Mitchell, Nathan Sidwell, H.J. Lu, Richard Guenther,
	Uros Bizjak, reply, GCC Patches

On Fri, Oct 26, 2012 at 9:07 AM, Xinliang David Li <davidxl@google.com> wrote:
> On Fri, Oct 26, 2012 at 8:54 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Hi,
>> sorry for jumping in late, for too long I did not had chnce to look at my TODO.
>> I have two comments...
>>> Index: gcc/cgraphbuild.c
>>> ===================================================================
>>> --- gcc/cgraphbuild.c (revision 192623)
>>> +++ gcc/cgraphbuild.c (working copy)
>>> @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "ipa-utils.h"
>>>  #include "except.h"
>>>  #include "ipa-inline.h"
>>> +#include "target.h"
>>>
>>>  /* Context of record_reference.  */
>>>  struct record_reference_ctx
>>> @@ -317,8 +318,23 @@ build_cgraph_edges (void)
>>>                                                        bb);
>>>             decl = gimple_call_fndecl (stmt);
>>>             if (decl)
>>> -             cgraph_create_edge (node, cgraph_get_create_node (decl),
>>> -                                 stmt, bb->count, freq);
>>> +             {
>>> +               struct cgraph_node *callee = cgraph_get_create_node (decl);
>>> +               /* If a call to a multiversioned function dispatcher is
>>> +                  found, generate the body to dispatch the right function
>>> +                  at run-time.  */
>>> +               if (callee->dispatcher_function)
>>> +                 {
>>> +                   tree resolver_decl;
>>> +                   gcc_assert (callee->function_version
>>> +                               && callee->function_version->next);
>>> +                   gcc_assert (targetm.generate_version_dispatcher_body);
>>> +                   resolver_decl
>>> +                      = targetm.generate_version_dispatcher_body (callee);
>>> +                   gcc_assert (resolver_decl != NULL_TREE);
>>> +                 }
>>> +               cgraph_create_edge (node, callee, stmt, bb->count, freq);
>>> +             }
>> I do not really think resolver generation belongs here + I would preffer
>> build_cgraph_edges to really just build the edges.
>>> Index: gcc/cgraph.c
>>> ===================================================================
>>> --- gcc/cgraph.c      (revision 192623)
>>> +++ gcc/cgraph.c      (working copy)
>>> @@ -1277,6 +1277,16 @@ cgraph_mark_address_taken_node (struct cgraph_node
>>>    node->symbol.address_taken = 1;
>>>    node = cgraph_function_or_thunk_node (node, NULL);
>>>    node->symbol.address_taken = 1;
>>> +  /* If the address of a multiversioned function dispatcher is taken,
>>> +     generate the body to dispatch the right function at run-time.  This
>>> +     is needed as the address can be used to do an indirect call.  */
>>> +  if (node->dispatcher_function)
>>> +    {
>>> +      gcc_assert (node->function_version
>>> +               && node->function_version->next);
>>> +      gcc_assert (targetm.generate_version_dispatcher_body);
>>> +      targetm.generate_version_dispatcher_body (node);
>>> +    }
>>
>> Similarly here.  I also think this way you will miss aliases of the multiversioned
>> functions.
>
>>
>> I am not sure why the multiversioning is tied with the cgraph build and the
>> datastructure is put into cgraph_node itself.  It seems to me that your
>> dispatchers are in a way related to thunks - i.e. they are inserted into
>> callgraph and once they become reachable their body needs to be produced.  I
>> think generate_version_dispatcher_body should thus probably be done from
>> cgraph_analyze_function. (to make the function to be seen by analyze_function
>> you will need to make it to be finalized at the time you set
>> dispatcher_function flag.
>
> This seems reasonable -- Sri, do you see any problems with this suggestion?

No, I will make this change asap.

>
>>
>> I would also put the dispatcher datastructure into on-side hash by node->uid.
>> (i.e. these are rare and thus the datastructure should be small)
>> symbol table is critical for WPA stage memory use and I plan to remove as much
>> as possible from the nodes in near future. For this reason I would preffer
>> to not add too much of stuff that is not going to be used by majority of nodes.
>>

OK, will change as suggested.

>
> I had the concern on the increasing the size of core data structure too.

Thanks,
-Sri.


>
> thanks,
>
> David
>
>> Honza

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: User directed Function Multiversioning via Function Overloading (issue5752064)
@ 2012-10-26 16:54 Xinliang David Li
  2012-10-26 17:28 ` Sriraman Tallam
  0 siblings, 1 reply; 93+ messages in thread
From: Xinliang David Li @ 2012-10-26 16:54 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Sriraman Tallam, Diego Novillo, Jason Merrill, Jan Hubicka,
	Mark Mitchell, Nathan Sidwell, H.J. Lu, Richard Guenther,
	Uros Bizjak, reply, GCC Patches

On Fri, Oct 26, 2012 at 8:54 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> sorry for jumping in late, for too long I did not had chnce to look at my TODO.
> I have two comments...
>> Index: gcc/cgraphbuild.c
>> ===================================================================
>> --- gcc/cgraphbuild.c (revision 192623)
>> +++ gcc/cgraphbuild.c (working copy)
>> @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "ipa-utils.h"
>>  #include "except.h"
>>  #include "ipa-inline.h"
>> +#include "target.h"
>>
>>  /* Context of record_reference.  */
>>  struct record_reference_ctx
>> @@ -317,8 +318,23 @@ build_cgraph_edges (void)
>>                                                        bb);
>>             decl = gimple_call_fndecl (stmt);
>>             if (decl)
>> -             cgraph_create_edge (node, cgraph_get_create_node (decl),
>> -                                 stmt, bb->count, freq);
>> +             {
>> +               struct cgraph_node *callee = cgraph_get_create_node (decl);
>> +               /* If a call to a multiversioned function dispatcher is
>> +                  found, generate the body to dispatch the right function
>> +                  at run-time.  */
>> +               if (callee->dispatcher_function)
>> +                 {
>> +                   tree resolver_decl;
>> +                   gcc_assert (callee->function_version
>> +                               && callee->function_version->next);
>> +                   gcc_assert (targetm.generate_version_dispatcher_body);
>> +                   resolver_decl
>> +                      = targetm.generate_version_dispatcher_body (callee);
>> +                   gcc_assert (resolver_decl != NULL_TREE);
>> +                 }
>> +               cgraph_create_edge (node, callee, stmt, bb->count, freq);
>> +             }
> I do not really think resolver generation belongs here + I would preffer
> build_cgraph_edges to really just build the edges.
>> Index: gcc/cgraph.c
>> ===================================================================
>> --- gcc/cgraph.c      (revision 192623)
>> +++ gcc/cgraph.c      (working copy)
>> @@ -1277,6 +1277,16 @@ cgraph_mark_address_taken_node (struct cgraph_node
>>    node->symbol.address_taken = 1;
>>    node = cgraph_function_or_thunk_node (node, NULL);
>>    node->symbol.address_taken = 1;
>> +  /* If the address of a multiversioned function dispatcher is taken,
>> +     generate the body to dispatch the right function at run-time.  This
>> +     is needed as the address can be used to do an indirect call.  */
>> +  if (node->dispatcher_function)
>> +    {
>> +      gcc_assert (node->function_version
>> +               && node->function_version->next);
>> +      gcc_assert (targetm.generate_version_dispatcher_body);
>> +      targetm.generate_version_dispatcher_body (node);
>> +    }
>
> Similarly here.  I also think this way you will miss aliases of the multiversioned
> functions.

>
> I am not sure why the multiversioning is tied with the cgraph build and the
> datastructure is put into cgraph_node itself.  It seems to me that your
> dispatchers are in a way related to thunks - i.e. they are inserted into
> callgraph and once they become reachable their body needs to be produced.  I
> think generate_version_dispatcher_body should thus probably be done from
> cgraph_analyze_function. (to make the function to be seen by analyze_function
> you will need to make it to be finalized at the time you set
> dispatcher_function flag.

This seems reasonable -- Sri, do you see any problems with this suggestion?

>
> I would also put the dispatcher datastructure into on-side hash by node->uid.
> (i.e. these are rare and thus the datastructure should be small)
> symbol table is critical for WPA stage memory use and I plan to remove as much
> as possible from the nodes in near future. For this reason I would preffer
> to not add too much of stuff that is not going to be used by majority of nodes.
>

I had the concern on the increasing the size of core data structure too.

thanks,

David

> Honza

^ permalink raw reply	[flat|nested] 93+ messages in thread

end of thread, other threads:[~2012-11-17 22:23 UTC | newest]

Thread overview: 93+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-07  0:47 User directed Function Multiversioning via Function Overloading (issue5752064) Sriraman Tallam
2012-03-07 14:05 ` Richard Guenther
2012-03-07 19:08   ` Sriraman Tallam
2012-03-08 21:37     ` Xinliang David Li
2012-03-08 21:00   ` Xinliang David Li
2012-03-09 20:04   ` Sriraman Tallam
2012-04-27  5:09     ` Sriraman Tallam
2012-04-27 13:39       ` H.J. Lu
2012-04-27 14:35         ` Sriraman Tallam
2012-04-27 14:39           ` H.J. Lu
2012-04-27 14:53             ` Sriraman Tallam
2012-04-27 15:36               ` H.J. Lu
2012-04-27 15:45                 ` Sriraman Tallam
2012-05-01 23:51                 ` Sriraman Tallam
2012-05-02  0:09                   ` H.J. Lu
2012-05-02  2:45                     ` Sriraman Tallam
2012-05-02 13:42                       ` H.J. Lu
2012-05-02 15:08                         ` Sriraman Tallam
2012-05-02 16:06                           ` H.J. Lu
2012-05-02 17:44                             ` Sriraman Tallam
2012-05-02 18:04                               ` H.J. Lu
2012-05-07 16:58                                 ` Sriraman Tallam
2012-05-09 19:01                                   ` Sriraman Tallam
2012-05-10 17:55                                     ` H.J. Lu
2012-05-12  2:04                                       ` Sriraman Tallam
2012-05-12 13:38                                         ` H.J. Lu
2012-05-14 18:29                                           ` Sriraman Tallam
2012-05-26  0:07                                             ` H.J. Lu
2012-05-26  0:16                                               ` Sriraman Tallam
2012-05-26  0:27                                                 ` H.J. Lu
2012-05-26  1:54                                                   ` Sriraman Tallam
     [not found]                                                     ` <CAMe9rOowm9K7r1xnRdRjW5Y4Ay+WxgSsBLTgGvq24z=i42AS+g@mail.gmail.com>
     [not found]                                                       ` <CAAs8HmzeQigcLQyfkC02u=6gCTLkjLLa_jYmp+b1HEtpMCrYWw@mail.gmail.com>
2012-05-26  5:06                                                         ` H.J. Lu
2012-05-26 22:35                                                           ` Sriraman Tallam
2012-05-26 23:56                                                             ` H.J. Lu
2012-05-27  0:24                                                               ` Sriraman Tallam
2012-05-27  2:06                                                                 ` H.J. Lu
2012-05-27  2:23                                                                   ` Sriraman Tallam
2012-05-27  2:31                                                                     ` H.J. Lu
2012-05-27 19:02                                                                     ` Ian Lance Taylor
2012-06-04 19:01                                             ` Sriraman Tallam
2012-06-04 21:36                                               ` H.J. Lu
2012-06-04 22:29                                                 ` Sriraman Tallam
2012-06-05 13:56                                                   ` H.J. Lu
2012-06-14 20:35                                               ` Sriraman Tallam
2012-06-20  1:10                                                 ` Sriraman Tallam
2012-07-06  9:14                                                 ` Richard Guenther
2012-07-06 17:38                                                   ` Sriraman Tallam
2012-07-07  6:06                                                 ` Jason Merrill
2012-07-07 18:38                                                   ` Xinliang David Li
2012-07-08 11:21                                                     ` Jason Merrill
2012-07-09 21:27                                                       ` Xinliang David Li
2012-07-10  9:46                                                         ` Jason Merrill
2012-07-10 16:09                                                           ` Xinliang David Li
     [not found]                                                             ` <CAAs8HmxHF38ktt6syjWp-MpjiX+6NcXh7_8Xn6iKnAiF2vRymQ@mail.gmail.com>
2012-07-19 20:40                                                               ` Jason Merrill
2012-07-30 19:16                                                                 ` Sriraman Tallam
2012-08-25  0:34                                                                   ` Sriraman Tallam
2012-09-18 16:29                                                                     ` Sriraman Tallam
2012-10-05 17:07                                                                       ` Xinliang David Li
2012-10-05 17:44                                                                     ` Jason Merrill
2012-10-05 18:14                                                                       ` Jason Merrill
2012-10-05 21:58                                                                       ` Sriraman Tallam
2012-10-05 22:50                                                                         ` Jason Merrill
2012-10-05 23:45                                                                           ` Sriraman Tallam
2012-10-05 18:32                                                                     ` Jason Merrill
2012-10-11  0:13                                                                       ` Sriraman Tallam
2012-10-12 22:41                                                                         ` Sriraman Tallam
2012-10-19 15:23                                                                           ` Diego Novillo
2012-10-20  4:29                                                                             ` Sriraman Tallam
2012-10-23 21:21                                                                               ` Sriraman Tallam
2012-10-26 16:53                                                                                 ` Jan Hubicka
2012-10-28  4:31                                                                                   ` Sriraman Tallam
2012-10-29 13:05                                                                                     ` Jan Hubicka
2012-10-29 17:56                                                                                       ` Sriraman Tallam
2012-10-30 19:18                                                                                     ` Jason Merrill
2012-10-31  0:58                                                                                       ` Sriraman Tallam
     [not found]                                                                                       ` <CAAs8Hmw09giv-5_v0irhByTjTJV=kD58rCAD2SAz7M8zrwjBOA@mail.gmail.com>
2012-10-31 14:27                                                                                         ` Jason Merrill
2012-11-02  2:53                                                                                           ` Sriraman Tallam
2012-11-06  2:38                                                                                             ` Sriraman Tallam
2012-11-06 15:52                                                                                               ` Jason Merrill
2012-11-06 18:17                                                                                                 ` Sriraman Tallam
2012-11-10  1:33                                                                                                 ` Sriraman Tallam
2012-11-12  5:04                                                                                                   ` Jason Merrill
2012-11-13  1:11                                                                                                     ` Sriraman Tallam
2012-11-13  2:39                                                                                                       ` Jason Merrill
2012-11-13 21:57                                                                                                         ` Sriraman Tallam
2012-11-17 22:23                                                                                                           ` H.J. Lu
2012-11-06 22:15                                                                                               ` Gerald Pfeifer
2012-10-26 14:11                                                                               ` Diego Novillo
2012-10-26 16:54 Xinliang David Li
2012-10-26 17:28 ` Sriraman Tallam
2012-11-06 22:17 Dominique Dhumieres
2012-11-07  1:16 ` Gerald Pfeifer
2012-11-07  8:53   ` Dominique Dhumieres

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).